第4章のStataコード

第4章 回帰分析の基礎

サンプルデータ

temperature_aug.csv:2014年8月の東京都の気温データ.

wage.csv:1976年米国における男性労働者の賃金に関するデータ.

4.1 回帰分析の考え方

4.1.4 数値例:気温と電力使用量

import delimited "temperature_aug.csv", clear

twoway (scatter temp time), ytitle(気温) xtitle(時刻)

4.1.6 ノンパラメトリック回帰の実行

summarize temp, detail
                            temp
-------------------------------------------------------------
      Percentiles      Smallest
 1%         20.1           19.8
 5%         21.4           19.8
10%         22.3           19.9       Obs                 744
25%        25.65           19.9       Sum of wgt.         744

50%           28                      Mean           27.66841
                        Largest       Std. dev.      3.545453
75%        30.15           34.7
90%         32.3             35       Variance       12.57024
95%         33.2           35.2       Skewness      -.2632042
99%         34.6           35.5       Kurtosis       2.405295
tabulate time, summarize(temp)
            |           Summary of temp
       time |        Mean   Std. dev.       Freq.
------------+------------------------------------
          0 |    26.36129   2.7476872          31
          1 |   26.096774   2.7306269          31
          2 |   25.896774   2.6930139          31
          3 |   25.835484   2.6072396          31
          4 |   25.725806   2.6659542          31
          5 |   25.832258   2.7607229          31
          6 |   26.425806   3.0999965          31
          7 |   27.258064   3.4144569          31
          8 |   28.135484   3.5356798          31
          9 |   29.038709   3.8001472          31
         10 |   29.422581   3.8813838          31
         11 |   29.622581   3.8733883          31
         12 |   29.874194   4.0120625          31
         13 |    30.03871   3.9090218          31
         14 |   29.706452   3.4909345          31
         15 |   29.309677   3.4641358          31
         16 |   28.880645   3.3710949          31
         17 |   28.290323   3.4171483          31
         18 |   27.716129   3.3325764          31
         19 |   27.454839    3.160679          31
         20 |   27.187097   3.1749796          31
         21 |    26.93871   3.0445227          31
         22 |   26.629032   2.9762608          31
         23 |   26.364516   2.8868147          31
------------+------------------------------------
      Total |   27.668414   3.5454533         744
bysort time: egen mean_temp = mean(temp)
line mean_temp time, ytitle(気温) xtitle(時刻)

4.1.7 グラフを重ねる

twoway (scatter temp time) || (line mean_temp time), ytitle(気温) xtitle(時刻) legend(off)

4.2 単回帰分析

4.2.1 ノンパラメトリック回帰の限界

cor temp elec
(obs=744)

             |     temp     elec
-------------+------------------
        temp |   1.0000
        elec |   0.7198   1.0000
bysort temp: egen mean_elec = mean(elec)
line mean_elec temp, ytitle(電気使用量(万kw)) xtitle(気温)

4.2.4 Rによる線形回帰分析

regress elec temp
      Source |       SS           df       MS      Number of obs   =       744
-------------+----------------------------------   F(1, 742)       =    797.86
       Model |   196449050         1   196449050   Prob > F        =    0.0000
    Residual |   182695164       742  246219.898   R-squared       =    0.5181
-------------+----------------------------------   Adj R-squared   =    0.5175
       Total |   379144214       743  510288.309   Root MSE        =    496.21

------------------------------------------------------------------------------
        elec | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        temp |   145.0303   5.134467    28.25   0.000     134.9505    155.1101
       _cons |  -614.2722   143.2226    -4.29   0.000     -895.442   -333.1024
------------------------------------------------------------------------------

4.2.6 単回帰の図示

quietly regress elec temp
quietly regress elec temp
generate predicted = e(b)[1, 2] + e(b)[1, 1] * temp
twoway (scatter elec temp) || (line predicted temp), ytitle(電気使用量(万kw)) xtitle(気温) legend(off)

twoway (scatter elec temp) || (lfit elec temp), ytitle(電気使用量(万kw)) xtitle(気温) legend(off)

4.3 重回帰分析

4.3.1 説明変数を追加する

twoway (scatter elec time), ytitle(電気使用量(万kw)) xtitle(時刻) legend(off)

generate daytime = inrange(time, 9, 18)
generate elec100 = elec / 100
generate time12  = mod(time, 12)
generate ampm    = 1 if time <  12 & !missing(time)
replace  ampm    = 2 if time >= 12 & !missing(time)
label define label_ampm 1 "a.m." 2 "p.m."
label values ampm label_ampm

list in 1/10, separator(0)
     +------------------------------------------------------------------------------------------------------------+
     |      date   time   elec   prec   temp   mean_t~p   mean_e~c   predic~d   daytime   elec100   time12   ampm |
     |------------------------------------------------------------------------------------------------------------|
  1. | 2014/8/28      2   2405    1.5   19.8   25.89677       2435   2257.328         0     24.05        2   a.m. |
  2. | 2014/8/28      1   2465      1   19.8   26.09677       2435   2257.328         0     24.65        1   a.m. |
  3. | 2014/8/28      6   2584     .5   19.9   26.42581       2506   2271.831         0     25.84        6   a.m. |
  4. | 2014/8/28      4   2428      0   19.9   25.72581       2506   2271.831         0     24.28        4   a.m. |
  5. | 2014/8/27     23   2818      1     20   26.36452       2818   2286.334         0     28.18       11   p.m. |
  6. | 2014/8/28      5   2463      0   20.1   25.83226       2495   2300.837         0     24.63        5   a.m. |
  7. | 2014/8/30      6   2420    1.5   20.1   26.42581       2495   2300.837         0      24.2        6   a.m. |
  8. | 2014/8/28      0   2602     .5   20.1   26.36129       2495   2300.837         0     26.02        0   a.m. |
  9. | 2014/8/27     22   2986      1   20.2   26.62903     2724.6    2315.34         0     29.86       10   p.m. |
 10. | 2014/8/30      5   2406    2.5   20.2   25.83226     2724.6    2315.34         0     24.06        5   a.m. |
     +------------------------------------------------------------------------------------------------------------+

4.3.2 重回帰分析

regress elec temp daytime
      Source |       SS           df       MS      Number of obs   =       744
-------------+----------------------------------   F(2, 741)       =    675.82
       Model |   244889502         2   122444751   Prob > F        =    0.0000
    Residual |   134254712       741  181180.448   R-squared       =    0.6459
-------------+----------------------------------   Adj R-squared   =    0.6449
       Total |   379144214       743  510288.309   Root MSE        =    425.65

------------------------------------------------------------------------------
        elec | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        temp |   116.9782   4.726761    24.75   0.000     107.6988    126.2577
     daytime |   555.4424   33.96961    16.35   0.000     488.7543    622.1306
       _cons |  -69.55031   127.2952    -0.55   0.585    -319.4525    180.3518
------------------------------------------------------------------------------
regress elec temp daytime prec
      Source |       SS           df       MS      Number of obs   =       744
-------------+----------------------------------   F(3, 740)       =    449.97
       Model |   244896760         3  81632253.3   Prob > F        =    0.0000
    Residual |   134247454       740  181415.478   R-squared       =    0.6459
-------------+----------------------------------   Adj R-squared   =    0.6445
       Total |   379144214       743  510288.309   Root MSE        =    425.93

------------------------------------------------------------------------------
        elec | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        temp |   116.8024   4.810879    24.28   0.000     107.3578     126.247
     daytime |   556.1478   34.17412    16.27   0.000     489.0581    623.2376
        prec |  -3.366072   16.82885    -0.20   0.842    -36.40405    29.67191
       _cons |  -64.50256   129.8536    -0.50   0.620    -319.4279    190.4228
------------------------------------------------------------------------------
generate sunday = (date == "2014/8/3") 
replace  sunday = (date == "2014/8/10") 
replace  sunday = (date == "2014/8/17") 
replace  sunday = (date == "2014/8/24") 
replace  sunday = (date == "2014/8/31")

list in 1/10, separator(0)
     +---------------------------------------------------------------------------------------------------------------------+
     |      date   time   elec   prec   temp   mean_t~p   mean_e~c   predic~d   daytime   elec100   time12   ampm   sunday |
     |---------------------------------------------------------------------------------------------------------------------|
  1. | 2014/8/28      2   2405    1.5   19.8   25.89677       2435   2257.328         0     24.05        2   a.m.        0 |
  2. | 2014/8/28      1   2465      1   19.8   26.09677       2435   2257.328         0     24.65        1   a.m.        0 |
  3. | 2014/8/28      4   2428      0   19.9   25.72581       2506   2271.831         0     24.28        4   a.m.        0 |
  4. | 2014/8/28      6   2584     .5   19.9   26.42581       2506   2271.831         0     25.84        6   a.m.        0 |
  5. | 2014/8/27     23   2818      1     20   26.36452       2818   2286.334         0     28.18       11   p.m.        0 |
  6. | 2014/8/28      0   2602     .5   20.1   26.36129       2495   2300.837         0     26.02        0   a.m.        0 |
  7. | 2014/8/30      6   2420    1.5   20.1   26.42581       2495   2300.837         0      24.2        6   a.m.        0 |
  8. | 2014/8/28      5   2463      0   20.1   25.83226       2495   2300.837         0     24.63        5   a.m.        0 |
  9. | 2014/8/27     22   2986      1   20.2   26.62903     2724.6    2315.34         0     29.86       10   p.m.        0 |
 10. | 2014/8/27     21   3026      1   20.2   26.93871     2724.6    2315.34         0     30.26        9   p.m.        0 |
     +---------------------------------------------------------------------------------------------------------------------+
drop sunday
generate date2 = date(date, "YMD")
drop date
rename date2 date
generate dow = dow(date)
label define label_dow 0 "日" 1 "月" 2 "火" 3 "水" 4 "木" 5 "金" 6 "土" 
label values dow label_dow
generate sunday = (dow == 0) 

list in 1/10, separator(0)
     +-----------------------------------------------------------------------------------------------------------------------+
     | time   elec   prec   temp   mean_t~p   mean_e~c   predic~d   daytime   elec100   time12   ampm    date   dow   sunday |
     |-----------------------------------------------------------------------------------------------------------------------|
  1. |    2   2405    1.5   19.8   25.89677       2435   2257.328         0     24.05        2   a.m.   19963    木        0 |
  2. |    1   2465      1   19.8   26.09677       2435   2257.328         0     24.65        1   a.m.   19963    木        0 |
  3. |    4   2428      0   19.9   25.72581       2506   2271.831         0     24.28        4   a.m.   19963    木        0 |
  4. |    6   2584     .5   19.9   26.42581       2506   2271.831         0     25.84        6   a.m.   19963    木        0 |
  5. |   23   2818      1     20   26.36452       2818   2286.334         0     28.18       11   p.m.   19962    水        0 |
  6. |    0   2602     .5   20.1   26.36129       2495   2300.837         0     26.02        0   a.m.   19963    木        0 |
  7. |    6   2420    1.5   20.1   26.42581       2495   2300.837         0      24.2        6   a.m.   19965    土        0 |
  8. |    5   2463      0   20.1   25.83226       2495   2300.837         0     24.63        5   a.m.   19963    木        0 |
  9. |   22   2986      1   20.2   26.62903     2724.6    2315.34         0     29.86       10   p.m.   19962    水        0 |
 10. |   21   3026      1   20.2   26.93871     2724.6    2315.34         0     30.26        9   p.m.   19962    水        0 |
     +-----------------------------------------------------------------------------------------------------------------------+
generate recess = inrange(date, td("11aug2014"), td("16aug2014"))


regress elec temp daytime prec sunday recess
      Source |       SS           df       MS      Number of obs   =       744
-------------+----------------------------------   F(5, 738)       =    405.47
       Model |   277960791         5  55592158.2   Prob > F        =    0.0000
    Residual |   101183423       738  137104.909   R-squared       =    0.7331
-------------+----------------------------------   Adj R-squared   =    0.7313
       Total |   379144214       743  510288.309   Root MSE        =    370.28

------------------------------------------------------------------------------
        elec | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        temp |   113.4773   4.192549    27.07   0.000     105.2466    121.7081
     daytime |   563.5284   29.71622    18.96   0.000     505.1899    621.8668
        prec |     14.275   14.70092     0.97   0.332     -14.5856     43.1356
      sunday |   -448.392   38.12452   -11.76   0.000    -523.2375   -373.5466
      recess |  -438.2301   35.19863   -12.45   0.000    -507.3315   -369.1287
       _cons |   179.0707   114.0884     1.57   0.117    -44.90582    403.0471
------------------------------------------------------------------------------
display (179.0707 + 113.4773 * 28 + 563.5284 * 1 + 14.275 * 0 - 448.392 * 0 - 438.2301 * 0)
3919.9635
quietly regress elec temp daytime prec sunday recess
matrix yhat =  e(b) * [28, 1, 0, 0, 0, 1]'
display yhat[1, 1]
3919.9643
quietly regress elec temp daytime prec sunday recess
adjust temp = 28 daytime = 1 prec = 0 sunday = 0 recess = 0
------------------------------------------------------------------------------------------------------
     Dependent variable: elec     Command: regress
Covariates set to value: temp = 28, daytime = 1, prec = 0, sunday = 0, recess = 0
------------------------------------------------------------------------------------------------------

----------------------
      All |         xb
----------+-----------
          |    3919.96
----------------------
     Key:  xb  =  Linear Prediction

4.4 決定係数と回帰分析

4.4.2 決定係数の出力

regress elec temp daytime prec sunday recess
      Source |       SS           df       MS      Number of obs   =       744
-------------+----------------------------------   F(5, 738)       =    405.47
       Model |   277960791         5  55592158.2   Prob > F        =    0.0000
    Residual |   101183423       738  137104.909   R-squared       =    0.7331
-------------+----------------------------------   Adj R-squared   =    0.7313
       Total |   379144214       743  510288.309   Root MSE        =    370.28

------------------------------------------------------------------------------
        elec | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        temp |   113.4773   4.192549    27.07   0.000     105.2466    121.7081
     daytime |   563.5284   29.71622    18.96   0.000     505.1899    621.8668
        prec |     14.275   14.70092     0.97   0.332     -14.5856     43.1356
      sunday |   -448.392   38.12452   -11.76   0.000    -523.2375   -373.5466
      recess |  -438.2301   35.19863   -12.45   0.000    -507.3315   -369.1287
       _cons |   179.0707   114.0884     1.57   0.117    -44.90582    403.0471
------------------------------------------------------------------------------
display e(r2)
.73312682
regress elec temp
      Source |       SS           df       MS      Number of obs   =       744
-------------+----------------------------------   F(1, 742)       =    797.86
       Model |   196449050         1   196449050   Prob > F        =    0.0000
    Residual |   182695164       742  246219.898   R-squared       =    0.5181
-------------+----------------------------------   Adj R-squared   =    0.5175
       Total |   379144214       743  510288.309   Root MSE        =    496.21

------------------------------------------------------------------------------
        elec | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        temp |   145.0303   5.134467    28.25   0.000     134.9505    155.1101
       _cons |  -614.2722   143.2226    -4.29   0.000     -895.442   -333.1024
------------------------------------------------------------------------------
display e(r2)
.51813807

4.4.3 決定係数は「モデルの正しさ」を保証しない

clear
set seed 2022
set obs 15
generate a = ceil(runiform(0, 6))
generate b = ceil(runiform(0, 6))
generate c = ceil(runiform(0, 6))
generate d = ceil(runiform(0, 6))
generate e = ceil(runiform(0, 6))
generate f = ceil(runiform(0, 6))
generate g = ceil(runiform(0, 6))

regress a b c d e f g
      Source |       SS           df       MS      Number of obs   =        15
-------------+----------------------------------   F(6, 8)         =      0.70
       Model |  13.6670698         6  2.27784497   Prob > F        =    0.6592
    Residual |  26.0662635         8  3.25828294   R-squared       =    0.3440
-------------+----------------------------------   Adj R-squared   =   -0.1481
       Total |  39.7333333        14  2.83809524   Root MSE        =    1.8051

------------------------------------------------------------------------------
           a | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           b |  -.4272472   .3836928    -1.11   0.298    -1.312044    .4575499
           c |   .1912033   .5685463     0.34   0.745    -1.119867    1.502274
           d |   .1235303    .344219     0.36   0.729    -.6702402    .9173007
           e |  -.3732129   .3376293    -1.11   0.301    -1.151788    .4053617
           f |   .6556865   .4250799     1.54   0.162    -.3245496    1.635923
           g |  -.1543043   .5476053    -0.28   0.785    -1.417084    1.108476
       _cons |   2.940337   4.507969     0.65   0.533    -7.455059    13.33573
------------------------------------------------------------------------------
generate h = ceil(runiform(0, 6))

regress a b c d e f g h
      Source |       SS           df       MS      Number of obs   =        15
-------------+----------------------------------   F(7, 7)         =      0.70
       Model |  16.3734493         7  2.33906419   Prob > F        =    0.6746
    Residual |   23.359884         7  3.33712629   R-squared       =    0.4121
-------------+----------------------------------   Adj R-squared   =   -0.1758
       Total |  39.7333333        14  2.83809524   Root MSE        =    1.8268

------------------------------------------------------------------------------
           a | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           b |  -.3092509   .4098178    -0.75   0.475    -1.278316    .6598142
           c |   .1101777   .5823762     0.19   0.855    -1.266923    1.487278
           d |   .0266849   .3645802     0.07   0.944    -.8354103    .8887801
           e |   -.524795   .3808991    -1.38   0.211    -1.425478    .3758882
           f |   .4013125   .5146376     0.78   0.461     -.815612    1.618237
           g |  -.0357412    .569615    -0.06   0.952    -1.382667    1.311184
           h |   .3935343   .4369931     0.90   0.398    -.6397902    1.426859
       _cons |    2.76026   4.566565     0.60   0.565    -8.037949    13.55847
------------------------------------------------------------------------------

補足

import delimited "temperature_aug.csv", clear
quietly regress elec temp
predict res, residuals
quietly summarize res
display r(sum)
-.00039506
generate tempXres = temp * res
quietly summarize tempXres
display r(sum)
-.00851059