第4章のStataコード
第4章 回帰分析の基礎
4.1 回帰分析の考え方
4.1.4 数値例:気温と電力使用量
"temperature_aug.csv", clear
import delimited
twoway (scatter temp time), ytitle(気温) xtitle(時刻)
4.1.6 ノンパラメトリック回帰の実行
summarize temp, detail
temp
-------------------------------------------------------------
Percentiles Smallest
1% 20.1 19.8
5% 21.4 19.8
10% 22.3 19.9 Obs 744
25% 25.65 19.9 Sum of wgt. 744
50% 28 Mean 27.66841
Largest Std. dev. 3.545453
75% 30.15 34.7
90% 32.3 35 Variance 12.57024
95% 33.2 35.2 Skewness -.2632042
99% 34.6 35.5 Kurtosis 2.405295
tabulate time, summarize(temp)
| Summary of temp
time | Mean Std. dev. Freq.
------------+------------------------------------
0 | 26.36129 2.7476872 31
1 | 26.096774 2.7306269 31
2 | 25.896774 2.6930139 31
3 | 25.835484 2.6072396 31
4 | 25.725806 2.6659542 31
5 | 25.832258 2.7607229 31
6 | 26.425806 3.0999965 31
7 | 27.258064 3.4144569 31
8 | 28.135484 3.5356798 31
9 | 29.038709 3.8001472 31
10 | 29.422581 3.8813838 31
11 | 29.622581 3.8733883 31
12 | 29.874194 4.0120625 31
13 | 30.03871 3.9090218 31
14 | 29.706452 3.4909345 31
15 | 29.309677 3.4641358 31
16 | 28.880645 3.3710949 31
17 | 28.290323 3.4171483 31
18 | 27.716129 3.3325764 31
19 | 27.454839 3.160679 31
20 | 27.187097 3.1749796 31
21 | 26.93871 3.0445227 31
22 | 26.629032 2.9762608 31
23 | 26.364516 2.8868147 31
------------+------------------------------------
Total | 27.668414 3.5454533 744
bysort time: egen mean_temp = mean(temp)
line mean_temp time, ytitle(気温) xtitle(時刻)
4.1.7 グラフを重ねる
twoway (scatter temp time) || (line mean_temp time), ytitle(気温) xtitle(時刻) legend(off)
4.2 単回帰分析
4.2.1 ノンパラメトリック回帰の限界
cor temp elec
(obs=744)
| temp elec
-------------+------------------
temp | 1.0000
elec | 0.7198 1.0000
bysort temp: egen mean_elec = mean(elec)
line mean_elec temp, ytitle(電気使用量(万kw)) xtitle(気温)
4.2.4 Rによる線形回帰分析
regress elec temp
Source | SS df MS Number of obs = 744
-------------+---------------------------------- F(1, 742) = 797.86
Model | 196449050 1 196449050 Prob > F = 0.0000
Residual | 182695164 742 246219.898 R-squared = 0.5181
-------------+---------------------------------- Adj R-squared = 0.5175
Total | 379144214 743 510288.309 Root MSE = 496.21
------------------------------------------------------------------------------
elec | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
temp | 145.0303 5.134467 28.25 0.000 134.9505 155.1101
_cons | -614.2722 143.2226 -4.29 0.000 -895.442 -333.1024
------------------------------------------------------------------------------
4.2.6 単回帰の図示
quietly regress elec temp
quietly regress elec temp
generate predicted = e(b)[1, 2] + e(b)[1, 1] * temp
twoway (scatter elec temp) || (line predicted temp), ytitle(電気使用量(万kw)) xtitle(気温) legend(off)
twoway (scatter elec temp) || (lfit elec temp), ytitle(電気使用量(万kw)) xtitle(気温) legend(off)
4.3 重回帰分析
4.3.1 説明変数を追加する
twoway (scatter elec time), ytitle(電気使用量(万kw)) xtitle(時刻) legend(off)
generate daytime = inrange(time, 9, 18)
generate elec100 = elec / 100
generate time12 = mod(time, 12)
generate ampm = 1 if time < 12 & !missing(time)
replace ampm = 2 if time >= 12 & !missing(time)
label define label_ampm 1 "a.m." 2 "p.m."
label values ampm label_ampm
list in 1/10, separator(0)
+------------------------------------------------------------------------------------------------------------+
| date time elec prec temp mean_t~p mean_e~c predic~d daytime elec100 time12 ampm |
|------------------------------------------------------------------------------------------------------------|
1. | 2014/8/28 2 2405 1.5 19.8 25.89677 2435 2257.328 0 24.05 2 a.m. |
2. | 2014/8/28 1 2465 1 19.8 26.09677 2435 2257.328 0 24.65 1 a.m. |
3. | 2014/8/28 6 2584 .5 19.9 26.42581 2506 2271.831 0 25.84 6 a.m. |
4. | 2014/8/28 4 2428 0 19.9 25.72581 2506 2271.831 0 24.28 4 a.m. |
5. | 2014/8/27 23 2818 1 20 26.36452 2818 2286.334 0 28.18 11 p.m. |
6. | 2014/8/28 5 2463 0 20.1 25.83226 2495 2300.837 0 24.63 5 a.m. |
7. | 2014/8/30 6 2420 1.5 20.1 26.42581 2495 2300.837 0 24.2 6 a.m. |
8. | 2014/8/28 0 2602 .5 20.1 26.36129 2495 2300.837 0 26.02 0 a.m. |
9. | 2014/8/27 22 2986 1 20.2 26.62903 2724.6 2315.34 0 29.86 10 p.m. |
10. | 2014/8/30 5 2406 2.5 20.2 25.83226 2724.6 2315.34 0 24.06 5 a.m. |
+------------------------------------------------------------------------------------------------------------+
4.3.2 重回帰分析
regress elec temp daytime
Source | SS df MS Number of obs = 744
-------------+---------------------------------- F(2, 741) = 675.82
Model | 244889502 2 122444751 Prob > F = 0.0000
Residual | 134254712 741 181180.448 R-squared = 0.6459
-------------+---------------------------------- Adj R-squared = 0.6449
Total | 379144214 743 510288.309 Root MSE = 425.65
------------------------------------------------------------------------------
elec | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
temp | 116.9782 4.726761 24.75 0.000 107.6988 126.2577
daytime | 555.4424 33.96961 16.35 0.000 488.7543 622.1306
_cons | -69.55031 127.2952 -0.55 0.585 -319.4525 180.3518
------------------------------------------------------------------------------
regress elec temp daytime prec
Source | SS df MS Number of obs = 744
-------------+---------------------------------- F(3, 740) = 449.97
Model | 244896760 3 81632253.3 Prob > F = 0.0000
Residual | 134247454 740 181415.478 R-squared = 0.6459
-------------+---------------------------------- Adj R-squared = 0.6445
Total | 379144214 743 510288.309 Root MSE = 425.93
------------------------------------------------------------------------------
elec | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
temp | 116.8024 4.810879 24.28 0.000 107.3578 126.247
daytime | 556.1478 34.17412 16.27 0.000 489.0581 623.2376
prec | -3.366072 16.82885 -0.20 0.842 -36.40405 29.67191
_cons | -64.50256 129.8536 -0.50 0.620 -319.4279 190.4228
------------------------------------------------------------------------------
generate sunday = (date == "2014/8/3")
replace sunday = (date == "2014/8/10")
replace sunday = (date == "2014/8/17")
replace sunday = (date == "2014/8/24")
replace sunday = (date == "2014/8/31")
list in 1/10, separator(0)
+---------------------------------------------------------------------------------------------------------------------+
| date time elec prec temp mean_t~p mean_e~c predic~d daytime elec100 time12 ampm sunday |
|---------------------------------------------------------------------------------------------------------------------|
1. | 2014/8/28 2 2405 1.5 19.8 25.89677 2435 2257.328 0 24.05 2 a.m. 0 |
2. | 2014/8/28 1 2465 1 19.8 26.09677 2435 2257.328 0 24.65 1 a.m. 0 |
3. | 2014/8/28 4 2428 0 19.9 25.72581 2506 2271.831 0 24.28 4 a.m. 0 |
4. | 2014/8/28 6 2584 .5 19.9 26.42581 2506 2271.831 0 25.84 6 a.m. 0 |
5. | 2014/8/27 23 2818 1 20 26.36452 2818 2286.334 0 28.18 11 p.m. 0 |
6. | 2014/8/28 0 2602 .5 20.1 26.36129 2495 2300.837 0 26.02 0 a.m. 0 |
7. | 2014/8/30 6 2420 1.5 20.1 26.42581 2495 2300.837 0 24.2 6 a.m. 0 |
8. | 2014/8/28 5 2463 0 20.1 25.83226 2495 2300.837 0 24.63 5 a.m. 0 |
9. | 2014/8/27 22 2986 1 20.2 26.62903 2724.6 2315.34 0 29.86 10 p.m. 0 |
10. | 2014/8/27 21 3026 1 20.2 26.93871 2724.6 2315.34 0 30.26 9 p.m. 0 |
+---------------------------------------------------------------------------------------------------------------------+
drop sunday
generate date2 = date(date, "YMD")
drop date
rename date2 date
generate dow = dow(date)
label define label_dow 0 "日" 1 "月" 2 "火" 3 "水" 4 "木" 5 "金" 6 "土"
label values dow label_dow
generate sunday = (dow == 0)
list in 1/10, separator(0)
+-----------------------------------------------------------------------------------------------------------------------+
| time elec prec temp mean_t~p mean_e~c predic~d daytime elec100 time12 ampm date dow sunday |
|-----------------------------------------------------------------------------------------------------------------------|
1. | 2 2405 1.5 19.8 25.89677 2435 2257.328 0 24.05 2 a.m. 19963 木 0 |
2. | 1 2465 1 19.8 26.09677 2435 2257.328 0 24.65 1 a.m. 19963 木 0 |
3. | 4 2428 0 19.9 25.72581 2506 2271.831 0 24.28 4 a.m. 19963 木 0 |
4. | 6 2584 .5 19.9 26.42581 2506 2271.831 0 25.84 6 a.m. 19963 木 0 |
5. | 23 2818 1 20 26.36452 2818 2286.334 0 28.18 11 p.m. 19962 水 0 |
6. | 0 2602 .5 20.1 26.36129 2495 2300.837 0 26.02 0 a.m. 19963 木 0 |
7. | 6 2420 1.5 20.1 26.42581 2495 2300.837 0 24.2 6 a.m. 19965 土 0 |
8. | 5 2463 0 20.1 25.83226 2495 2300.837 0 24.63 5 a.m. 19963 木 0 |
9. | 22 2986 1 20.2 26.62903 2724.6 2315.34 0 29.86 10 p.m. 19962 水 0 |
10. | 21 3026 1 20.2 26.93871 2724.6 2315.34 0 30.26 9 p.m. 19962 水 0 |
+-----------------------------------------------------------------------------------------------------------------------+
generate recess = inrange(date, td("11aug2014"), td("16aug2014"))
regress elec temp daytime prec sunday recess
Source | SS df MS Number of obs = 744
-------------+---------------------------------- F(5, 738) = 405.47
Model | 277960791 5 55592158.2 Prob > F = 0.0000
Residual | 101183423 738 137104.909 R-squared = 0.7331
-------------+---------------------------------- Adj R-squared = 0.7313
Total | 379144214 743 510288.309 Root MSE = 370.28
------------------------------------------------------------------------------
elec | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
temp | 113.4773 4.192549 27.07 0.000 105.2466 121.7081
daytime | 563.5284 29.71622 18.96 0.000 505.1899 621.8668
prec | 14.275 14.70092 0.97 0.332 -14.5856 43.1356
sunday | -448.392 38.12452 -11.76 0.000 -523.2375 -373.5466
recess | -438.2301 35.19863 -12.45 0.000 -507.3315 -369.1287
_cons | 179.0707 114.0884 1.57 0.117 -44.90582 403.0471
------------------------------------------------------------------------------
display (179.0707 + 113.4773 * 28 + 563.5284 * 1 + 14.275 * 0 - 448.392 * 0 - 438.2301 * 0)
3919.9635
quietly regress elec temp daytime prec sunday recess
matrix yhat = e(b) * [28, 1, 0, 0, 0, 1]'
display yhat[1, 1]
3919.9643
quietly regress elec temp daytime prec sunday recess
adjust temp = 28 daytime = 1 prec = 0 sunday = 0 recess = 0
------------------------------------------------------------------------------------------------------
Dependent variable: elec Command: regress
Covariates set to value: temp = 28, daytime = 1, prec = 0, sunday = 0, recess = 0
------------------------------------------------------------------------------------------------------
----------------------
All | xb
----------+-----------
| 3919.96
----------------------
Key: xb = Linear Prediction
4.4 決定係数と回帰分析
4.4.2 決定係数の出力
regress elec temp daytime prec sunday recess
Source | SS df MS Number of obs = 744
-------------+---------------------------------- F(5, 738) = 405.47
Model | 277960791 5 55592158.2 Prob > F = 0.0000
Residual | 101183423 738 137104.909 R-squared = 0.7331
-------------+---------------------------------- Adj R-squared = 0.7313
Total | 379144214 743 510288.309 Root MSE = 370.28
------------------------------------------------------------------------------
elec | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
temp | 113.4773 4.192549 27.07 0.000 105.2466 121.7081
daytime | 563.5284 29.71622 18.96 0.000 505.1899 621.8668
prec | 14.275 14.70092 0.97 0.332 -14.5856 43.1356
sunday | -448.392 38.12452 -11.76 0.000 -523.2375 -373.5466
recess | -438.2301 35.19863 -12.45 0.000 -507.3315 -369.1287
_cons | 179.0707 114.0884 1.57 0.117 -44.90582 403.0471
------------------------------------------------------------------------------
display e(r2)
.73312682
regress elec temp
Source | SS df MS Number of obs = 744
-------------+---------------------------------- F(1, 742) = 797.86
Model | 196449050 1 196449050 Prob > F = 0.0000
Residual | 182695164 742 246219.898 R-squared = 0.5181
-------------+---------------------------------- Adj R-squared = 0.5175
Total | 379144214 743 510288.309 Root MSE = 496.21
------------------------------------------------------------------------------
elec | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
temp | 145.0303 5.134467 28.25 0.000 134.9505 155.1101
_cons | -614.2722 143.2226 -4.29 0.000 -895.442 -333.1024
------------------------------------------------------------------------------
display e(r2)
.51813807
4.4.3 決定係数は「モデルの正しさ」を保証しない
clear
set seed 2022
set obs 15
generate a = ceil(runiform(0, 6))
generate b = ceil(runiform(0, 6))
generate c = ceil(runiform(0, 6))
generate d = ceil(runiform(0, 6))
generate e = ceil(runiform(0, 6))
generate f = ceil(runiform(0, 6))
generate g = ceil(runiform(0, 6))
regress a b c d e f g
Source | SS df MS Number of obs = 15
-------------+---------------------------------- F(6, 8) = 0.70
Model | 13.6670698 6 2.27784497 Prob > F = 0.6592
Residual | 26.0662635 8 3.25828294 R-squared = 0.3440
-------------+---------------------------------- Adj R-squared = -0.1481
Total | 39.7333333 14 2.83809524 Root MSE = 1.8051
------------------------------------------------------------------------------
a | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
b | -.4272472 .3836928 -1.11 0.298 -1.312044 .4575499
c | .1912033 .5685463 0.34 0.745 -1.119867 1.502274
d | .1235303 .344219 0.36 0.729 -.6702402 .9173007
e | -.3732129 .3376293 -1.11 0.301 -1.151788 .4053617
f | .6556865 .4250799 1.54 0.162 -.3245496 1.635923
g | -.1543043 .5476053 -0.28 0.785 -1.417084 1.108476
_cons | 2.940337 4.507969 0.65 0.533 -7.455059 13.33573
------------------------------------------------------------------------------
generate h = ceil(runiform(0, 6))
regress a b c d e f g h
Source | SS df MS Number of obs = 15
-------------+---------------------------------- F(7, 7) = 0.70
Model | 16.3734493 7 2.33906419 Prob > F = 0.6746
Residual | 23.359884 7 3.33712629 R-squared = 0.4121
-------------+---------------------------------- Adj R-squared = -0.1758
Total | 39.7333333 14 2.83809524 Root MSE = 1.8268
------------------------------------------------------------------------------
a | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
b | -.3092509 .4098178 -0.75 0.475 -1.278316 .6598142
c | .1101777 .5823762 0.19 0.855 -1.266923 1.487278
d | .0266849 .3645802 0.07 0.944 -.8354103 .8887801
e | -.524795 .3808991 -1.38 0.211 -1.425478 .3758882
f | .4013125 .5146376 0.78 0.461 -.815612 1.618237
g | -.0357412 .569615 -0.06 0.952 -1.382667 1.311184
h | .3935343 .4369931 0.90 0.398 -.6397902 1.426859
_cons | 2.76026 4.566565 0.60 0.565 -8.037949 13.55847
------------------------------------------------------------------------------
補足
"temperature_aug.csv", clear
import delimited quietly regress elec temp
predict res, residuals
quietly summarize res
display r(sum)
-.00039506
generate tempXres = temp * res
quietly summarize tempXres
display r(sum)
-.00851059