第3章のStataコード

第3章 確率論の基礎

サンプルデータ

wage.csv:男性労働者の賃金データ.

3.1 事象と確率

3.1.3 コイン投げのシミュレーション

clear
display exp(lnfactorial(100)) / (exp(lnfactorial(50))^2) / (2^100)
.07958924
clear
set seed 2022
set obs 100

generate Z = round(runiform(0, 1))
display r(sum)
43
postfile sim sums using sums100, replace

forvalues i = 1/100000 {
    quietly capture drop x
    quietly set obs 100
    quietly generate x = round(runiform(0, 1))
    quietly summarize x
    post sim (r(sum))
}

postclose sim

use sums100, clear
histogram sums, bin(15)

summarize sums
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        sums |    100,000    49.97471    5.000654         26         72

3.2 確率変数と独立性

3.2.4 独立性と相関係数

clear
set seed 2022
set obs 100000

generate X = rnormal(50, 10)
generate Y = rnormal(50, 10)
cor X Y
(obs=100,000)

             |        X        Y
-------------+------------------
           X |   1.0000
           Y |   0.0012   1.0000
generate Z = -((X - 50)^2) / 10
cor X Z
(obs=100,000)

             |        X        Z
-------------+------------------
           X |   1.0000
           Z |   0.0122   1.0000
twoway (scatter Z X)

3.3 期待値と条件付期待値

3.3.1 分布関数

clear
set obs 1001

generate x = (_n - 1) / 10
generate z = (x - 50) / 10
generate y = normal(z)

line y x

display normal(1) -  normal(-1)
.68268949

3.3.2 確率密度関数

display normal(0.01) -  normal(0)
.00398936
display normalden(50, 50, 10)
.03989423
display normalden(80, 50, 10)
.00044318
clear
set obs 1001

generate x = (_n - 1) / 10
generate y = normalden(x, 50, 10)

line y x

3.3.7 データによる条件付期待値の推定

import delimited "wage.csv", clear

list in 1/10, separator(0)
     +---------------------+
     | educ   exper   wage |
     |---------------------|
  1. |    7      16    548 |
  2. |   12       9    481 |
  3. |   12      16    721 |
  4. |   11      10    250 |
  5. |   12      16    729 |
  6. |   12       8    500 |
  7. |   18       9    565 |
  8. |   14       9    608 |
  9. |   12      10    425 |
 10. |   12      11    515 |
     +---------------------+
summarize
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        educ |      3,010    13.26346    2.676913          1         18
       exper |      3,010    8.856146    4.141672          0         23
        wage |      3,010    577.2824    262.9583        100       2404
graph matrix educ exper wage, msize(1pt)

quietly summarize wage if educ == 12
display r(mean)
563.53427
quietly summarize wage if educ == 16
display r(mean)
642.89325
clear
set seed 2022

postfile sim Zn using muhat10000, replace

forvalues i = 1/10000 {
    quietly capture drop X
    quietly set obs 10000
    quietly generate X = rnormal(50, 10)
    quietly summarize X
    post sim (sqrt(_n) * (r(mean) - 50) / sqrt(r(Var)))
}

postclose sim

use muhat10000, clear
histogram Zn, bin(15)

summarize Zn
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          Zn |     10,000   -.0000216    .0099583  -.0346809   .0348891

3.4 中心極限定理

3.4.3 信頼区間のシミュレーション

clear
set seed 2022

postfile sim rec using ci10000, replace

forvalues i = 1/10000 {
    quietly capture drop X
    quietly set obs 10000
    quietly generate X = rnormal(50, 10)
    quietly summarize X
    post sim ((r(mean) - 1.96 * sqrt(r(Var) / _N) < 50) * (50 < r(mean) + 1.96 * sqrt(r(Var) / _N)))
}

postclose sim

use ci10000, clear
summarize rec
display r(mean)
.9494

3.4.4 信頼区間の導出

display normal((69.6 - 50) / 10) - normal((30.4 - 50) / 10) 
.95000421