在一次关于某城镇居民上下班使用交通工具的社会调查中,因变量y =1表示居民主要乘坐公共汽车上下班;y=0表示主要骑自行车上下班;自变量x1表示被调查者的年龄;x2表示被调查者的月收入;x3表示被调查者的性别(x3=1为男性,x3=0为女性)
交通工具 |
年龄 |
月收入 |
性别 |
交通工具 |
年龄 |
月收入 |
性别 |
0 |
18 |
850 |
0 |
0 |
20 |
1000 |
1 |
0 |
21 |
1200 |
0 |
0 |
25 |
1200 |
1 |
1 |
23 |
850 |
0 |
0 |
27 |
1300 |
1 |
1 |
23 |
950 |
0 |
0 |
28 |
1500 |
1 |
1 |
28 |
1200 |
0 |
1 |
30 |
950 |
1 |
0 |
31 |
850 |
0 |
0 |
32 |
1000 |
1 |
1 |
36 |
1500 |
0 |
0 |
33 |
1800 |
1 |
1 |
42 |
1000 |
0 |
0 |
33 |
1000 |
1 |
1 |
46 |
950 |
0 |
0 |
38 |
1200 |
1 |
0 |
48 |
1200 |
0 |
0 |
41 |
1500 |
1 |
1 |
55 |
1800 |
0 |
1 |
45 |
1800 |
1 |
1 |
56 |
2100 |
0 |
0 |
48 |
1000 |
1 |
1 |
58 |
1800 |
0 |
1 |
52 |
1500 |
1 |
0 |
18 |
850 |
1 |
1 |
56 |
1800 |
1 |
使用极大似然法估计模型

中的各个参数:
y:=array(0.00,0.00,1.00,1.00,1.00,0.00,1.00,1.00,1.00,0.00,1.00,1.00,1.00,0.00,0.00,0.00,0.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,1.00,0.00,1.00,1.00);
x:=`array((18,21,23,23,28,31,36,42,46,48,55,56,58,18,20,25,27,28,30,32,33,33,38,41,45,48,52,56),(850,1200,850,950,1200,850,1500,1000,950,1200,1800,2100,1800,850,1000,1200,1300,1500,950,1000,1800,1000,1200,1500,1800,1000,1500,1800),(0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1));
constant:=true;
alpha:=0.05;
ret:=Regress_Binary(y,x,"logit",3,constant,alpha);
结果:
下面我们继续看看样本内的误判率:
a:=(ones(length(x))|x):*`ret["Coefficient"];
a::=1/(1+exp(-mcell));
a:=a[:,0];
a::=mcell>=0.5;
//预测值
//return y|a;
b:=y-a;
//误判的为0或1
b::=abs(mcell);
return sum(b)/length(b);
得到的结果:
阈值设为0.5,误判率是0.17857
Regress_CMLS Regression Regress_NLM