Chapter 13 Regression designs
## Parsed with column specification:
## cols(
## Cholesterol = col_double(),
## Age = col_double(),
## Sex = col_character(),
## SBP = col_double(),
## DBP = col_double(),
## CIG = col_double()
## )
The Framingham Heart Study [PDF] is one of the longest running health studies. It has followed original subjects, their children, and their grand children, looking for factors that affect cardiac health. These data only include subjects whose cholesterol was measured in the first exam.
Source: “Statistical Methods in Epidemiology” by H.A. Kahn and C.T. Sempos
For ease of use, the data linked below are a random selection of 500 from the total 1406 cases in the DASL data set.
13.1 Description
For the continuous tables, we present descriptive statistics.
## Non-numerical variable(s) ignored: Sex
Age | Cholesterol | CIG | DBP | SBP | |
---|---|---|---|---|---|
Mean | 52.26 | 233.98 | 8.26 | 90.40 | 148.17 |
Std.Dev | 4.78 | 47.93 | 11.39 | 13.72 | 27.84 |
Min | 45.00 | 121.00 | 0.00 | 52.00 | 90.00 |
Median | 52.00 | 230.00 | 0.00 | 90.00 | 142.00 |
Max | 62.00 | 430.00 | 50.00 | 140.00 | 290.00 |
N.Valid | 500.00 | 500.00 | 500.00 | 500.00 | 500.00 |
Pct.Valid | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
For the discrete variables, we present frequencies.
Freq | % Valid | % Valid Cum. | % Total | % Total Cum. | |
---|---|---|---|---|---|
FEM | 258 | 51.60 | 51.60 | 51.60 | 51.60 |
MALE | 242 | 48.40 | 100.00 | 48.40 | 100.00 |
<NA> | 0 | 0.00 | 100.00 | ||
Total | 500 | 100.00 | 100.00 | 100.00 | 100.00 |
13.2 Visualization
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
13.3 Regression analysis
Model statistics | |
---|---|
Multiple \(R^2\) | 0.049 |
Multiple \(R^2_{adj}\) | 0.040 |
\(F_{5,494}\) | 5.128 |
Sig. (\(p\)) | 0.000 |
Resid. SD | 46.972 |
Resid. df | 494 |
Regression coefficients | |||||
---|---|---|---|---|---|
Coef. | Std. Err. | \(t\) stat. | Sig. (\(p\)) | VIF | |
(Intercept) | 175.671 | 28.784 | 6.103 | 0.000 | |
Age | 0.668 | 0.465 | 1.434 | 0.152 | 1.120 |
SexMALE | −18.924 | 4.575 | −4.136 | 0.000 | 1.185 |
SBP | −0.037 | 0.120 | −0.307 | 0.759 | 2.523 |
DBP | 0.413 | 0.241 | 1.711 | 0.088 | 2.481 |
CIG | 0.082 | 0.202 | 0.407 | 0.684 | 1.196 |
13.3.1 Model checking
13.3.2 Sequential model testing
Model | Resid. SS | Resid. df | Add. SS | Add. df | \(F\) | Sig. (\(p\)) | AIC |
---|---|---|---|---|---|---|---|
Cholesterol ~ 1 | 1,146,509.712 | 499 | 5,291.751 | ||||
Cholesterol ~ Age | 1,144,509.033 | 498 | 2,000.679 | 1 | 0.907 | 0.341 | 5,292.878 |
Cholesterol ~ Age + Sex | 1,102,051.249 | 497 | 42,457.783 | 1 | 19.243 | 0.000 | 5,275.976 |
Cholesterol ~ Age + Sex + SBP + DBP | 1,090,300.348 | 495 | 11,750.901 | 2 | 2.663 | 0.071 | 5,274.616 |
Cholesterol ~ Age + Sex + SBP + DBP + CIG | 1,089,935.082 | 494 | 365.266 | 1 | 0.166 | 0.684 | 5,276.449 |
Intercept | +Age | +Sex | +BPr | +Cig | |
---|---|---|---|---|---|
(Intercept) | 233.98*** | 212.09*** | 214.46*** | 178.16*** | 175.67*** |
(2.14) | (23.56) | (23.14) | (28.10) | (28.78) | |
Age | 0.42 | 0.54 | 0.63 | 0.67 | |
(0.45) | (0.44) | (0.46) | (0.47) | ||
Sex (Male) | -18.48*** | -18.22*** | -18.92*** | ||
(4.22) | (4.23) | (4.58) | |||
Sys. Bld Pres. | -0.04 | -0.04 | |||
(0.12) | (0.12) | ||||
Dia. Bld Pres. | 0.41 | 0.41 | |||
(0.24) | (0.24) | ||||
Cig/day | 0.08 | ||||
(0.20) | |||||
R2 | 0.00 | 0.00 | 0.04 | 0.05 | 0.05 |
Adj. R2 | 0.00 | -0.00 | 0.03 | 0.04 | 0.04 |
Num. obs. | 500 | 500 | 500 | 500 | 500 |
***p < 0.001; **p < 0.01; *p < 0.05 |
×