Chapter 13 Regression designs

## Parsed with column specification:
## cols(
##   Cholesterol = col_double(),
##   Age = col_double(),
##   Sex = col_character(),
##   SBP = col_double(),
##   DBP = col_double(),
##   CIG = col_double()
## )

DASL

The Framingham Heart Study [PDF] is one of the longest running health studies. It has followed original subjects, their children, and their grand children, looking for factors that affect cardiac health. These data only include subjects whose cholesterol was measured in the first exam.

Source: “Statistical Methods in Epidemiology” by H.A. Kahn and C.T. Sempos

For ease of use, the data linked below are a random selection of 500 from the total 1406 cases in the DASL data set.

13.1 Description

For the continuous tables, we present descriptive statistics.

## Non-numerical variable(s) ignored: Sex
  Age Cholesterol CIG DBP SBP
Mean 52.26 233.98 8.26 90.40 148.17
Std.Dev 4.78 47.93 11.39 13.72 27.84
Min 45.00 121.00 0.00 52.00 90.00
Median 52.00 230.00 0.00 90.00 142.00
Max 62.00 430.00 50.00 140.00 290.00
N.Valid 500.00 500.00 500.00 500.00 500.00
Pct.Valid 100.00 100.00 100.00 100.00 100.00

For the discrete variables, we present frequencies.

  Freq % Valid % Valid Cum. % Total % Total Cum.
FEM 258 51.60 51.60 51.60 51.60
MALE 242 48.40 100.00 48.40 100.00
<NA> 0 0.00 100.00
Total 500 100.00 100.00 100.00 100.00

13.2 Visualization

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Relationships between key variables.

Figure 13.1: Relationships between key variables.

13.3 Regression analysis

Model statistics
Multiple \(R^2\) 0.049
Multiple \(R^2_{adj}\) 0.040
\(F_{5,494}\) 5.128
Sig. (\(p\)) 0.000
Resid. SD 46.972
Resid. df 494
Regression coefficients
Coef. Std. Err. \(t\) stat. Sig. (\(p\)) VIF
(Intercept) 175.671 28.784 6.103 0.000
Age 0.668 0.465 1.434 0.152 1.120
SexMALE −18.924 4.575 −4.136 0.000 1.185
SBP −0.037 0.120 −0.307 0.759 2.523
DBP 0.413 0.241 1.711 0.088 2.481
CIG 0.082 0.202 0.407 0.684 1.196

13.3.1 Model checking

13.3.2 Sequential model testing

Model Resid. SS Resid. df Add. SS Add. df \(F\) Sig. (\(p\)) AIC
Cholesterol ~ 1 1,146,509.712 499 5,291.751
Cholesterol ~ Age 1,144,509.033 498 2,000.679 1 0.907 0.341 5,292.878
Cholesterol ~ Age + Sex 1,102,051.249 497 42,457.783 1 19.243 0.000 5,275.976
Cholesterol ~ Age + Sex + SBP + DBP 1,090,300.348 495 11,750.901 2 2.663 0.071 5,274.616
Cholesterol ~ Age + Sex + SBP + DBP + CIG 1,089,935.082 494 365.266 1 0.166 0.684 5,276.449
Statistical models
  Intercept +Age +Sex +BPr +Cig
(Intercept) 233.98*** 212.09*** 214.46*** 178.16*** 175.67***
  (2.14) (23.56) (23.14) (28.10) (28.78)
Age   0.42 0.54 0.63 0.67
    (0.45) (0.44) (0.46) (0.47)
Sex (Male)     -18.48*** -18.22*** -18.92***
      (4.22) (4.23) (4.58)
Sys. Bld Pres.       -0.04 -0.04
        (0.12) (0.12)
Dia. Bld Pres.       0.41 0.41
        (0.24) (0.24)
Cig/day         0.08
          (0.20)
R2 0.00 0.00 0.04 0.05 0.05
Adj. R2 0.00 -0.00 0.03 0.04 0.04
Num. obs. 500 500 500 500 500
***p < 0.001; **p < 0.01; *p < 0.05