I’m going to “skip” this step because you can trust all the data is clean in this case and I want to get down to business. I’m going to develop a model with the outcome of interest being whether the patient was examined by the ED physician or not and the primary exposure of interest being the patients presenting complaint/condition. The presenting complaint/conditions are coded as four indicator variables because patients, as you know, can have more than one complaint/condition: headache, neurological deficit, vision complaint , or diastolic blood pressure >120 mmHg. The other variables of interest (i.e., potential confounders and interacting variables) are the examination method available (which depended on the phase of the study), sex, race, age, body mass index, and mean arterial pressure.
The outcome is dichotomous so I’ll use a logistic regression. In R, I usually use the rms
library. The data is found in the adm
package.
library(rms)
## Loading required package: Hmisc
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
## Loading required package: SparseM
## Loading required package: methods
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
##
## backsolve
library(adm)
##
## Attaching package: 'adm'
## The following object is masked from 'package:Hmisc':
##
## html
head(edfun)
## cc.ha cc.neuro cc.vision cc.bp phase sex race.black age bmi map examined
## 1 Yes No No No 1 M black 38 32 110 No
## 2 Yes No No No 1 W black 51 31 121 No
## 3 Yes Yes No No 1 W other/unknown 28 19 108 No
## 4 No No Yes No 1 W black 52 64 110 No
## 5 No Yes No No 1 M black 44 37 104 No
## 6 No Yes No No 1 W black 75 24 148 No
Is there any missing data in the variables we care about?
all(complete.cases(edfun))
## [1] TRUE
No; so with that out of the way, let’s now look for non-linearity:
(m <- lrm(examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + phase + sex +
race.black + rcs(age, 5) + rcs(bmi, 5) + rcs(map, 5), edfun))
## Logistic Regression Model
##
## lrm(formula = examined ~ cc.ha + cc.neuro + cc.vision + cc.bp +
## phase + sex + race.black + rcs(age, 5) + rcs(bmi, 5) + rcs(map,
## 5), data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 268.34 R2 0.428 C 0.832
## No 417 d.f. 19 g 1.763 Dxy 0.664
## Yes 287 Pr(> chi2) <0.0001 gr 5.831 gamma 0.666
## max |deriv| 5e-08 gp 0.326 tau-a 0.321
## Brier 0.159
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -3.6022 4.0638 -0.89 0.3754
## cc.ha=Yes -0.1888 0.2788 -0.68 0.4983
## cc.neuro=Yes -0.6583 0.2854 -2.31 0.0211
## cc.vision=Yes 0.7543 0.2818 2.68 0.0074
## cc.bp=Yes -0.4267 0.6580 -0.65 0.5166
## phase=2 2.8759 0.2206 13.04 <0.0001
## sex=W 0.1871 0.2186 0.86 0.3921
## race.black=black -0.0312 0.2047 -0.15 0.8787
## age -0.1100 0.0430 -2.56 0.0105
## age' 0.7562 0.3223 2.35 0.0190
## age'' -2.0778 0.9999 -2.08 0.0377
## age''' 1.8010 1.1553 1.56 0.1190
## bmi -0.0298 0.0986 -0.30 0.7624
## bmi' 0.3962 1.4378 0.28 0.7829
## bmi'' -0.4190 4.0608 -0.10 0.9178
## bmi''' -0.7846 4.1680 -0.19 0.8507
## map 0.0578 0.0382 1.51 0.1308
## map' -0.6494 0.4397 -1.48 0.1397
## map'' 2.7494 1.7229 1.60 0.1105
## map''' -3.6201 2.1319 -1.70 0.0895
##
anova(m)
## Wald Statistics Response: examined
##
## Factor Chi-Square d.f. P
## cc.ha 0.46 1 0.4983
## cc.neuro 5.32 1 0.0211
## cc.vision 7.16 1 0.0074
## cc.bp 0.42 1 0.5166
## phase 169.93 1 <.0001
## sex 0.73 1 0.3921
## race.black 0.02 1 0.8787
## age 8.52 4 0.0743
## Nonlinear 7.49 3 0.0577
## bmi 4.10 4 0.3924
## Nonlinear 4.10 3 0.2507
## map 5.80 4 0.2149
## Nonlinear 4.50 3 0.2123
## TOTAL NONLINEAR 16.64 9 0.0547
## TOTAL 180.55 19 <.0001
Note that age is close to being non-linear and that the “total” nonlinearity is also nearly significant. I’m going to choose to get rid of the non-linear term in this model to keep it “simple.” You could choose to be extra careful about confounding control and keep age as non-linear for now.
Next, collinearity:
library(perturb) # helps print with colldiag.alt - install if don't have
(m <- lrm(examined ~ cc.ha + cc.neuro + cc.vision + cc.bp +
phase + sex + race.black + age + bmi + map, edfun))
## Logistic Regression Model
##
## lrm(formula = examined ~ cc.ha + cc.neuro + cc.vision + cc.bp +
## phase + sex + race.black + age + bmi + map, data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 250.87 R2 0.404 C 0.823
## No 417 d.f. 10 g 1.626 Dxy 0.646
## Yes 287 Pr(> chi2) <0.0001 gr 5.085 gamma 0.648
## max |deriv| 4e-09 gp 0.313 tau-a 0.313
## Brier 0.163
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -2.5260 0.7638 -3.31 0.0009
## cc.ha=Yes -0.1508 0.2712 -0.56 0.5783
## cc.neuro=Yes -0.5987 0.2761 -2.17 0.0301
## cc.vision=Yes 0.7548 0.2768 2.73 0.0064
## cc.bp=Yes -1.0076 0.5493 -1.83 0.0666
## phase=2 2.8054 0.2137 13.13 <0.0001
## sex=W 0.1469 0.2095 0.70 0.4830
## race.black=black 0.0689 0.1960 0.35 0.7253
## age -0.0059 0.0060 -0.98 0.3251
## bmi -0.0029 0.0120 -0.24 0.8102
## map 0.0088 0.0063 1.39 0.1636
##
colldiag.alt(m)
## Condition
## Index Variance Decomposition Proportions
## Intercept cc.ha=Yes cc.neuro=Yes cc.vision=Yes cc.bp=Yes phase=2
## 1 1.000 0.000 0.002 0.002 0.002 0.001 0.004
## 2 2.614 0.000 0.002 0.004 0.109 0.367 0.004
## 3 2.716 0.000 0.010 0.174 0.193 0.041 0.018
## 4 3.021 0.000 0.090 0.089 0.235 0.058 0.002
## 5 4.156 0.000 0.009 0.001 0.000 0.005 0.022
## 6 5.012 0.000 0.003 0.075 0.026 0.016 0.514
## 7 5.408 0.002 0.063 0.043 0.033 0.001 0.337
## 8 7.293 0.000 0.456 0.437 0.219 0.136 0.036
## 9 9.648 0.004 0.194 0.061 0.067 0.060 0.014
## 10 14.346 0.100 0.141 0.090 0.075 0.140 0.004
## 11 27.179 0.893 0.030 0.023 0.041 0.174 0.044
## sex=W race.black=black age bmi map
## 1 0.004 0.005 0.002 0.001 0.000
## 2 0.000 0.001 0.000 0.000 0.000
## 3 0.000 0.000 0.000 0.000 0.000
## 4 0.007 0.001 0.001 0.000 0.000
## 5 0.000 0.884 0.011 0.000 0.001
## 6 0.417 0.020 0.000 0.000 0.000
## 7 0.542 0.005 0.022 0.005 0.003
## 8 0.016 0.003 0.295 0.003 0.002
## 9 0.005 0.077 0.337 0.496 0.003
## 10 0.001 0.001 0.329 0.478 0.189
## 11 0.007 0.002 0.003 0.016 0.801
Only mean arterial pressure and the intercept appear collinear so we will proceed.
(m <- lrm(examined ~ (cc.ha + cc.neuro + cc.vision + cc.bp) * (phase + sex +
race.black + age + bmi + map), edfun))
## Logistic Regression Model
##
## lrm(formula = examined ~ (cc.ha + cc.neuro + cc.vision + cc.bp) *
## (phase + sex + race.black + age + bmi + map), data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 282.03 R2 0.445 C 0.839
## No 417 d.f. 34 g 1.843 Dxy 0.679
## Yes 287 Pr(> chi2) <0.0001 gr 6.315 gamma 0.680
## max |deriv| 3e-05 gp 0.331 tau-a 0.328
## Brier 0.154
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -0.6725 2.2227 -0.30 0.7622
## cc.ha=Yes -2.5957 2.0535 -1.26 0.2062
## cc.neuro=Yes -3.0118 2.3282 -1.29 0.1958
## cc.vision=Yes 0.6559 2.2422 0.29 0.7699
## cc.bp=Yes 0.8323 5.3171 0.16 0.8756
## phase=2 2.6843 0.6622 4.05 <0.0001
## sex=W -0.9600 0.7706 -1.25 0.2128
## race.black=black 0.6558 0.6662 0.98 0.3249
## age -0.0073 0.0205 -0.36 0.7217
## bmi -0.0334 0.0456 -0.73 0.4631
## map 0.0070 0.0192 0.36 0.7169
## cc.ha=Yes * phase=2 0.4048 0.6113 0.66 0.5079
## cc.ha=Yes * sex=W 1.0288 0.7222 1.42 0.1543
## cc.ha=Yes * race.black=black -0.6535 0.6219 -1.05 0.2933
## cc.ha=Yes * age -0.0064 0.0188 -0.34 0.7353
## cc.ha=Yes * bmi 0.0141 0.0431 0.33 0.7440
## cc.ha=Yes * map 0.0144 0.0174 0.83 0.4072
## cc.neuro=Yes * phase=2 -0.1191 0.6524 -0.18 0.8551
## cc.neuro=Yes * sex=W 1.4851 0.7690 1.93 0.0535
## cc.neuro=Yes * race.black=black 0.1222 0.6374 0.19 0.8479
## cc.neuro=Yes * age 0.0197 0.0196 1.01 0.3133
## cc.neuro=Yes * bmi 0.0569 0.0443 1.29 0.1984
## cc.neuro=Yes * map -0.0130 0.0191 -0.68 0.4964
## cc.vision=Yes * phase=2 0.0950 0.6256 0.15 0.8793
## cc.vision=Yes * sex=W 0.3248 0.7073 0.46 0.6461
## cc.vision=Yes * race.black=black -1.2265 0.6401 -1.92 0.0554
## cc.vision=Yes * age -0.0269 0.0198 -1.36 0.1729
## cc.vision=Yes * bmi 0.0308 0.0412 0.75 0.4546
## cc.vision=Yes * map 0.0053 0.0188 0.28 0.7757
## cc.bp=Yes * phase=2 0.2486 1.1550 0.22 0.8296
## cc.bp=Yes * sex=W 0.6239 1.1956 0.52 0.6018
## cc.bp=Yes * race.black=black 0.7420 1.2434 0.60 0.5507
## cc.bp=Yes * age 0.0535 0.0345 1.55 0.1215
## cc.bp=Yes * bmi 0.0518 0.0645 0.80 0.4220
## cc.bp=Yes * map -0.0482 0.0368 -1.31 0.1907
##
anova(m)
## Wald Statistics Response: examined
##
## Factor Chi-Square d.f. P
## cc.ha (Factor+Higher Order Factors) 4.21 7 0.7552
## All Interactions 3.56 6 0.7357
## cc.neuro (Factor+Higher Order Factors) 10.80 7 0.1474
## All Interactions 6.70 6 0.3492
## cc.vision (Factor+Higher Order Factors) 10.74 7 0.1502
## All Interactions 5.92 6 0.4319
## cc.bp (Factor+Higher Order Factors) 4.93 7 0.6681
## All Interactions 4.36 6 0.6283
## phase (Factor+Higher Order Factors) 168.88 5 <.0001
## All Interactions 1.05 4 0.9028
## sex (Factor+Higher Order Factors) 5.50 5 0.3577
## All Interactions 4.90 4 0.2982
## race.black (Factor+Higher Order Factors) 8.03 5 0.1545
## All Interactions 7.72 4 0.1024
## age (Factor+Higher Order Factors) 11.23 5 0.0469
## All Interactions 10.39 4 0.0344
## bmi (Factor+Higher Order Factors) 3.11 5 0.6823
## All Interactions 3.11 4 0.5392
## map (Factor+Higher Order Factors) 8.32 5 0.1397
## All Interactions 6.07 4 0.1939
## cc.ha * phase (Factor+Higher Order Factors) 0.44 1 0.5079
## cc.ha * sex (Factor+Higher Order Factors) 2.03 1 0.1543
## cc.ha * race.black (Factor+Higher Order Factors) 1.10 1 0.2933
## cc.ha * age (Factor+Higher Order Factors) 0.11 1 0.7353
## cc.ha * bmi (Factor+Higher Order Factors) 0.11 1 0.7440
## cc.ha * map (Factor+Higher Order Factors) 0.69 1 0.4072
## cc.neuro * phase (Factor+Higher Order Factors) 0.03 1 0.8551
## cc.neuro * sex (Factor+Higher Order Factors) 3.73 1 0.0535
## cc.neuro * race.black (Factor+Higher Order Factors) 0.04 1 0.8479
## cc.neuro * age (Factor+Higher Order Factors) 1.02 1 0.3133
## cc.neuro * bmi (Factor+Higher Order Factors) 1.65 1 0.1984
## cc.neuro * map (Factor+Higher Order Factors) 0.46 1 0.4964
## cc.vision * phase (Factor+Higher Order Factors) 0.02 1 0.8793
## cc.vision * sex (Factor+Higher Order Factors) 0.21 1 0.6461
## cc.vision * race.black (Factor+Higher Order Factors) 3.67 1 0.0554
## cc.vision * age (Factor+Higher Order Factors) 1.86 1 0.1729
## cc.vision * bmi (Factor+Higher Order Factors) 0.56 1 0.4546
## cc.vision * map (Factor+Higher Order Factors) 0.08 1 0.7757
## cc.bp * phase (Factor+Higher Order Factors) 0.05 1 0.8296
## cc.bp * sex (Factor+Higher Order Factors) 0.27 1 0.6018
## cc.bp * race.black (Factor+Higher Order Factors) 0.36 1 0.5507
## cc.bp * age (Factor+Higher Order Factors) 2.40 1 0.1215
## cc.bp * bmi (Factor+Higher Order Factors) 0.64 1 0.4220
## cc.bp * map (Factor+Higher Order Factors) 1.71 1 0.1907
## TOTAL INTERACTION 27.55 24 0.2796
## TOTAL 184.20 34 <.0001
There are a couple of significant second order terms race and vision complaint and sex and neuro complaint. Make sure there is no evidence of additional interaction if we reduce the model to just keep those two terms:
(m2 <- lrm(examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + phase +
sex + race.black + age + bmi + map + race.black * cc.vision +
sex * cc.neuro, edfun))
## Logistic Regression Model
##
## lrm(formula = examined ~ cc.ha + cc.neuro + cc.vision + cc.bp +
## phase + sex + race.black + age + bmi + map + race.black *
## cc.vision + sex * cc.neuro, data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 257.61 R2 0.413 C 0.826
## No 417 d.f. 12 g 1.682 Dxy 0.651
## Yes 287 Pr(> chi2) <0.0001 gr 5.377 gamma 0.653
## max |deriv| 2e-08 gp 0.317 tau-a 0.315
## Brier 0.162
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -2.4223 0.7900 -3.07 0.0022
## cc.ha=Yes -0.1946 0.2736 -0.71 0.4769
## cc.neuro=Yes -1.1675 0.4223 -2.76 0.0057
## cc.vision=Yes 1.1042 0.3618 3.05 0.0023
## cc.bp=Yes -1.0681 0.5526 -1.93 0.0533
## phase=2 2.8326 0.2167 13.07 <0.0001
## sex=W -0.0901 0.2567 -0.35 0.7255
## race.black=black 0.2395 0.2186 1.10 0.2732
## age -0.0061 0.0061 -1.00 0.3178
## bmi -0.0005 0.0121 -0.04 0.9646
## map 0.0082 0.0064 1.28 0.1994
## cc.vision=Yes * race.black=black -0.8202 0.4679 -1.75 0.0796
## cc.neuro=Yes * sex=W 0.7813 0.4409 1.77 0.0764
##
# non-significant likelihood ratio test
# between the two models so ok to reduce
lrtest(m, m2)
##
## Model 1: examined ~ (cc.ha + cc.neuro + cc.vision + cc.bp) * (phase +
## sex + race.black + age + bmi + map)
## Model 2: examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + phase + sex +
## race.black + age + bmi + map + race.black * cc.vision + sex *
## cc.neuro
##
## L.R. Chisq d.f. P
## 24.4228124 22.0000000 0.3254631
# even though the two terms are non-sig at 0.05 they are at 0.1,
# and in addition total interaction is significant at 0.05 level now
anova(m2)
## Wald Statistics Response: examined
##
## Factor Chi-Square d.f. P
## cc.ha 0.51 1 0.4769
## cc.neuro (Factor+Higher Order Factors) 7.86 2 0.0196
## All Interactions 3.14 1 0.0764
## cc.vision (Factor+Higher Order Factors) 9.39 2 0.0092
## All Interactions 3.07 1 0.0796
## cc.bp 3.74 1 0.0533
## phase 170.92 1 <.0001
## sex (Factor+Higher Order Factors) 3.78 2 0.1514
## All Interactions 3.14 1 0.0764
## race.black (Factor+Higher Order Factors) 3.21 2 0.2006
## All Interactions 3.07 1 0.0796
## age 1.00 1 0.3178
## bmi 0.00 1 0.9646
## map 1.65 1 0.1994
## cc.vision * race.black (Factor+Higher Order Factors) 3.07 1 0.0796
## cc.neuro * sex (Factor+Higher Order Factors) 3.14 1 0.0764
## TOTAL INTERACTION 6.63 2 0.0364
## TOTAL 179.41 12 <.0001
Now compare the adjusted ORs for the complaints (but we have to deal with the interactions too in this case because they effect the exposures of interest) with the model where age, BMI, and MAP are removed. Kept phase because it is such an important variable.
(m3 <- lrm(examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + phase +
sex + race.black + race.black * cc.vision + sex * cc.neuro, edfun))
## Logistic Regression Model
##
## lrm(formula = examined ~ cc.ha + cc.neuro + cc.vision + cc.bp +
## phase + sex + race.black + race.black * cc.vision + sex *
## cc.neuro, data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 255.53 R2 0.411 C 0.823
## No 417 d.f. 9 g 1.662 Dxy 0.646
## Yes 287 Pr(> chi2) <0.0001 gr 5.271 gamma 0.666
## max |deriv| 1e-10 gp 0.315 tau-a 0.312
## Brier 0.162
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -1.8670 0.3763 -4.96 <0.0001
## cc.ha=Yes -0.1528 0.2696 -0.57 0.5708
## cc.neuro=Yes -1.1932 0.4204 -2.84 0.0045
## cc.vision=Yes 1.0887 0.3607 3.02 0.0025
## cc.bp=Yes -0.6598 0.4567 -1.44 0.1486
## phase=2 2.8075 0.2148 13.07 <0.0001
## sex=W -0.1193 0.2534 -0.47 0.6377
## race.black=black 0.2728 0.2145 1.27 0.2035
## cc.vision=Yes * race.black=black -0.7835 0.4659 -1.68 0.0927
## cc.neuro=Yes * sex=W 0.8270 0.4374 1.89 0.0587
##
You can eyeball that the only one that is an issue is the DBP > 120 condition. Nothing else will be confounded if we get rid age, BMI, and MAP. So, now, I’m going to look just at the DBP > 120 coefficient for all subsets of age, BMI, and MAP:
# create all subsets of age, bmi, map
# combn(list, n) creates the permutations on the rows
# when we use sapply we get to supply a different number to the function for
# 1, 2, or 3 of the variables simultaneously
(vargroups <- sapply(1:3, function(x) combn(Cs(age, bmi, map), x)))
## [[1]]
## [,1] [,2] [,3]
## [1,] "age" "bmi" "map"
##
## [[2]]
## [,1] [,2] [,3]
## [1,] "age" "age" "bmi"
## [2,] "bmi" "map" "map"
##
## [[3]]
## [,1]
## [1,] "age"
## [2,] "bmi"
## [3,] "map"
# next I need to run each list element through a function
# that will create the right model based on the rows of
# each list element; paste them together with collapse = ' + '
(formulas <- lapply(vargroups, function(m) apply(m, 2, function(x) paste(x, collapse = ' + '))))
## [[1]]
## [1] "age" "bmi" "map"
##
## [[2]]
## [1] "age + bmi" "age + map" "bmi + map"
##
## [[3]]
## [1] "age + bmi + map"
# now make them a single vector of formulas
(formulas <- do.call(c, formulas))
## [1] "age" "bmi" "map" "age + bmi"
## [5] "age + map" "bmi + map" "age + bmi + map"
# paste the remainder of the formula using paste with sep = ' + '
# (collapse would have made one really long formula)
# rather than 7 single formulas
baseform <- "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase"
(formulas <- paste(baseform, formulas, sep = ' + '))
## [1] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age"
## [2] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi"
## [3] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + map"
## [4] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi"
## [5] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + map"
## [6] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi + map"
## [7] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi + map"
# and don't forget the baseform is a formula too!
(formulas <- c(baseform, formulas))
## [1] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase"
## [2] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age"
## [3] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi"
## [4] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + map"
## [5] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi"
## [6] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + map"
## [7] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi + map"
## [8] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi + map"
# now run lrm on each of these formulas
(models <- lapply(formulas, function(form) lrm(as.formula(form), edfun)))
## [[1]]
## Logistic Regression Model
##
## lrm(formula = as.formula(form), data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 255.53 R2 0.411 C 0.823
## No 417 d.f. 9 g 1.662 Dxy 0.646
## Yes 287 Pr(> chi2) <0.0001 gr 5.271 gamma 0.666
## max |deriv| 1e-10 gp 0.315 tau-a 0.312
## Brier 0.162
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -1.8670 0.3763 -4.96 <0.0001
## cc.ha=Yes -0.1528 0.2696 -0.57 0.5708
## cc.neuro=Yes -1.1932 0.4204 -2.84 0.0045
## cc.vision=Yes 1.0887 0.3607 3.02 0.0025
## cc.bp=Yes -0.6598 0.4567 -1.44 0.1486
## race.black=black 0.2728 0.2145 1.27 0.2035
## sex=W -0.1193 0.2534 -0.47 0.6377
## phase=2 2.8075 0.2148 13.07 <0.0001
## cc.vision=Yes * race.black=black -0.7835 0.4659 -1.68 0.0927
## cc.neuro=Yes * sex=W 0.8270 0.4374 1.89 0.0587
##
##
## [[2]]
## Logistic Regression Model
##
## lrm(formula = as.formula(form), data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 255.94 R2 0.411 C 0.824
## No 417 d.f. 10 g 1.669 Dxy 0.648
## Yes 287 Pr(> chi2) <0.0001 gr 5.308 gamma 0.650
## max |deriv| 6e-09 gp 0.316 tau-a 0.313
## Brier 0.162
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -1.6764 0.4771 -3.51 0.0004
## cc.ha=Yes -0.1776 0.2722 -0.65 0.5142
## cc.neuro=Yes -1.1933 0.4204 -2.84 0.0045
## cc.vision=Yes 1.0836 0.3606 3.01 0.0027
## cc.bp=Yes -0.6701 0.4576 -1.46 0.1431
## race.black=black 0.2605 0.2154 1.21 0.2266
## sex=W -0.1163 0.2532 -0.46 0.6461
## phase=2 2.8118 0.2152 13.07 <0.0001
## age -0.0037 0.0058 -0.64 0.5193
## cc.vision=Yes * race.black=black -0.7884 0.4660 -1.69 0.0906
## cc.neuro=Yes * sex=W 0.8292 0.4378 1.89 0.0582
##
##
## [[3]]
## Logistic Regression Model
##
## lrm(formula = as.formula(form), data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 255.54 R2 0.411 C 0.823
## No 417 d.f. 10 g 1.663 Dxy 0.647
## Yes 287 Pr(> chi2) <0.0001 gr 5.275 gamma 0.653
## max |deriv| 3e-09 gp 0.315 tau-a 0.313
## Brier 0.162
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -1.9053 0.4839 -3.94 <0.0001
## cc.ha=Yes -0.1545 0.2699 -0.57 0.5671
## cc.neuro=Yes -1.1942 0.4205 -2.84 0.0045
## cc.vision=Yes 1.0899 0.3609 3.02 0.0025
## cc.bp=Yes -0.6641 0.4581 -1.45 0.1471
## race.black=black 0.2686 0.2171 1.24 0.2159
## sex=W -0.1228 0.2549 -0.48 0.6300
## phase=2 2.8065 0.2149 13.06 <0.0001
## bmi 0.0015 0.0120 0.13 0.8999
## cc.vision=Yes * race.black=black -0.7863 0.4665 -1.69 0.0919
## cc.neuro=Yes * sex=W 0.8294 0.4378 1.89 0.0582
##
##
## [[4]]
## Logistic Regression Model
##
## lrm(formula = as.formula(form), data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 256.61 R2 0.412 C 0.825
## No 417 d.f. 10 g 1.671 Dxy 0.651
## Yes 287 Pr(> chi2) <0.0001 gr 5.318 gamma 0.652
## max |deriv| 1e-08 gp 0.316 tau-a 0.315
## Brier 0.161
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -2.5391 0.7515 -3.38 0.0007
## cc.ha=Yes -0.1545 0.2703 -0.57 0.5675
## cc.neuro=Yes -1.1732 0.4218 -2.78 0.0054
## cc.vision=Yes 1.1068 0.3618 3.06 0.0022
## cc.bp=Yes -0.9596 0.5402 -1.78 0.0757
## race.black=black 0.2611 0.2150 1.21 0.2246
## sex=W -0.1012 0.2550 -0.40 0.6913
## phase=2 2.8213 0.2156 13.09 <0.0001
## map 0.0063 0.0060 1.04 0.2979
## cc.vision=Yes * race.black=black -0.8055 0.4671 -1.72 0.0846
## cc.neuro=Yes * sex=W 0.7892 0.4394 1.80 0.0725
##
##
## [[5]]
## Logistic Regression Model
##
## lrm(formula = as.formula(form), data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 255.96 R2 0.411 C 0.824
## No 417 d.f. 11 g 1.670 Dxy 0.648
## Yes 287 Pr(> chi2) <0.0001 gr 5.310 gamma 0.650
## max |deriv| 6e-09 gp 0.316 tau-a 0.313
## Brier 0.162
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -1.7179 0.5623 -3.05 0.0023
## cc.ha=Yes -0.1795 0.2726 -0.66 0.5101
## cc.neuro=Yes -1.1944 0.4205 -2.84 0.0045
## cc.vision=Yes 1.0850 0.3608 3.01 0.0026
## cc.bp=Yes -0.6750 0.4590 -1.47 0.1414
## race.black=black 0.2557 0.2181 1.17 0.2410
## sex=W -0.1201 0.2546 -0.47 0.6372
## phase=2 2.8107 0.2153 13.05 <0.0001
## age -0.0037 0.0058 -0.65 0.5175
## bmi 0.0017 0.0120 0.14 0.8890
## cc.vision=Yes * race.black=black -0.7915 0.4665 -1.70 0.0897
## cc.neuro=Yes * sex=W 0.8320 0.4382 1.90 0.0576
##
##
## [[6]]
## Logistic Regression Model
##
## lrm(formula = as.formula(form), data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 257.61 R2 0.413 C 0.826
## No 417 d.f. 11 g 1.682 Dxy 0.651
## Yes 287 Pr(> chi2) <0.0001 gr 5.377 gamma 0.653
## max |deriv| 2e-08 gp 0.317 tau-a 0.315
## Brier 0.162
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -2.4320 0.7594 -3.20 0.0014
## cc.ha=Yes -0.1952 0.2733 -0.71 0.4752
## cc.neuro=Yes -1.1680 0.4222 -2.77 0.0057
## cc.vision=Yes 1.1046 0.3617 3.05 0.0023
## cc.bp=Yes -1.0678 0.5526 -1.93 0.0533
## race.black=black 0.2381 0.2164 1.10 0.2711
## sex=W -0.0915 0.2550 -0.36 0.7198
## phase=2 2.8322 0.2164 13.09 <0.0001
## age -0.0061 0.0061 -1.00 0.3181
## map 0.0082 0.0063 1.29 0.1971
## cc.vision=Yes * race.black=black -0.8211 0.4675 -1.76 0.0790
## cc.neuro=Yes * sex=W 0.7824 0.4402 1.78 0.0755
##
##
## [[7]]
## Logistic Regression Model
##
## lrm(formula = as.formula(form), data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 256.61 R2 0.412 C 0.825
## No 417 d.f. 11 g 1.671 Dxy 0.650
## Yes 287 Pr(> chi2) <0.0001 gr 5.317 gamma 0.652
## max |deriv| 1e-08 gp 0.316 tau-a 0.315
## Brier 0.161
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -2.5345 0.7820 -3.24 0.0012
## cc.ha=Yes -0.1543 0.2706 -0.57 0.5686
## cc.neuro=Yes -1.1729 0.4219 -2.78 0.0054
## cc.vision=Yes 1.1066 0.3619 3.06 0.0022
## cc.bp=Yes -0.9597 0.5402 -1.78 0.0756
## race.black=black 0.2618 0.2173 1.21 0.2282
## sex=W -0.1006 0.2567 -0.39 0.6951
## phase=2 2.8215 0.2158 13.07 <0.0001
## bmi -0.0003 0.0121 -0.02 0.9828
## map 0.0063 0.0061 1.03 0.3013
## cc.vision=Yes * race.black=black -0.8050 0.4675 -1.72 0.0851
## cc.neuro=Yes * sex=W 0.7887 0.4401 1.79 0.0731
##
##
## [[8]]
## Logistic Regression Model
##
## lrm(formula = as.formula(form), data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 257.61 R2 0.413 C 0.826
## No 417 d.f. 12 g 1.682 Dxy 0.651
## Yes 287 Pr(> chi2) <0.0001 gr 5.377 gamma 0.653
## max |deriv| 2e-08 gp 0.317 tau-a 0.315
## Brier 0.162
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -2.4223 0.7900 -3.07 0.0022
## cc.ha=Yes -0.1946 0.2736 -0.71 0.4769
## cc.neuro=Yes -1.1675 0.4223 -2.76 0.0057
## cc.vision=Yes 1.1042 0.3618 3.05 0.0023
## cc.bp=Yes -1.0681 0.5526 -1.93 0.0533
## race.black=black 0.2395 0.2186 1.10 0.2732
## sex=W -0.0901 0.2567 -0.35 0.7255
## phase=2 2.8326 0.2167 13.07 <0.0001
## age -0.0061 0.0061 -1.00 0.3178
## bmi -0.0005 0.0121 -0.04 0.9646
## map 0.0082 0.0064 1.28 0.1994
## cc.vision=Yes * race.black=black -0.8202 0.4679 -1.75 0.0796
## cc.neuro=Yes * sex=W 0.7813 0.4409 1.77 0.0764
##
# no need to look at each of them; just pull out the exp of the cc.bp coef
(ors <- sapply(models, function(x) exp(coef(x)['cc.bp=Yes'])))
## cc.bp=Yes cc.bp=Yes cc.bp=Yes cc.bp=Yes cc.bp=Yes cc.bp=Yes cc.bp=Yes cc.bp=Yes
## 0.5169713 0.5116456 0.5147410 0.3830430 0.5091617 0.3437730 0.3829957 0.3436484
# match with the formulas
cbind(formulas, ors)
## formulas
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase"
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age"
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi"
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + map"
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi"
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + map"
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi + map"
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi + map"
## ors
## cc.bp=Yes "0.51697130753288"
## cc.bp=Yes "0.511645643233362"
## cc.bp=Yes "0.514741013826027"
## cc.bp=Yes "0.383043040433255"
## cc.bp=Yes "0.509161700241673"
## cc.bp=Yes "0.343773032111207"
## cc.bp=Yes "0.382995700075936"
## cc.bp=Yes "0.343648427724841"
Note the simplest model that fully controls for confounding is the 3rd from the bottom which requires keeping age and MAP. So, let’s plan to drop BMI.
# new function that makes it much easier to create new models:
# . means what was there before
(m4 <- update(m2, . ~ . - bmi))
## Logistic Regression Model
##
## lrm(formula = examined ~ cc.ha + cc.neuro + cc.vision + cc.bp +
## phase + sex + race.black + age + map + cc.vision:race.black +
## cc.neuro:sex, data = edfun)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 704 LR chi2 257.61 R2 0.413 C 0.826
## No 417 d.f. 11 g 1.682 Dxy 0.651
## Yes 287 Pr(> chi2) <0.0001 gr 5.377 gamma 0.653
## max |deriv| 2e-08 gp 0.317 tau-a 0.315
## Brier 0.162
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -2.4320 0.7594 -3.20 0.0014
## cc.ha=Yes -0.1952 0.2733 -0.71 0.4752
## cc.neuro=Yes -1.1680 0.4222 -2.77 0.0057
## cc.vision=Yes 1.1046 0.3617 3.05 0.0023
## cc.bp=Yes -1.0678 0.5526 -1.93 0.0533
## phase=2 2.8322 0.2164 13.09 <0.0001
## sex=W -0.0915 0.2550 -0.36 0.7198
## race.black=black 0.2381 0.2164 1.10 0.2711
## age -0.0061 0.0061 -1.00 0.3181
## map 0.0082 0.0063 1.29 0.1971
## cc.vision=Yes * race.black=black -0.8211 0.4675 -1.76 0.0790
## cc.neuro=Yes * sex=W 0.7824 0.4402 1.78 0.0755
##
Look at all the coefficients for the complaints and conditions. They are generally a little smaller. No need to keep age and BMI for political reasons so we have the final model (m4).
It’s all good:
# no nonlinearity at 0.05
anova(update(m4, . ~ . - age + rcs(age, 5) - map + rcs(map, 5)))
## Wald Statistics Response: examined
##
## Factor Chi-Square d.f. P
## cc.ha 0.65 1 0.4188
## cc.neuro (Factor+Higher Order Factors) 8.04 2 0.0180
## All Interactions 3.03 1 0.0818
## cc.vision (Factor+Higher Order Factors) 8.09 2 0.0175
## All Interactions 2.25 1 0.1336
## cc.bp 0.32 1 0.5694
## phase 168.45 1 <.0001
## sex (Factor+Higher Order Factors) 4.11 2 0.1283
## All Interactions 3.03 1 0.0818
## race.black (Factor+Higher Order Factors) 2.26 2 0.3233
## All Interactions 2.25 1 0.1336
## age 7.52 4 0.1109
## Nonlinear 6.37 3 0.0951
## map 6.52 4 0.1637
## Nonlinear 5.26 3 0.1535
## cc.vision * race.black (Factor+Higher Order Factors) 2.25 1 0.1336
## cc.neuro * sex (Factor+Higher Order Factors) 3.03 1 0.0818
## TOTAL NONLINEAR 11.79 6 0.0669
## TOTAL INTERACTION 5.60 2 0.0609
## TOTAL NONLINEAR + INTERACTION 18.12 8 0.0204
## TOTAL 180.79 17 <.0001
# no collinearity (only intercept and map - safe to ignore)
colldiag.alt(m4)
## Condition
## Index Variance Decomposition Proportions
## Intercept cc.ha=Yes cc.neuro=Yes cc.vision=Yes cc.bp=Yes phase=2
## 1 1.000 0.000 0.002 0.001 0.002 0.001 0.005
## 2 2.042 0.000 0.001 0.018 0.059 0.000 0.004
## 3 2.256 0.000 0.015 0.020 0.026 0.097 0.003
## 4 2.669 0.000 0.048 0.004 0.004 0.351 0.000
## 5 3.539 0.001 0.000 0.000 0.151 0.010 0.007
## 6 4.095 0.001 0.031 0.031 0.000 0.029 0.171
## 7 4.840 0.001 0.001 0.020 0.022 0.001 0.472
## 8 5.986 0.000 0.232 0.061 0.235 0.045 0.116
## 9 6.697 0.000 0.299 0.000 0.265 0.061 0.150
## 10 8.460 0.002 0.006 0.529 0.017 0.012 0.012
## 11 12.304 0.078 0.331 0.260 0.159 0.218 0.014
## 12 26.633 0.917 0.033 0.055 0.059 0.176 0.045
## sex=W race.black=black age map cc.vision=Yes * race.black=black
## 1 0.004 0.005 0.002 0.000 0.001
## 2 0.000 0.001 0.000 0.000 0.082
## 3 0.000 0.001 0.000 0.000 0.044
## 4 0.002 0.002 0.000 0.000 0.020
## 5 0.000 0.322 0.009 0.001 0.116
## 6 0.170 0.000 0.011 0.000 0.000
## 7 0.061 0.287 0.007 0.002 0.150
## 8 0.115 0.273 0.004 0.001 0.475
## 9 0.225 0.040 0.172 0.001 0.077
## 10 0.331 0.013 0.287 0.000 0.003
## 11 0.057 0.055 0.502 0.140 0.015
## 12 0.034 0.001 0.005 0.854 0.015
## cc.neuro=Yes * sex=W
## 1 0.002
## 2 0.024
## 3 0.032
## 4 0.005
## 5 0.001
## 6 0.110
## 7 0.005
## 8 0.001
## 9 0.117
## 10 0.567
## 11 0.101
## 12 0.035
# model has no evidence of lack of fit
residuals(m4, "gof") # need to put x = TRUE, y = TRUE in fit for this to work (see error)
## Error in residuals.lrm(m4, "gof"): you did not specify y=TRUE in the fit
residuals(update(m4, . ~ ., x = TRUE, y = TRUE), "gof")
## Sum of squared errors Expected value|H0 SD
## 113.6978708 113.4894754 0.4999796
## Z P
## 0.4168078 0.6768190
Create tables…