Modeling Example

Step 1. Variable Specification

I’m going to “skip” this step because you can trust all the data is clean in this case and I want to get down to business. I’m going to develop a model with the outcome of interest being whether the patient was examined by the ED physician or not and the primary exposure of interest being the patients presenting complaint/condition. The presenting complaint/conditions are coded as four indicator variables because patients, as you know, can have more than one complaint/condition: headache, neurological deficit, vision complaint , or diastolic blood pressure >120 mmHg. The other variables of interest (i.e., potential confounders and interacting variables) are the examination method available (which depended on the phase of the study), sex, race, age, body mass index, and mean arterial pressure.

Step 2. Model Specification

The outcome is dichotomous so I’ll use a logistic regression. In R, I usually use the rms library. The data is found in the adm package.

library(rms)

## Loading required package: Hmisc

## Loading required package: lattice

## Loading required package: survival

## Loading required package: Formula

## Loading required package: ggplot2

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units

## Loading required package: SparseM

## Loading required package: methods

## 
## Attaching package: 'SparseM'

## The following object is masked from 'package:base':
## 
##     backsolve

library(adm)

## 
## Attaching package: 'adm'

## The following object is masked from 'package:Hmisc':
## 
##     html

head(edfun)

##   cc.ha cc.neuro cc.vision cc.bp phase sex    race.black age bmi map examined
## 1   Yes       No        No    No     1   M         black  38  32 110       No
## 2   Yes       No        No    No     1   W         black  51  31 121       No
## 3   Yes      Yes        No    No     1   W other/unknown  28  19 108       No
## 4    No       No       Yes    No     1   W         black  52  64 110       No
## 5    No      Yes        No    No     1   M         black  44  37 104       No
## 6    No      Yes        No    No     1   W         black  75  24 148       No

Is there any missing data in the variables we care about?

all(complete.cases(edfun))

## [1] TRUE

No; so with that out of the way, let’s now look for non-linearity:

(m <- lrm(examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + phase + sex + 
            race.black + rcs(age, 5) + rcs(bmi, 5) + rcs(map, 5), edfun))

## Logistic Regression Model
##  
##  lrm(formula = examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + 
##      phase + sex + race.black + rcs(age, 5) + rcs(bmi, 5) + rcs(map, 
##      5), data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     268.34    R2       0.428    C       0.832    
##   No           417    d.f.            19    g        1.763    Dxy     0.664    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.831    gamma   0.666    
##  max |deriv| 5e-08                          gp       0.326    tau-a   0.321    
##                                             Brier    0.159                     
##  
##                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept        -3.6022 4.0638 -0.89  0.3754  
##  cc.ha=Yes        -0.1888 0.2788 -0.68  0.4983  
##  cc.neuro=Yes     -0.6583 0.2854 -2.31  0.0211  
##  cc.vision=Yes     0.7543 0.2818  2.68  0.0074  
##  cc.bp=Yes        -0.4267 0.6580 -0.65  0.5166  
##  phase=2           2.8759 0.2206 13.04  <0.0001 
##  sex=W             0.1871 0.2186  0.86  0.3921  
##  race.black=black -0.0312 0.2047 -0.15  0.8787  
##  age              -0.1100 0.0430 -2.56  0.0105  
##  age'              0.7562 0.3223  2.35  0.0190  
##  age''            -2.0778 0.9999 -2.08  0.0377  
##  age'''            1.8010 1.1553  1.56  0.1190  
##  bmi              -0.0298 0.0986 -0.30  0.7624  
##  bmi'              0.3962 1.4378  0.28  0.7829  
##  bmi''            -0.4190 4.0608 -0.10  0.9178  
##  bmi'''           -0.7846 4.1680 -0.19  0.8507  
##  map               0.0578 0.0382  1.51  0.1308  
##  map'             -0.6494 0.4397 -1.48  0.1397  
##  map''             2.7494 1.7229  1.60  0.1105  
##  map'''           -3.6201 2.1319 -1.70  0.0895  
##

anova(m)

##                 Wald Statistics          Response: examined 
## 
##  Factor          Chi-Square d.f. P     
##  cc.ha             0.46      1   0.4983
##  cc.neuro          5.32      1   0.0211
##  cc.vision         7.16      1   0.0074
##  cc.bp             0.42      1   0.5166
##  phase           169.93      1   <.0001
##  sex               0.73      1   0.3921
##  race.black        0.02      1   0.8787
##  age               8.52      4   0.0743
##   Nonlinear        7.49      3   0.0577
##  bmi               4.10      4   0.3924
##   Nonlinear        4.10      3   0.2507
##  map               5.80      4   0.2149
##   Nonlinear        4.50      3   0.2123
##  TOTAL NONLINEAR  16.64      9   0.0547
##  TOTAL           180.55     19   <.0001

Note that age is close to being non-linear and that the “total” nonlinearity is also nearly significant. I’m going to choose to get rid of the non-linear term in this model to keep it “simple.” You could choose to be extra careful about confounding control and keep age as non-linear for now.

Next, collinearity:

library(perturb) # helps print with colldiag.alt - install if don't have
(m <- lrm(examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + 
            phase + sex + race.black + age + bmi + map, edfun))

## Logistic Regression Model
##  
##  lrm(formula = examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + 
##      phase + sex + race.black + age + bmi + map, data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     250.87    R2       0.404    C       0.823    
##   No           417    d.f.            10    g        1.626    Dxy     0.646    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.085    gamma   0.648    
##  max |deriv| 4e-09                          gp       0.313    tau-a   0.313    
##                                             Brier    0.163                     
##  
##                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept        -2.5260 0.7638 -3.31  0.0009  
##  cc.ha=Yes        -0.1508 0.2712 -0.56  0.5783  
##  cc.neuro=Yes     -0.5987 0.2761 -2.17  0.0301  
##  cc.vision=Yes     0.7548 0.2768  2.73  0.0064  
##  cc.bp=Yes        -1.0076 0.5493 -1.83  0.0666  
##  phase=2           2.8054 0.2137 13.13  <0.0001 
##  sex=W             0.1469 0.2095  0.70  0.4830  
##  race.black=black  0.0689 0.1960  0.35  0.7253  
##  age              -0.0059 0.0060 -0.98  0.3251  
##  bmi              -0.0029 0.0120 -0.24  0.8102  
##  map               0.0088 0.0063  1.39  0.1636  
##

colldiag.alt(m)

## Condition
## Index    Variance Decomposition Proportions
##            Intercept cc.ha=Yes cc.neuro=Yes cc.vision=Yes cc.bp=Yes phase=2
## 1    1.000 0.000     0.002     0.002        0.002         0.001     0.004  
## 2    2.614 0.000     0.002     0.004        0.109         0.367     0.004  
## 3    2.716 0.000     0.010     0.174        0.193         0.041     0.018  
## 4    3.021 0.000     0.090     0.089        0.235         0.058     0.002  
## 5    4.156 0.000     0.009     0.001        0.000         0.005     0.022  
## 6    5.012 0.000     0.003     0.075        0.026         0.016     0.514  
## 7    5.408 0.002     0.063     0.043        0.033         0.001     0.337  
## 8    7.293 0.000     0.456     0.437        0.219         0.136     0.036  
## 9    9.648 0.004     0.194     0.061        0.067         0.060     0.014  
## 10  14.346 0.100     0.141     0.090        0.075         0.140     0.004  
## 11  27.179 0.893     0.030     0.023        0.041         0.174     0.044  
##    sex=W race.black=black age   bmi   map  
## 1  0.004 0.005            0.002 0.001 0.000
## 2  0.000 0.001            0.000 0.000 0.000
## 3  0.000 0.000            0.000 0.000 0.000
## 4  0.007 0.001            0.001 0.000 0.000
## 5  0.000 0.884            0.011 0.000 0.001
## 6  0.417 0.020            0.000 0.000 0.000
## 7  0.542 0.005            0.022 0.005 0.003
## 8  0.016 0.003            0.295 0.003 0.002
## 9  0.005 0.077            0.337 0.496 0.003
## 10 0.001 0.001            0.329 0.478 0.189
## 11 0.007 0.002            0.003 0.016 0.801

Only mean arterial pressure and the intercept appear collinear so we will proceed.

Step 3. Interaction assessment

(m <- lrm(examined ~ (cc.ha + cc.neuro + cc.vision + cc.bp) * (phase + sex + 
                                        race.black + age + bmi + map), edfun))

## Logistic Regression Model
##  
##  lrm(formula = examined ~ (cc.ha + cc.neuro + cc.vision + cc.bp) * 
##      (phase + sex + race.black + age + bmi + map), data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     282.03    R2       0.445    C       0.839    
##   No           417    d.f.            34    g        1.843    Dxy     0.679    
##   Yes          287    Pr(> chi2) <0.0001    gr       6.315    gamma   0.680    
##  max |deriv| 3e-05                          gp       0.331    tau-a   0.328    
##                                             Brier    0.154                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -0.6725 2.2227 -0.30  0.7622  
##  cc.ha=Yes                        -2.5957 2.0535 -1.26  0.2062  
##  cc.neuro=Yes                     -3.0118 2.3282 -1.29  0.1958  
##  cc.vision=Yes                     0.6559 2.2422  0.29  0.7699  
##  cc.bp=Yes                         0.8323 5.3171  0.16  0.8756  
##  phase=2                           2.6843 0.6622  4.05  <0.0001 
##  sex=W                            -0.9600 0.7706 -1.25  0.2128  
##  race.black=black                  0.6558 0.6662  0.98  0.3249  
##  age                              -0.0073 0.0205 -0.36  0.7217  
##  bmi                              -0.0334 0.0456 -0.73  0.4631  
##  map                               0.0070 0.0192  0.36  0.7169  
##  cc.ha=Yes * phase=2               0.4048 0.6113  0.66  0.5079  
##  cc.ha=Yes * sex=W                 1.0288 0.7222  1.42  0.1543  
##  cc.ha=Yes * race.black=black     -0.6535 0.6219 -1.05  0.2933  
##  cc.ha=Yes * age                  -0.0064 0.0188 -0.34  0.7353  
##  cc.ha=Yes * bmi                   0.0141 0.0431  0.33  0.7440  
##  cc.ha=Yes * map                   0.0144 0.0174  0.83  0.4072  
##  cc.neuro=Yes * phase=2           -0.1191 0.6524 -0.18  0.8551  
##  cc.neuro=Yes * sex=W              1.4851 0.7690  1.93  0.0535  
##  cc.neuro=Yes * race.black=black   0.1222 0.6374  0.19  0.8479  
##  cc.neuro=Yes * age                0.0197 0.0196  1.01  0.3133  
##  cc.neuro=Yes * bmi                0.0569 0.0443  1.29  0.1984  
##  cc.neuro=Yes * map               -0.0130 0.0191 -0.68  0.4964  
##  cc.vision=Yes * phase=2           0.0950 0.6256  0.15  0.8793  
##  cc.vision=Yes * sex=W             0.3248 0.7073  0.46  0.6461  
##  cc.vision=Yes * race.black=black -1.2265 0.6401 -1.92  0.0554  
##  cc.vision=Yes * age              -0.0269 0.0198 -1.36  0.1729  
##  cc.vision=Yes * bmi               0.0308 0.0412  0.75  0.4546  
##  cc.vision=Yes * map               0.0053 0.0188  0.28  0.7757  
##  cc.bp=Yes * phase=2               0.2486 1.1550  0.22  0.8296  
##  cc.bp=Yes * sex=W                 0.6239 1.1956  0.52  0.6018  
##  cc.bp=Yes * race.black=black      0.7420 1.2434  0.60  0.5507  
##  cc.bp=Yes * age                   0.0535 0.0345  1.55  0.1215  
##  cc.bp=Yes * bmi                   0.0518 0.0645  0.80  0.4220  
##  cc.bp=Yes * map                  -0.0482 0.0368 -1.31  0.1907  
##

anova(m)

##                 Wald Statistics          Response: examined 
## 
##  Factor                                                Chi-Square d.f. P     
##  cc.ha  (Factor+Higher Order Factors)                    4.21      7   0.7552
##   All Interactions                                       3.56      6   0.7357
##  cc.neuro  (Factor+Higher Order Factors)                10.80      7   0.1474
##   All Interactions                                       6.70      6   0.3492
##  cc.vision  (Factor+Higher Order Factors)               10.74      7   0.1502
##   All Interactions                                       5.92      6   0.4319
##  cc.bp  (Factor+Higher Order Factors)                    4.93      7   0.6681
##   All Interactions                                       4.36      6   0.6283
##  phase  (Factor+Higher Order Factors)                  168.88      5   <.0001
##   All Interactions                                       1.05      4   0.9028
##  sex  (Factor+Higher Order Factors)                      5.50      5   0.3577
##   All Interactions                                       4.90      4   0.2982
##  race.black  (Factor+Higher Order Factors)               8.03      5   0.1545
##   All Interactions                                       7.72      4   0.1024
##  age  (Factor+Higher Order Factors)                     11.23      5   0.0469
##   All Interactions                                      10.39      4   0.0344
##  bmi  (Factor+Higher Order Factors)                      3.11      5   0.6823
##   All Interactions                                       3.11      4   0.5392
##  map  (Factor+Higher Order Factors)                      8.32      5   0.1397
##   All Interactions                                       6.07      4   0.1939
##  cc.ha * phase  (Factor+Higher Order Factors)            0.44      1   0.5079
##  cc.ha * sex  (Factor+Higher Order Factors)              2.03      1   0.1543
##  cc.ha * race.black  (Factor+Higher Order Factors)       1.10      1   0.2933
##  cc.ha * age  (Factor+Higher Order Factors)              0.11      1   0.7353
##  cc.ha * bmi  (Factor+Higher Order Factors)              0.11      1   0.7440
##  cc.ha * map  (Factor+Higher Order Factors)              0.69      1   0.4072
##  cc.neuro * phase  (Factor+Higher Order Factors)         0.03      1   0.8551
##  cc.neuro * sex  (Factor+Higher Order Factors)           3.73      1   0.0535
##  cc.neuro * race.black  (Factor+Higher Order Factors)    0.04      1   0.8479
##  cc.neuro * age  (Factor+Higher Order Factors)           1.02      1   0.3133
##  cc.neuro * bmi  (Factor+Higher Order Factors)           1.65      1   0.1984
##  cc.neuro * map  (Factor+Higher Order Factors)           0.46      1   0.4964
##  cc.vision * phase  (Factor+Higher Order Factors)        0.02      1   0.8793
##  cc.vision * sex  (Factor+Higher Order Factors)          0.21      1   0.6461
##  cc.vision * race.black  (Factor+Higher Order Factors)   3.67      1   0.0554
##  cc.vision * age  (Factor+Higher Order Factors)          1.86      1   0.1729
##  cc.vision * bmi  (Factor+Higher Order Factors)          0.56      1   0.4546
##  cc.vision * map  (Factor+Higher Order Factors)          0.08      1   0.7757
##  cc.bp * phase  (Factor+Higher Order Factors)            0.05      1   0.8296
##  cc.bp * sex  (Factor+Higher Order Factors)              0.27      1   0.6018
##  cc.bp * race.black  (Factor+Higher Order Factors)       0.36      1   0.5507
##  cc.bp * age  (Factor+Higher Order Factors)              2.40      1   0.1215
##  cc.bp * bmi  (Factor+Higher Order Factors)              0.64      1   0.4220
##  cc.bp * map  (Factor+Higher Order Factors)              1.71      1   0.1907
##  TOTAL INTERACTION                                      27.55     24   0.2796
##  TOTAL                                                 184.20     34   <.0001

There are a couple of significant second order terms race and vision complaint and sex and neuro complaint. Make sure there is no evidence of additional interaction if we reduce the model to just keep those two terms:

(m2 <- lrm(examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + phase + 
             sex + race.black + age + bmi + map + race.black * cc.vision + 
             sex * cc.neuro, edfun))

## Logistic Regression Model
##  
##  lrm(formula = examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + 
##      phase + sex + race.black + age + bmi + map + race.black * 
##      cc.vision + sex * cc.neuro, data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     257.61    R2       0.413    C       0.826    
##   No           417    d.f.            12    g        1.682    Dxy     0.651    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.377    gamma   0.653    
##  max |deriv| 2e-08                          gp       0.317    tau-a   0.315    
##                                             Brier    0.162                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -2.4223 0.7900 -3.07  0.0022  
##  cc.ha=Yes                        -0.1946 0.2736 -0.71  0.4769  
##  cc.neuro=Yes                     -1.1675 0.4223 -2.76  0.0057  
##  cc.vision=Yes                     1.1042 0.3618  3.05  0.0023  
##  cc.bp=Yes                        -1.0681 0.5526 -1.93  0.0533  
##  phase=2                           2.8326 0.2167 13.07  <0.0001 
##  sex=W                            -0.0901 0.2567 -0.35  0.7255  
##  race.black=black                  0.2395 0.2186  1.10  0.2732  
##  age                              -0.0061 0.0061 -1.00  0.3178  
##  bmi                              -0.0005 0.0121 -0.04  0.9646  
##  map                               0.0082 0.0064  1.28  0.1994  
##  cc.vision=Yes * race.black=black -0.8202 0.4679 -1.75  0.0796  
##  cc.neuro=Yes * sex=W              0.7813 0.4409  1.77  0.0764  
##

# non-significant likelihood ratio test 
# between the two models so ok to reduce
lrtest(m, m2)

## 
## Model 1: examined ~ (cc.ha + cc.neuro + cc.vision + cc.bp) * (phase + 
##     sex + race.black + age + bmi + map)
## Model 2: examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + phase + sex + 
##     race.black + age + bmi + map + race.black * cc.vision + sex * 
##     cc.neuro
## 
## L.R. Chisq       d.f.          P 
## 24.4228124 22.0000000  0.3254631

# even though the two terms are non-sig at 0.05 they are at 0.1, 
# and in addition total interaction is significant at 0.05 level now
anova(m2)

##                 Wald Statistics          Response: examined 
## 
##  Factor                                                Chi-Square d.f. P     
##  cc.ha                                                   0.51      1   0.4769
##  cc.neuro  (Factor+Higher Order Factors)                 7.86      2   0.0196
##   All Interactions                                       3.14      1   0.0764
##  cc.vision  (Factor+Higher Order Factors)                9.39      2   0.0092
##   All Interactions                                       3.07      1   0.0796
##  cc.bp                                                   3.74      1   0.0533
##  phase                                                 170.92      1   <.0001
##  sex  (Factor+Higher Order Factors)                      3.78      2   0.1514
##   All Interactions                                       3.14      1   0.0764
##  race.black  (Factor+Higher Order Factors)               3.21      2   0.2006
##   All Interactions                                       3.07      1   0.0796
##  age                                                     1.00      1   0.3178
##  bmi                                                     0.00      1   0.9646
##  map                                                     1.65      1   0.1994
##  cc.vision * race.black  (Factor+Higher Order Factors)   3.07      1   0.0796
##  cc.neuro * sex  (Factor+Higher Order Factors)           3.14      1   0.0764
##  TOTAL INTERACTION                                       6.63      2   0.0364
##  TOTAL                                                 179.41     12   <.0001

Step 4. Confounding assessment

Now compare the adjusted ORs for the complaints (but we have to deal with the interactions too in this case because they effect the exposures of interest) with the model where age, BMI, and MAP are removed. Kept phase because it is such an important variable.

(m3 <- lrm(examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + phase + 
             sex + race.black + race.black * cc.vision + sex * cc.neuro, edfun))

## Logistic Regression Model
##  
##  lrm(formula = examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + 
##      phase + sex + race.black + race.black * cc.vision + sex * 
##      cc.neuro, data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     255.53    R2       0.411    C       0.823    
##   No           417    d.f.             9    g        1.662    Dxy     0.646    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.271    gamma   0.666    
##  max |deriv| 1e-10                          gp       0.315    tau-a   0.312    
##                                             Brier    0.162                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -1.8670 0.3763 -4.96  <0.0001 
##  cc.ha=Yes                        -0.1528 0.2696 -0.57  0.5708  
##  cc.neuro=Yes                     -1.1932 0.4204 -2.84  0.0045  
##  cc.vision=Yes                     1.0887 0.3607  3.02  0.0025  
##  cc.bp=Yes                        -0.6598 0.4567 -1.44  0.1486  
##  phase=2                           2.8075 0.2148 13.07  <0.0001 
##  sex=W                            -0.1193 0.2534 -0.47  0.6377  
##  race.black=black                  0.2728 0.2145  1.27  0.2035  
##  cc.vision=Yes * race.black=black -0.7835 0.4659 -1.68  0.0927  
##  cc.neuro=Yes * sex=W              0.8270 0.4374  1.89  0.0587  
##

You can eyeball that the only one that is an issue is the DBP > 120 condition. Nothing else will be confounded if we get rid age, BMI, and MAP. So, now, I’m going to look just at the DBP > 120 coefficient for all subsets of age, BMI, and MAP:

# create all subsets of age, bmi, map
# combn(list, n) creates the permutations on the rows
# when we use sapply we get to supply a different number to the function for 
# 1, 2, or 3 of the variables simultaneously
(vargroups <- sapply(1:3, function(x) combn(Cs(age, bmi, map), x)))

## [[1]]
##      [,1]  [,2]  [,3] 
## [1,] "age" "bmi" "map"
## 
## [[2]]
##      [,1]  [,2]  [,3] 
## [1,] "age" "age" "bmi"
## [2,] "bmi" "map" "map"
## 
## [[3]]
##      [,1] 
## [1,] "age"
## [2,] "bmi"
## [3,] "map"

# next I need to run each list element through a function 
# that will create the right model based on the rows of 
# each list element; paste them together with collapse = ' + '
(formulas <- lapply(vargroups, function(m) apply(m, 2, function(x) paste(x, collapse = ' + '))))

## [[1]]
## [1] "age" "bmi" "map"
## 
## [[2]]
## [1] "age + bmi" "age + map" "bmi + map"
## 
## [[3]]
## [1] "age + bmi + map"

# now make them a single vector of formulas
(formulas <- do.call(c, formulas))

## [1] "age"             "bmi"             "map"             "age + bmi"      
## [5] "age + map"       "bmi + map"       "age + bmi + map"

# paste the remainder of the formula using paste with sep = ' + ' 
# (collapse would have made one really long formula)
# rather than 7 single formulas
baseform <- "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase"
(formulas <- paste(baseform, formulas, sep = ' + '))

## [1] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age"            
## [2] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi"            
## [3] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + map"            
## [4] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi"      
## [5] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + map"      
## [6] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi + map"      
## [7] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi + map"

# and don't forget the baseform is a formula too!
(formulas <- c(baseform, formulas))

## [1] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase"                  
## [2] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age"            
## [3] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi"            
## [4] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + map"            
## [5] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi"      
## [6] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + map"      
## [7] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi + map"      
## [8] "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi + map"

# now run lrm on each of these formulas
(models <- lapply(formulas, function(form) lrm(as.formula(form), edfun)))

## [[1]]
## Logistic Regression Model
##  
##  lrm(formula = as.formula(form), data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     255.53    R2       0.411    C       0.823    
##   No           417    d.f.             9    g        1.662    Dxy     0.646    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.271    gamma   0.666    
##  max |deriv| 1e-10                          gp       0.315    tau-a   0.312    
##                                             Brier    0.162                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -1.8670 0.3763 -4.96  <0.0001 
##  cc.ha=Yes                        -0.1528 0.2696 -0.57  0.5708  
##  cc.neuro=Yes                     -1.1932 0.4204 -2.84  0.0045  
##  cc.vision=Yes                     1.0887 0.3607  3.02  0.0025  
##  cc.bp=Yes                        -0.6598 0.4567 -1.44  0.1486  
##  race.black=black                  0.2728 0.2145  1.27  0.2035  
##  sex=W                            -0.1193 0.2534 -0.47  0.6377  
##  phase=2                           2.8075 0.2148 13.07  <0.0001 
##  cc.vision=Yes * race.black=black -0.7835 0.4659 -1.68  0.0927  
##  cc.neuro=Yes * sex=W              0.8270 0.4374  1.89  0.0587  
##  
## 
## [[2]]
## Logistic Regression Model
##  
##  lrm(formula = as.formula(form), data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     255.94    R2       0.411    C       0.824    
##   No           417    d.f.            10    g        1.669    Dxy     0.648    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.308    gamma   0.650    
##  max |deriv| 6e-09                          gp       0.316    tau-a   0.313    
##                                             Brier    0.162                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -1.6764 0.4771 -3.51  0.0004  
##  cc.ha=Yes                        -0.1776 0.2722 -0.65  0.5142  
##  cc.neuro=Yes                     -1.1933 0.4204 -2.84  0.0045  
##  cc.vision=Yes                     1.0836 0.3606  3.01  0.0027  
##  cc.bp=Yes                        -0.6701 0.4576 -1.46  0.1431  
##  race.black=black                  0.2605 0.2154  1.21  0.2266  
##  sex=W                            -0.1163 0.2532 -0.46  0.6461  
##  phase=2                           2.8118 0.2152 13.07  <0.0001 
##  age                              -0.0037 0.0058 -0.64  0.5193  
##  cc.vision=Yes * race.black=black -0.7884 0.4660 -1.69  0.0906  
##  cc.neuro=Yes * sex=W              0.8292 0.4378  1.89  0.0582  
##  
## 
## [[3]]
## Logistic Regression Model
##  
##  lrm(formula = as.formula(form), data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     255.54    R2       0.411    C       0.823    
##   No           417    d.f.            10    g        1.663    Dxy     0.647    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.275    gamma   0.653    
##  max |deriv| 3e-09                          gp       0.315    tau-a   0.313    
##                                             Brier    0.162                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -1.9053 0.4839 -3.94  <0.0001 
##  cc.ha=Yes                        -0.1545 0.2699 -0.57  0.5671  
##  cc.neuro=Yes                     -1.1942 0.4205 -2.84  0.0045  
##  cc.vision=Yes                     1.0899 0.3609  3.02  0.0025  
##  cc.bp=Yes                        -0.6641 0.4581 -1.45  0.1471  
##  race.black=black                  0.2686 0.2171  1.24  0.2159  
##  sex=W                            -0.1228 0.2549 -0.48  0.6300  
##  phase=2                           2.8065 0.2149 13.06  <0.0001 
##  bmi                               0.0015 0.0120  0.13  0.8999  
##  cc.vision=Yes * race.black=black -0.7863 0.4665 -1.69  0.0919  
##  cc.neuro=Yes * sex=W              0.8294 0.4378  1.89  0.0582  
##  
## 
## [[4]]
## Logistic Regression Model
##  
##  lrm(formula = as.formula(form), data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     256.61    R2       0.412    C       0.825    
##   No           417    d.f.            10    g        1.671    Dxy     0.651    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.318    gamma   0.652    
##  max |deriv| 1e-08                          gp       0.316    tau-a   0.315    
##                                             Brier    0.161                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -2.5391 0.7515 -3.38  0.0007  
##  cc.ha=Yes                        -0.1545 0.2703 -0.57  0.5675  
##  cc.neuro=Yes                     -1.1732 0.4218 -2.78  0.0054  
##  cc.vision=Yes                     1.1068 0.3618  3.06  0.0022  
##  cc.bp=Yes                        -0.9596 0.5402 -1.78  0.0757  
##  race.black=black                  0.2611 0.2150  1.21  0.2246  
##  sex=W                            -0.1012 0.2550 -0.40  0.6913  
##  phase=2                           2.8213 0.2156 13.09  <0.0001 
##  map                               0.0063 0.0060  1.04  0.2979  
##  cc.vision=Yes * race.black=black -0.8055 0.4671 -1.72  0.0846  
##  cc.neuro=Yes * sex=W              0.7892 0.4394  1.80  0.0725  
##  
## 
## [[5]]
## Logistic Regression Model
##  
##  lrm(formula = as.formula(form), data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     255.96    R2       0.411    C       0.824    
##   No           417    d.f.            11    g        1.670    Dxy     0.648    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.310    gamma   0.650    
##  max |deriv| 6e-09                          gp       0.316    tau-a   0.313    
##                                             Brier    0.162                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -1.7179 0.5623 -3.05  0.0023  
##  cc.ha=Yes                        -0.1795 0.2726 -0.66  0.5101  
##  cc.neuro=Yes                     -1.1944 0.4205 -2.84  0.0045  
##  cc.vision=Yes                     1.0850 0.3608  3.01  0.0026  
##  cc.bp=Yes                        -0.6750 0.4590 -1.47  0.1414  
##  race.black=black                  0.2557 0.2181  1.17  0.2410  
##  sex=W                            -0.1201 0.2546 -0.47  0.6372  
##  phase=2                           2.8107 0.2153 13.05  <0.0001 
##  age                              -0.0037 0.0058 -0.65  0.5175  
##  bmi                               0.0017 0.0120  0.14  0.8890  
##  cc.vision=Yes * race.black=black -0.7915 0.4665 -1.70  0.0897  
##  cc.neuro=Yes * sex=W              0.8320 0.4382  1.90  0.0576  
##  
## 
## [[6]]
## Logistic Regression Model
##  
##  lrm(formula = as.formula(form), data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     257.61    R2       0.413    C       0.826    
##   No           417    d.f.            11    g        1.682    Dxy     0.651    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.377    gamma   0.653    
##  max |deriv| 2e-08                          gp       0.317    tau-a   0.315    
##                                             Brier    0.162                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -2.4320 0.7594 -3.20  0.0014  
##  cc.ha=Yes                        -0.1952 0.2733 -0.71  0.4752  
##  cc.neuro=Yes                     -1.1680 0.4222 -2.77  0.0057  
##  cc.vision=Yes                     1.1046 0.3617  3.05  0.0023  
##  cc.bp=Yes                        -1.0678 0.5526 -1.93  0.0533  
##  race.black=black                  0.2381 0.2164  1.10  0.2711  
##  sex=W                            -0.0915 0.2550 -0.36  0.7198  
##  phase=2                           2.8322 0.2164 13.09  <0.0001 
##  age                              -0.0061 0.0061 -1.00  0.3181  
##  map                               0.0082 0.0063  1.29  0.1971  
##  cc.vision=Yes * race.black=black -0.8211 0.4675 -1.76  0.0790  
##  cc.neuro=Yes * sex=W              0.7824 0.4402  1.78  0.0755  
##  
## 
## [[7]]
## Logistic Regression Model
##  
##  lrm(formula = as.formula(form), data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     256.61    R2       0.412    C       0.825    
##   No           417    d.f.            11    g        1.671    Dxy     0.650    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.317    gamma   0.652    
##  max |deriv| 1e-08                          gp       0.316    tau-a   0.315    
##                                             Brier    0.161                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -2.5345 0.7820 -3.24  0.0012  
##  cc.ha=Yes                        -0.1543 0.2706 -0.57  0.5686  
##  cc.neuro=Yes                     -1.1729 0.4219 -2.78  0.0054  
##  cc.vision=Yes                     1.1066 0.3619  3.06  0.0022  
##  cc.bp=Yes                        -0.9597 0.5402 -1.78  0.0756  
##  race.black=black                  0.2618 0.2173  1.21  0.2282  
##  sex=W                            -0.1006 0.2567 -0.39  0.6951  
##  phase=2                           2.8215 0.2158 13.07  <0.0001 
##  bmi                              -0.0003 0.0121 -0.02  0.9828  
##  map                               0.0063 0.0061  1.03  0.3013  
##  cc.vision=Yes * race.black=black -0.8050 0.4675 -1.72  0.0851  
##  cc.neuro=Yes * sex=W              0.7887 0.4401  1.79  0.0731  
##  
## 
## [[8]]
## Logistic Regression Model
##  
##  lrm(formula = as.formula(form), data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     257.61    R2       0.413    C       0.826    
##   No           417    d.f.            12    g        1.682    Dxy     0.651    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.377    gamma   0.653    
##  max |deriv| 2e-08                          gp       0.317    tau-a   0.315    
##                                             Brier    0.162                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -2.4223 0.7900 -3.07  0.0022  
##  cc.ha=Yes                        -0.1946 0.2736 -0.71  0.4769  
##  cc.neuro=Yes                     -1.1675 0.4223 -2.76  0.0057  
##  cc.vision=Yes                     1.1042 0.3618  3.05  0.0023  
##  cc.bp=Yes                        -1.0681 0.5526 -1.93  0.0533  
##  race.black=black                  0.2395 0.2186  1.10  0.2732  
##  sex=W                            -0.0901 0.2567 -0.35  0.7255  
##  phase=2                           2.8326 0.2167 13.07  <0.0001 
##  age                              -0.0061 0.0061 -1.00  0.3178  
##  bmi                              -0.0005 0.0121 -0.04  0.9646  
##  map                               0.0082 0.0064  1.28  0.1994  
##  cc.vision=Yes * race.black=black -0.8202 0.4679 -1.75  0.0796  
##  cc.neuro=Yes * sex=W              0.7813 0.4409  1.77  0.0764  
##

# no need to look at each of them; just pull out the exp of the cc.bp coef
(ors <- sapply(models, function(x) exp(coef(x)['cc.bp=Yes'])))

## cc.bp=Yes cc.bp=Yes cc.bp=Yes cc.bp=Yes cc.bp=Yes cc.bp=Yes cc.bp=Yes cc.bp=Yes 
## 0.5169713 0.5116456 0.5147410 0.3830430 0.5091617 0.3437730 0.3829957 0.3436484

# match with the formulas
cbind(formulas, ors)

##           formulas                                                                                                             
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase"                  
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age"            
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi"            
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + map"            
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi"      
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + map"      
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + bmi + map"      
## cc.bp=Yes "examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + cc.vision * race.black + cc.neuro * sex + phase + age + bmi + map"
##           ors                
## cc.bp=Yes "0.51697130753288" 
## cc.bp=Yes "0.511645643233362"
## cc.bp=Yes "0.514741013826027"
## cc.bp=Yes "0.383043040433255"
## cc.bp=Yes "0.509161700241673"
## cc.bp=Yes "0.343773032111207"
## cc.bp=Yes "0.382995700075936"
## cc.bp=Yes "0.343648427724841"

Note the simplest model that fully controls for confounding is the 3rd from the bottom which requires keeping age and MAP. So, let’s plan to drop BMI.

Step 5. Precision

# new function that makes it much easier to create new models:
# . means what was there before
(m4 <- update(m2, . ~ . - bmi))

## Logistic Regression Model
##  
##  lrm(formula = examined ~ cc.ha + cc.neuro + cc.vision + cc.bp + 
##      phase + sex + race.black + age + map + cc.vision:race.black + 
##      cc.neuro:sex, data = edfun)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           704    LR chi2     257.61    R2       0.413    C       0.826    
##   No           417    d.f.            11    g        1.682    Dxy     0.651    
##   Yes          287    Pr(> chi2) <0.0001    gr       5.377    gamma   0.653    
##  max |deriv| 2e-08                          gp       0.317    tau-a   0.315    
##                                             Brier    0.162                     
##  
##                                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept                        -2.4320 0.7594 -3.20  0.0014  
##  cc.ha=Yes                        -0.1952 0.2733 -0.71  0.4752  
##  cc.neuro=Yes                     -1.1680 0.4222 -2.77  0.0057  
##  cc.vision=Yes                     1.1046 0.3617  3.05  0.0023  
##  cc.bp=Yes                        -1.0678 0.5526 -1.93  0.0533  
##  phase=2                           2.8322 0.2164 13.09  <0.0001 
##  sex=W                            -0.0915 0.2550 -0.36  0.7198  
##  race.black=black                  0.2381 0.2164  1.10  0.2711  
##  age                              -0.0061 0.0061 -1.00  0.3181  
##  map                               0.0082 0.0063  1.29  0.1971  
##  cc.vision=Yes * race.black=black -0.8211 0.4675 -1.76  0.0790  
##  cc.neuro=Yes * sex=W              0.7824 0.4402  1.78  0.0755  
##

Look at all the coefficients for the complaints and conditions. They are generally a little smaller. No need to keep age and BMI for political reasons so we have the final model (m4).

Step 6. Recheck assumptions and GOF

It’s all good:

# no nonlinearity at 0.05
anova(update(m4, . ~ . - age + rcs(age, 5) - map + rcs(map, 5)))

##                 Wald Statistics          Response: examined 
## 
##  Factor                                                Chi-Square d.f. P     
##  cc.ha                                                   0.65      1   0.4188
##  cc.neuro  (Factor+Higher Order Factors)                 8.04      2   0.0180
##   All Interactions                                       3.03      1   0.0818
##  cc.vision  (Factor+Higher Order Factors)                8.09      2   0.0175
##   All Interactions                                       2.25      1   0.1336
##  cc.bp                                                   0.32      1   0.5694
##  phase                                                 168.45      1   <.0001
##  sex  (Factor+Higher Order Factors)                      4.11      2   0.1283
##   All Interactions                                       3.03      1   0.0818
##  race.black  (Factor+Higher Order Factors)               2.26      2   0.3233
##   All Interactions                                       2.25      1   0.1336
##  age                                                     7.52      4   0.1109
##   Nonlinear                                              6.37      3   0.0951
##  map                                                     6.52      4   0.1637
##   Nonlinear                                              5.26      3   0.1535
##  cc.vision * race.black  (Factor+Higher Order Factors)   2.25      1   0.1336
##  cc.neuro * sex  (Factor+Higher Order Factors)           3.03      1   0.0818
##  TOTAL NONLINEAR                                        11.79      6   0.0669
##  TOTAL INTERACTION                                       5.60      2   0.0609
##  TOTAL NONLINEAR + INTERACTION                          18.12      8   0.0204
##  TOTAL                                                 180.79     17   <.0001

# no collinearity (only intercept and map - safe to ignore)
colldiag.alt(m4)

## Condition
## Index    Variance Decomposition Proportions
##            Intercept cc.ha=Yes cc.neuro=Yes cc.vision=Yes cc.bp=Yes phase=2
## 1    1.000 0.000     0.002     0.001        0.002         0.001     0.005  
## 2    2.042 0.000     0.001     0.018        0.059         0.000     0.004  
## 3    2.256 0.000     0.015     0.020        0.026         0.097     0.003  
## 4    2.669 0.000     0.048     0.004        0.004         0.351     0.000  
## 5    3.539 0.001     0.000     0.000        0.151         0.010     0.007  
## 6    4.095 0.001     0.031     0.031        0.000         0.029     0.171  
## 7    4.840 0.001     0.001     0.020        0.022         0.001     0.472  
## 8    5.986 0.000     0.232     0.061        0.235         0.045     0.116  
## 9    6.697 0.000     0.299     0.000        0.265         0.061     0.150  
## 10   8.460 0.002     0.006     0.529        0.017         0.012     0.012  
## 11  12.304 0.078     0.331     0.260        0.159         0.218     0.014  
## 12  26.633 0.917     0.033     0.055        0.059         0.176     0.045  
##    sex=W race.black=black age   map   cc.vision=Yes * race.black=black
## 1  0.004 0.005            0.002 0.000 0.001                           
## 2  0.000 0.001            0.000 0.000 0.082                           
## 3  0.000 0.001            0.000 0.000 0.044                           
## 4  0.002 0.002            0.000 0.000 0.020                           
## 5  0.000 0.322            0.009 0.001 0.116                           
## 6  0.170 0.000            0.011 0.000 0.000                           
## 7  0.061 0.287            0.007 0.002 0.150                           
## 8  0.115 0.273            0.004 0.001 0.475                           
## 9  0.225 0.040            0.172 0.001 0.077                           
## 10 0.331 0.013            0.287 0.000 0.003                           
## 11 0.057 0.055            0.502 0.140 0.015                           
## 12 0.034 0.001            0.005 0.854 0.015                           
##    cc.neuro=Yes * sex=W
## 1  0.002               
## 2  0.024               
## 3  0.032               
## 4  0.005               
## 5  0.001               
## 6  0.110               
## 7  0.005               
## 8  0.001               
## 9  0.117               
## 10 0.567               
## 11 0.101               
## 12 0.035

# model has no evidence of lack of fit
residuals(m4, "gof") # need to put x = TRUE, y = TRUE in fit for this to work (see error)

## Error in residuals.lrm(m4, "gof"): you did not specify y=TRUE in the fit

residuals(update(m4, . ~ ., x = TRUE, y = TRUE), "gof")

## Sum of squared errors     Expected value|H0                    SD 
##           113.6978708           113.4894754             0.4999796 
##                     Z                     P 
##             0.4168078             0.6768190

Step 7. Report

Create tables…