Exercise

R comes with functions for ordinary linear regression (lm), for generalized linear models (glm), and time-to-event models (package survival - do not need to install - comes with R).
However, a more powerful package for these common models is rms. You’ll see some messages after your load the package.

## install.packages("rms")
library(rms)

We’ll use data in the adm package:

# library(devtools); install_github("advdatamgmt/adm")
library(adm)
## 
## Attaching package: 'adm'
## The following object is masked from 'package:Hmisc':
## 
##     html

Linear regression

Let’s start with a linear regression of the optic nerve head data using the “built-in” version:

lm(mean ~ age, onhlong)
## 
## Call:
## lm(formula = mean ~ age, data = onhlong)
## 
## Coefficients:
## (Intercept)          age  
##      2.0065       0.1399

You don’t get much back that way. It turns out that R returns an object that has a lot of the useful data. You can access many aspects of the model. For a quick summary start with summary:

summary(lm(mean ~ age, onhlong))
## 
## Call:
## lm(formula = mean ~ age, data = onhlong)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8121 -0.6617  0.2338  0.6715  1.2797 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.00651    0.10378  19.335  < 2e-16 ***
## age          0.13995    0.01827   7.661 7.12e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7968 on 112 degrees of freedom
## Multiple R-squared:  0.3439, Adjusted R-squared:  0.338 
## F-statistic:  58.7 on 1 and 112 DF,  p-value: 7.125e-12

That is you usual summary… it will get tedious to keep typing the model so save it in a variable:

m <- lm(mean ~ age, onhlong)
summary(m)
## 
## Call:
## lm(formula = mean ~ age, data = onhlong)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8121 -0.6617  0.2338  0.6715  1.2797 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.00651    0.10378  19.335  < 2e-16 ***
## age          0.13995    0.01827   7.661 7.12e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7968 on 112 degrees of freedom
## Multiple R-squared:  0.3439, Adjusted R-squared:  0.338 
## F-statistic:  58.7 on 1 and 112 DF,  p-value: 7.125e-12
# coefficients
coef(m)
## (Intercept)         age 
##   2.0065072   0.1399477
# diagnostic plots
plot(m)

# influence measures
influence.measures(m)
## Influence measures of
##   lm(formula = mean ~ age, data = onhlong) :
## 
##       dfb.1_   dfb.age    dffit cov.r   cook.d     hat inf
## 1   -0.08896  0.032659 -0.09777 1.011 4.78e-03 0.00987    
## 2   -0.03687  0.012971 -0.04085 1.025 8.41e-04 0.00976    
## 3   -0.15262  0.097382 -0.15309 1.005 1.17e-02 0.01473    
## 4    0.11563 -0.072672  0.11613 1.016 6.75e-03 0.01442    
## 5    0.00468  0.002013  0.00870 1.027 3.81e-05 0.00927    
## 6   -0.11027  0.073173 -0.11037 1.020 6.10e-03 0.01565    
## 7   -0.15860  0.105248 -0.15875 1.005 1.25e-02 0.01565    
## 8    0.01792  0.000322  0.02524 1.026 3.21e-04 0.00877    
## 9    0.05986 -0.019993  0.06697 1.020 2.25e-03 0.00963    
## 10  -0.08797  0.047602 -0.08996 1.019 4.06e-03 0.01218    
## 11   0.00201 -0.001082  0.00206 1.031 2.13e-06 0.01213    
## 12   0.04087 -0.016278  0.04421 1.025 9.84e-04 0.01015    
## 13  -0.21844  0.149166 -0.21847 0.984 2.35e-02 0.01643    
## 14  -0.16493  0.111371 -0.16499 1.004 1.35e-02 0.01611    
## 15  -0.01895  0.012050 -0.01901 1.033 1.82e-04 0.01466    
## 16  -0.13828  0.060978 -0.14665 0.993 1.07e-02 0.01061    
## 17  -0.13974  0.068785 -0.14519 0.996 1.05e-02 0.01131    
## 18  -0.10898  0.069536 -0.10932 1.019 5.99e-03 0.01473    
## 19  -0.13417  0.084323 -0.13474 1.010 9.06e-03 0.01442    
## 20   0.13026 -0.078226  0.13138 1.009 8.61e-03 0.01359    
## 21  -0.05403 -0.048891 -0.13179 0.998 8.63e-03 0.01017    
## 22  -0.07187  0.045861 -0.07210 1.027 2.61e-03 0.01473    
## 23  -0.13490  0.092184 -0.13492 1.015 9.09e-03 0.01645    
## 24   0.22695 -0.704426 -0.79343 0.830 2.81e-01 0.04142   *
## 25   0.15272 -0.078901  0.15734 0.992 1.23e-02 0.01172    
## 26  -0.03797  0.023194 -0.03823 1.031 7.37e-04 0.01388    
## 27  -0.21508  0.146971 -0.21510 0.985 2.28e-02 0.01645    
## 31   0.09021 -0.059861  0.09029 1.025 4.09e-03 0.01565    
## 32   0.02051  0.018561  0.05003 1.024 1.26e-03 0.01017    
## 33   0.02518  0.034182  0.07615 1.020 2.91e-03 0.01099    
## 34   0.03412 -0.098001 -0.10880 1.063 5.96e-03 0.04649   *
## 35   0.02317  0.050774  0.09583 1.017 4.60e-03 0.01220    
## 36   0.04934 -0.011033  0.05899 1.020 1.75e-03 0.00909    
## 37   0.13359 -0.071933  0.13671 1.003 9.30e-03 0.01213    
## 38   0.08252 -0.020225  0.09733 1.009 4.74e-03 0.00917    
## 40   0.00891  0.019534  0.03687 1.029 6.85e-04 0.01220    
## 41   0.00314  0.070223  0.10074 1.025 5.09e-03 0.01706    
## 43   0.03845 -0.108445 -0.11998 1.064 7.24e-03 0.04793   *
## 44   0.04581 -0.030933  0.04582 1.032 1.06e-03 0.01611    
## 46   0.03843 -0.011564  0.04382 1.024 9.67e-04 0.00943    
## 47   0.03416  0.026261  0.07747 1.017 3.01e-03 0.00991    
## 48  -0.00677  0.029589  0.03526 1.048 6.27e-04 0.02965    
## 49   0.13717 -0.086208  0.13776 1.009 9.46e-03 0.01442    
## 50   0.01204  0.010053  0.02831 1.027 4.04e-04 0.01004    
## 51   0.05893  0.004702  0.08663 1.012 3.76e-03 0.00880    
## 53   0.06817 -0.017967  0.07949 1.015 3.17e-03 0.00924    
## 54   0.04164  0.048082  0.11491 1.007 6.59e-03 0.01063    
## 55  -0.00766  0.073678  0.09536 1.033 4.57e-03 0.02176    
## 60   0.01069  0.043985  0.07229 1.026 2.63e-03 0.01393    
## 61   0.14216 -0.064527  0.14993 0.992 1.11e-02 0.01077    
## 62   0.07257 -0.000685  0.10025 1.006 5.02e-03 0.00877    
## 63   0.11438 -0.243451 -0.25510 1.117 3.27e-02 0.09830   *
## 64   0.13189 -0.087523  0.13202 1.014 8.71e-03 0.01565    
## 65   0.01828  0.075209  0.12361 1.013 7.63e-03 0.01393    
## 66   0.08134 -0.178334 -0.18791 1.109 1.78e-02 0.08831   *
## 67   0.01336  0.054972  0.09035 1.022 4.10e-03 0.01393    
## 68   0.17438 -0.109592  0.17512 0.995 1.52e-02 0.01442    
## 110  0.07135 -0.026193  0.07841 1.017 3.08e-03 0.00987    
## 28   0.11471 -0.040353  0.12710 0.998 8.03e-03 0.00976    
## 39  -0.16203  0.103389 -0.16254 1.001 1.31e-02 0.01473    
## 42  -0.05466  0.034352 -0.05489 1.029 1.52e-03 0.01442    
## 52  -0.03463 -0.014879 -0.06428 1.019 2.08e-03 0.00927    
## 69  -0.11668  0.077431 -0.11679 1.018 6.83e-03 0.01565    
## 71  -0.11989  0.079561 -0.12001 1.018 7.21e-03 0.01565    
## 81  -0.07473 -0.001342 -0.10523 1.004 5.52e-03 0.00877    
## 91  -0.04655  0.015548 -0.05208 1.023 1.37e-03 0.00963    
## 101 -0.10452  0.056562 -0.10689 1.014 5.72e-03 0.01218    
## 111  0.02107 -0.011346  0.02156 1.030 2.34e-04 0.01213    
## 121 -0.11990  0.047754 -0.12970 0.999 8.36e-03 0.01015    
## 131 -0.18145  0.123908 -0.18148 0.999 1.63e-02 0.01643    
## 141 -0.09194  0.062086 -0.09197 1.025 4.25e-03 0.01611    
## 151 -0.09573  0.060878 -0.09605 1.022 4.63e-03 0.01466    
## 161 -0.12070  0.053226 -0.12801 1.001 8.15e-03 0.01061    
## 171 -0.07930  0.039035 -0.08239 1.019 3.41e-03 0.01131    
## 181 -0.12140  0.077464 -0.12178 1.015 7.42e-03 0.01473    
## 191 -0.10958  0.068869 -0.11005 1.018 6.06e-03 0.01442    
## 201  0.15416 -0.092579  0.15548 1.000 1.20e-02 0.01359    
## 211 -0.05403 -0.048891 -0.13179 0.998 8.63e-03 0.01017    
## 221 -0.08422  0.053737 -0.08448 1.024 3.59e-03 0.01473    
## 231 -0.22693  0.155074 -0.22696 0.980 2.53e-02 0.01645    
## 241  0.03131 -0.097176 -0.10945 1.057 6.03e-03 0.04142   *
## 251 -0.08361  0.043193 -0.08614 1.019 3.72e-03 0.01172    
## 261 -0.17102  0.104459 -0.17218 0.994 1.47e-02 0.01388    
## 271 -0.23203  0.158557 -0.23206 0.978 2.64e-02 0.01645    
## 311  0.06788 -0.045043  0.06794 1.029 2.32e-03 0.01565    
## 321  0.02785  0.025204  0.06794 1.020 2.32e-03 0.01017    
## 331  0.02254  0.030593  0.06816 1.022 2.33e-03 0.01099    
## 341  0.05015 -0.144042 -0.15992 1.058 1.28e-02 0.04649   *
## 351  0.02317  0.050774  0.09583 1.017 4.60e-03 0.01220    
## 361  0.05136 -0.011486  0.06140 1.020 1.90e-03 0.00909    
## 371  0.11972 -0.064468  0.12252 1.008 7.49e-03 0.01213    
## 381  0.07423 -0.018193  0.08755 1.012 3.84e-03 0.00917    
## 401  0.01568  0.034377  0.06489 1.024 2.12e-03 0.01220    
## 411  0.00345  0.077229  0.11079 1.023 6.15e-03 0.01706    
## 431  0.06162 -0.173801 -0.19228 1.055 1.85e-02 0.04793   *
## 441  0.07490 -0.050580  0.07493 1.028 2.82e-03 0.01611    
## 461  0.03412 -0.010268  0.03891 1.025 7.63e-04 0.00943    
## 471  0.04312  0.033146  0.09778 1.011 4.78e-03 0.00991    
## 481 -0.00762  0.033311  0.03970 1.048 7.95e-04 0.02965    
## 491  0.12178 -0.076533  0.12230 1.014 7.48e-03 0.01442    
## 501  0.01312  0.010952  0.03084 1.027 4.79e-04 0.01004    
## 511  0.06545  0.005222  0.09620 1.008 4.63e-03 0.00880    
## 531  0.07027 -0.018520  0.08194 1.014 3.37e-03 0.00924    
## 541  0.04452  0.051410  0.12287 1.003 7.52e-03 0.01063    
## 551 -0.00706  0.067825  0.08778 1.034 3.88e-03 0.02176    
## 601  0.00936  0.038503  0.06328 1.027 2.02e-03 0.01393    
## 611  0.14728 -0.066849  0.15533 0.989 1.19e-02 0.01077    
## 621  0.06910 -0.000652  0.09546 1.008 4.56e-03 0.00877    
## 631  0.12617 -0.268550 -0.28140 1.114 3.97e-02 0.09830   *
## 641  0.12224 -0.081119  0.12236 1.017 7.49e-03 0.01565    
## 651  0.01694  0.069676  0.11452 1.015 6.56e-03 0.01393    
## 661  0.09907 -0.217219 -0.22888 1.106 2.63e-02 0.08831   *
## 671  0.01069  0.043985  0.07229 1.026 2.63e-03 0.01393    
## 681  0.19628 -0.123358  0.19712 0.985 1.91e-02 0.01442

After you have loaded the rms package you can use the ols function. OLS is ordinary least squares:

m <- ols(mean ~ age, onhlong)
m # no need for summary
## Linear Regression Model
##  
##  ols(formula = mean ~ age, data = onhlong)
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs     114    LR chi2     48.04    R2       0.344    
##  sigma0.7968    d.f.            1    R2 adj   0.338    
##  d.f.    112    Pr(> chi2) 0.0000    g        0.597    
##  
##  Residuals
##  
##      Min      1Q  Median      3Q     Max 
##  -2.8121 -0.6617  0.2338  0.6715  1.2797 
##  
##  
##            Coef   S.E.   t     Pr(>|t|)
##  Intercept 2.0065 0.1038 19.33 <0.0001 
##  age       0.1399 0.0183  7.66 <0.0001 
## 
# you can make nice plots with rms, but first do the following to let
# rms know the distribution of your dataset
options(datadist = "dd")
dd <- datadist(onhlong)

# now you can plot the actual line
plot(Predict(m, age))

# and if you do summary you get your CIs and an effect size based on the
# IQR of the different variables
summary(m)
##              Effects              Response : mean 
## 
##  Factor Low  High Diff. Effect  S.E.     Lower 0.95 Upper 0.95
##  age    0.67 5.83 5.16  0.72213 0.094256 0.53537    0.90889
# let's add in the case variable
m2 <- ols(mean ~ age + case, onhlong)
m2
## Linear Regression Model
##  
##  ols(formula = mean ~ age + case, data = onhlong)
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs     114    LR chi2    151.36    R2       0.735    
##  sigma0.5087    d.f.            2    R2 adj   0.730    
##  d.f.    111    Pr(> chi2) 0.0000    g        0.928    
##  
##  Residuals
##  
##       Min       1Q   Median       3Q      Max 
##  -1.47290 -0.28740 -0.03718  0.24526  1.70644 
##  
##  
##            Coef    S.E.   t      Pr(>|t|)
##  Intercept  2.9292 0.0979  29.91 <0.0001 
##  age        0.0647 0.0131   4.95 <0.0001 
##  case=1    -1.3711 0.1072 -12.80 <0.0001 
## 
plot(Predict(m2, case))

summary(m2)
##              Effects              Response : mean 
## 
##  Factor     Low  High Diff. Effect  S.E.     Lower 0.95 Upper 0.95
##  age        0.67 5.83 5.16   0.3336 0.067406  0.20003    0.46717  
##  case - 1:0 1.00 2.00   NA  -1.3711 0.107150 -1.58340   -1.15880
plot(m2, which = 1) # check the help for plot.lm

which.influence(m2)
## Error in residuals.ols(fit, "dfbetas"): did not specify x=TRUE in fit

Sometimes rms will ask you to specify x or y in the fit. Do it like this:

m2 <- ols(mean ~ age + case, onhlong, x = TRUE)

It just saves extra info about the data which makes the model object bigger but doesn’t change anything else:

which.influence(m2)
## $Intercept
## [1] 24 39 81
## 
## $age
## [1] 24 81 95
## 
## $case
## [1]  5  8  9 24 25 58 59 81
show.influence(which.influence(m2), onhlong)
##    Count    age case
## 5      1   4.92   *1
## 8      1   4.00   *1
## 9      1   2.67   *1
## 24     3 *11.83   *1
## 25     1   1.58   *1
## 39     1   0.58    1
## 81     3 * 4.00   *1

If you want to specify interaction (R keeps things HWF) use the * for all the terms:

(m3 <- ols(mean ~ age * case , onhlong))
## Linear Regression Model
##  
##  ols(formula = mean ~ age * case, data = onhlong)
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs     114    LR chi2    154.29    R2       0.742    
##  sigma0.5045    d.f.            3    R2 adj   0.735    
##  d.f.    110    Pr(> chi2) 0.0000    g        0.932    
##  
##  Residuals
##  
##       Min       1Q   Median       3Q      Max 
##  -1.89925 -0.28756 -0.02924  0.22494  1.72189 
##  
##  
##               Coef    S.E.   t      Pr(>|t|)
##  Intercept     2.9922 0.1040  28.78 <0.0001 
##  age           0.0535 0.0145   3.68 0.0004  
##  case=1       -1.5176 0.1370 -11.08 <0.0001 
##  age * case=1  0.0543 0.0320   1.69 0.0930  
## 

Finally, one useful function for removing or adding terms as you model is update. The . means keep everything on that side and the - will take terms away while + would add:

update(m2, . ~ . - age)
## Linear Regression Model
##  
##  ols(formula = mean ~ case, data = onhlong, x = TRUE)
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs     114    LR chi2    128.63    R2       0.676    
##  sigma0.5596    d.f.            1    R2 adj   0.674    
##  d.f.    112    Pr(> chi2) 0.0000    g        0.806    
##  
##  Residuals
##  
##       Min       1Q   Median       3Q      Max 
##  -1.06340 -0.36089 -0.05339  0.30661  1.69660 
##  
##  
##            Coef    S.E.   t      Pr(>|t|)
##  Intercept  3.2934 0.0711  46.34 <0.0001 
##  case=1    -1.6100 0.1052 -15.30 <0.0001 
## 
update(m2, . ~ . - case)
## Linear Regression Model
##  
##  ols(formula = mean ~ age, data = onhlong, x = TRUE)
##  
##                 Model Likelihood     Discrimination    
##                    Ratio Test           Indexes        
##  Obs     114    LR chi2     48.04    R2       0.344    
##  sigma0.7968    d.f.            1    R2 adj   0.338    
##  d.f.    112    Pr(> chi2) 0.0000    g        0.597    
##  
##  Residuals
##  
##      Min      1Q  Median      3Q     Max 
##  -2.8121 -0.6617  0.2338  0.6715  1.2797 
##  
##  
##            Coef   S.E.   t     Pr(>|t|)
##  Intercept 2.0065 0.1038 19.33 <0.0001 
##  age       0.1399 0.0183  7.66 <0.0001 
## 

You can also add additional valid arguements (like x = TRUE):

which.influence(update(m3, x = TRUE))
## $Intercept
## [1] 39 85 96
## 
## $age
## [1] 39 95
## 
## $case
## [1]  4 24 25 39 81
## 
## $`age * case`
## [1]  5 24 81

Logistic regression

For logistic regression use rms’s lrm instead of ols.

(m <- lrm(case ~ age, onhlong))
## Logistic Regression Model
##  
##  lrm(formula = case ~ age, data = onhlong)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           114    LR chi2      29.36    R2       0.304    C       0.793    
##   0             62    d.f.             1    g        1.517    Dxy     0.586    
##   1             52    Pr(> chi2) <0.0001    gr       4.560    gamma   0.594    
##  max |deriv| 4e-08                          gp       0.268    tau-a   0.293    
##                                             Brier    0.185                     
##  
##            Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept  1.0033 0.3149  3.19  0.0014  
##  age       -0.3556 0.0844 -4.21  <0.0001 
## 
plot(Predict(m, age)) # remember that the model forces the log-odds to be linear

summary(m)
##              Effects              Response : case 
## 
##  Factor      Low  High Diff. Effect  S.E.    Lower 0.95 Upper 0.95
##  age         0.67 5.83 5.16  -1.8351 0.43574 -2.689100  -0.98103  
##   Odds Ratio 0.67 5.83 5.16   0.1596      NA  0.067942   0.37493
summary(m, age = c(4,5))
##              Effects              Response : case 
## 
##  Factor      Low High Diff. Effect   S.E.     Lower 0.95 Upper 0.95
##  age         4   5    1     -0.35563 0.084446 -0.52114   -0.19012  
##   Odds Ratio 4   5    1      0.70073       NA  0.59384    0.82686

Time-to-event

For time-to-event you can use the built-in functions from the survival package.

library(survival)

# the Surv specifies the follow-up time and the 
# status of the subject; survfit is K-M
survfit(Surv(futime, fustat) ~ rx, data = ovarian)
## Call: survfit(formula = Surv(futime, fustat) ~ rx, data = ovarian)
## 
##       n events median 0.95LCL 0.95UCL
## rx=1 13      7    638     268      NA
## rx=2 13      5     NA     475      NA
survdiff(Surv(futime, fustat) ~ rx, data = ovarian)
## Call:
## survdiff(formula = Surv(futime, fustat) ~ rx, data = ovarian)
## 
##       N Observed Expected (O-E)^2/E (O-E)^2/V
## rx=1 13        7     5.23     0.596      1.06
## rx=2 13        5     6.77     0.461      1.06
## 
##  Chisq= 1.1  on 1 degrees of freedom, p= 0.303
# stratify us strata in the survival package
survdiff(Surv(time, status) ~ pat.karno + strata(inst), data=lung)
## Call:
## survdiff(formula = Surv(time, status) ~ pat.karno + strata(inst), 
##     data = lung)
## 
## n=224, 4 observations deleted due to missingness.
## 
##                N Observed Expected (O-E)^2/E (O-E)^2/V
## pat.karno=30   2        1    0.692   0.13720   0.15752
## pat.karno=40   2        1    1.099   0.00889   0.00973
## pat.karno=50   4        4    1.166   6.88314   7.45359
## pat.karno=60  30       27   16.298   7.02790   9.57333
## pat.karno=70  41       31   26.358   0.81742   1.14774
## pat.karno=80  50       38   41.938   0.36978   0.60032
## pat.karno=90  60       38   47.242   1.80800   3.23078
## pat.karno=100 35       21   26.207   1.03451   1.44067
## 
##  Chisq= 21.4  on 7 degrees of freedom, p= 0.00326
# plot it
plot(survfit(Surv(futime, fustat) ~ rx, data = ovarian))

# Cox model
(m <- coxph(Surv(futime, fustat) ~ rx + age, data = ovarian))
## Call:
## coxph(formula = Surv(futime, fustat) ~ rx + age, data = ovarian)
## 
##        coef exp(coef) se(coef)     z      p
## rx  -0.8040    0.4475   0.6320 -1.27 0.2034
## age  0.1473    1.1587   0.0461  3.19 0.0014
## 
## Likelihood ratio test=15.9  on 2 df, p=0.000355
## n= 26, number of events= 12
summary(m)
## Call:
## coxph(formula = Surv(futime, fustat) ~ rx + age, data = ovarian)
## 
##   n= 26, number of events= 12 
## 
##         coef exp(coef) se(coef)      z Pr(>|z|)   
## rx  -0.80397   0.44755  0.63205 -1.272  0.20337   
## age  0.14733   1.15873  0.04615  3.193  0.00141 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##     exp(coef) exp(-coef) lower .95 upper .95
## rx     0.4475      2.234    0.1297     1.545
## age    1.1587      0.863    1.0585     1.268
## 
## Concordance= 0.798  (se = 0.091 )
## Rsquare= 0.457   (max possible= 0.932 )
## Likelihood ratio test= 15.89  on 2 df,   p=0.0003551
## Wald test            = 13.47  on 2 df,   p=0.00119
## Score (logrank) test = 18.56  on 2 df,   p=9.341e-05
# plot(m) - doesn't work

With the rms package:

# Cox model
## only need once per session and we did above
# options(datadist = "dd")
dd <- datadist(ovarian)
(m <- cph(Surv(futime, fustat) ~ rx + age, data = ovarian))
## Cox Proportional Hazards Model
##  
##  cph(formula = Surv(futime, fustat) ~ rx + age, data = ovarian)
##  
##                      Model Tests       Discrimination    
##                                           Indexes        
##  Obs        26    LR chi2     15.89    R2       0.490    
##  Events     12    d.f.            2    Dxy      0.596    
##  Center 7.0687    Pr(> chi2) 0.0004    g        1.734    
##                   Score chi2  18.56    gr       5.661    
##                   Pr(> chi2) 0.0001                      
##  
##      Coef    S.E.   Wald Z Pr(>|Z|)
##  rx  -0.8040 0.6320 -1.27  0.2034  
##  age  0.1473 0.0461  3.19  0.0014  
## 
# need x and y
m <- update(m, x = TRUE, y = TRUE)
survplot(m, rx)

survplot(m, age)

Extra rms functions

More powerful than summary:

describe(ovarian)
## ovarian 
## 
##  6  Variables      26  Observations
## ---------------------------------------------------------------------------
## futime 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       26        0       26        1    599.5    390.3    125.2    212.0 
##      .25      .50      .75      .90      .95 
##    368.0    476.0    794.8   1117.5   1186.8 
## 
## lowest :   59  115  156  268  329, highest: 1040 1106 1129 1206 1227
## ---------------------------------------------------------------------------
## fustat 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##       26        0        2    0.747       12   0.4615   0.5169 
## 
## ---------------------------------------------------------------------------
## age 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       26        0       26        1    56.17    11.64    40.23    43.13 
##      .25      .50      .75      .90      .95 
##    50.17    56.85    62.38    69.40    73.95 
## 
## lowest : 38.8932 39.2712 43.1233 43.1370 44.2055
## highest: 64.4247 66.4658 72.3315 74.4932 74.5041
## ---------------------------------------------------------------------------
## resid.ds 
##        n  missing distinct     Info     Mean      Gmd 
##       26        0        2    0.733    1.577   0.5077 
##                       
## Value          1     2
## Frequency     11    15
## Proportion 0.423 0.577
## ---------------------------------------------------------------------------
## rx 
##        n  missing distinct     Info     Mean      Gmd 
##       26        0        2    0.751      1.5     0.52 
##                   
## Value        1   2
## Frequency   13  13
## Proportion 0.5 0.5
## ---------------------------------------------------------------------------
## ecog.ps 
##        n  missing distinct     Info     Mean      Gmd 
##       26        0        2    0.747    1.462   0.5169 
##                       
## Value          1     2
## Frequency     14    12
## Proportion 0.538 0.462
## ---------------------------------------------------------------------------

Really awesome for univariate analysis:

summary(rx ~ age + futime + fustat, ovarian, method = "reverse", test = TRUE)
## 
## 
## Descriptive Statistics by rx
## 
## +------+---------------------------+---------------------------+------------------------------+
## |      |1                          |2                          |  Test                        |
## |      |(N=13)                     |(N=13)                     |Statistic                     |
## +------+---------------------------+---------------------------+------------------------------+
## |age   |    43.1370/56.4301/66.4658|    53.9068/57.0521/59.6301|   F=0.14 d.f.=1,24 P=0.709   |
## +------+---------------------------+---------------------------+------------------------------+
## |futime|          268/448/803      |          421/563/770      |   F=1.48 d.f.=1,24 P=0.236   |
## +------+---------------------------+---------------------------+------------------------------+
## |fustat|           54%  (7)        |           38%  (5)        |Chi-square=0.62 d.f.=1 P=0.431|
## +------+---------------------------+---------------------------+------------------------------+

Read more about the function above in the help for summary.formula.

Explore and Extend

Look at the edfun dataset in the adm package. It has four variables representing four presenting complaints (headache, neurologic, vision, and elevated blood pressure), phase of the study (1 or 2), sex, race, age, body mass index, and mean arterial pressure (MAP). Is there a linear relationship between higher MAP and older age?