Exercise

High-level plotting functions

High-level plotting functions clear the canvas (where the plot is drawn) and produce an entire plot including axis labels, points, lines, boxes, etc. depending on the type of plot requested. All the usual suspects in terms of types of figures are avaialble. We will see many examples using random and builtin datasets. For some of the examples, you may need to break down the data or the processing done to understand the options that the high-level plot function takes. Remember to work inside out.

Scatter plot

set.seed(536) # set seed to keep random plots the same
# create 50 random normally distributed points
x <- rnorm(50)
y <- rnorm(50)
plot(x, y)

Bar plot

# 50 random Poisson distributed points with mean 3
(x <- rpois(50, lambda = 3))
##  [1] 3 3 2 5 3 2 2 1 4 3 2 1 1 2 3 3 4 1 4 2 2 7 1 5 7 1 4 0 4 1 3 3 5 2 3
## [36] 5 1 2 3 2 7 2 1 3 4 5 4 1 3 2
table(x)
## x
##  0  1  2  3  4  5  7 
##  1 10 12 12  7  5  3
barplot(table(x))

Curve

Plots a function

curve(exp)

curve(x ^ 3 - 3 * x)

Histogram

x <- rnorm(100)
hist(x)

Dot chart

head(VADeaths)  # displays first 6 rows by default
##       Rural Male Rural Female Urban Male Urban Female
## 50-54       11.7          8.7       15.4          8.4
## 55-59       18.1         11.7       24.3         13.6
## 60-64       26.9         20.3       37.0         19.3
## 65-69       41.0         30.9       54.6         35.1
## 70-74       66.0         54.3       71.1         50.0
dotchart(VADeaths)

Image

x <- 10*(1:NROW(volcano))
y <- 10*(1:NCOL(volcano))
image(x, y, volcano) 

Matrix plot

x <- 1:10
a <- c(15, 36, 54, 60, 68, 71, 73, 75, 78, 78)
b <- c(20, 49, 58, 69, 75, 80, 83, 86, 88, 89)
c <- c(24, 58, 68, 75, 83, 90, 93, 93, 95, 96)
df <- data.frame(a, b, c)
matplot(x, df)    # see below when combined with additional options

Mosaic plot

mosaicplot(HairEyeColor)

Box and whiskers plot

spray is a factor. Note that the ~ is commonly used in R for showing the dependent and independent variable(s) in models and situations like this.

boxplot(count ~ spray, data  =  InsectSprays)

Since spray is a factor, R even makes a boxplot by default if you do this (note the missing box):

plot(count ~ spray, data  =  InsectSprays)

Contour plot

x <- 10*(1:NROW(volcano))
y <- 10*(1:NCOL(volcano))
contour(x, y, volcano)

Common options

You can find all the options available (there are lots!) in the help for par. The function par can be used to set options permanently during the R session. However, most options can be set within the plot function which is helpful beacuse you do not generally want to changes the settings

Titles and axis labels

The main function sets the main title (although you rarely want one for publication graphics). The functions xlab and ylab set the axis labels. See their use in the “Point type” section below.

Color

There are many ways to specify colors (including transparency):

x <- rnorm(1000)
y <- rnorm(1000)

By name (see the R Color Chart):

plot(x, y, col = "red")

By number:

plot(x, y, col = 2) # same is above

The rgb function allows you to specify the amount of red, green, and blue as well as the degree of transparency (1.0 is opaque, 0.0 completely invisible). Transparency lets you better see the density of the data.

plot(x, y, col = rgb(1, 0, 0, 0.2)) 

Point type

For symbols 21 through 25, specify border color (col = ) and fill color (bg = ).

plot(x = 1:25, y = rep(1, 25), pch = 1:25, 
     xlab = "Point type number", 
     ylab = "", main = "Point type (pch)", bg = "red") 

plot(x, y, pch = 5) 

Line type

df <- data.frame(lty1 = c(1, 1), lty2 = c(2, 2), lty3 = c(3, 3), 
                 lty4 = c(4, 4), lty5 = c(5, 5), lty6 = c(6, 6))
matplot(x = 1:2, df, lty = 1:6, type = "l", 
        ylab = "lty", xlab = "", col = 1) 

Here you see that type = "l" specifies lines instead of points for matplot.
Read help for main plot function to learn more about type option.

Line thickness

# fractional lty's are allowed also like 0.5
matplot(x = 1:2, df, lwd = c(1, 2, 3, 5, 10, 25), lty = 1, 
        type = "l", ylab = "", yaxt = "n", xlab = "", col = 1) 
axis(2, at = 1:6, labels = c(1, 2, 3, 5, 10, 25)) 

See section about low-level plotting functions below for more about the axis function.

Size

Several related options control the size of various elements. cex affect the size of the drawing elements. cex.axis the axis annotations, cex.lab for the labels, and cex.main for the title.

x <- rnorm(5000)
y <- rnorm(5000)
plot(x, y, main = "5000 Random Points", cex = 0.35, cex.main = 3)

Multiple plots

par(mfrow = c(2, 3)) # makes two rows of plots in three columns, fills by row first
for(i in 1:6) {
    x <- rnorm(50)
    y <- rnorm(50)
    plot(x, y)
}

par(mfrow = c(1, 1)) # put it back, if you don't it will stay 2 rows, 3 columns

Note you can also use mfcol which will do the same thing, but fills down the column first.

Low-level plotting functions

Low-level plotting functions allow you to build a plot from scratch. Usually, you will not completely build a plot from scratch but use them to modify elements of a plot that can’t be done easily with one of the high level plots (as we did when we changed the y-axis for line thickness above) with the function axis or to add or overlay additional elements on a high-level plot.

Lines

There are two primary line functions: lines and abline. lines takes points and connects them by lines on the plot. ylim and xlim are options you’ve not see below that limit the plot region.

# make a blank plot that is 0-10 by 0-10
plot(x = 1, y = 1, type = "n", xlim = c(0, 10), ylim = c(0, 10), xlab = "x", ylab = "y") 
lines(x = c(2, 8, 9, 10), y = c(5, 7, 2, 3))

Use abline to add horizontal and vertical lines and regression lines.

x <- rnorm(50)
y <- rnorm(50)
plot(x, y)
abline(h = 0, lty = 3, col = "blue")
abline(v = 0, lty = 3, col = "red")
m <- lm(y ~ x) # linear model
abline(m, lty = 2) # add the regression line

Points

Points works like lines but adds points instead of lines.

plot(x, y, type = "n") # blank plot of the right size
# can use a dataframe instead of separate x & y variables 
# like we did for lines above
df <- data.frame(x = x, y = y) 
points(subset(df, x < 0 & y < 0), col = 1)
points(subset(df, x > 0 & y < 0), col = 2)
points(subset(df, x > 0 & y > 0), col = 3)
points(subset(df, x < 0 & y > 0), col = 4)

Text

You can add the main title later with title, axis labels with mtext, legends with legend, and general text with text. You can use expression to make math expressions too for your titles or axes. Read the help for plotmath to learn all the things you can do.

plot(x, y)
title("50 random points")
text(0.75, -2, "there are 50 points on here")
mtext(expression(alpha ^ 3))

Saving plots

You are going to want to save your plots for publication or to bring them in to other programs (Word, Powerpoint, etc.). To do this with R is quite simple: use a different graphics device than the default one that draws on your screen. These include pdf, tiff, jpeg, png, and others. Then, you just repeat the commands to draw the plot after calling the new device. When you are done, you call dev.off() and that’ll write the file and get you back to your screen device. Here is an example of making a PDF:

x <- rnorm(50)
y <- rnorm(50)
pdf("plot01.pdf")
plot(x, y)   # nothing comes up on screen
dev.off()   # you'll find your pdf plot in plot01.pdf

In RStudio You can also use the “Export” button in the “Plots” tab of the bottom right panel of your screen (where the plot is drawn).

Explore and Extend

Try to change the optionss for or add additional features (extra lines, points, or text) to the high level plot examples.

Evaluate

Recreate the following plot exactly (e.g., pay particular attention to the type and color of points and lines and the axis labels, etc.) using the data described below.

The caption of the figure which should help you recreate it from the datasets is: “Figure. Mean optic nerve size vs. age in patients with optic nerve hypoplasia (ONH) and controls. Linear regression of the mean optic nerve size of controls (black points: individual control optic nerve measurements, black line: mean optic nerve size of controls, dashed black lines: 95% prediction intervals of mean optic nerve size of controls). Red points are measurements of optic nerves with clinical ONH. Blue points represent the clinically unaffected eye of patients with clinically unilateral ONH. The contralateral optic nerve of ONH patients was generally smaller than control optic nerves.”

There are two datasets. Dataset onhlong contains the individual measurements on each row. case tells us whether the patient was a case or not (factor: 1 vs. 0), clinhypo if the nerve was clinically hypoplastic (factor: Yes vs. No), age of the subject, and mean the mean optic nerve measurement. Dataset onhfit contains predicted values for the regression line mean and the upper (upr95) and lower (lwr95). Each would be plotted by onhfit’s age variable to create the lines.

Some hints:

# you replace the ... with your code afor the right variable and options:
with(subset(onhlong, clinhypo == "No" & case == 1), points(...))  
# you also use the with function with for plotting the lines - you replace the ... with the right variable and options
with(onhfit, lines(...))

Load in the datasets with:

library(devtools); install_github("advdatamgmt/adm") # if you've not refreshed adm recently
library(adm)
onhfit
##    age     mean    lwr95    upr95
## 1    0 2.992167 2.241136 3.743197
## 2    1 3.045639 2.297685 3.793592
## 3    2 3.099111 2.353644 3.844577
## 4    3 3.152583 2.409007 3.896159
## 5    4 3.206055 2.463769 3.948341
## 6    5 3.259527 2.517928 4.001127
## 7    6 3.312999 2.571480 4.054518
## 8    7 3.366471 2.624428 4.108515
## 9    8 3.419944 2.676770 4.163117
## 10   9 3.473416 2.728511 4.218320
## 11  10 3.526888 2.779655 4.274121
## 12  11 3.580360 2.830207 4.330513
## 13  12 3.633832 2.880173 4.387491
## 14  13 3.687304 2.929563 4.445045
## 15  14 3.740776 2.978385 4.503167
## 16  15 3.794248 3.026650 4.561847
## 17  16 3.847720 3.074369 4.621072
## 18  17 3.901193 3.121553 4.680832
## 19  18 3.954665 3.168217 4.741113
head(onhlong) # shows first 6 observations by default
##   case  age clinhypo     mean
## 1    1 2.50      Yes 1.580000
## 2    1 2.58      Yes 2.040000
## 3    1 0.58      Yes 1.100000
## 4    1 0.67       No 2.860000
## 5    1 4.92       No 2.766667
## 6    1 0.33      Yes 1.360000