Rcel, Part 2

Author
Affiliation

Beau B. Bruce, MD, PhD

Emory University

Exposition

Introduction

In the first part of this exercise we learned about vector and factor which are two composite data types as we build toward a data.frame. As you recall, a vector requires that each element to be of the same atomic data type.

Now, we meet the first composite R data type that can contain different atomic types as elements simultaneously: list. A specialized list called data.frame will be used to store our datasets and will allow us to operate on our datasets as spreadsheet-like objects.

Lists

Starting with list, a list allows you to mix ‘n’ match atomic data types. You build a list with the function list, like this:

l <- list("A", 2, TRUE)

See the subtle difference between l (“ell”) and 1 (“one”). Notice that there is also a difference between O (“oh”) and 0 (“zero”). This is important so keep an eye out for it throughout this lesson.

Try creating l with the expression above now.

Note

Type l <- list("A", 2, TRUE).

Note
l <- list("A", 2, TRUE)
l <- list("A", 2, TRUE)

To extract an element from a list, use double square brackets. For example, l[[1]] will extract the first element of l.

Try it now.

Note
l[[1]]
l[[1]]

Now test if the first element of l really is a character.

Note

Use is.character, l, and [[1]] somehow.

Note
is.character(l[[1]])
is.character(l[[1]])

Now test if the second element of l really is numeric.

Note

Use is.numeric, l, and [[2]] somehow.

Note
is.numeric(l[[2]])
is.numeric(l[[2]])

Now check if the third element of l really is logical.

Note

Use is.logical, l, and [[3]] somehow.

Note
is.logical(l[[3]])
is.logical(l[[3]])

Finally, what function do you think checks if something really is a list? Try it now on l.

Note

is.list?

Note
is.list(l)
is.list(l)

Each element does not have to be a single atomic value. You can use longer vectors.

Reassign l to be list("A", c(1,3,2,4), TRUE).

Note

Type l <- list("A", c(1,3,2,4), TRUE)

Note
l <- list("A", c(1,3,2,4), TRUE)
l <- list("A", c(1,3,2,4), TRUE)

Now examine the value of l and notice how R gives you hints about how to access the elements.

Note

Nothing more than just typing l (“ell”).

Note
l
l

Now examine the second element of l.

Note

Don’t forget the double bracket!

Note
l[[2]]
l[[2]]

You can also name the elements in a list like this:

l2 <- list(a = "A", b = c(1, 3, 2, 4), c = TRUE)

Try it now.

Note

Type l2 <- list(a = "A", b = c(1, 3, 2, 4), c = TRUE)

Note
l2 <- list(a = "A", b = c(1, 3, 2, 4), c = TRUE)
l2 <- list(a = "A", b = c(1, 3, 2, 4), c = TRUE)

Now examine the value of l2 and notice how R gives you hints about how to access the elements.

Note

Type l2, that is ell two (l2) not twelve (12).

Note
l2
l2

So, those hints tell you to use a dollar sign to access the element by name. For example, to access the logical value in l2, type l2$c.

Note

Type l2$c

Note
l2$c
l2$c

With a vector recall that you can assign to a specific element. Let’s create a vector to try this with. Type x <- c(1, 3, 2, 4).

Note

Type x <- c(1, 3, 2, 4)

Note
x <- c(1, 3, 2, 4)
x <- c(1, 3, 2, 4)

If we wanted to reassign the second element to be 100, what would we do?

Note

Type x[2] <- 100

Note
x[2] <- 100
x[2] <- 100

Now examine x to check the result.

Note

Type x

Note
x
x

Now, change the third element of x to 5.

Note

Type x[3] <- 5

Note
x[3] <- 5
x[3] <- 5

You can also assign to parts of a list. Let’s remember what l looks like first. Type l.

Note

Type l (“ell”)

Note
l
l

So if I wanted to assign to the 3rd element of the vector that is the 2nd element of l, what would one do? Start by telling me what the second element of l is?

Note

Type l[[2]] (don’t forget the double brackets)

Note
l[[2]]
l[[2]]

l[[2]] is a vector. I want the third element. I use single square brackets to get an element. If I tell you that you can treat l[[2]] just like the name for that vector what would you type to get the third element?

Note

Type l[[2]][3]. If you didn’t get it, let’s talk more about it in class.

Note
l[[2]][3]
l[[2]][3]

Now, change that same value to 42.

Note

Use l[[2]][3], the assignment operator (<-), and 42.

Note
l[[2]][3] <- 42
l[[2]][3] <- 42

Dataframes

Now, to the last major data type that we are going to learn about in this course, the data.frame. A data.frame is a specialized list where all the elements of the list have equal length. It is perfect for representing data where there are several values (in columns) per observation (in rows).

You create a data.frame with the data.frame function. Type:

df <- data.frame(age = c(3, 2, 3, 3, 1, 2, 4, 4), 
                 sex = factor(c("m", "f", "m", "f", "m", "f", "m", "f")), 
                 sorethroat = factor(c("no", "no", "no", "no", "yes", 
                                       "yes", "yes", "yes")
                                    )
                )

Here, the places that I put the line endings are not strictly required; they are there to help clarity with where the parentheses open and close.

Note

Just type it out carefully. I know it is a pain, but it will make you appreciate the other ways we will create these in the future!

Note
df <- data.frame(age = c(3, 2, 3, 3, 1, 2, 4, 4), sex = factor(c("m", "f", "m", "f", "m", "f", "m", "f")), sorethroat = factor(c("no", "no", "no", "no", "yes", "yes", "yes", "yes")))
df <- data.frame(age = c(3, 2, 3, 3, 1, 2, 4, 4), sex = factor(c("m", "f", "m", "f", "m", "f", "m", "f")), sorethroat = factor(c("no", "no", "no", "no", "yes", "yes", "yes", "yes")))

Now type df to take a look at your new data.frame!

Note

Type df

Note
df
df

OK, do you remember how to access the element of a list by name? The $ right? So how could you get just the age column out of df?

Note

Does df$age make sense?

Note
df$age
df$age

You can also use summary on data.frames. Try it now on df.

Note

Does summary(df) make sense?

Note
summary(df)
summary(df)

You can check what variables are in a data.frame like this (it also works for a list): names(df). Try it now.

Note

Type names(df)

Note
names(df)
names(df)

There are several ways to access elements in a data.frame and each will be useful somewhere as we get better at programming. We can access a single value by row and column. Rows first, columns second. Try df[2, 3].

Note

Type df[2, 3]

Note
df[2, 3]
df[2, 3]

If we leave out the column specification, you get the whole row: df[2, ]. Try it.

Note

Type df[2, ]

Note
df[2, ]
df[2, ]

If we leave out the row specification, you get the whole column: df[, 3]. Try it.

Note

Type df[, 3]

Note
df[, 3]
df[, 3]

You can also use the column name instead of the index when you are accessing elements or columns. An example would be: df[, "sorethroat"]. Try it.

Note

Type df[, "sorethroat"]

Note
df[, "sorethroat"]
df[, "sorethroat"]

Finally, you can use the subset command to find rows that match certain criteria. Try subset(df, age == 2) to find the observations where the age of the subject is 2.

Note

Type subset(df, age == 2)

Note
subset(df, age == 2)
subset(df, age == 2)

Experimentation

Now try to combine your knowledge of the &, |, and ! operators (you may only need one of those) to find those subjects who are both age 2 and have a sorethroat using the subset function.

Note

The wording implies and (&) and don’t forget the quotes around "yes" for sorethroat because you are accessing by a character type not a variable name.

Note
subset(df, age == 2 & sorethroat == "yes")
subset(df, age == 2 & sorethroat == "yes")

What comes out of the subset function is a data.frame that you can further subset using the operators you already know. For example, try subset(df, sorethroat == "yes")$age.

Note

Type subset(df, sorethroat == "yes")$age

Note
subset(df, sorethroat == "yes")$age
subset(df, sorethroat == "yes")$age

And we can apply functions to that like summary, e.g., summary(subset(df, sorethroat == "yes")$age). Try it.

Note

Type summary(subset(df, sorethroat == "yes")$age)

Note
summary(subset(df, sorethroat == "yes")$age)
summary(subset(df, sorethroat == "yes")$age)

R has a lot of built in data.frames. One we’ll use a lot in this course is the esoph dataset. You load it with data(esoph). Do that now.

Note

Type data(esoph)

Note
data(esoph)
data(esoph)

Now that it is loaded (under the name esoph) use summary to examine it.

Note

Type summary(esoph)

Note
summary(esoph)
summary(esoph)

Extract esoph’s 4th row.

Note

Type esoph[4, ]

Note
esoph[4, ]
esoph[4, ]

List the variable names in esoph.

Note

Type names(esoph)

Note
names(esoph)
names(esoph)

esoph is in a case-control format where each row represents multiple cases and controls, in variables named ncases and ncontrols. So, if you are asked how many observations there are you’d add the number of cases and controls.

So, write code that will give the number of cases which have tobacco consumption (tobgp) of 30+ gm/day. You’ll find the function sum helpful, which takes a vector and adds it up. You’ll use it in a pattern just like you did with summary above on a column of a subset.

Note

subset(esoph, tobgp == "30+") is the kernel of what you need. Then, that is a little data.frame from which you need the variable ncases and you then need to add it up - how do you put it all together?

Note

A couple of examples that work are sum(subset(esoph, tobgp == "30+")$ncases) and sum(esoph$ncases[esoph$tobgp == "30+"]).

OK, here is your most challenging problem yet. What is the number of observations (ncases + ncontrols) where both the age group (agegp) is 25-34 years and tobacco consumption is 10-19 gm/day?

Note

Don’t overthink it. It is a straightforward extension of the last problem. Calculate each of the two pieces separately even though it requires repetition and simply add them together with + is one solution.

Note

Some things that work are sum(subset(esoph, agegp == "25-34" & tobgp == "10-19")$ncases) + sum(subset(esoph, agegp == "25-34" & tobgp == "10-19")$ncontrols), sum(subset(esoph, agegp == "25-34" & tobgp == "10-19")[, c("ncases", "ncontrols")]), and sum(esoph$ncases[esoph$agegp == "25-34" & esoph$tobgp == "10-19"],esoph$ncontrols[esoph$agegp == "25-34" & esoph$tobgp == "10-19"]).

Evaluation

Submit Your Assignment

Submit your assignment below.