Rcel, Part 1

Author
Affiliation

Beau B. Bruce, MD, PhD

Emory University

Exposition

Introduction

Once you learn R well, you will likely find that you do not tend to use spreadsheets like Microsoft Excel as much for common tasks that you might have previously done in spreadsheet software. We will learn about a spreadsheet-like structure, data.frame, that develops naturally as we work with the composite data types vector, factor, and list.

These are foundational exercises so do not just type mindlessly at the R console to get the “right” answer, but also try to be sure you understand what is happening.

This lesson will only cover the vector and factor data types.

Vectors

A vector is constructed using a function simply called c. Note that c is lower case which, as you now know, matters because R is a case-sensitive language. If you type a capital, or upper case, “C” instead of lower case “c” then R will not know what you mean. This seems a minor inconvenience when you realize that “c” stands for “combine” or “concatenate.” Let’s thank the designers of R for giving this important and commonly used function a short name!

Try it out! Type and run c(1, 2, 3).

Note

Just type: c(1, 2, 3)

Note
c(1, 2, 3)
c(1, 2, 3)

A convenient way to make a vector of consecutive numbers is to use a colon, i.e, :, between them them like 1:5.

Try it now.

Note

Type 1:5

Note
1:5
1:5

A vector can only be made of values from the same atomic type. So, we can also make a vector of a different type, e.g., a character vector like this:

c("a", "b", "c")

Now, you try to make a logical vector.

Note

Use the function c, TRUE, and FALSE.

Note

An example solution would be c(TRUE, FALSE, FALSE)

The is.numeric, is.character, and is.logical functions that you learned about in the Atomic Data Types work on a vector too. Type one of those functions with a vector—e.g., is.numeric(c("a", "b"))—below, think about what R will respond, and then run it.

Note

Use the function c and one of these functions: is.numeric, is.character, or is.logical.

Note

Lots of valid choices here! For example:

is.logical(c(TRUE, FALSE, TRUE))
is.character(c("a", "b", "c"))
is.numeric(c(1, 2, 3))

By now you are probably pretty tired of typing things like c(TRUE,TRUE,TRUE,FALSE,TRUE) and wonder if there is a way you could save things like that for later use. There is! It is called assignment.

We can assign a name to represent a vector (or any R object). We do this using the <- operator (that is made of two characters, the less than symbol, <, and a hyphen, - with no space between). Think of it as an arrow that directs the value on the right-hand side of <- into the name on the left-hand side.

Type x <- c(TRUE, TRUE, TRUE, FALSE, TRUE) below and run it.

Note

Type x <- c(TRUE, TRUE, TRUE, FALSE, TRUE)

Note
x <- c(TRUE, TRUE, TRUE, FALSE, TRUE)
x <- c(TRUE, TRUE, TRUE, FALSE, TRUE)

Now you can use the name x in place of the vector. Type is.logical(x) below and run it.

Note

Type is.logical(x)

Note
is.logical(x)
is.logical(x)

The name x is just one of a nearly infinite number of names, but R does have some rules for valid names:

A valid name can only consist of:

  • letters,
  • numbers,
  • the dot or period character (.), and
  • the underscore or underline character (_),

And it can only start with:

  • a letter, or
  • the dot not followed by a number.

You can see what the name x contains by typing that name. Try it now, just type x.

Note

Type x

Note
x
x

When you name something the same thing, it is replaced. Type and run x <- c(1, 2, 3)

Note

Type x <- c(1, 2, 3)

Note
x <- c(1, 2, 3)
x <- c(1, 2, 3)

Now look at what is in the variable x.

x
x

As you can see, x now contains that new numeric vector. The name can stand in for the object that is assigned to it anywhere that the object can be used, even in assignment. See that now by assigning x to y.

Note

Type y <- x

Note
y <- x
y <- x

Finally, you cannot use one of the reserved words as names. You have seen a few of these already, e.g., TRUE and NaN

It could really wreck havoc on R if you could change the value of those names.
Try to assign 5 to the name TRUE.

Note

Type TRUE <- 5

Note

Hopefully, the error makes a little sense. You have an invalid left-hand side (i.e., TRUE) for the assignment.

Factors

OK, it is time to turn to a new data type, i.e., factor.

A factor is similar to a vector, but is used to represent a nominal or ordinal variable. This allows R to automatically give you back the right statistics for something that is not numeric. For now, we will use the summary function to show the difference. Let’s start by assigning the following to c(1, 1, 2, 3, 1) to x.

Note

Type x <- c(1, 1, 2, 3, 1)

Note
x <- c(1, 1, 2, 3, 1)
x <- c(1, 1, 2, 3, 1)

Now let’s create a factor version of x in the variable y like this:

y <- factor(x)
Note
y <- factor(x)
y <- factor(x)

Now apply the summary function to x, like this: summary(x)

Note
summary(x)
summary(x)
Note
y <- factor(x)
y <- factor(x)

Now apply the summary function to x:

Do the same for y.

Note
summary(y)
summary(y)

That last output may be a little confusing at first, but notice that it is a type of table with the different values in the first row and the number of elements in the factor that take that value in the second row. So you see in the first instance, x is a numeric vector and R provides summary statistics that make sense for a continuous variable. In the second case, y is a factor that R understands to be categorical.

When you give the values labels, as you often will, it can be much easier to understand the summary of a factor. Make a new factor from x called state using labels, like this (you don’t have to always create new names, we could just have chosen to write over y, but for now let’s do it this way): state <- factor(x, labels = c("GA", "FL", "AL"))

Note
state <- factor(x, labels = c("GA", "FL", "AL"))
state <- factor(x, labels = c("GA", "FL", "AL"))

Now make a summary of state and examine the results.

Note
summary(state)
summary(state)

The labels = c(...) is a named argument to the factor function. The argument’s name is labels and the argument itself is the c(...). Notice the = sign that connects the name to the argument. We will discuss this more in the future. For now, just use this exact syntax if you need to label a numeric vector when creating a factor.

Note that the labels match the order of the values if you use numbers, not the order they appear in the vector you convert to a factor. Try the following by putting x in a different order even though it represents the same data:

x <- c(2, 3, 1, 1, 1)

Note
x <- c(2, 3, 1, 1, 1)
x <- c(2, 3, 1, 1, 1)

Now create a factor named state from x in the same way as before using labels = c("GA", "FL", "AL").

Note
state <- factor(x, labels = c("GA", "FL", "AL"))
state <- factor(x, labels = c("GA", "FL", "AL"))

Examine the summary of state.

Note
summary(state)
summary(state)

As you can see the results turned out exactly the same even though the numeric vector was in a different order. You can also use a character vector to create a factor, try this: state <- factor(c("GA", "GA", "FL", "AL", "GA"))

Note
state <- factor(c("GA", "GA", "FL", "AL", "GA"))
state <- factor(c("GA", "GA", "FL", "AL", "GA"))

Examine the summary of state again.

Note
summary(state)
summary(state)

Now they are alphabetically ordered. Thus, how the data is structured can have an influence on the order of the names in a factor. There are many other ways to influence the order and we will likely see those as we progress.

Extracting Elements

When you want to extract a specific element from a vector or factor, you use square brackets (i.e., [ ]) and the index (the number) of the element within the data structure. So to get the fifth element of x from before, which was c(2, 3, 1, 1, 1), you’d type x[5].

Try it now.

Note
x[5]
x[5]

As you can see we got 1 which was the fifth element of the vector x. So since x is c(2, 3, 1, 1, 1). What would you do to get R to return 3 instead of 1?

Note
x[2]
x[2]

Just like with names we can use these extract values anywhere they would be valid. Try x[2] + x[5].

Note
x[2] + x[5]
x[2] + x[5]

What if you want a new vector of a subset of the values in the vector x? We need our trusty friend c again, try x[c(2, 5)].

Note
x[c(2, 5)]
x[c(2, 5)]

So you see that returns a new vector of length two made up of the second and fifth elements of x. The square bracket notation also works on a factor. Now extract the third element of state.

Note
state[3]
state[3]

Now is where you’ll see how vector operations in R are a lot like using functions on columns in Excel. Keep in mind x is c(2, 3, 1, 1, 1).
Type x * 2.

Note
x * 2
x * 2

How about x < 2?

Note
x < 2
x < 2

See how it applies the expression elementwise?

You can also apply operators to two vector’s of the same length in which case the resulting vector will be the result of the operator applied to the first element of each vector, then the second element of each vector, and so on. Remember x is c(2, 3, 1, 1, 1) and try:

x - c(1, 0, 1, 0, 1)
Note
x - c(1, 0, 1, 0, 1)
x - c(1, 0, 1, 0, 1)

You can even use a vector that is shorter than the other. The shorter one will be recycled, try:

c(1, 0, 1, 0, 1, 0) + c(0, 1)
Note
c(1, 0, 1, 0, 1, 0) + c(0, 1)
c(1, 0, 1, 0, 1, 0) + c(0, 1)

However, you will get a warning if the shorter vector is not a multiple of the longer vector because this is usually unintentional, and it most often indicates a bug in your program. In this case, though, it works the way we expect, try c(1, 0, 1, 0, 1) + c(0, 1).

Note
c(1, 0, 1, 0, 1) + c(0, 1)
c(1, 0, 1, 0, 1) + c(0, 1)
Note

A warning is issued by a program when it can continue executing, but is not sure it did what you were expecting it to do. A warning does not rise to the level of an error which is something that the program cannot recover from.

Experimentation

Let’s make a really long vector of sequential integers from 5 to 123.

Note
5:123
5:123
Note

Examine the output. Now that you know how to extract elements from a vector with square brackets, can you explain why R has been printing [1] at the beginning of the output all this time and why R is now printing other numbers between the square brackets ([ ])? Come ready to discuss in class.

I bet you can also guess what function can test if something is a vector (of any type). Find out if a factor is also a vector by trying that function on state.

Note
is.vector(state)
is.vector(state)

Remember that I said that a vector can only be made from the same atomic type?

Try typing and running c(TRUE, 0, FALSE, 3) and look at R’s response.

Now, try typing and running c("a", 3, TRUE) and likewise examine R’s response.

What is R doing? Why does this always work? Come to class prepared to discuss. Now submit your assignment!

Evaluation

Submit Your Assignment