Functions

Author

Affiliation

Beau B. Bruce, MD, PhD

Emory University

Exposition

Now that we have discussed the major basic datatypes in R, we have a solid foundation with respect to the “nouns” of R, and it is a good time to turn to the R language’s “verbs”, functions.

I also like to think of functions as a type of creature that eats and processes data thereby providing a useful output. Think of yeast which “eat” sugar and process it into alcohol.

Functions, like creatures, also have an anatomy: a name, a mouth, and a body. Take the c function as an example that we learned about back in Rcel, part 1. The name of the function is c. The mouth is the parentheses.

c("e", "a", "t")

The characters of the word “eat” are in this case the food, called arguments, for the function c. They go in the mouth separated by commas (I like to think of them as teeth, but now, maybe I’m taking the analogy too far… 😆).

Where is the body? R functions are generally shy and do not like to show off their body, so they hide it. However, if you call their name without offering food, they will show it to you as they turn to run away realizing you do not plan to offer them anything to eat. (Yes, the analogy is definitely getting thin 🤣.)

Try it now with the function named c. Just type c to see the body of the function.

c has an unusually strange body. Trust me, dark magic is going on there so I suggest you just look away, unless you are into that sort of thing. Let’s look at a more typical function, factor, which you also met in Rcel, part 1. How can you make factor show you its body?

There is a lot going on there but look at the general structure. The keyword function followed by the mouth enclosed in parentheses () tells you some details about the foods this creature eats. Next, you see a section enclosed in curly brackets {}: that’s the body. What is inside? Even now, I hope you can appreciate that is the very R code we have been learning so far! The body of a function holds a recipe of R code that can be reused to repeat a useful action.

If you thought c was odd, there are even functions that do not show their mouth without some coaxing. Can you think of one? What is an “action” you have taken on data that did not need parentheses? Think about it for a moment before continuing.

Continue

How about the operators, like +, *, >, etc.? Can you believe that they are ALL functions? They are, but you have to know how to get these painfully shy creatures to show even their mouth.

To do so, you need to get bold by calling their name by using backticks. A backtick looks like ` and is not a single quote, i.e., '. On the keyboard, you usually find the backtick on the same key as and under the tilde, ~, i.e., ~ ` on the upper left edge of your keyboard under the Esc key.

Try `*`(2, 2) to see * acting as a regular function.

Can you guess how to get * to show you its body?

All arguments have names, but you do not always need to use them. Look at the help for factor by typing ?factor. The help is quite detailed. After you run the command, scroll down and I’ll give you some tips for walking through the help.

Near the top of the help, you’ll see the arguments for the function. In order, the names of the arguments for the factor function are x, levels, labels, exclude, ordered, and nmax. After many you will note an equals sign with something after it. These are the defaults for the argument if you do not specify them. However, just because levels does not have an explicit default in the list does not mean that it does not have one. If you read further into the help it says that levels does have a default value.

Since all values have a default, the function factor will return valid output even if run with no arguments, like this:

factor()

Try it now.

R’s response means that you created an empty (length 0) factor. On the other hand, + does not have default arguments, so if you type `+`() you’ll get an error. Try it now.

So why the are the arguments named? The most important reason as you will learn in the next section is so that you can refer to the arguments by their name in the body of the function. A secondary use has to do with the way you call functions.

If you do not name the arguments when you call the function, R assumes that you are using them in order starting with the first listed. Names are useful when you need to skip an argument, want to be explicit about which argument something is, or if you are (or are afraid you might be) using them out of order.

Let’s explore this. Start by creating a variable named a that contains the vector c(1, 1, 2, 2).

OK, when we use the labels argument to give them nice labels we have to type factor(a, labels = c("M", "F")). Why? Because the second argument is not used and the third is named labels.

Look back at the help for factor. What is the name of the first argument according to the help file? Now specify that name even though you do not have to and run the same command: factor(a, labels = c("M", "F"))

Finally, put labels before the first argument!

Now you know most everything you need to know about how to call R functions that already exist, but a lot of programming should be about writing your own functions.

Functions are your friend. Have you ever cut-and-paste a piece code to run it again after just tweaking it to run on slightly different data? Try not to do that, and instead use functions. Why?

When you use functions, you encapsulate little pieces of logic in your program. It makes your code, and the thinking behind it, much easier to understand and modify. The code will also be shorter overall.
If you cut and paste and realize you need to change something you have to do it everywhere and not make any mistakes, with a function you do it in one place: the function.
You will also likely find your functions are useful in your next project, and a function is a good way to carry something useful from one project to another.

A major principle of programming is DRY which stands for “Don’t Repeat Yourself”. I might cut and paste something twice, but by the 3rd paste— unless I’m seriously not going to do it again and then only if it is a very short (one line) piece of code—I am going to be trying to make a function out of it. This rule of not using the same code more than three times is also known as the “rule of three” in computer programming.

So how to write a function? Name it (via assignment with <-), use the keyword function followed by the mouth with a list of arguments. Then, write the R code inside the body. Use it just like any other R function. Use the name you chose for the argument in body of the function to work on the data that it will pass in. The names are NOT magic. You can pick anything you like and it will stand in for the object that is passed, just as if you assigned that data to that name outside the function. Type this example of a trivial function:

add2 <- function(x) { x + 2 }

Now try it out, like this: add2(5)

And again, like this: add2(2)

See how R takes your number, assigns it to x, adds 2 to x, and returns it. It returns x + 2 because it is the last thing that is executed in the body of the function. Let’s make a very slightly more complicated function. Look at the function below.

This second example function shows that within a function body you need to write R code just like you would at the console with each statement on a separate line. R returns the result of the last statement.

The variables defined within the function are local to the function. Changing their values as we did in add2sub5 does not change them outside the function. Therefore, using functions also makes your programs much safer so you don’t accidentally change something you did earlier in a long program.

To see what I mean type and run x <- 10.

Now try add2sub5(10).

Did x change? How can you take a peek?

Yep! x is still 10. So, functions help us protect our data because changes inside are “hidden” from the outside. However, now look at the function addysub5 below.

Try to run addysub5(10).

Now assign y <- 5 so the function addysub5 can find y.

Try to run addysub5(10) again.

Did y change? How can you take a peek?

If a variable is not specified by argument, then R has to search for it and looks outside the function to try to find it. Once you assign a number to y, then R can find it and will use it in the function.
Avoiding this behavior isusually best; instead pass y as another argument. However, occasionally this approach is the best solution to a problem.

Experimentation

OK, now it is your turn to write a function from scratch. Your function must be named add5 for the grading code to find it. Guess what it should do? Take one arguement (you can name that anything you like), add 5 to it, and return the result.

Now write a function named cube that takes one argument (again you can choose the name of the argument) and returns that number cubed, i.e., raised to the 3rd power.

Evaluation

Submit Your Assignment

Submit your assignment below.