Writing functions

You can write your own functions in R. For example, let’s try to write a function that calculates the mean average of a list of numbers. As input, the function will take a list of numbers. It should output a number which is the mean of those numbers.

There are many ways this function could be written. Here is a possible solution

calculate_mean <- function(values){
     total <- 0.0
     count <- 0
     for (value in values){
         total <- total + value
         count <- count + 1
     }
     return(total / count)
}

We can then use this function, to, e.g. calculate the average height of a group of people, by typing:

person_heights = c(1.62, 1.80, 1.56, 1.73, 1.91)

average_height <- calculate_mean(person_heights)

cat(sprintf("The average height is %.2f m\n", average_height))
The average height is 1.72 m

Scaffolding

To explain how this worked, we need to look at how this function was defined. There is some scaffolding that is common to all functions. First, we define the function name. In R, this is a variable that holds the code of the function. We define this variable and assign data to it in the same way as if this was assigning a number to a numeric variable, or a string to a string variable, namely using the syntax variable <- value.

#   Variable    assigned  value 
#      ↓           ↓      ↓
calculate_average <- function(...

Next, we have the keyword function, that says that this is some data that is of type function. This means that the data will contain code. The arguments to function are the arguments you would like to use as input for your new function.

#                    keyword  arguments
#                       ↓        ↓
calculate_average <- function(values) {

Next, you need the body of the function. This body is the lines of code that will be run when your function is called. Just like with for loops or if statements, the body of code is contained within curly brackets.

#                                Open curly brackets
#                                     ↓
calculate_average <- function(values) {
    # body of the function is the
    # lines of code within the curly brackets
} # <- Close curly brackets

The input(s) for the function is/are the argument(s) that are passed to function, in this case, values. Our code will loop through all of the values in values to calculate the mean average. Once we have finished, we reach the final part of the function, which is return. This is used to return the output of the function back to the caller.

#                          Input(s)
#                            ↓
calculate_mean <- function(values){
     total <- 0.0
     count <- 0
     for (value in values){
         total <- total + value
         count <- count + 1
     }
     return(total / count)
#      ↑
#    Return output
}

Finally, when we call the function, the arguments that pass to the function are used as the input. The output is then returned and assigned to the result variable. So, in this case:

#                Call function      with input(s)
#                        ↓             ↓  
average_height <- calculate_mean(person_heights)
#     ↑         ↑
#  Output     assigned

calculate_mean is called with person_heights. The data referred to by person_heights is passed to calculate_mean and in this function is referred to as values. This data is looped over, the mean average calculated, resulting in an output that is returned at the end of the function, and assigned to the variable average_height.

Exercise

Write a function, called calculate_max, that returns the largest value. Use this to find the largest height in the list of heights above.

Hint: start by using a variable called max_value and setting that equal to NA. Then use if (is.na(max_value) || value > max_value) to test whether a value in values is greater. The || is a logical OR.

calculate_max <- function(values){
    max_value <- NA

    for (value in values){
        if (is.na(max_value) || value > max_value){
            max_value = value
        }
    }

    return(max_value)
}

person_heights = c(1.62, 1.80, 1.56, 1.73, 1.91)

max_height = calculate_max(person_heights)

cat("The highest person is", max_height, "m.")
The highest person is 1.91 m.