Pipes %>%

The tidyverse makes heavy use of the R concept of forward pipes. Forward pipes, represented via %>%, are provided by the magrittr package, which should be automatically loaded by the tidyverse.

A forward pipe ‘%>%’ forwards the variable on the left into the first argument to the function on the right, e.g.

"kitten" %>% print()
[1] "kitten"

will forward the string “kitten” so that it is the first argument to the function print. Hence this is exactly identical to

print("kitten")
[1] "kitten"

This is useful because it enables you to chain together a lot of functions. For example, the tidyverse dply package provides the function filter, for filtering data.

library(tidyverse)

cats <- read_csv("https://raw.githubusercontent.com/Bristol-Training/intermediate-r/refs/heads/main/data/cats.csv")

cats %>% filter(Sex=="F")
# A tibble: 47 × 3
   Sex   BodyWeight HeartWeight
   <chr>      <dbl>       <dbl>
 1 F            2           7  
 2 F            2           7.4
 3 F            2           9.5
 4 F            2.1         7.2
 5 F            2.1         7.3
 6 F            2.1         7.6
 7 F            2.1         8.1
 8 F            2.1         8.2
 9 F            2.1         8.3
10 F            2.1         8.5
# ℹ 37 more rows

has filtered the cats data set from the last page to return a tibble that contains data only for female cats. This was identical to typing filter(cats, Sex=="F").

The power comes that we can now chain filters, e.g.

cats %>% filter(Sex=="F") %>% filter(BodyWeight > 2.5)
# A tibble: 11 × 3
   Sex   BodyWeight HeartWeight
   <chr>      <dbl>       <dbl>
 1 F            2.6         8.7
 2 F            2.6        10.1
 3 F            2.6        10.1
 4 F            2.7         8.5
 5 F            2.7        10.2
 6 F            2.7        10.8
 7 F            2.9         9.9
 8 F            2.9        10.1
 9 F            2.9        10.1
10 F            3          10.6
11 F            3          13  

We can then use the dplyr summarise() function to create a new dataframe with the mean of a specified column of this filtered data. For example,

cats %>% 
    filter(Sex=="F") %>% 
    filter(BodyWeight>2.5) %>% 
    summarise(mean=mean(HeartWeight))
# A tibble: 1 × 1
   mean
  <dbl>
1  10.2

is the mean average of the heart weight in grams of female cats whose body weight is greater than 2.5 kg.

Note how we have split this over multiple lines, putting the forward pipe %>% at the end so that it is clear that the line continues. If you are using the R Console you can start a new line with Shift+Enter without running the command.

To save this to a variable, we would use the assign <- as normal

average_heart_weight <- cats %>%
    filter(Sex=="F") %>%
    filter(BodyWeight>2.5) %>%
    summarise(mean=mean(HeartWeight))

This is a very dense bit of code. This is typical for R. You will often see very dense blocks of code that use forward pipes to push data through several functions, resulting in a final output result. As you can see, it is important that you name your variables, data, columns and functions clearly, so that it is easier for future readers of your code to understand what is going on.

Finally, note that average_heart_weight is a 1x1 tibble. You can extract the actual numeric value by typing as.numeric(average_heart_weight).

Exercise

Calculate the average heart weight of male cats whose body weight is greater than or equal to 3.0 kg.

Using dplyr summarise() function we can write

cats %>%
     filter(Sex=="M") %>%
     filter(BodyWeight>=3.0) %>%
     summarise(mean=mean(HeartWeight))
# A tibble: 1 × 1
   mean
  <dbl>
1  13.2

Alternatively, in this case we could use dplyr select() function

cats %>%
     filter(Sex=="M") %>%
     filter(BodyWeight>=3.0) %>%
     select(HeartWeight) %>%
     unlist() %>%
     mean()
[1] 13.16889
Exercise

Calculate the maximum body weight of both the male cat and the female cat that has a heart weight of less than or equal to 9 grams.

max_male <- cats %>%
   filter(Sex=="M") %>%
   filter(HeartWeight <= 9.0) %>%
   summarise(max=max(BodyWeight))

max_female <- cats %>%
   filter(Sex=="F") %>%
   filter(HeartWeight <= 9.0) %>%
   summarise(max=max(BodyWeight))

cat( "Maximum body weight: male =",
     as.numeric(max_male) %>% round(digits=2),
     "kg, female =",
     as.numeric(max_female) %>% round(digits=2),
     "kg\n" )
Maximum body weight: male = 2.7 kg, female = 2.7 kg
Exercise

Look back at the vignette broom and dplyr you found when searching for the Pearson’s product-moment correlation. How much more of this vignette do you now understand?