Tibbles and readr

A tibble is the modern tidyverse version of a data.frame. A tibble is a data.frame, and so can be used in the same way. But it comes with more powerful features and removes inconsistent and confusing behaviour.

In the same way, readr provides modern tidyverse replacements for R’s standard reading functions. readr provides read_csv, which is a better way of reading csv files than R’s standard read.csv.

Let’s now use the tidyverse to read_csv the dataset Anatomical Data from Domestic Cats into a tibble. The first thing you will notice is that the tidyverse has printed out some useful information.

# Did you load tidyverse? library(tidyverse)

cats <- read_csv("https://raw.githubusercontent.com/Bristol-Training/intermediate-r/refs/heads/main/data/cats.csv")
Rows: 144 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Sex
dbl (2): BodyWeight, HeartWeight

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

This is telling you that read_csv found three columns; Sex, which is treated as a columns of strings (characters), and BodyWeight and HeartWeight, which are both treated as columns of floating point numbers (doubles).

Next, if you type cats and press return you will see

cats
# A tibble: 144 × 3
   Sex   BodyWeight HeartWeight
   <chr>      <dbl>       <dbl>
 1 F            2           7  
 2 F            2           7.4
 3 F            2           9.5
 4 F            2.1         7.2
 5 F            2.1         7.3
 6 F            2.1         7.6
 7 F            2.1         8.1
 8 F            2.1         8.2
 9 F            2.1         8.3
10 F            2.1         8.5
# ℹ 134 more rows

that the tibble summarises itself to the screen. This makes it much easier to quickly look at some data without it overflowing your console.

As a tibble is a data.frame, you can use the same methods of accessing data, e.g.

cats$BodyWeight
  [1] 2.0 2.0 2.0 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.2 2.2 2.2 2.2 2.2 2.2
 [19] 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.4 2.4 2.4 2.4 2.5 2.5
 [37] 2.6 2.6 2.6 2.7 2.7 2.7 2.9 2.9 2.9 3.0 3.0 2.0 2.0 2.1 2.2 2.2 2.2 2.2
 [55] 2.2 2.2 2.2 2.2 2.3 2.4 2.4 2.4 2.4 2.4 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5
 [73] 2.6 2.6 2.6 2.6 2.6 2.6 2.7 2.7 2.7 2.7 2.7 2.7 2.7 2.7 2.7 2.8 2.8 2.8
 [91] 2.8 2.8 2.8 2.8 2.9 2.9 2.9 2.9 2.9 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
[109] 3.1 3.1 3.1 3.1 3.1 3.1 3.2 3.2 3.2 3.2 3.2 3.2 3.3 3.3 3.3 3.3 3.3 3.4
[127] 3.4 3.4 3.4 3.4 3.5 3.5 3.5 3.5 3.5 3.6 3.6 3.6 3.6 3.7 3.8 3.8 3.9 3.9
cats[1,]
# A tibble: 1 × 3
  Sex   BodyWeight HeartWeight
  <chr>      <dbl>       <dbl>
1 F              2           7
Exercise

Load the cats data set into a tibble using read_csv. Use the mean and max functions to calculate the mean and max body weight and heart weight of the cats.

Note that according to this dataset description the body weight is in kilograms and the heart weight is in grams. As is normal for real-world data, there is little consistency in units.

library(tidyverse)

cats <- read_csv("https://raw.githubusercontent.com/Bristol-Training/intermediate-r/refs/heads/main/data/cats.csv")
Rows: 144 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Sex
dbl (2): BodyWeight, HeartWeight

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mean_body_weight <- mean(cats$BodyWeight)
max_body_weight <- max(cats$BodyWeight)
cat( sprintf("Body weight: mean = %.2f kg, max = %.2f kg\n",
             mean_body_weight, max_body_weight) )
Body weight: mean = 2.72 kg, max = 3.90 kg
cat( "Heart Weight: mean = ",
      mean(cats$HeartWeight) %>% round(digits=2),
      "g, max =",
      max(cats$HeartWeight) %>% round(digits=2),
      "g\n")
Heart Weight: mean =  10.63 g, max = 20.5 g