# Create a numeric vector
<- c(1, 2, 3, 4, 5)
num_vector
# Create a vector of strings
<- c("apple", "banana", "cherry") str_vector
Data structures
Until now all the variables we have used have contained a single piece of information, for example, a <- 4
makes a variable a containing a single number, 4. It’s very common to want to refer to collections of data. You can think, for example, of a bank statement that contains the list of expenses you had last month.
R has several build-in data structures that facilitate working with this common kind of data. In this beginners course we will four of the most used data structures: vector
, list
, matrix
and data.frame
. But keep in mind there are other built-in data structures.
Vectors
Vectors are the most basic data structure in R. They are one-dimensional arrays that can hold elements of the same data type.
The elements of a vectors can be accessed using square brackets, being the first element index 1:
# Access the third element
cat(num_vector[3])
3
As well as being able to select individual elements from a data structure, you can also grab sections of it at once. This process of asking for subsections of a data structure of called slicing.
# Access the third element
cat(num_vector[2:4])
2 3 4
# Access the third element
cat(num_vector[c(1,3,5)])
1 3 5
Addind elements in a vector can be done with the function append
:
# Access the third element
<- append(str_vector, "orange")
str_vector cat(str_vector)
apple banana cherry orange
Elements can also be removed using a negative sign while indexing them, as in:
# Access the third element
<- str_vector[-2]
str_vector cat(str_vector)
apple cherry orange
Matrices
Matrices are two-dimensional arrays that contain elements of the same data type.
# Create a 3x3 matrix
<- matrix(1:9, nrow = 3, ncol = 3)
my_matrix
print(my_matrix)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Elements in a matrix can be accessed using row and column indices:
cat(my_matrix[2, 3]) # Returns the element in the 2nd row, 3rd column
8
Using the functions rbind
and cbind
we can add rows and columns, respectively, to a matrix.
<- rbind(my_matrix, c(20,21,22))
my_matrix print(my_matrix)
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
[4,] 20 21 22
<- cbind(my_matrix, 31:34)
my_matrix print(my_matrix)
[,1] [,2] [,3] [,4]
[1,] 1 4 7 31
[2,] 2 5 8 32
[3,] 3 6 9 33
[4,] 20 21 22 34
You can also delete rows and columns using the same method that we saw for vectors.
Lists
Lists are a bit more versatile data structures as they can contain elements of different data types, including other lists.
<- list(
my_list numbers = 1:5,
fruit = c("apple", "banana", "cherry"),
nested_list = list("a", "b", "c")
)
You can access list elements by name or position:
cat(my_list$fruit)
apple banana cherry
cat(my_list[[2]])
apple banana cherry
You can have as many items in a list as you like, even zero items as in:
<- list()
my_empty_list cat(my_empty_list)
We can add elements to a list just by indexing a new name. For instance:
"new_element"] <- TRUE
my_list[print(my_list)
$numbers
[1] 1 2 3 4 5
$fruit
[1] "apple" "banana" "cherry"
$nested_list
$nested_list[[1]]
[1] "a"
$nested_list[[2]]
[1] "b"
$nested_list[[3]]
[1] "c"
$new_element
[1] TRUE
We can remove any element of a list assigning them the value NULL
.
c("nested_list","new_element")] <- NULL
my_list[print(my_list)
$numbers
[1] 1 2 3 4 5
$fruit
[1] "apple" "banana" "cherry"
Data Frames
Data frames are table-like structures that can contain columns of different data types. They are one of the most commonly used data structures for data analysis in R. Note that all the columns in a data frame have the same number of elements.
<- data.frame(
df name = c("Jean", "Thomas", "Daniel"),
age = c(25, 30, 35),
is_student = c(TRUE, FALSE, FALSE)
)
You can access data frame columns using the $
operator or square brackets []
:
cat(df$name)
Jean Thomas Daniel
cat(df[, "age"])
25 30 35
You can also access the a data frame by rows
print(df[2, ])
name age is_student
2 Thomas 30 FALSE
Data frames support adding new columns by passing to an assignment operation a new column name:
$height <- c(165, 180, 175)
dfprint(df)
name age is_student height
1 Jean 25 TRUE 165
2 Thomas 30 FALSE 180
3 Daniel 35 FALSE 175
And adding new rows using rbind
:
<- rbind(df, c("Patricia",18,FALSE,160) )
df print(df)
name age is_student height
1 Jean 25 TRUE 165
2 Thomas 30 FALSE 180
3 Daniel 35 FALSE 175
4 Patricia 18 FALSE 160
Errors while working with data stuctures
It is very likely that indexing lists is the first time you will see a R error. Seing R errors (also sometimes called exceptions) is not a sign that you’re a bad programmer or that you’re doing something terrible. Even experienced programmers see R errors on their screen.
Error messages are in fact a very useful feedback mechanism for the programmer but that can be a bit daunting when you first see them. Let’s recreate a typical error message: a dataframe with three columns will not have a column at index 6
(the highest index in that case would be 3
) and produce an error if we ask for it.
dataframe.R
<- data.frame(
df name = c("Jean", "Thomas", "Daniel"),
age = c(25, 30, 35),
is_student = c(TRUE, FALSE, FALSE)
)
cat(df[,6])
Error in `[.data.frame`(df, , 6): undefined columns selected
Likewise, it will return an error if there’s no matching column name.
cat(df[,"Will"])
Error in `[.data.frame`(df, , "Will"): undefined columns selected
In this last case, has two parts to it. The first is the words before the colon which tells you where the error was found. The second part of that line is usually a slightly more descriptive message, in this case telling us that the specific problem was that the column selected is not known.
Take your time to read the error messages when they are printied to the screen, they will most likely help you solve the issue. If you think that you’ve fixed the problem but the error persists, make sure that you’ve saved the script file and rerun your code afterwards.