We cannot cover all of ggplot2 in this workshop, so you are strongly encouraged to read the book and cheat sheet. For today, we will give a quick overview of how to create some simple graphs.
First we load the tidyverse in the same way as previously;
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Next we will load some data to plot. We will use the climate data from the previous section (available from cetml1659on.txt).
Plotting works best with tidy data, so we will load and tidy the data as in the previous section;
Next, we will use ggplot to draw a graph (we will explain how this works after drawing). You should see a graph similar to this:
ggplot(historical_temperature, aes(x = year, y = temperature)) +geom_point()
Warning: Removed 5 rows containing missing values or values outside the scale range
(`geom_point()`).
This command has drawn a scatter plot of the data contained in the tibble historical_temperature, putting the year column on the x-axis, and the temperature column on the y-axis.
ggplot is written to follow a specific “grammar of visualisation”. There are three key components;
data,
A set of aesthetic mappings between variables in the data and visual properties of the graph, and
one or more layers that describe how to render each observation.
The data is a tibble containing tidy data with one observation per row, and one variable per column.
The aesthetic is a mapping specified via the aes function, which maps the variables to axes, colours or other graphical properties.
The layers (layer1, layer2 etc) are specific renderings of the data, e.g. geom_point() will draw points (scatter plot), geom_line() will draw lines (line graph) etc.
Plotting the analysis
You can combine analysis with plotting, e.g. here we plot the average yearly temperature as a line graph.
(%/% means “integer division”, so 1655 %/% 10 equals 165, which becomes 1650 when multiplied by 10)
This enables us to draw a line graph of the average temperature each decade;
ggplot(historical_temperature %>%group_by(decade) %>%summarise(temperature=mean(temperature)), aes(x = decade, y = temperature)) +geom_line()
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).
A line chart is not a good choice for this plot, as it doesn’t show the underlying statistics of the average. Instead, a box-and-whisker or violin plot would be better. To use this, we need to specify the grouping in the aesthetic, e.g.
ggplot(historical_temperature, aes(x = decade, y = temperature, group=decade)) +geom_violin()
Warning: Removed 5 rows containing non-finite outside the scale range
(`stat_ydensity()`).
Finally, the global aesthetic set in ggplot can be overridden by setting it in the layers themselves. For example, here we overlay a smooth line over the violin plot;
ggplot(historical_temperature, aes(x=year, y = temperature)) +geom_violin(aes(group=decade)) +geom_smooth()
Warning: Removed 5 rows containing non-finite outside the scale range
(`stat_ydensity()`).
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Warning: Removed 5 rows containing non-finite outside the scale range
(`stat_smooth()`).
Note how the group aesthetic has to be set only for the violin plot.
You can save your plots to a file using the ggsave function. This will save the last plot drawn to a file, with filename, size, format etc. all controlled via arguments to this function, e.g.
ggsave("violin.pdf", device="pdf", dpi="print")
would save the plot to a file called violin.pdf, in PDF format, using a resolution (dpi) that is suitable for printing.
Exercise
Create a graph that shows the average maximum temperature by month. Draw this as a line graph. Note that you may need to add group="decade" to the aesthetic so that the line layer can join together the points.
Create a graph that shows the change in average temperature in December per decade. Draw this as a scatter plot, with a smooth trend line added.
Create a graph that shows the change in average temperature by century. Draw this as a line graph with a smooth trend line added. Note that you may need to adjust the span value if the number of data points is too few to draw a very smooth line.
Answer
First load the tidyverse, then read in all of the data and tidy it up…