breast <-read.csv("https://raw.githubusercontent.com/Bristol-Training/beginning-r/refs/heads/main/data/GDS3716.soft",sep="\t",skip=99)
Using tidyverse, answer the below exercises.
Exercise 1
Read the dataset. You may want to look at the documentation for read_delim function. Remember specifying the separator character and number of lines you want to skip.
Answer 1
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
breast <-read_delim("https://raw.githubusercontent.com/Bristol-Training/beginning-r/refs/heads/main/data/GDS3716.soft",delim="\t",skip=99)
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 22284 Columns: 44
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (2): ID_REF, IDENTIFIER
dbl (42): GSM512539, GSM512540, GSM512541, GSM512542, GSM512543, GSM512544, ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The function read_delim is quite smart and if we don’t specify the delimiter character will figure it out, although it may not always work.
breast <-read_delim("https://raw.githubusercontent.com/Bristol-Training/beginning-r/refs/heads/main/data/GDS3716.soft",skip=99)
Exercise 2
Calculate the average expresion levels for genes matching the name TP53.
Answer 2
First we can filter the rows that match the gene name and remove the non-numeric columns.
expr_tp53 <- breast %>%filter(IDENTIFIER=="TP53") %>%select(!c(ID_REF, IDENTIFIER)) expr_tp53