breast <-read.csv("https://raw.githubusercontent.com/Bristol-Training/beginning-r/refs/heads/main/data/GDS3716.soft",sep="\t",skip=99)
Using tidyverse, answer the below exercises.
NoteExercise 1
Read the dataset. You may want to look at the documentation for read_delim function. Remember specifying the separator character and number of lines you want to skip.
CautionAnswer 1
library(tidyverse)
Error in library(tidyverse): there is no package called 'tidyverse'
breast <-read_delim("https://raw.githubusercontent.com/Bristol-Training/beginning-r/refs/heads/main/data/GDS3716.soft",delim="\t",skip=99)
Error in read_delim("https://raw.githubusercontent.com/Bristol-Training/beginning-r/refs/heads/main/data/GDS3716.soft", : could not find function "read_delim"
The function read_delim is quite smart and if we don’t specify the delimiter character will figure it out, although it may not always work.
breast <-read_delim("https://raw.githubusercontent.com/Bristol-Training/beginning-r/refs/heads/main/data/GDS3716.soft",skip=99)
NoteExercise 2
Calculate the average expresion levels for genes matching the name TP53.
CautionAnswer 2
First we can filter the rows that match the gene name and remove the non-numeric columns.
expr_tp53 <- breast %>%filter(IDENTIFIER=="TP53") %>%select(!c(ID_REF, IDENTIFIER))
Error in breast %>% filter(IDENTIFIER == "TP53") %>% select(!c(ID_REF, : could not find function "%>%"
expr_tp53
Error: object 'expr_tp53' not found
Now to calculate the mean of each one of the rows we can run
expr_mean_tp53 <- breast %>%filter(IDENTIFIER=="TP53") %>%select(!c(ID_REF, IDENTIFIER)) %>%rowMeans()
Error in breast %>% filter(IDENTIFIER == "TP53") %>% select(!c(ID_REF, : could not find function "%>%"
expr_mean_tp53
Error: object 'expr_mean_tp53' not found
NoteExercise 3
Find the patient with the higher average expression levels accross the whole genome.
CautionAnswer 3
max_idx <- breast %>%select(!c(ID_REF, IDENTIFIER)) %>%colMeans(na.rm=TRUE) %>%which.max()
Error in breast %>% select(!c(ID_REF, IDENTIFIER)) %>% colMeans(na.rm = TRUE) %>% : could not find function "%>%"