cp ~/classdata/Session5/RNAseq-Analysis/Heatmaps.r ~/mydata/[workdir]
5.1 Venn diagrams
Looking for conserved or specific reponses between treatment types or time points can often reveal key bioilkoguical insights. The most strainghtforward way of doing this is to take a venn diagram based response to search for Gene ID that are shred or specific between groups. The simplest on line tool is Venny 2.1 which allows you to paste uptoo 4 lists and derive the overlapping and unique GeneID, export these list by selecting the relevant list within the graphical representation of the venn diagram and also customise representation (colour, lablers ect) and save it as a png image by ‘right hand clicking’ in venn diagram representation.
More informative representations can be generated by using programs such as Divenny Diagram where you upload the DEG list indicating if the ‘change’ in expression is UP or DOWN and then the intersections display conservation of up and down regulation as well as creating a attractive representation.
There are tools for creating venn diagram with more that 4 samples both on line with interactivenn and in R ggVennDiagram. Over 4 samples the classical venn diagram representations can become hard to interpret so you may want to consider a UpSet plot - ggVennDiagram supports UpSet plot generation.
5.2 Visualising Expression Patterns
Once an analysis has yielded a list of DEGs, it’s common to visualise them in a heatmap. There is a wonderful R package called ComplexHeatmap, which will do almost anything you can think of to a heatmap. We have create a script that you can copy into your workign directory and use to visualise your data.
##Heatmaps Script
##Load libraries
#if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
::install("ComplexHeatmap", lib.loc="/home/sbi9srj/mydata/Rpackages")
BiocManagerlibrary(ComplexHeatmap)
## Prepare your data for the heatmap
#Read in your Rdata that you saved previously
load("projectName.RData")
## Create a list of the output files from SARtools
<- list.files(path="tables", pattern="*.txt", full.names=TRUE, recursive=FALSE)
tables <- read.delim("/mnt/clusters/hawker/data/classdata/Bioinformatics/REFS/Homo_sapiens.GRCh38.100.map.txt", header=T)
IDtoGene
# Now add column to tables with gene name
lapply(tables, function(x) {
<-read.table(x, header=TRUE)
t <- select(t, Id, log2FoldChange, pvalue, padj)
t.cols <- merge(t.cols, IDtoGene, by='Id')
t.annot write.table(t.annot, file=paste(x, ".annot.csv", sep=""), sep="\t", quote=FALSE, row.names=FALSE, col.names=TRUE)
})
#################
## Read in data ##
#################
# We are going to draw a heatmap of all the genes differentially upregulated in the deletion compared to the control
# Files must be in the format with a header (samples) and gene names (rows)
<- data.frame(read.table("tables/DeletionvsControl.up.txt", sep="\t", header = TRUE, row.names= 1))
infile #Add in the name of each gene
$names <- IDtoGene$Gene[match(rownames(infile), IDtoGene$Id)]
infile
#Click on 'infile' to see what it looks like.
# The first set of columns are the raw reads for each gene in each sample
# The next set of columns are the normalised reads for each gene in each sample
# At the end are some columns with other useful information, e.g. the fold change and P values
# We want to use the read counts that have been normalised, so we will select just those columns
<- infile[,grep("norm", names(infile))]
normVals
# Now we read in the metadata (the information about each sample)
# Metadata column names must match sample names!
= data.frame(read.table(targetFile, header = TRUE, row.names= 1))
x.meta
# Ensure that the samples are in the same order in both files
<- normVals[ , order(names(normVals))]
normVals <- x.meta[order(row.names(x.meta)), ]
x.meta
# Check that they are in the same order
all.equal(gsub("norm.","",names(normVals)), rownames(x.meta))
# Create a copy of the data into gene-scaled format
= t(apply(normVals, 1, function(x) {
scaleVals = quantile(x, 0.1)
q10 = quantile(x, 0.9)
q90 < q10] = q10
x[x > q90] = q90
x[x scale(x)
}))colnames(scaleVals) = colnames(normVals)
## NOTE: You now have two datasets:
## -- normVals (raw normalised count data)
## -- scaleVals (thes same data, but each gene is on its own scale)
##
## Choose which to plot yourself!
#####################
## Complex Heatmap ##
#####################
# Plot a very basic heatmap
Heatmap(data.matrix(scaleVals))
# This is a lot of genes! Let's try to filter them down
# Try excluding genes with a fold change of less than 1
<-scaleVals[infile$FoldChange > 1,]
scaleValsFC1
Heatmap(data.matrix(scaleValsFC1))
# Try just the 50 most expressed genes
<-scaleVals[order(infile$baseMean, decreasing = T)[1:50],]
scaleVals50
Heatmap(data.matrix(scaleVals50))
# A fancier heatmap!
Heatmap(data.matrix(scaleVals50),
cluster_columns = TRUE,
col = colorRampPalette(c("white", "black", "red"))(50),
heatmap_legend_param = list(
title = "Scaled counts"),
top_annotation = columnAnnotation(Type = x.meta$Type),
row_labels = infile$names[rownames(infile) %in% rownames(scaleVals50)],
row_names_gp = gpar(fontsize = 8),
column_split = x.meta$Type,
cluster_column_slices = FALSE,
border = TRUE,
clustering_distance_rows = "pearson",
clustering_distance_columns = "pearson"
)
# Try commenting out different lines in the heatmap above. Can you work out what each option does?
# ComplexHeatmap has a very good reference manual at https://jokergoo.github.io/ComplexHeatmap-reference/book/
# If you want, try looking for more option to play with on there
# Try switching between scaled and non-scaled counts. Does it make a difference?
5.3 PCAexplorer
Exploring the multi-varient relationships between your rawdata or filtered lists provides a powerful approach to exploring the relationship between your samples. Both SARTools and Geo2R incorporate a simple PCA approach but this approach can be deployed to explore the full dimensionality of the data. PCAexplorer provides a R based shinny app
(menu driven app) that allows you to explore PCA full functionality. You will need to create a raw data matrix and customised metadata file but these are easy to adapt from the SRATools metadata file and the SARTools ‘complete’ outout file - you can use raw or normalised counts. The documentation is very straightforward - have fun.
5.4 Extension: Useful Packages for Downstream Analysis
Use the list of ~200 most differentially expressed genes in gProfiler, or other online resources below, to investigate the functional connotations (More on this in the extension session).