微信公众号:研平方
关注可了解更多的科研教程及技巧。如有问题或建议,请在公众号留言
欢迎关注我:一起学习,一起进步!
已经很久没有再用R语言跑过数据了,最近有朋友需要跑GSVA,顺便重温了下R,现将内容分享如下。
1.GSVA简介
GSVA全名Gene set variation analysis(基因集变异分析),是一种非参数,无监督的算法。与GSEA不同,GSVA不需要预先对样本进行分组,可以计算每个样本中特定基因集的富集分数?;欢灾?,GSVA转化了基因表达数据,从单个基因作为特征的表达矩阵,转化为特定基因集作为特征的表达矩阵。GSVA对基因富集结果进行了量化,可以更方便地进行后续统计分析。如果用limma包做差异表达分析可以寻找样本间差异表达的基因,同样地,使用limma包对GSVA的结果(依然是一个矩阵)做同样的分析,则可以寻找样本间有显著差异的基因集。这些“差异表达”的基因集,相对于基因而言,更加具有生物学意义,更具有可解释性,可以进一步用于肿瘤subtype的分型等等与生物学意义结合密切的探究。
2.准备数据
2.1 加载相应的包
setwd(" ")
rm(list = ls())
options(stringsAsFactors = F)
library(GSVA)
library(GSEABase)
library(msigdbr)
library(clusterProfiler)
library(org.Hs.eg.db)
library(enrichplot)
library(limma)
2.2 Expression Data
exprSet <- read.table("exprSet.txt",header = T,sep = ",")
rownames(exprSet) <- exprSet$X
exprSet <- exprSet[,-1]
str(exprSet)
2.3 自定义基因集
2.3.1 版本一:没眼睛看
pathway <- read_delim("pathway.txt", "\t",
escape_double = FALSE, trim_ws = TRUE)
pathway <- as.data.frame(pathway)
if(T){
T_cell_activation <- unique(na.omit(pathway$`T cell activation`))
toll_like_receptor_signaling_pathway <- unique(na.omit(pathway$`toll-like receptor signaling pathway`))
leukocyte_differentiation <- unique(na.omit(pathway$`leukocyte differentiation`))
positive_regulation_of_cell_death <- unique(na.omit(pathway$`positive regulation of cell death`))
neutrophil_activation <- unique(na.omit(pathway$`neutrophil activation`))
positive_regulation_of_immune_response <- unique(na.omit(pathway$`positive regulation of immune response`))
}
pathway_list <- list(T_cell_activation,toll_like_receptor_signaling_pathway,leukocyte_differentiation,
positive_regulation_of_cell_death,neutrophil_activation,positive_regulation_of_immune_response)
names(pathway_list) <- c("T cell activation","toll-like receptor signaling pathway","leukocyte differentiation",
"positive regulation of cell death","neutrophil activation","positive regulation of immune response")
2.3.2 版本二:for循环
pathway_list <- vector("list",length(pathway))
for (i in seq_along(pathway)) {
pathway_list[[i]] <- unique(na.omit(pathway[,i]))
}
names(pathway_list) <- c("T cell activation","toll-like receptor signaling pathway","leukocyte differentiation",
"positive regulation of cell death","neutrophil activation","positive regulation of immune response")
2.3.3 版本二:lappy()
pathway_list <- lapply(pathway, function(x) {
unique(na.omit(x))
})
不得不说,apply()家族是真的香呀!
3.实战
gsva_matrix_BD <- gsva(as.matrix(exprSet), pathway_list,method='gsva',
kcdf='Gaussian',abs.ranking=TRUE)
write.csv(gsva_matrix_BD,file = "gsva_matrix_BD.csv")