×

【r<-包|数据集|公开数据库】UCSCXenaTools包用法介绍——搜索与下载TCGA、GDC、ICGC等公开数据库数据集

96
王诗翔 Db3aaf4f effd 43dc 9137 d6bf7f70211e
2018.08.07 14:46* 字数 1092

XenaR包提供了一个简单的UCSC Xena接口,可以获取一些UCSC Xena存储的信息,包括GDC、TCGA、ICGC、GTEx、CCLE等数据库的上千个数据集。特别是TCGA(hg19版本)的一部分数据UCSC做了非常好的标准化处理,下载即可用。这几天我想要能够通过代码下载相关数据,而不是每次通过网页上的点点点。考虑到XenaR包的原作者有3年没有更新了,我在它的基础上修正了目前UCSC Xena提供的Hug API,可以完成原包的功能(见https://github.com/DataGeeker/XenaR)。并且,基于这个包,目前正在构建包UCSCXenaTools

点击查看目前Xena提供的数据集。

目前可以利用它搜索数据集以及下载和导入R了。下面简单讲解下它的用法,目前没时间写文档,所以使用该包看这篇文章很重要。

使用

安装

Github上安装,运行下面代码

if(!require(devtools)){
    install.packages("devtools", dependencies = TRUE)
}

devtools::install_github("ShixiangWang/UCSCXenaTools")

导入

library(UCSCXenaTools)

一些用法存在变动,请阅读 https://cran.r-project.org/package=UCSCXenaTools 页面的手册

探索

使用XenaHub()可以获取所有的资源,另外可以通过参数指定感兴趣的,包括hostscohorts以及datasets

xe <- XenaHub()
xe
## class: XenaHub 
## hosts():
##   https://ucscpublic.xenahubs.net
##   https://tcga.xenahubs.net
##   https://gdc.xenahubs.net
##   https://icgc.xenahubs.net
##   https://toil.xenahubs.net
## cohorts() (137 total):
##   (unassigned)
##   1000_genomes
##   Acute lymphoblastic leukemia (Mullighan 2008)
##   ...
##   TCGA Pan-Cancer (PANCAN)
##   TCGA TARGET GTEx
## datasets() (1521 total):
##   parsons2008cgh_public/parsons2008cgh_genomicMatrix
##   parsons2008cgh_public/parsons2008cgh_public_clinicalMatrix
##   vijver2002_public/vijver2002_genomicMatrix
##   ...
##   TCGA_survival_data
##   mc3.v0.2.8.PUBLIC.toil.xena
head(cohorts(xe))
## [1] "(unassigned)"                                 
## [2] "1000_genomes"                                 
## [3] "Acute lymphoblastic leukemia (Mullighan 2008)"
## [4] "B cells (Basso 2005)"                         
## [5] "Breast Cancer (Caldas 2007)"                  
## [6] "Breast Cancer (Chin 2006)"

结果返回一个XenaHub对象。

为了简化hosts()的输入,我们可以使用hostName指定我们想要搜索TCGA的内容,如下:

XenaHub(hostName = "TCGA")
## class: XenaHub 
## hosts():
##   https://tcga.xenahubs.net
## cohorts() (39 total):
##   (unassigned)
##   TCGA Acute Myeloid Leukemia (LAML)
##   TCGA Adrenocortical Cancer (ACC)
##   ...
##   TCGA Thyroid Cancer (THCA)
##   TCGA Uterine Carcinosarcoma (UCS)
## datasets() (879 total):
##   TCGA.OV.sampleMap/HumanMethylation27
##   TCGA.OV.sampleMap/HumanMethylation450
##   TCGA.OV.sampleMap/Gistic2_CopyNumber_Gistic2_all_data_by_genes
##   ...
##   TCGA.MESO.sampleMap/MESO_clinicalMatrix
##   TCGA.MESO.sampleMap/Pathway_Paradigm_RNASeq_And_Copy_Number

hosts()cohorts()datasets()以及samples()函数可以获取对应的内容,输入参数为XenaHub对象。

hosts(xe)
## [1] "https://ucscpublic.xenahubs.net" "https://tcga.xenahubs.net"      
## [3] "https://gdc.xenahubs.net"        "https://icgc.xenahubs.net"      
## [5] "https://toil.xenahubs.net"
cohorts(xe)
##   [1] "(unassigned)"                                                         
##   [2] "1000_genomes"                                                         
##   [3] "Acute lymphoblastic leukemia (Mullighan 2008)"                        
##   [4] "B cells (Basso 2005)"                                                 
##   [5] "Breast Cancer (Caldas 2007)"                                          
##   [6] "Breast Cancer (Chin 2006)"                                            
##   [7] "Breast Cancer (Haverty 2008)"                                         
##   [8] "Breast Cancer (Hess 2006)"                                            
##   [9] "Breast Cancer (Miller 2005)"                                          
##  [10] "Breast Cancer (vantVeer 2002)"                                        
##  [11] "Breast Cancer (Vijver 2002)"                                          
##  [12] "Breast Cancer (Yau 2010)"                                             
##  [13] "Breast Cancer Cell Lines (Heiser 2012)"                               
##  [14] "Breast Cancer Cell Lines (Neve 2006)"                                 
##  [15] "Cancer Cell Line Encyclopedia (Breast)"                               
##  [16] "Cancer Cell Line Encyclopedia (CCLE)"                                 
##  [17] "Connectivity Map"                                                     
##  [18] "DIPG and Pediatric Non-Brainstem High-Grade Glioma (Wu 2014, St Jude)"
##  [19] "Ewing Sarcoma Family of Tumors (Brohl 2014)"                          
##  [20] "GBM (Parsons 2008)"                                                   
##  [21] "Glioma (Kotliarov 2006)"                                              
##  [22] "Inbred mouse (Cutler 2007)"                                           
##  [23] "Lung Adenocarcinoma (Ding 2008)"                                      
##  [24] "Lung Cancer (Raponi 2006)"                                            
##  [25] "Lung Cancer CGH (Weir 2007)"                                          
##  [26] "lymph-node-negative breast cancer (Wang 2005)"                        
##  [27] "MAGIC"                                                                
##  [28] "Melanoma (Lin 2008)"                                                  
##  [29] "Mouse and Human Colon Tumors (Kaiser 2007)"                           
##  [30] "Mouse pancreatic adenocarcinoma (Bardeesy 2006)"                      
##  [31] "Mouse Tumors (Maser 2007)"                                            
##  [32] "NCI60"                                                                
##  [33] "Neuroblastoma (Khan)"                                                 
##  [34] "Neuroblastoma (Sausen 2013)"                                          
##  [35] "Node-negative breast cancer (Desmedt 2007)"                           
##  [36] "Ovarian Cancer (Etemadmoghadam 2009)"                                 
##  [37] "Pancreatic Cancer (Balagurunathan 2008)"                              
##  [38] "Pancreatic Cancer (Harada 2008)"                                      
##  [39] "Pancreatic Cancer (Jones 2008)"                                       
##  [40] "Pediatric diffuse intrinsic pontine gliomas (Puget 2012)"             
##  [41] "Pediatric tumor (Khan)"                                               
##  [42] "POG TCGA TARGET_NBL"                                                  
##  [43] "Single-cell RNA-seq mouse cortex (Zeisel)"                            
##  [44] "St Jude PCGP pan-cancer"                                              
##  [45] "TARGET Acute Lymphoblastic Leukemia"                                  
##  [46] "TARGET neuroblastoma"                                                 
##  [47] "(unassigned)"                                                         
##  [48] "TCGA Acute Myeloid Leukemia (LAML)"                                   
##  [49] "TCGA Adrenocortical Cancer (ACC)"                                     
##  [50] "TCGA Bile Duct Cancer (CHOL)"                                         
##  [51] "TCGA Bladder Cancer (BLCA)"                                           
##  [52] "TCGA Breast Cancer (BRCA)"                                            
##  [53] "TCGA Cervical Cancer (CESC)"                                          
##  [54] "TCGA Colon and Rectal Cancer (COADREAD)"                              
##  [55] "TCGA Colon Cancer (COAD)"                                             
##  [56] "TCGA Endometrioid Cancer (UCEC)"                                      
##  [57] "TCGA Esophageal Cancer (ESCA)"                                        
##  [58] "TCGA Formalin Fixed Paraffin-Embedded Pilot Phase II (FPPP)"          
##  [59] "TCGA Glioblastoma (GBM)"                                              
##  [60] "TCGA Head and Neck Cancer (HNSC)"                                     
##  [61] "TCGA Kidney Chromophobe (KICH)"                                       
##  [62] "TCGA Kidney Clear Cell Carcinoma (KIRC)"                              
##  [63] "TCGA Kidney Papillary Cell Carcinoma (KIRP)"                          
##  [64] "TCGA Large B-cell Lymphoma (DLBC)"                                    
##  [65] "TCGA Liver Cancer (LIHC)"                                             
##  [66] "TCGA Lower Grade Glioma (LGG)"                                        
##  [67] "TCGA lower grade glioma and glioblastoma (GBMLGG)"                    
##  [68] "TCGA Lung Adenocarcinoma (LUAD)"                                      
##  [69] "TCGA Lung Cancer (LUNG)"                                              
##  [70] "TCGA Lung Squamous Cell Carcinoma (LUSC)"                             
##  [71] "TCGA Melanoma (SKCM)"                                                 
##  [72] "TCGA Mesothelioma (MESO)"                                             
##  [73] "TCGA Ocular melanomas (UVM)"                                          
##  [74] "TCGA Ovarian Cancer (OV)"                                             
##  [75] "TCGA Pan-Cancer (PANCAN)"                                             
##  [76] "TCGA Pancreatic Cancer (PAAD)"                                        
##  [77] "TCGA Pheochromocytoma & Paraganglioma (PCPG)"                         
##  [78] "TCGA Prostate Cancer (PRAD)"                                          
##  [79] "TCGA Rectal Cancer (READ)"                                            
##  [80] "TCGA Sarcoma (SARC)"                                                  
##  [81] "TCGA Stomach Cancer (STAD)"                                           
##  [82] "TCGA Testicular Cancer (TGCT)"                                        
##  [83] "TCGA Thymoma (THYM)"                                                  
##  [84] "TCGA Thyroid Cancer (THCA)"                                           
##  [85] "TCGA Uterine Carcinosarcoma (UCS)"                                    
##  [86] "(unassigned)"                                                         
##  [87] "GDC Pan-Cancer (PANCAN)"                                              
##  [88] "GDC TARGET-AML"                                                       
##  [89] "GDC TARGET-CCSK"                                                      
##  [90] "GDC TARGET-NBL"                                                       
##  [91] "GDC TARGET-OS"                                                        
##  [92] "GDC TARGET-RT"                                                        
##  [93] "GDC TARGET-WT"                                                        
##  [94] "GDC TCGA Acute Myeloid Leukemia (LAML)"                               
##  [95] "GDC TCGA Adrenocortical Cancer (ACC)"                                 
##  [96] "GDC TCGA Bile Duct Cancer (CHOL)"                                     
##  [97] "GDC TCGA Bladder Cancer (BLCA)"                                       
##  [98] "GDC TCGA Breast Cancer (BRCA)"                                        
##  [99] "GDC TCGA Cervical Cancer (CESC)"                                      
## [100] "GDC TCGA Colon Cancer (COAD)"                                         
## [101] "GDC TCGA Endometrioid Cancer (UCEC)"                                  
## [102] "GDC TCGA Esophageal Cancer (ESCA)"                                    
## [103] "GDC TCGA Glioblastoma (GBM)"                                          
## [104] "GDC TCGA Head and Neck Cancer (HNSC)"                                 
## [105] "GDC TCGA Kidney Chromophobe (KICH)"                                   
## [106] "GDC TCGA Kidney Clear Cell Carcinoma (KIRC)"                          
## [107] "GDC TCGA Kidney Papillary Cell Carcinoma (KIRP)"                      
## [108] "GDC TCGA Large B-cell Lymphoma (DLBC)"                                
## [109] "GDC TCGA Liver Cancer (LIHC)"                                         
## [110] "GDC TCGA Lower Grade Glioma (LGG)"                                    
## [111] "GDC TCGA Lung Adenocarcinoma (LUAD)"                                  
## [112] "GDC TCGA Lung Squamous Cell Carcinoma (LUSC)"                         
## [113] "GDC TCGA Melanoma (SKCM)"                                             
## [114] "GDC TCGA Mesothelioma (MESO)"                                         
## [115] "GDC TCGA Ocular melanomas (UVM)"                                      
## [116] "GDC TCGA Ovarian Cancer (OV)"                                         
## [117] "GDC TCGA Pancreatic Cancer (PAAD)"                                    
## [118] "GDC TCGA Pheochromocytoma & Paraganglioma (PCPG)"                     
## [119] "GDC TCGA Prostate Cancer (PRAD)"                                      
## [120] "GDC TCGA Rectal Cancer (READ)"                                        
## [121] "GDC TCGA Sarcoma (SARC)"                                              
## [122] "GDC TCGA Stomach Cancer (STAD)"                                       
## [123] "GDC TCGA Testicular Cancer (TGCT)"                                    
## [124] "GDC TCGA Thymoma (THYM)"                                              
## [125] "GDC TCGA Thyroid Cancer (THCA)"                                       
## [126] "GDC TCGA Uterine Carcinosarcoma (UCS)"                                
## [127] "(unassigned)"                                                         
## [128] "ICGC (donor centric)"                                                 
## [129] "ICGC (specimen centric)"                                              
## [130] "ICGC (US donors with both RNA and SNV data)"                          
## [131] "PACA-AU"                                                              
## [132] "(unassigned)"                                                         
## [133] "GTEX"                                                                 
## [134] "TARGET Pan-Cancer (PANCAN)"                                           
## [135] "TCGA and TARGET Pan-Cancer (PANCAN)"                                  
## [136] "TCGA Pan-Cancer (PANCAN)"                                             
## [137] "TCGA TARGET GTEx"
datasets(xe)[1:10]
##  [1] "parsons2008cgh_public/parsons2008cgh_genomicMatrix"                      
##  [2] "parsons2008cgh_public/parsons2008cgh_public_clinicalMatrix"              
##  [3] "vijver2002_public/vijver2002_genomicMatrix"                              
##  [4] "vijver2002_public/vijver2002_public_clinicalMatrix"                      
##  [5] "chin2006_public/chin2006Exp_genomicMatrix"                               
##  [6] "chin2006_public/ucsfChinCGH2006_genomicMatrix"                           
##  [7] "chin2006_public/chin2006_public_clinicalMatrix"                          
##  [8] "Treehouse/Treehouse_Khan_neuroblastoma/expression"                       
##  [9] "Treehouse/Treehouse_Khan_neuroblastoma/neuroblastoma_affy_clinicalMatrix"
## [10] "Treehouse/NBL_Sausen_et_al_2013_SNV.tsv"
# samples(xe)[1:10]
# 关于samples的用法请查看 <https://github.com/DataGeeker/XenaR/blob/master/inst/README.Rmd>
# 这里输出内容太多,也不是该包的主题

下载与导入数据

为了能够自定义下载所需要的数据,该包提供了XenaQueryXenaDownloadXenaPrepare3连击。

下面以下载和导入TCGA临床数据为例进行说明,其他数据类似。

filter

查看感兴趣的数据集

xe = XenaHub(hostName = "TCGA")
xe
## class: XenaHub 
## hosts():
##   https://tcga.xenahubs.net
## cohorts() (39 total):
##   (unassigned)
##   TCGA Acute Myeloid Leukemia (LAML)
##   TCGA Adrenocortical Cancer (ACC)
##   ...
##   TCGA Thyroid Cancer (THCA)
##   TCGA Uterine Carcinosarcoma (UCS)
## datasets() (879 total):
##   TCGA.OV.sampleMap/HumanMethylation27
##   TCGA.OV.sampleMap/HumanMethylation450
##   TCGA.OV.sampleMap/Gistic2_CopyNumber_Gistic2_all_data_by_genes
##   ...
##   TCGA.MESO.sampleMap/MESO_clinicalMatrix
##   TCGA.MESO.sampleMap/Pathway_Paradigm_RNASeq_And_Copy_Number

可以看到有800+个数据集,太多了。下面使用filterXena()函数进行过滤。用户可以使用全名或者正则表达式。

(filterXena(xe, filterDatasets = "clinical") -> xe2)
## class: XenaHub 
## hosts():
##   https://tcga.xenahubs.net
## cohorts() (39 total):
##   (unassigned)
##   TCGA Acute Myeloid Leukemia (LAML)
##   TCGA Adrenocortical Cancer (ACC)
##   ...
##   TCGA Thyroid Cancer (THCA)
##   TCGA Uterine Carcinosarcoma (UCS)
## datasets() (37 total):
##   TCGA.OV.sampleMap/OV_clinicalMatrix
##   TCGA.DLBC.sampleMap/DLBC_clinicalMatrix
##   TCGA.KIRC.sampleMap/KIRC_clinicalMatrix
##   ...
##   TCGA.READ.sampleMap/READ_clinicalMatrix
##   TCGA.MESO.sampleMap/MESO_clinicalMatrix

不是很多了吧?注意该函数的两个参数filterCohortsfilterDatasets是相互独立的,因为核心的XenaR并没有其中一者变化,另外也跟着变化的功能。后续我会想其他办法解决。不过呢,这里因为我们主要聚焦数据集的下载和使用,cohorts可以不管。

datasets(xe2)
##  [1] "TCGA.OV.sampleMap/OV_clinicalMatrix"            
##  [2] "TCGA.DLBC.sampleMap/DLBC_clinicalMatrix"        
##  [3] "TCGA.KIRC.sampleMap/KIRC_clinicalMatrix"        
##  [4] "TCGA.SARC.sampleMap/SARC_clinicalMatrix"        
##  [5] "TCGA.COAD.sampleMap/COAD_clinicalMatrix"        
##  [6] "TCGA.PRAD.sampleMap/PRAD_clinicalMatrix"        
##  [7] "TCGA.LUSC.sampleMap/LUSC_clinicalMatrix"        
##  [8] "TCGA.ACC.sampleMap/ACC_clinicalMatrix"          
##  [9] "TCGA.KICH.sampleMap/KICH_clinicalMatrix"        
## [10] "TCGA.UCS.sampleMap/UCS_clinicalMatrix"          
## [11] "TCGA.COADREAD.sampleMap/COADREAD_clinicalMatrix"
## [12] "TCGA.LUNG.sampleMap/LUNG_clinicalMatrix"        
## [13] "TCGA.LUAD.sampleMap/LUAD_clinicalMatrix"        
## [14] "TCGA.FPPP.sampleMap/FPPP_clinicalMatrix"        
## [15] "TCGA.LAML.sampleMap/LAML_clinicalMatrix"        
## [16] "TCGA.GBM.sampleMap/GBM_clinicalMatrix"          
## [17] "TCGA.KIRP.sampleMap/KIRP_clinicalMatrix"        
## [18] "TCGA.PAAD.sampleMap/PAAD_clinicalMatrix"        
## [19] "TCGA.CHOL.sampleMap/CHOL_clinicalMatrix"        
## [20] "TCGA.CESC.sampleMap/CESC_clinicalMatrix"        
## [21] "TCGA.SKCM.sampleMap/SKCM_clinicalMatrix"        
## [22] "TCGA.LGG.sampleMap/LGG_clinicalMatrix"          
## [23] "TCGA.PCPG.sampleMap/PCPG_clinicalMatrix"        
## [24] "TCGA.TGCT.sampleMap/TGCT_clinicalMatrix"        
## [25] "TCGA.BLCA.sampleMap/BLCA_clinicalMatrix"        
## [26] "TCGA.THYM.sampleMap/THYM_clinicalMatrix"        
## [27] "TCGA.BRCA.sampleMap/BRCA_clinicalMatrix"        
## [28] "TCGA.UVM.sampleMap/UVM_clinicalMatrix"          
## [29] "TCGA.UCEC.sampleMap/UCEC_clinicalMatrix"        
## [30] "TCGA.LIHC.sampleMap/LIHC_clinicalMatrix"        
## [31] "TCGA.GBMLGG.sampleMap/GBMLGG_clinicalMatrix"    
## [32] "TCGA.THCA.sampleMap/THCA_clinicalMatrix"        
## [33] "TCGA.HNSC.sampleMap/HNSC_clinicalMatrix"        
## [34] "TCGA.ESCA.sampleMap/ESCA_clinicalMatrix"        
## [35] "TCGA.STAD.sampleMap/STAD_clinicalMatrix"        
## [36] "TCGA.READ.sampleMap/READ_clinicalMatrix"        
## [37] "TCGA.MESO.sampleMap/MESO_clinicalMatrix"

我只想选择肺癌相关,所以再加一些条件:

(filterXena(xe2, filterDatasets = "LUAD|LUSC|LUNG")) -> xe2

如果你很清楚你想要做的,可以使用dplyr的管道操作符进行连续过滤,不然建议一步一步挑选。

suppressMessages(require(dplyr))
## Warning: 程辑包'dplyr'是用R版本3.5.1 来建造的
xe %>% 
    filterXena(filterDatasets = "clinical") %>% 
    filterXena(filterDatasets = "luad|lusc|lung")
## class: XenaHub 
## hosts():
##   https://tcga.xenahubs.net
## cohorts() (39 total):
##   (unassigned)
##   TCGA Acute Myeloid Leukemia (LAML)
##   TCGA Adrenocortical Cancer (ACC)
##   ...
##   TCGA Thyroid Cancer (THCA)
##   TCGA Uterine Carcinosarcoma (UCS)
## datasets() (3 total):
##   TCGA.LUSC.sampleMap/LUSC_clinicalMatrix
##   TCGA.LUNG.sampleMap/LUNG_clinicalMatrix
##   TCGA.LUAD.sampleMap/LUAD_clinicalMatrix

过滤后返回的还是XenaHub对象。

query

接下来我们准备下载这3个选择好的数据集。

先构建一个query对象(当前还没有用类封装),就是一个数据框。存储了主机地址,下载的url等。

xe2_query = XenaQuery(xe2)

xe2_query
##                       hosts                                datasets
## 1 https://tcga.xenahubs.net TCGA.LUSC.sampleMap/LUSC_clinicalMatrix
## 2 https://tcga.xenahubs.net TCGA.LUNG.sampleMap/LUNG_clinicalMatrix
## 3 https://tcga.xenahubs.net TCGA.LUAD.sampleMap/LUAD_clinicalMatrix
##                                                                             url
## 1 https://tcga.xenahubs.net/download/TCGA.LUSC.sampleMap/LUSC_clinicalMatrix.gz
## 2 https://tcga.xenahubs.net/download/TCGA.LUNG.sampleMap/LUNG_clinicalMatrix.gz
## 3 https://tcga.xenahubs.net/download/TCGA.LUAD.sampleMap/LUAD_clinicalMatrix.gz

download

默认XenaDownload函数将下载数据到当前目录的Xena_Data目录下,如果数据已经下载,将提示并不会下载,可以使用force=TRUE强制下载,另外支持一些到download.file函数的参数。

注意该函数有返回项,可以用于后续数据的导入。

xe2_download = XenaDownload(xe2_query, destdir = "E:/Github/XenaData/test/")
## We will download files to directory E:/Github/XenaData/test/.
## E:/Github/XenaData/test//TCGA.LUSC.sampleMap__LUSC_clinicalMatrix.gz, the file has been download!
## E:/Github/XenaData/test//TCGA.LUNG.sampleMap__LUNG_clinicalMatrix.gz, the file has been download!
## E:/Github/XenaData/test//TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz, the file has been download!
## Note fileNames transfromed from datasets name and / chracter all changed to __ character.

prepare

数据下载之后就可以将数据导入R,背后用的是readr包的read_tsv函数。

支持4种导入方式,大于1个文件就会生成一个列表:

  • 指定本地目录(目录下所有文件都会导入)
  • 指定本地文件
  • 指定url,如果只是少量文件,我们可以直接指定url导入,这一步不需要先下载数据到本地(但不推荐)
  • 指定XenaDownload函数返回的对象

方式1:

# way1:  directory
cli1 = XenaPrepare("E:/Github/XenaData/test/")
names(cli1)
## [1] "TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz"
## [2] "TCGA.LUNG.sampleMap__LUNG_clinicalMatrix.gz"
## [3] "TCGA.LUSC.sampleMap__LUSC_clinicalMatrix.gz"

方式2:

# way2: local files
cli2 = XenaPrepare("E:/Github/XenaData/test/TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz")
class(cli2)
## [1] "tbl_df"     "tbl"        "data.frame"

cli2 = XenaPrepare(c("E:/Github/XenaData/test/TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz",
                     "E:/Github/XenaData/test/TCGA.LUNG.sampleMap__LUNG_clinicalMatrix.gz"))
class(cli2)
## [1] "list"
names(cli2)
## [1] "TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz"
## [2] "TCGA.LUNG.sampleMap__LUNG_clinicalMatrix.gz"

方式3:

# way3: urls
cli3 = XenaPrepare(xe2_download$url[1:2])
names(cli3)
## [1] "LUSC_clinicalMatrix.gz" "LUNG_clinicalMatrix.gz"

方式4:

# way4: xenadownload object
cli4 = XenaPrepare(xe2_download)
names(cli4)
## [1] "TCGA.LUSC.sampleMap__LUSC_clinicalMatrix.gz"
## [2] "TCGA.LUNG.sampleMap__LUNG_clinicalMatrix.gz"
## [3] "TCGA.LUAD.sampleMap__LUAD_clinicalMatrix.gz"

许可证

GPL-3

进一步

查找感兴趣的数据集、下载数据是这个包的核心。除了修复Bug,后续会尝试开发一些更快速运行,支持hostscohortsdatasets同步变化的功能,另外增加数据下载后的探索与分析。

欢迎使用、关注、Star与提问。

极客RrR<<-数据分析之道
16.5万字 · 7.6万阅读 · 316人关注
R语言学习笔记、数据分析与解决方案、文章转载与资料分享。有些内容属于付费,如果大家觉得有用,希望支持一下。 用点滴记录成长为极客R。 如果文章内容有问题还请指正。
Web note ad 1