Analyzing TCGA Data in R via the TCGAretriever package

Brief introduction about TCGA, cBioPortal and the Cancer Genomic Data Server (CGDS)

The Cancer Genome Atlas (TCGA) is a program aimed at improving the understanding of the molecular basis of cancer. TCGA stores multidimensional genomic data sets generated by analysis of tumor tissue (as well as tumor-matched non-cancer tissues) from more than 10,000 patients. TCGA data are publicly available (only access to raw seq data is usually restricted) and can be used by oncologists and researchers worldwide.

A very straightforward way to access TCGA data is via cBioPortal. cBioPortal is an open-access on-line resource for “interactive exploration of multidimensional cancer genomic datasets” (including TCGA). cBioPortal is developed and maintained by the Computational Biology (cBio) Center at Memorial Sloan-Kettering Cancer Center. cBioPortal allows access to TCGA and other data sets via an interactive web interface (http://www.cbioportal.org) or via the Cancer Genomic Data Server (CGDS) interface.

TCGAretriever is an R package that queries CGDS to retrieve TCGA (as well as non-TCGA) genomic data sets. More information about TCGA (http://cancergenome.nih.gov/abouttcga/overview), cBioPortal (http://www.cbioportal.org/faq.jsp) and CGDS (http://www.cbioportal.org/web_api.jsp) are available on-line. Also, cBio maintains an official R package for accessing CGDS: cgdsr.

 

Installing TCGAretriever

There are two ways to install TCGAretriever: from CRAN or from GitHub. The last official release of TCGAretriever can be installed from CRAN by typing in your R console:

install.packages(“TCGAretriever”)

Alternatively, the last dev/beta release can be installed from GitHub via the “devtools” package. The GitHub page of TCGAretriever is available at this link.

library(devtools)

install_github(“dami82/TCGAretriever”)

 

Basic TCGAretriever usage (core functions)

get_cancer_types() List of all Cancer Types listed on the server

get_cancer_studies() List of all Cancer Studies available on the server

get_cancer_studies() Types of data available for a cancer study of interest

get_case_lists() Given a cancer study of interest, which patients (cases) were tested by which type of analysis

expand_cases() Returns a list where each case_list_id is reported together with the individual case identifiers

 

e

Include TCGA Datasets and

The Cancer Genome Atlas (TCGA), a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), has generated comprehensive, multi-dimensional mapof the key genomic changes in 33 types of cancer. The TCGA dataset, 2.5 petabytes of  The data have contributed to more than a thousand studies of cancer by independent researchers and to the TCGA research network publications.

 

http://www.cbioportal.org/faq.jsp

http://www.cbioportal.org/data_sets.jsp

 

About Author

Damiano
Postdoc Research Fellow at Northwestern University (Chicago)

Leave a Comment

Your email address will not be published. Required fields are marked *