Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Publication

Whole-genome sequencing of nine esophageal adenocarcinoma cell lines.

F1000Research | 2016

Esophageal adenocarcinoma (EAC) is highly mutated and molecularly heterogeneous. The number of cell lines available for study is limited and their genome has been only partially characterized. The availability of an accurate annotation of their mutational landscape is crucial for accurate experimental design and correct interpretation of genotype-phenotype findings. We performed high coverage, paired end whole genome sequencing on eight EAC cell lines-ESO26, ESO51, FLO-1, JH-EsoAd1, OACM5.1 C, OACP4 C, OE33, SK-GT-4-all verified against original patient material, and one esophageal high grade dysplasia cell line, CP-D. We have made available the aligned sequence data and report single nucleotide variants (SNVs), small insertions and deletions (indels), and copy number alterations, identified by comparison with the human reference genome and known single nucleotide polymorphisms (SNPs). We compare these putative mutations to mutations found in primary tissue EAC samples, to inform the use of these cell lines as a model of EAC.

Pubmed ID: 27594985 RIS Download

Research resources used in this publication

None found

Additional research tools detected in this publication

Antibodies used in this publication

None found

Associated grants

None

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.

European Nucleotide Archive (ENA) (tool)

RRID:SCR_006515

Public archive providing a comprehensive record of the world''''s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation. All submitted data, once public, will be exchanged with the NCBI and DDBJ as part of the INSDC data exchange agreement. The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources including submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centers and routine and comprehensive exchange with their partners in the International Nucleotide Sequence Database Collaboration (INSDC). Provision of nucleotide sequence data to ENA or its INSDC partners has become a central and mandatory step in the dissemination of research findings to the scientific community. ENA works with publishers of scientific literature and funding bodies to ensure compliance with these principles and to provide optimal submission systems and data access tools that work seamlessly with the published literature. ENA is made up of a number of distinct databases that includes the EMBL Nucleotide Sequence Database (Embl-Bank), the newly established Sequence Read Archive (SRA) and the Trace Archive. The main tool for downloading ENA data is the ENA Browser, which is available through REST URLs for easy programmatic use. All ENA data are available through the ENA Browser. Note: EMBL Nucleotide Sequence Database (EMBL-Bank) is entirely included within this resource.

View all literature mentions

Variant Effect Predictor (tool)

RRID:SCR_007931

Data analysis service to predict the functional consequences of known and unknown variants.

View all literature mentions

ATCC (tool)

RRID:SCR_001672

Global nonprofit biological resource center (BRC) and research organization that provides biological products, technical services and educational programs to private industry, government and academic organizations. Its mission is to acquire, authenticate, preserve, develop and distribute biological materials, information, technology, intellectual property and standards for the advancement and application of scientific knowledge. The primary purpose of ATCC is to use its resources and experience as a BRC to become the world leader in standard biological reference materials management, intellectual property resource management and translational research as applied to biomaterial development, standardization and certification. ATCC characterizes cell lines, bacteria, viruses, fungi and protozoa, as well as develops and evaluates assays and techniques for validating research resources and preserving and distributing biological materials to the public and private sector research communities.

View all literature mentions

GATK (tool)

RRID:SCR_001876

A software package to analyze next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. This software library makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner. (entry from Genetic Analysis Software)

View all literature mentions

FastQC (tool)

RRID:SCR_014583

Quality control software that perform checks on raw sequence data coming from high throughput sequencing pipelines. This software also provides a modular set of analyses which can give a quick impression of the quality of the data prior to further analysis.

View all literature mentions

VARIANT (tool)

RRID:SCR_005194

Analysis tool that can report the functional properties of any variant in all the human, mouse or rat genes (and soon new model organisms will be added) and the corresponding neighborhoods. Also other non-coding extra-genic regions, such as miRNAs are included in the analysis. It not only reports the obvious functional effects in the coding regions but also analyzes noncoding SNVs situated both within the gene and in the neighborhood that could affect different regulatory motifs, splicing signals, and other structural elements. These include: Jaspar regulatory motifs, miRNA targets, splice sites, exonic splicing silencers, calculations of selective pressures on the particular polymorphic positions, etc. Software analysis pipelines used in the analysis of NGS data are highly modular, heterogeneous, and rapidly evolving. VARIANT can easily be incorporated into a NGS resequencing pipeline either as a CLI or invoked a webservice. It inputs data directly from the most widely used programs for SNV detection.

View all literature mentions

Picard (tool)

RRID:SCR_006525

Java toolset for working with next generation sequencing data in the BAM format.

View all literature mentions

1000 Genomes Project and AWS (tool)

RRID:SCR_008801

A dataset containing the full genomic sequence of 1,700 individuals, freely available for research use. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalogue of human genetic variation. The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research free of charge. The dataset containing the full genomic sequence of 1,700 individuals is now available to all via Amazon S3. The data can be found at: http://s3.amazonaws.com/1000genomes The 1000 Genomes Project aims to include the genomes of more than 2,662 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the data collection this year. Public Data Sets on AWS provide a centralized repository of public data hosted on Amazon Simple Storage Service (Amazon S3). The data can be seamlessly accessed from AWS services such Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), which provide organizations with the highly scalable compute resources needed to take advantage of these large data collections. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data. All 200 TB of the latest 1000 Genomes Project data is available in a publicly available Amazon S3 bucket. You can access the data via simple HTTP requests, or take advantage of the AWS SDKs in languages such as Ruby, Java, Python, .NET and PHP. Researchers can use the Amazon EC2 utility computing service to dive into this data without the usual capital investment required to work with data at this scale. AWS also provides a number of orchestration and automation services to help teams make their research available to others to remix and reuse. Making the data available via a bucket in Amazon S3 also means that customers can crunch the information using Hadoop via Amazon Elastic MapReduce, and take advantage of the growing collection of tools for running bioinformatics job flows, such as CloudBurst and Crossbow.

View all literature mentions

Control-FREEC (tool)

RRID:SCR_010822

Prediction of copy number alterations and loss of heterozygosity using deep-sequencing data.

View all literature mentions

About

The SciCrunch Infrastructure was developed as a cooperative data platform to be used by diverse communities in making data more FAIR.

Contact Us

FAIR Data Informatics Lab

University of California, San Diego

9500 Gilman Drive, Mail Code 0608

La Jolla, CA 92093-0608

United States

info

scicrunch.org

About SciCrunch | Privacy Policy | Terms of Service

Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

Whole-genome sequencing of nine esophageal adenocarcinoma cell lines.

Research resources used in this publication

Additional research tools detected in this publication

Antibodies used in this publication

Associated grants

This is a list of tools and resources that we have found mentioned in this publication.