Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Whole genome and exome sequencing reference datasets from a multi-center and cross-platform benchmark study.

Scientific data | 2021

With the rapid advancement of sequencing technologies, next generation sequencing (NGS) analysis has been widely applied in cancer genomics research. More recently, NGS has been adopted in clinical oncology to advance personalized medicine. Clinical applications of precision oncology require accurate tests that can distinguish tumor-specific mutations from artifacts introduced during NGS processes or data analysis. Therefore, there is an urgent need to develop best practices in cancer mutation detection using NGS and the need for standard reference data sets for systematically measuring accuracy and reproducibility across platforms and methods. Within the SEQC2 consortium context, we established paired tumor-normal reference samples and generated whole-genome (WGS) and whole-exome sequencing (WES) data using sixteen library protocols, seven sequencing platforms at six different centers. We systematically interrogated somatic mutations in the reference samples to identify factors affecting detection reproducibility and accuracy in cancer genomes. These large cross-platform/site WGS and WES datasets using well-characterized reference samples will represent a powerful resource for benchmarking NGS technologies, bioinformatics pipelines, and for the cancer genomics studies.

Pubmed ID: 34753956 RIS Download

Associated grants

  • Agency: NCI NIH HHS, United States
    Id: HHSN261201500003C
  • Agency: NIH HHS, United States
    Id: S10 OD019960
  • Agency: NCI NIH HHS, United States
    Id: HHSN261201500003I
  • Agency: NCI NIH HHS, United States
    Id: HHSN261201800001C

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


ATCC (tool)

RRID:SCR_001672

Global nonprofit biological resource center (BRC) and research organization that provides biological products, technical services and educational programs to private industry, government and academic organizations. Its mission is to acquire, authenticate, preserve, develop and distribute biological materials, information, technology, intellectual property and standards for the advancement and application of scientific knowledge. The primary purpose of ATCC is to use its resources and experience as a BRC to become the world leader in standard biological reference materials management, intellectual property resource management and translational research as applied to biomaterial development, standardization and certification. ATCC characterizes cell lines, bacteria, viruses, fungi and protozoa, as well as develops and evaluates assays and techniques for validating research resources and preserving and distributing biological materials to the public and private sector research communities.

View all literature mentions

GATK (tool)

RRID:SCR_001876

A software package to analyze next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. This software library makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner. (entry from Genetic Analysis Software)

View all literature mentions

COSMIC - Catalogue Of Somatic Mutations In Cancer (tool)

RRID:SCR_002260

Database to store and display somatic mutation information and related details and contains information relating to human cancers. The mutation data and associated information is extracted from the primary literature. In order to provide a consistent view of the data a histology and tissue ontology has been created and all mutations are mapped to a single version of each gene. The data can be queried by tissue, histology or gene and displayed as a graph, as a table or exported in various formats.
Some key features of COSMIC are:
* Contains information on publications, samples and mutations. Includes samples which have been found to be negative for mutations during screening therefore enabling frequency data to be calculated for mutations in different genes in different cancer types.
* Samples entered include benign neoplasms and other benign proliferations, in situ and invasive tumours, recurrences, metastases and cancer cell lines.

View all literature mentions

SomaticSniper (tool)

RRID:SCR_005108

Software program to identify single nucleotide positions that are different between tumor and normal (or, in theory, any two bam files). It takes a tumor bam and a normal bam and compares the two to determine the differences. It outputs a file in a format very similar to Samtools consensus format. It uses the genotype likelihood model of MAQ (as implemented in Samtools) and then calculates the probability that the tumor and normal genotypes are different. This probability is reported as a somatic score. The somatic score is the Phred-scaled probability (between 0 to 255) that the Tumor and Normal genotypes are not different where 0 means there is no probability that the genotypes are different and 255 means there is a probability of 1 ? 10(255/-10) that the genotypes are different between tumor and normal. This is consistent with how the SAM format reports such probabilities. It is currently available as source code via github or as a Debian APT package.

View all literature mentions

Picard (tool)

RRID:SCR_006525

Java toolset for working with next generation sequencing data in the BAM format.

View all literature mentions

Trimmomatic (tool)

RRID:SCR_011848

Software Java pipeline for trimming tasks for Illumina paired end and single ended data. Flexible Trimmer for Illumina Sequence Data. Pair aware preprocessing tool optimized for Illumina next generation sequencing data. Includes several processing steps for read trimming and filtering. Operating systems Unix/Linux, Mac OS, Windows.

View all literature mentions

Agilent Technologies (tool)

RRID:SCR_013575

Company provides laboratories worldwide with analytical instruments and supplies, clinical and diagnostic testing services, consumables, applications and expertise in life sciences and applied chemical markets.

View all literature mentions

Bamtools (tool)

RRID:SCR_015987

Software that provides both a C++ API and a command-line toolkit for reading, writing, and manipulating genome sequence alignment files in the BAM and SAM formats. It is used for research analysis and management of data produced by sequencing technologies.

View all literature mentions

HCC1395 (tool)

RRID:CVCL_1249

Cell line HCC1395 is a Cancer cell line with a species of origin Homo sapiens (Human)

View all literature mentions