Searching accross hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

U87MG decoded: the genomic sequence of a cytogenetically aberrant human cancer cell line.

PLoS genetics | Jan 29, 2010

U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30x genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date.

Pubmed ID: 20126413 RIS Download

Mesh terms: Cell Line, Tumor | Genome, Human | Genotype | Glioma | Humans | Molecular Sequence Data | Mutation | Polymorphism, Single Nucleotide | Proteins | Sequence Analysis, DNA

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


Ensembl

A collection of genome databases for vertebrates and other eukaryotic species with DNA and protein sequence search capabilities. The goal of Ensembl is to automatically annotate the genome, integrate this annotation with other available biological data and make the data publicly available via the web. The range of available data has also expanded to include comparative genomics, variation and regulatory data. Ensembl allows users to: upload and analyze data and save it to an Ensembl account; search for a DNA or protein sequence using BLAST or BLAT; fetch desired data from the public database, using the Perl API; download the databases via FTP in FASTA, MySQL and other formats; and mine Ensembl with BioMart and export sequences or tables in text, HTML, or Excel format. The DNA sequences and assemblies used in the Ensembl genebuild are provided by various projects around the world. Ensembl has entered into an agreement with UCSC and NCBI with regard to sequence identifiers in order to improve consistency between the data provided by different genome browsers. The site also links to the Ensembl blog with updates on new species and sequences as they are added to the database.

tool

View all literature mentions

RefSeq

Database that provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. It provides a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis (especially RefSeqGene records), expression studies, and comparative analyses. Included are sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes. Each RefSeq is constructed wholly from sequence data submitted to the International Nucleotide Sequence Database Collaboration (INSDC). It is a unique resource because it provides a large, multi-species, curated sequence database representing separate but explicitly linked records from genomes to transcripts and translation products, as appropriate. Unlike the sequence redundancy found in the public sequence repositories that comprise the INSDC, (i.e., NCBI's GenBank, the European Nucleotide Archive, and the DNA Data Bank of Japan), the RefSeq collection aims to provide, for each included species, a complete set of non-redundant, extensively cross-linked, and richly annotated nucleic acid and protein records. It is recognized, however, that the coverage and finishing of public sequence data varies from organism to organism so intermediate genomic records are provided in some circumstances. The RefSeq collection is available without restriction and can be retrieved in several different ways, such as by searching or by available links in NCBI resources, including PubMed, Nucleotide, Protein, Gene, and Map Viewer, searching with a sequence via BLAST, and downloading from the RefSeq FTP site.

tool

View all literature mentions

UCSC Genome Browser

A collection of genomes which include reference sequences and working draft assemblies, as well as a variety of tools to explore these sequences. The Genome Browser zooms and scrolls over chromosomes, showing the work of annotators worldwide. The Gene Sorter shows expression, homology and other information on groups of genes that can be related in many ways. Blat quickly maps your sequence to the genome. The Table Browser provides access to the underlying database. VisiGene lets you browse through a large collection of in situ mouse and frog images to examine expression patterns. Genome Graphs allows you to upload and display genome-wide data sets. Also provided is a portal to the Encyclopedia of DNA Elements (ENCODE) and Neandertal projects.

tool

View all literature mentions

YanHuang Project

This database presents the entire DNA sequence of the first diploid genome sequence of a Han Chinese, a representative of Asian population. The genome, named as YH, represents the start of YanHuang Project, which aims to sequence 100 Chinese individuals in 3 years. It was assembled based on 3.3 billion reads (117.7Gbp raw data) generated by Illumina Genome Analyzer. In total of 102.9Gbp nucleotides were mapped onto the NCBI human reference genome (Build 36) by self-developed software SOAP (Short Oligonucleotide Alignment Program), and 3.07 million SNPs were identified. The personal genome data is illustrated in a MapView, which is powered by GBrowse. A new module was developed to browse large-scale short reads alignment. This module enabled users track detailed divergences between consensus and sequencing reads. In total of 53,643 HGMD recorders were used to screen YH SNPs to retrieve phenotype related information, to superficially explain the donor's genome. Blast service to align query sequences against YH genome consensus was also provided.

tool

View all literature mentions

Consensus CDS

Database (anonymous FTP) resulting from a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations. Collaborators are EBI, NCBI, UCSC, WTSI and the initial results are also available from the participants'''' genome browser Web sites. In addition, CCDS identifiers are indicated on the relevant NCBI RefSeq and Entrez Gene records and in Map Viewer displays of RNA (RefSeq) and Gene annotations on the reference assembly.

tool

View all literature mentions

Mammalian Gene Collection

A trans-NIH initiative providing full-length open reading frame (FL-ORF) clones for human, mouse, and rat genes. In 2005, the project added the cow cDNAs generated by Genome Canada. MGC cDNA clones were obtained by screening of cDNA libraries, by transcript-specific RT-PCR cloning, and by DNA synthesis of cDNA inserts. All MGC sequences are deposited in GenBank and the clones can be purchased from distributors of the IMAGE consortium. You can use Download Plugin Adobe Acrobat Reader A Guide to Finding Mammalian Gene Collection (MGC) Clones and Evaluating Their Sequence to assist in determining whether MGC cDNA clones for human, mouse, or rat genes and transcripts of interest are available for purchase or sequence investigation. ORFeome Collaboration (OC) was formed to provide the research community with sequence-validated, full-ORF human cDNA clones in the Gateway vector format. * Total MGC full ORF clones: Human 29,818; Mouse 27,285; Rat 6,763; Bovine 9,104 * Non-redundant genes: Human 17,592; Mouse 17,701; Rat 6,486; Bovine 8,724 With the conclusion of the MGC project in March 2009, the GenBank records of MGC sequences will be frozen, without further updates. Since the definition of what constitutes a full-length coding region for some of the genes and transcripts for which they have MGC clones will likely change in the future, users planning to order MGC clones will need to monitor for these changes. Users can make use of genome browsers and gene-specific databases, such as the UCSC Genome browser, NCBI's Map Viewer, and Entrez Gene, to view the relevant regions of the genome (browsers) or gene-related information (Entrez Gene)

tool

View all literature mentions

VEGA

Central repository for high quality frequently updated manual annotation of vertebrate finished genome sequence. Human, mouse and zebrafish are in the process of being completely annotated, whereas for other species the annotation is only of specific genomic regions of particular biological interest. The majority of the annotation is from the HAVANA group at the Welcome Trust Sanger Institute. Users can BLAST, search for specific text, export, and download data. Genomes and details of the projects for each species are available through the homepages for human mouse and zebrafish. The website is built upon code from the EnsEMBL (http://www.ensembl.org) project. Some Ensembl features are not available in Vega. From the users point of view perhaps the most significant of these is MartView. However due to their inclusion in Ensembl, Vega human and mouse data can be queried using Ensembl MartView. Vega contains annotation of the human MHC region in eight haplotypes, and the LRC region in three haplotypes. Vega also contains annotation on the Insulin Dependent Diabetes (IDD) regions on non-reference assemblies for mouse.

tool

View all literature mentions

Pompep

FTP site to access Schizosaccharomyces pombe protein data.

tool

View all literature mentions