Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Rare variant phasing and haplotypic expression from RNA sequencing with phASER.

Nature communications | 2016

Haplotype phasing of genetic variants is important for clinical interpretation of the genome, population genetic analysis and functional genomic analysis of allelic activity. Here we present phASER, an accurate approach for phasing variants that are overlapped by sequencing reads, including those from RNA sequencing (RNA-seq), which often span multiple exons due to splicing. Using diverse RNA-seq data we demonstrate that this provides more accurate phasing of rare variants compared with population-based phasing and allows phasing of variants in the same gene up to hundreds of kilobases away that cannot be obtained from DNA sequencing (DNA-seq) reads. We show that in the context of medical genetic studies this improves the resolution of compound heterozygotes. Additionally, phASER provides measures of haplotypic expression that increase power and accuracy in studies of allelic expression. In summary, phasing using RNA-seq and phASER is accurate and improves studies where rare variant haplotypes or allelic expression is needed.

Pubmed ID: 27605262 RIS Download

Associated grants

  • Agency: NIDA NIH HHS, United States
    Id: R01 DA006227
  • Agency: NICHD NIH HHS, United States
    Id: R01 HD057036
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH101782
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH101810
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH101819
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH090936
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH090951
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH101820
  • Agency: NIDDK NIH HHS, United States
    Id: P30 DK026687
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH101822
  • Agency: NCRR NIH HHS, United States
    Id: UL1 RR024156
  • Agency: NIDA NIH HHS, United States
    Id: R01 DA033684
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH106842
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH101825
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH090948
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH090941
  • Agency: NIGMS NIH HHS, United States
    Id: R01 GM122924
  • Agency: CCR NIH HHS, United States
    Id: HHSN261200800001C
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH090937
  • Agency: NHLBI NIH HHS, United States
    Id: HHSN268201000029C
  • Agency: NCI NIH HHS, United States
    Id: HHSN261200800001E
  • Agency: NIMH NIH HHS, United States
    Id: R01 MH101814

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


1000 Genomes: A Deep Catalog of Human Genetic Variation (tool)

RRID:SCR_006828

International collaboration producing an extensive public catalog of human genetic variation, including SNPs and structural variants, and their haplotype contexts, in an effort to provide a foundation for investigating the relationship between genotype and phenotype. The genomes of about 2500 unidentified people from about 25 populations around the world were sequenced using next-generation sequencing technologies. Redundant sequencing on various platforms and by different groups of scientists of the same samples can be compared. The results of the study are freely and publicly accessible to researchers worldwide. The consortium identified the following populations whose DNA will be sequenced: Yoruba in Ibadan, Nigeria; Japanese in Tokyo; Chinese in Beijing; Utah residents with ancestry from northern and western Europe; Luhya in Webuye, Kenya; Maasai in Kinyawa, Kenya; Toscani in Italy; Gujarati Indians in Houston; Chinese in metropolitan Denver; people of Mexican ancestry in Los Angeles; and people of African ancestry in the southwestern United States. The goal Project is to find most genetic variants that have frequencies of at least 1% in the populations studied. Sequencing is still too expensive to deeply sequence the many samples being studied for this project. However, any particular region of the genome generally contains a limited number of haplotypes. Data can be combined across many samples to allow efficient detection of most of the variants in a region. The Project currently plans to sequence each sample to about 4X coverage; at this depth sequencing cannot provide the complete genotype of each sample, but should allow the detection of most variants with frequencies as low as 1%. Combining the data from 2500 samples should allow highly accurate estimation (imputation) of the variants and genotypes for each sample that were not seen directly by the light sequencing. All samples from the 1000 genomes are available as lymphoblastoid cell lines (LCLs) and LCL derived DNA from the Coriell Cell Repository as part of the NHGRI Catalog. The sequence and alignment data generated by the 1000genomes project is made available as quickly as possible via their mirrored ftp sites. ftp://ftp.1000genomes.ebi.ac.uk ftp://ftp-trace.ncbi.nlm.nih.gov/1000genomes

View all literature mentions

NumPy (tool)

RRID:SCR_008633

NumPy is the fundamental package needed for scientific computing with Python. It contains among other things: * a powerful N-dimensional array object * sophisticated (broadcasting) functions * tools for integrating C/C and Fortran code * useful linear algebra, Fourier transform, and random number capabilities. Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. Sponsored by ENTHOUGHT

View all literature mentions

GATK (tool)

RRID:SCR_001876

A software package to analyze next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. This software library makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner. (entry from Genetic Analysis Software)

View all literature mentions

GitHub (tool)

RRID:SCR_002630

A web-based hosting service for software development projects that use the Git revision control system offering powerful collaboration, code review, and code management. It offers both paid plans for private repositories, and free accounts for open source projects. Large or small, every repository comes with the same powerful tools. These tools are open to the community for public projects and secure for private projects. Features include: * Integrated issue tracking * Collaborative code review * Easily manage teams within organizations * Text entry with understated power * A growing list of programming languages and data formats * On the desktop and in your pocket - Android app and mobile web views let you keep track of your projects on the go.

View all literature mentions

Systems Transcriptional Activity Reconstruction (tool)

RRID:SCR_005622

A next-generation web-based application that aims to provide an integrated solution for both visualization and analysis of deep-sequencing data, along with simple access to public datasets.

View all literature mentions

BEDTools (tool)

RRID:SCR_006646

A powerful toolset for genome arithmetic allowing one to address common genomics tasks such as finding feature overlaps and computing coverage. Bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.

View all literature mentions

SciPy (tool)

RRID:SCR_008058

A Python-based environment of open-source software for mathematics, science, and engineering. The core packages of SciPy include: NumPy, a base N-dimensional array package; SciPy Library, a fundamental library for scientific computing; and IPython, an enhanced interactive console.

View all literature mentions

HapCUT (tool)

RRID:SCR_010791

A max-cut based algorithm for haplotype assembly using sequence reads from the two chromosomes of an individual.

View all literature mentions

Phaser (tool)

RRID:SCR_014219

Crystallographic software which solves structures using algorithms and automated rapid search calculations to perform molecular replacement and experimental phasing methods.

View all literature mentions