Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Construction and integration of three de novo Japanese human genome assemblies toward a population-specific reference.

Nature communications | 2021

The complete human genome sequence is used as a reference for next-generation sequencing analyses. However, some ethnic ancestries are under-represented in the reference genome (e.g., GRCh37) due to its bias toward European and African ancestries. Here, we perform de novo assembly of three Japanese male genomes using > 100× Pacific Biosciences long reads and Bionano Genomics optical maps per sample. We integrate the genomes using the major allele for consensus and anchor the scaffolds using genetic and radiation hybrid maps to reconstruct each chromosome. The resulting genome sequence, JG1, is contiguous, accurate, and carries the Japanese major allele at most loci. We adopt JG1 as the reference for confirmatory exome re-analyses of seven rare-disease Japanese families and find that re-analysis using JG1 reduces total candidate variant calls versus GRCh37 while retaining disease-causing variants. These results suggest that integrating multiple genomes from a single population can aid genome analyses of that population.

Pubmed ID: 33431880 RIS Download

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


PLINK (tool)

RRID:SCR_001757

Open source whole genome association analysis toolset, designed to perform range of basic, large scale analyses in computationally efficient manner. Used for analysis of genotype/phenotype data. Through integration with gPLINK and Haploview, there is some support for subsequent visualization, annotation and storage of results. PLINK 1.9 is improved and second generation of the software.

View all literature mentions

GATK (tool)

RRID:SCR_001876

A software package to analyze next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. This software library makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner. (entry from Genetic Analysis Software)

View all literature mentions

SAMTOOLS (tool)

RRID:SCR_002105

Original SAMTOOLS package has been split into three separate repositories including Samtools, BCFtools and HTSlib. Samtools for manipulating next generation sequencing data used for reading, writing, editing, indexing,viewing nucleotide alignments in SAM,BAM,CRAM format. BCFtools used for reading, writing BCF2,VCF, gVCF files and calling, filtering, summarising SNP and short indel sequence variants. HTSlib used for reading, writing high throughput sequencing data.

View all literature mentions

Eigensoft (tool)

RRID:SCR_004965

EIGENSOFT package combines functionality from our population genetics methods (Patterson et al. 2006) and our EIGENSTRAT stratification method (Price et al. 2006). The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker''s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes. Source code, documentation and executables for using EIGENSOFT 3.0 on a Linux platform can be downloaded. New features of EIGENSOFT 3.0 include supporting either 32-bit or 64-bit Linux machines, a utility to merge different data sets, a utility to identify related samples (accounting for population structure), and supporting multiple file formats for EIGENSTRAT stratification correction.

View all literature mentions

SnpEff (tool)

RRID:SCR_005191

Genetic variant annotation and effect prediction software toolbox that annotates and predicts effects of variants on genes (such as amino acid changes). By using standards, such as VCF, SnpEff makes it easy to integrate with other programs.

View all literature mentions

Picard (tool)

RRID:SCR_006525

Java toolset for working with next generation sequencing data in the BAM format.

View all literature mentions

BEDTools (tool)

RRID:SCR_006646

A powerful toolset for genome arithmetic allowing one to address common genomics tasks such as finding feature overlaps and computing coverage. Bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.

View all literature mentions

UniSTS (tool)

RRID:SCR_006843

THIS RESOURCE IS NO LONGER IN SERVICE, documented August 22, 2016. Database of sequence tagged sites (STSs) derived from STS-based maps and other experiments. STSs are defined by PCR primer pairs and are associated with additional information such as genomic position, genes, and sequences. Chromosome maps are labeled by name of the originating organism, the map title, total markers, total UniSTSs and links to view maps as well as research documents available through PubMed, another NCBI database. The search functions within UniSTS allow the user to search by gene marker, chromosome, gene symbol and gene description terms to locate markers on specified genes. A representation of the UniSTS datasets is available by ftp. NOTE: All data from this resource have been moved to the Probe database, http://www.ncbi.nlm.nih.gov/probe. You can retrieve all UniSTS records by searching the probe database using the search term unists(properties). (use brackets insead of parenthesis). Additionally, legacy data remain on the NCBI FTP Site in the UniSTS Repository (ftp://ftp.ncbi.nih.gov/pub/ProbeDB/legacy_unists).

View all literature mentions

RepeatMasker (tool)

RRID:SCR_012954

Software tool that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library ) and RepBase ( consensus sequence library ).

View all literature mentions

TraceTuner (tool)

RRID:SCR_013019

Software tool for base and quality calling of trace files from DNA sequencing instruments.

View all literature mentions

ggplot2 (tool)

RRID:SCR_014601

Open source software package for statistical programming language R to create plots based on grammar of graphics. Used for data visualization to break up graphs into semantic components such as scales and layers.

View all literature mentions

Pilon (tool)

RRID:SCR_014731

Software tool to automatically improve draft assemblies and find variation among strains, including large event detection. FASTA files of genome along with one or more BAM files of reads aligned as input. Read alignment analysis is used to identify inconsistencies between input genome and evidence in reads, then attempts to make improvements to genome.

View all literature mentions

GENCODE (tool)

RRID:SCR_014966

Human and mouse genome annotation project which aims to identify all gene features in the human genome using computational analysis, manual annotation, and experimental validation.

View all literature mentions

BWA-MEM2 (tool)

RRID:SCR_022192

Software tool for sequence mapping.The next version of BWA-MEM. Used for aligning sequencing reads against large reference genome.

View all literature mentions