Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Genome Evolution and Innovation across the Four Major Lineages of Cryptococcus gattii.

mBio | 2015

Cryptococcus gattii is a fungal pathogen of humans, causing pulmonary infections in otherwise healthy hosts. To characterize genomic variation among the four major lineages of C. gattii (VGI, -II, -III, and -IV), we generated, annotated, and compared 16 de novo genome assemblies, including the first for the rarely isolated lineages VGIII and VGIV. By identifying syntenic regions across assemblies, we found 15 structural rearrangements, which were almost exclusive to the VGI-III-IV lineages. Using synteny to inform orthology prediction, we identified a core set of 87% of C. gattii genes present as single copies in all four lineages. Remarkably, 737 genes are variably inherited across lineages and are overrepresented for response to oxidative stress, mitochondrial import, and metal binding and transport. Specifically, VGI has an expanded set of iron-binding genes thought to be important to the virulence of Cryptococcus, while VGII has expansions in the stress-related heat shock proteins relative to the other lineages. We also characterized genes uniquely absent in each lineage, including a copper transporter absent from VGIV, which influences Cryptococcus survival during pulmonary infection and the onset of meningoencephalitis. Through inclusion of population-level data for an additional 37 isolates, we identified a new transcontinental clonal group that we name VGIIx, mitochondrial recombination between VGII and VGIII, and positive selection of multidrug transporters and the iron-sulfur protein aconitase along multiple branches of the phylogenetic tree. Our results suggest that gene expansion or contraction and positive selection have introduced substantial variation with links to mechanisms of pathogenicity across this species complex.

Pubmed ID: 26330512 RIS Download

Associated grants

  • Agency: NIAID NIH HHS, United States
    Id: R37 AI039115
  • Agency: NHGRI NIH HHS, United States
    Id: U54 HG003067
  • Agency: NHGRI NIH HHS, United States
    Id: U54HG003067
  • Agency: Medical Research Council, United Kingdom
    Id: MR/K000373/1
  • Agency: NIAID NIH HHS, United States
    Id: R01 AI050113
  • Agency: Wellcome Trust, United Kingdom

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


RepeatMasker (tool)

RRID:SCR_012954

Software tool that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library ) and RepBase ( consensus sequence library ).

View all literature mentions

GATK (tool)

RRID:SCR_001876

A software package to analyze next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. This software library makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner. (entry from Genetic Analysis Software)

View all literature mentions

SAMTOOLS (tool)

RRID:SCR_002105

Original SAMTOOLS package has been split into three separate repositories including Samtools, BCFtools and HTSlib. Samtools for manipulating next generation sequencing data used for reading, writing, editing, indexing,viewing nucleotide alignments in SAM,BAM,CRAM format. BCFtools used for reading, writing BCF2,VCF, gVCF files and calling, filtering, summarising SNP and short indel sequence variants. HTSlib used for reading, writing high throughput sequencing data.

View all literature mentions

UnifiedGenotyper (tool)

RRID:SCR_004710

A multiple-sample, technology-aware SNP and indel caller.

View all literature mentions

Pfam (tool)

RRID:SCR_004726

A database of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Users can analyze protein sequences for Pfam matches, view Pfam family annotation and alignments, see groups of related families, look at the domain organization of a protein sequence, find the domains on a PDB structure, and query Pfam by keywords. There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A entries are high quality, manually curated families that may automatically generate a supplement using the ADDA database. These automatically generated entries are called Pfam-B. Although of lower quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A entries are found. Pfam also generates higher-level groupings of related families, known as clans (collections of Pfam-A entries which are related by similarity of sequence, structure or profile-HMM).

View all literature mentions

Blast2GO (tool)

RRID:SCR_005828

An ALL in ONE tool for functional annotation of (novel) sequences and the analysis of annotation data. Blast2GO (B2G) joins in one universal application similarity search based GO annotation and functional analysis. B2G offers the possibility of direct statistical analysis on gene function information and visualization of relevant functional features on a highlighted GO direct acyclic graph (DAG). Furthermore B2G includes various statistics charts summarizing the results obtained at BLASTing, GO-mapping, annotation and enrichment analysis (Fisher''''s Exact Test). All analysis process steps are configurable and data import and export are supported at any stage. The application also accepts pre-existing BLAST or annotation files and takes them to subsequent steps. The tool offers a very suitable platform for high throughput functional genomics research in non-model species. B2G is a species-independent, intuitive and interactive desktop application which allows monitoring and comprehending the whole annotation and analysis process supported by additional features like GO Slim integration, evidence code (EC) consideration, a Batch-Mode or GO-Multilevel-Pies. Platform: Windows compatible, Mac OS X compatible, Linux compatible, Unix compatible

View all literature mentions

RAxML (tool)

RRID:SCR_006086

Software program for phylogenetic analyses of large datasets under maximum likelihood.

View all literature mentions

Picard (tool)

RRID:SCR_006525

Java toolset for working with next generation sequencing data in the BAM format.

View all literature mentions

MUSCLE (tool)

RRID:SCR_011812

Multiple sequence alignment method with reduced time and space complexity.Multiple sequence alignment with high accuracy and high throughput. Data analysis service for multiple sequence comparison by log- expectation.

View all literature mentions

GeneMark (tool)

RRID:SCR_011930

A family of gene prediction programs developed at Georgia Institute of Technology.

View all literature mentions

ProtTest (tool)

RRID:SCR_014628

Web-based software used for the selection of best-fit models of protein evolution.

View all literature mentions

Pilon (tool)

RRID:SCR_014731

Software tool to automatically improve draft assemblies and find variation among strains, including large event detection. FASTA files of genome along with one or more BAM files of reads aligned as input. Read alignment analysis is used to identify inconsistencies between input genome and evidence in reads, then attempts to make improvements to genome.

View all literature mentions

RepeatModeler (tool)

RRID:SCR_015027

Sequence analysis software that performs repeat family identification and creates models for sequence data. RepeatModeler utilizes RepeatScout and RECON to identify repeat element boundaries and family relationships.

View all literature mentions

GeneWise (tool)

RRID:SCR_015054

Gene alignment tool from the EBI which predicts gene structure using similar protein sequences. See also the associated GenomeWise tool.

View all literature mentions