Cryptococcus gattii is a fungal pathogen of humans, causing pulmonary infections in otherwise healthy hosts. To characterize genomic variation among the four major lineages of C. gattii (VGI, -II, -III, and -IV), we generated, annotated, and compared 16 de novo genome assemblies, including the first for the rarely isolated lineages VGIII and VGIV. By identifying syntenic regions across assemblies, we found 15 structural rearrangements, which were almost exclusive to the VGI-III-IV lineages. Using synteny to inform orthology prediction, we identified a core set of 87% of C. gattii genes present as single copies in all four lineages. Remarkably, 737 genes are variably inherited across lineages and are overrepresented for response to oxidative stress, mitochondrial import, and metal binding and transport. Specifically, VGI has an expanded set of iron-binding genes thought to be important to the virulence of Cryptococcus, while VGII has expansions in the stress-related heat shock proteins relative to the other lineages. We also characterized genes uniquely absent in each lineage, including a copper transporter absent from VGIV, which influences Cryptococcus survival during pulmonary infection and the onset of meningoencephalitis. Through inclusion of population-level data for an additional 37 isolates, we identified a new transcontinental clonal group that we name VGIIx, mitochondrial recombination between VGII and VGIII, and positive selection of multidrug transporters and the iron-sulfur protein aconitase along multiple branches of the phylogenetic tree. Our results suggest that gene expansion or contraction and positive selection have introduced substantial variation with links to mechanisms of pathogenicity across this species complex.
Pubmed ID: 26330512 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
Software tool that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library ) and RepBase ( consensus sequence library ).
View all literature mentionsA software package to analyze next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size. This software library makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner. (entry from Genetic Analysis Software)
View all literature mentionsOriginal SAMTOOLS package has been split into three separate repositories including Samtools, BCFtools and HTSlib. Samtools for manipulating next generation sequencing data used for reading, writing, editing, indexing,viewing nucleotide alignments in SAM,BAM,CRAM format. BCFtools used for reading, writing BCF2,VCF, gVCF files and calling, filtering, summarising SNP and short indel sequence variants. HTSlib used for reading, writing high throughput sequencing data.
View all literature mentionsA multiple-sample, technology-aware SNP and indel caller.
View all literature mentionsA database of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Users can analyze protein sequences for Pfam matches, view Pfam family annotation and alignments, see groups of related families, look at the domain organization of a protein sequence, find the domains on a PDB structure, and query Pfam by keywords. There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A entries are high quality, manually curated families that may automatically generate a supplement using the ADDA database. These automatically generated entries are called Pfam-B. Although of lower quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A entries are found. Pfam also generates higher-level groupings of related families, known as clans (collections of Pfam-A entries which are related by similarity of sequence, structure or profile-HMM).
View all literature mentionsAn ALL in ONE tool for functional annotation of (novel) sequences and the analysis of annotation data. Blast2GO (B2G) joins in one universal application similarity search based GO annotation and functional analysis. B2G offers the possibility of direct statistical analysis on gene function information and visualization of relevant functional features on a highlighted GO direct acyclic graph (DAG). Furthermore B2G includes various statistics charts summarizing the results obtained at BLASTing, GO-mapping, annotation and enrichment analysis (Fisher''''s Exact Test). All analysis process steps are configurable and data import and export are supported at any stage. The application also accepts pre-existing BLAST or annotation files and takes them to subsequent steps. The tool offers a very suitable platform for high throughput functional genomics research in non-model species. B2G is a species-independent, intuitive and interactive desktop application which allows monitoring and comprehending the whole annotation and analysis process supported by additional features like GO Slim integration, evidence code (EC) consideration, a Batch-Mode or GO-Multilevel-Pies. Platform: Windows compatible, Mac OS X compatible, Linux compatible, Unix compatible
View all literature mentionsSoftware program for phylogenetic analyses of large datasets under maximum likelihood.
View all literature mentionsJava toolset for working with next generation sequencing data in the BAM format.
View all literature mentionsMultiple sequence alignment method with reduced time and space complexity.Multiple sequence alignment with high accuracy and high throughput. Data analysis service for multiple sequence comparison by log- expectation.
View all literature mentionsA family of gene prediction programs developed at Georgia Institute of Technology.
View all literature mentionsWeb-based software used for the selection of best-fit models of protein evolution.
View all literature mentionsSoftware tool to automatically improve draft assemblies and find variation among strains, including large event detection. FASTA files of genome along with one or more BAM files of reads aligned as input. Read alignment analysis is used to identify inconsistencies between input genome and evidence in reads, then attempts to make improvements to genome.
View all literature mentionsSequence analysis software that performs repeat family identification and creates models for sequence data. RepeatModeler utilizes RepeatScout and RECON to identify repeat element boundaries and family relationships.
View all literature mentionsGene alignment tool from the EBI which predicts gene structure using similar protein sequences. See also the associated GenomeWise tool.
View all literature mentions