Relative to the commonly used mitochondrial and nuclear protein-coding genes, the noncoding intron sequences are a promising source of informative markers that have the potential to resolve difficult phylogenetic nodes such as rapid radiations and recent divergences. Yet many issues exist in the use of intron markers, which prevent their extensive application as conventional markers. We used the diverse group of snakes as an example to try paving the way for massive identification and application of intron markers. We performed a series of bioinformatics screenings which identified appropriate introns between single-copy and conserved exons from two snake genomes, adding particular constraints on sequence length variability and sequence variability. A total of 1,273 candidate intron loci were retrieved. Primers for nested polymerase chain reaction (PCR) were designed for over a hundred candidates and tested in 16 snake representatives. 96 intron markers were developed that could be amplified across a broad range of snake taxa with high PCR successful rates. The markers were then applied to 49 snake samples. The large number of amplicons was subjected to next-generation sequencing (NGS). An analytic strategy was developed to accurately recover the amplicon sequences, and approximately, 76% of the marker sequences were recovered. The average p-distances of the intron markers at interfamily, intergenus, interspecies, and intraspecies levels were .168, .052, .015, and .004, respectively, suggesting that they were useful to study snake relationships of different evolutionary depths. A snake phylogeny was constructed with the intron markers, which produced concordant results with robust support at both interfamily and intragenus levels. The intron markers provide a convenient way to explore the signals in the noncoding regions to address the controversies on the snake tree. Our improved strategy of genome screening is effective and can be applied to other animal groups. NGS coupled with appropriate sequence processing can greatly facilitate the extensive application of molecular markers.
Pubmed ID: 29238535 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
Web application to search nucleotide databases using a nucleotide query. Algorithms: blastn, megablast, discontiguous megablast.
View all literature mentionsOriginal SAMTOOLS package has been split into three separate repositories including Samtools, BCFtools and HTSlib. Samtools for manipulating next generation sequencing data used for reading, writing, editing, indexing,viewing nucleotide alignments in SAM,BAM,CRAM format. BCFtools used for reading, writing BCF2,VCF, gVCF files and calling, filtering, summarising SNP and short indel sequence variants. HTSlib used for reading, writing high throughput sequencing data.
View all literature mentionsCollection of genome databases for vertebrates and other eukaryotic species with DNA and protein sequence search capabilities. Used to automatically annotate genome, integrate this annotation with other available biological data and make data publicly available via web. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.
View all literature mentionsTHIS RESOURCE IS NO LONGER IN SERVICE, documented on January 19, 2022. Command line version of multiple sequence alignment program Clustal for DNA or proteins. Alignment is progressive and considers sequence redundancy. No longer being maintained. Please consider using Clustal Omega instead which accepts nucleic acid or protein sequences in multiple sequence formats NBRF/PIR, EMBL/UniProt, Pearson (FASTA), GDE, ALN/ClustalW, GCG/MSF, RSF.
View all literature mentionsSoftware ultrafast memory efficient tool for aligning sequencing reads. Bowtie is short read aligner.
View all literature mentionsSoftware program for phylogenetic analyses of large datasets under maximum likelihood.
View all literature mentionsSoftware package as multiple alignment program for amino acid or nucleotide sequences. Can align up to 500 sequences or maximum file size of 1 MB. First version of MAFFT used algorithm based on progressive alignment, in which sequences were clustered with help of Fast Fourier Transform. Subsequent versions have added other algorithms and modes of operation, including options for faster alignment of large numbers of sequences, higher accuracy alignments, alignment of non-coding RNA sequences, and addition of new sequences to existing alignments.
View all literature mentionsSource code that infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. It uses the Jukes-Cantor or generalized time-reversible (GTR) models of nucleotide evolution and the JTT, WAG, or LG models of amino acid evolution.
View all literature mentionsSoftware tool as partner script to the popular ncbi-genome-download script. Allows to download sequences from GenBank/RefSeq by accession through the NCBI ENTREZ API.
View all literature mentions