The ancestor of cetaceans underwent a macroevolutionary transition from land to water early in the Eocene Period >50 million years ago. However, little is known about how diverse retroviruses evolved during this shift from terrestrial to aquatic environments. Did retroviruses transition into water accompanying their hosts? Did retroviruses infect cetaceans through cross-species transmission after cetaceans invaded the aquatic environments? Endogenous retroviruses (ERVs) provide important molecular fossils for tracing the evolution of retroviruses during this macroevolutionary transition. Here, we use a phylogenomic approach to study the origin and evolution of ERVs in cetaceans. We identify a total of 8,724 ERVs within the genomes of 25 cetaceans, and phylogenetic analyses suggest these ERVs cluster into 315 independent lineages, each of which represents one or more independent endogenization events. We find that cetacean ERVs originated through two possible routes. 298 ERV lineages may derive from retrovirus endogenization that occurred before or during the transition from land to water of cetaceans, and most of these cetacean ERVs were reaching evolutionary dead-ends. 17 ERV lineages are likely to arise from independent retrovirus endogenization events that occurred after the split of mysticetes and odontocetes, indicating that diverse retroviruses infected cetaceans through cross-species transmission from non-cetacean mammals after the transition to aquatic life of cetaceans. Both integration time and synteny analyses support the recent or ongoing activity of multiple retroviral lineages in cetaceans, some of which proliferated into hundreds of copies within the host genomes. Although ERVs only recorded a proportion of past retroviral infections, our findings illuminate the complex evolution of retroviruses during one of the most marked macroevolutionary transitions in vertebrate history.
Pubmed ID: 34252162 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
Software tool to search for open reading frames (ORFs) in the DNA sequence. The program returns the range of each ORF, along with its protein translation. Used to search newly sequenced DNA for potential protein encoding segments, verify predicted protein. Limited to the subrange of the query sequence up to 50 kb long.
View all literature mentionsWeb application to search nucleotide databases using a nucleotide query. Algorithms: blastn, megablast, discontiguous megablast.
View all literature mentionsSoftware package as multiple alignment program for amino acid or nucleotide sequences. Can align up to 500 sequences or maximum file size of 1 MB. First version of MAFFT used algorithm based on progressive alignment, in which sequences were clustered with help of Fast Fourier Transform. Subsequent versions have added other algorithms and modes of operation, including options for faster alignment of large numbers of sequences, higher accuracy alignments, alignment of non-coding RNA sequences, and addition of new sequences to existing alignments.
View all literature mentionsMultiple sequence alignment method with reduced time and space complexity.Multiple sequence alignment with high accuracy and high throughput. Data analysis service for multiple sequence comparison by log- expectation.
View all literature mentionsTool to search translated nucleotide databases using a protein query.
View all literature mentionsPackage of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. PAML estimates parameters and tests hypotheses to study the evolutionary process from a phylogenetic tree.
View all literature mentionsSource code that infers approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide or protein sequences. It uses the Jukes-Cantor or generalized time-reversible (GTR) models of nucleotide evolution and the JTT, WAG, or LG models of amino acid evolution.
View all literature mentionsPublic knowledge base for information on evolutionary timescale of life. Data from thousands of published studies are assembled into searchable tree of life scaled to time.
View all literature mentions