The evolution of land flora transformed the terrestrial environment. Land plants evolved from an ancestral charophycean alga from which they inherited developmental, biochemical, and cell biological attributes. Additional biochemical and physiological adaptations to land, and a life cycle with an alternation between multicellular haploid and diploid generations that facilitated efficient dispersal of desiccation tolerant spores, evolved in the ancestral land plant. We analyzed the genome of the liverwort Marchantia polymorpha, a member of a basal land plant lineage. Relative to charophycean algae, land plant genomes are characterized by genes encoding novel biochemical pathways, new phytohormone signaling pathways (notably auxin), expanded repertoires of signaling pathways, and increased diversity in some transcription factor families. Compared with other sequenced land plants, M. polymorpha exhibits low genetic redundancy in most regulatory pathways, with this portion of its genome resembling that predicted for the ancestral land plant. PAPERCLIP.
Pubmed ID: 28985561 RIS Download
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.
Database of protein families and domains that is based on the observation that, while there is a huge number of different proteins, most of them can be grouped, on the basis of similarities in their sequences, into a limited number of families. Proteins or protein domains belonging to a particular family generally share functional attributes and are derived from a common ancestor. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. ScanProsite finds matches of your protein sequences to PROSITE signatures. PROSITE currently contains patterns and profiles specific for more than a thousand protein families or domains. Each of these signatures comes with documentation providing background information on the structure and function of these proteins. The database is available via FTP.
View all literature mentionsSoftware performing alignment of high-throughput RNA-seq data. Aligns RNA-seq reads to reference genome using uncompressed suffix arrays.
View all literature mentionsDatabase of genetic and molecular biology data for the model higher plant Arabidopsis thaliana. Data available includes the complete genome sequence along with gene structure, gene product information, metabolism, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publications, and information about the Arabidopsis research community. Gene product function data is updated every two weeks from the latest published research literature and community data submissions. Gene structures are updated 1-2 times per year using computational and manual methods as well as community submissions of new and updated genes. TAIR also provides extensive linkouts from data pages to other Arabidopsis resources. The data can be searched, viewed and analyzed. Datasets can also be downloaded. Pages on news, job postings, conference announcements, Arabidopsis lab protocols, and useful links are provided.
View all literature mentionsA database of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). Users can analyze protein sequences for Pfam matches, view Pfam family annotation and alignments, see groups of related families, look at the domain organization of a protein sequence, find the domains on a PDB structure, and query Pfam by keywords. There are two components to Pfam: Pfam-A and Pfam-B. Pfam-A entries are high quality, manually curated families that may automatically generate a supplement using the ADDA database. These automatically generated entries are called Pfam-B. Although of lower quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A entries are found. Pfam also generates higher-level groupings of related families, known as clans (collections of Pfam-A entries which are related by similarity of sequence, structure or profile-HMM).
View all literature mentionsSoftware program for phylogenetic analyses of large datasets under maximum likelihood.
View all literature mentionsSoftware for aligning sequencing reads against large reference genome. Consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. First for sequence reads up to 100bp, and other two for longer sequences ranged from 70bp to 1Mbp.
View all literature mentionsMultiple sequence alignment method with reduced time and space complexity.Multiple sequence alignment with high accuracy and high throughput. Data analysis service for multiple sequence comparison by log- expectation.
View all literature mentionsA generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms.
View all literature mentionsDatabase that describes the families of structurally-related catalytic and carbohydrate-binding modules (or functional domains) of enzymes that degrade, modify, or create glycosidic bonds. This specialist database is dedicated to the display and analysis of genomic, structural and biochemical information on Carbohydrate-Active Enzymes (CAZymes). CAZy data are accessible either by browsing sequence-based families or by browsing the content of genomes in carbohydrate-active enzymes. New genomes are added regularly shortly after they appear in the daily releases of GenBank. New families are created based on published evidence for the activity of at least one member of the family and all families are regularly updated, both in content and in description. An original aspect of the CAZy database is its attempt to cover all carbohydrate-active enzymes across organisms and across subfields of glycosciences. One can search for CAZY Family pages using the Protein Accession (Genpept Accession, Uniprot Accession or PDB ID), Cazy family name or EC number. In addition, genomes can be searched using the NCBI TaxID. This search can be complemented by Google-based searches on the CAZy site.
View all literature mentionsSoftware tool that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library ) and RepBase ( consensus sequence library ).
View all literature mentionsPortal providing access to all JGI genomic databases and analytical tools, sequencing projects and their status, search for and download assemblies and annotations of sequenced genomes, and interactively explore those genomes and compare them with other sequenced microbes, fungi, plants or metagenomes using specialized systems tailored to each particular class of organisms. The Department of Energy (DOE) Joint Genome Institute (JGI) is a national user facility with massive-scale DNA sequencing and analysis capabilities dedicated to advancing genomics for bioenergy and environmental applications. Beyond generating tens of trillions of DNA bases annually, the Institute develops and maintains data management systems and specialized analytical capabilities to manage and interpret complex genomic data sets, and to enable an expanding community of users around the world to analyze these data in different contexts over the web.
View all literature mentions