The flowering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegans--the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the first complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-specific gene functions and establishing rapid systematic ways to identify genes for crop improvement.
Pubmed ID: 11130711 RIS Download
Mesh terms: Animals | Arabidopsis | Biological Transport | Cell Membrane | Cell Nucleus | Centromere | Chloroplasts | Chromosome Mapping | DNA Repair | DNA Transposable Elements | DNA, Plant | DNA, Ribosomal | Gene Duplication | Gene Expression Regulation, Plant | Genome, Plant | Humans | Light | Mitochondria | Photosynthesis | Plant Diseases | Proteome | Recombination, Genetic | Repetitive Sequences, Nucleic Acid | Sequence Analysis, DNA | Signal Transduction | Species Specificity | Telomere
Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.