Consortium that puts sequences into a chromosome context and provides the best possible reference assembly for human, mouse, and zebrafish via FTP. The consortium does this by both generating multiple representations (alternate loci) for regions that are too complex to be represented by a single path and by releasing regional fixes known as patches. This allows users who are interested in a specific locus to get an improved representation without affecting users who need chromosome coordinate stability. This resource additionally provides mechanisms by which the scientific community can report loci in need of further review.
The GRC has built tools to facilitate the curation of genome assemblies based on the sequence overlaps of long, high quality sequences (Clones and PCR products, not short sequence reads). The GRC currently supports production of assemblies for human, mouse or zebrafish. If your assembly data fits this model and you are interested in using these tools please contact us using the 'Contact Us' page.
The human genome assembly was produced as part of the Human Genome Project (HGP). The previous assembly (NCBI36) was the last one produced by the HGP and was described in 2004 (PMID: 15496913); this was the starting point for the GRC. The assembly is based largely on assembling overlapping clone sequences.
The GRC has produced an updated assembly (GRCm38). This is an update of the last MGSC assembly (MGSCv37) which was described in 2004(PMID: 19468303). The primary assembly is based on assembling overlapping BAC clones derived from the C57BL/6J strain and several loci have sequence available from other strains.
The zebrafish genome assembly was produced at the Sanger Institute. The last assembly produced from the original project was Zv9 and will be described in 2010. This assembly is the starting point for the GRC. The assembly is based on assembling overlapping BAC clones and integrating these sequences with the whole genome shotgun assembly.
A set of TPF files are maintained for each assembled chromosome and partial assembly. These files are stored in a central database that manages TPF tracking and validation. Sequences (also known as components) which are adjacent on the TPF are expected to have a specific type of sequence alignment known as a full dovetail. A program call 'find_overlaps' assesses all adjacent component sequences to determine if they have an appropriate overlap.
SciCrunch is a data sharing and display platform. Anyone can create a custom portal where they can select searchable subsets of hundreds of data sources, brand their web pages and create their community. SciCrunch will push data updates automatically to all portals on a weekly basis. User communities can also add their own data to scicrunch, however this is not currently a free service.