-
Notifications
You must be signed in to change notification settings - Fork 23
Reference Sequences
We strongly recommend the use of genomic reference sequences containing proper annotation for optimal use of Mutalyzer's capabilities to generate descriptions for all transcripts and protein isoforms of the gene(s) affected by the sequence variation.
Mutalyzer accepts the following reference sequences:
- GenBank files
GenBank records (e.g., NG_007400.1
) are specified by a GenBank accession
number (NG_007400
) and a version number (.1
). Omission of the version
number automatically results in selection of the most recent version of that
record. In case of outdated versions, Mutalyzer will issue a warning.
Alternatively, the unique GenInfo identifier (gi) of the reference sequence
(e.g., 4506864) can be used with or without the letters gi.
Mutalyzer does not accept GenBank records containing no sequence (e.g. chromosomal reference sequence identifiers referring to contig accession numbers) or files larger than 10 MB. Mutalyzer also accepts user-defined files in GenBank format, including slices of chromosomal reference sequences. These files are specified by unique UD identifiers, which are returned by Mutalyzer after upload (See the Reference File Loader section for more information).
- LRG files
Locus Reference Genomic (LRG) files containing uniquely and stable reference
DNA sequences along with all relevant transcript and protein sequences
essential to the description of gene variants (see the
LRG website for more information). LRG files
are based on NCBI's
RefSeqGene project and created in
collaboration with the community of research and diagnostic labs,
LSDB curators and mutation consortia. LRG
files are specified by the prefix "LRG_" followed by a number (e.g., LRG_1
).
The LRG website lists existing LRG sequences and has an FTP site for
downloading LRGs. To maintain LRG stability, Mutalyzer's Reference File Loader
does not accept user-defined LRG files.
The Human Genome Sequence Variation Society (HGVS) recommends the use of LRGs or RefSeqGene reference sequences in gene variant databases and clinical reports. The quality of chromosomal reference sequence has been improved, but Bio-IT's interview with Deanna Church about the past, present and future of the reference genome illustrates that the next human genome build hg20 (GRCh38) (release planned in early August) will not solve all remaining challenges.