-
Notifications
You must be signed in to change notification settings - Fork 0
EVE Nomenclature
In DIGS-for-EVEs, we apply a systematic nomenclature to endogenous viral elements (EVEs) to ensure a consistent, standardized approach to identifying and discussing these sequences. The EVE IDs are designed to provide detailed information about the origin and characteristics of each element in a concise format. Below, we outline how this nomenclature is constructed and how it should be used in practice.
Each EVE ID in DIGS-for-EVEs is constructed from a defined set of components, organized to provide information about the virus family, subgroup, host species, and the unique insertion event. The ID components are as follows:
-
Classifier: This component usually denotes the virus family or group from which the EVE derives. For example:
-
ERV
for endogenous retroviruses. -
eHBV
for endogenous hepadnaviruses. -
EBLL
for endogenous bornavirus-like L-proteins.
Classifiers generally follow established conventions as closely as possible. In the case of segmented viruses, the classifier may also include a gene or segment identifier, such as
EBLN
(endogenous borna-like nucleoprotein). -
-
Subgroup and Numeric ID: This is a composite of two distinct subcomponents:
- The subgroup denotes the specific virus subgroup the EVE derives from, such as
Cultervirus
. - The numeric ID is an integer that uniquely identifies the insertion locus associated with the initial germline infection. Orthologous copies in different species retain the same numeric ID to reflect their shared origin.
An example might be:
Cultervirus.10
, where10
represents the unique insertion event. - The subgroup denotes the specific virus subgroup the EVE derives from, such as
-
Duplicate ID (if applicable): If an EVE sequence has been duplicated within the germline (e.g., via segmental duplication or transposition), an additional 'duplicate ID' is appended to the numeric ID, separated by a period. This indicates multiple copies derived from the same insertion event.
-
Host Species Identifier: This component specifies the host species in which the EVE occurs. The full scientific name of the host species is used for clarity:
- For example:
Myotis_daubentonii
for the bat species. - In contexts where species abbreviations are clear and unambiguous, a shortened form like
MyoDau
may be used.
- For example:
An example of a complete EVE ID is:
EBLL-Cultervirus.10-Myotis_daubentonii
This ID indicates an endogenous bornavirus-like L-protein element derived from the Cultervirus subgroup, found in the bat species Myotis daubentonii, with a unique numeric identifier of 10
.
When referencing EVEs, always start by using the full identifier (ID) to provide clear and unambiguous information. For example:
EBLL-Cultervirus.10-Myotis_daubentonii
This full ID provides all the necessary details for identifying the EVE, including its virus origin, insertion event, and host species.
In contexts where a focus on specific host species is implied, a slightly abbreviated form can be used:
EBLL-Cultervirus.10-MyoDau
Note: For official records, always use the unabbreviated host species name to ensure that the taxonomy is clear and precise. 2. Abbreviate IDs to Facilitate Clear Discussion
While full-form EVE IDs are essential for the initial reference, they can be cumbersome in extended discussions. It is advisable to abbreviate or shorten the IDs once the context has been established.
Examples of abbreviations:
- If the genus Cultervirus is referred to as CV, the ID can be shortened to:
EBLL-CV.10-MyoDau
- Further compression can occur if a two-letter abbreviation like
Md
is sufficient to identify the host species:
EBLL-Cultervirus.10-Md
- In discussions specifically about EBLL elements, the classifier can be omitted if it becomes redundant:
CV.10-MyoDau
or
CV.10-Md
These shortened forms are permissible as long as they unambiguously refer to the specific EVE element in the given context.
It is crucial to distinguish between species-specific copies of an EVE (referred to as EVE alleles) and the broader EVE locus that may contain orthologous copies across multiple species.
For example, a species-specific copy might be represented as:
DIGS-for-EVEs by Robert J Gifford Lab.
For questions, issues, or feedback, please open an issue on the GitHub repository.