Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding some info about Nextstrain clades to homepage #13

Merged
merged 3 commits into from
Dec 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 16 additions & 3 deletions pages/index.qmd
Original file line number Diff line number Diff line change
@@ -1,18 +1,31 @@
---
title: "Introduction"
include-after-body:
file: includes/after-body.html
---



The United States SARS-CoV-2 Variant Nowcast Hub has been designed by researchers from the [US CDC Center of Forecasting and Outbreak Analytics (CFA)](https://www.cdc.gov/forecast-outbreak-analytics/index.html) and [the Reich Lab at UMass Amherst](https://reichlab.io/), in consultation with folks from [the NextStrain project](https://nextstrain.org/).
The United States SARS-CoV-2 Variant Nowcast Hub has been designed by researchers from the [US CDC Center of Forecasting and Outbreak Analytics (CFA)](https://www.cdc.gov/forecast-outbreak-analytics/index.html) and [the Reich Lab at UMass Amherst](https://reichlab.io/), in consultation with [the NextStrain project](https://nextstrain.org/) team.

Teams may submit predictions of the prevalence of the predominant SARS-CoV-2 clades in the US, at a daily timescale and the geographic resolution of all 50 United States plus Washington, DC and Puerto Rico.

### Predicted clades
## Nextstrain clades and pango lineages

[Nextstrain clade names](https://docs.nextstrain.org/projects/ncov/en/latest/reference/naming_clades.html), such as "19A" or "24H", "label genetically well defined clades that have reached significant frequency and geographic spread". The Variant Nowcast Hub uses Nextstrain clades, although model outputs, as shown in the model reports, are translated so that clades are matched with pango lineages. One reason the hub chose to use the Nextstrain clades as the grouping for the modeling is that the broader "clade" grouping allows for a smaller number of categories to be modeled in any given week compared with pango lineages, which do not have a similar high-level grouping.

Nextstrain provides [an interactive phylogenetic tree for SARS-CoV-2](https://nextstrain.org/nextclade/sars-cov-2), with Nextstrain clade labels annotating branches of the tree, and each sequence receiving both a pango lineage and Nextstrain clade assignment.

The following list provides a partial history of Nextstrain clade announcements, which include detailed discussion about the origin and genetic differences with other lineages and clades:

- [24H (LF.7) and 24I (MV.1)](https://github.com/nextstrain/ncov/pull/1158) (18 Oct 2024)
- [24F (XEC) and 24G (KP.2.3)](https://github.com/nextstrain/ncov/pull/1152) (25 Sep 2024)

nickreich marked this conversation as resolved.
Show resolved Hide resolved
## How does the hub determine which clades to predict?

Each week the hub designates up to nine NextStrain clades with the highest reported prevalence of at least 1% across the US in any of the three complete [USA/CDC epidemiological weeks](https://ndc.services.cdc.gov/wp-content/uploads/MMWR_Week_overview.pdf) (a.k.a. MMWR weeks) preceding the Wednesday submission date. Any clades with prevalence of less than 1% are grouped into an “other” category for which predictions of combined prevalence are also collected. No more than 10 clades (including “other”) are selected in a given week.

### Prediction horizon
## What dates are predictions available for?

Genomic sequences tend to be reported weeks after being collected. Therefore, recent data is subject to quite a lot of backfill. For this reason, the hub collects "nowcasts" (predictions for data relevant to times prior to the current time, but not yet observed) and some "forecasts" (predictions for future observations). Counting the Wednesday submission date as a prediction horizon of zero, we collect daily-level predictions for 10 days into the future (the Saturday that ends the epidemic week after the Wednesday submission) and -31 days into the past (the Sunday that starts the epidemic week four weeks prior to the Wednesday submission date). Overall, six weeks (42 days) of predicted values are solicited each week.