-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replaced pooled global Aes with population average Aes #155
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
standage
commented
Sep 25, 2024
Comment on lines
-141
to
-158
agg_tallies = defaultdict(Counter) | ||
for n, row in haplotypes.iterrows(): | ||
for haplokey in ("Haplotype1", "Haplotype2"): | ||
mhallele = [row[haplokey]] | ||
if not pd.isna(mhallele): | ||
# The following line could arguably be moved into the conditional block below to | ||
# excluded admixed individuals from the aggregate haplotype tallies. But as of | ||
# today, I think including them in the aggregate totals is appropriate. | ||
# -- DSS, 2023-02-28. | ||
agg_tallies[row["Marker"]].update(mhallele) | ||
pop_tallies[row["Marker"]][row["Population"]].update(mhallele) | ||
if row["Population"] not in admixed: | ||
pop_tallies[row["Marker"]][row["Superpopulation"]].update(mhallele) | ||
for marker, popcounts in sorted(pop_tallies.items()): | ||
total_count = sum(agg_tallies[marker].values()) | ||
for mhallele, agg_count in sorted(agg_tallies[marker].items()): | ||
freq = agg_count / total_count | ||
yield marker, "1KGP", mhallele, freq, total_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to calculate pooled frequencies any more.
Comment on lines
+158
to
+169
superpops = ("AFR", "AMR", "EAS", "EUR", "SAS") | ||
for marker, marker_data in frequencies.groupby("Marker"): | ||
population_aes = list() | ||
for population, pop_data in marker_data.groupby("Population"): | ||
ae = 1.0 / sum([f**2 for f in pop_data.Frequency]) | ||
entry = (marker, population, ae) | ||
aes.append(entry) | ||
if population not in superpops: | ||
population_aes.append(ae) | ||
avg_ae = sum(population_aes) / len(population_aes) | ||
entry = (marker, "1KGP", avg_ae) | ||
aes.append(entry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead, we average population-level Ae values here.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The default Ae values currently in MicroHapDB are calculated from a pool of all observed haplotypes in the entire global 26-population data set. This PR changes the build process so that the default values are computed as the mean of the 26 population-specific Ae values, rather than from the entire pool.