Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support distributions in GaussianCopulaSynthesizer that better capture extreme values #2258

Open
srinify opened this issue Oct 8, 2024 · 1 comment
Labels
feature: modeling Related to training the model itself feature request Request for a new feature

Comments

@srinify
Copy link
Contributor

srinify commented Oct 8, 2024

Problem Description

If a column's values have a few extreme values, we don't have a clear distribution in GaussianCopulaSynthesizer to recommend. One example is a 'horseshoe distribution', image borrowed from this blog.

image

Originally suggested here: #2240

We currently support the norm, beta, truncnorm, uniform, gamma, and gaussian_kde

@srinify srinify added feature request Request for a new feature new Automatic label applied to new issues feature: modeling Related to training the model itself and removed new Automatic label applied to new issues labels Oct 8, 2024
@npatki
Copy link
Contributor

npatki commented Oct 8, 2024

Just a note that the beta distribution can take on a "horseshoe-like" shape when parameters alpha and beta are both <1. For an example, see the wikipedia article.

SDV is designed to estimate parameters based on the shape of the real data itself. If the desire is to artificially synthesize extreme values (diverging from the real data), then conditional sampling is the recommended approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature: modeling Related to training the model itself feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants