Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Likelihood of the generated data regarding the real data #401

Open
celsofranssa opened this issue Sep 29, 2024 · 2 comments
Open

Likelihood of the generated data regarding the real data #401

celsofranssa opened this issue Sep 29, 2024 · 2 comments
Labels
question General question about the software under discussion Issue is currently being discussed

Comments

@celsofranssa
Copy link

What metrics could we apply to measure the likelihood of the generated data regarding the real data?

@celsofranssa celsofranssa added new Label applied to new issues question General question about the software labels Sep 29, 2024
@19956406179
Copy link

I also want to know, if I just: pip install ctgan, and then write a piece of code to use CTGAN to generate synthetic data for my dataset, how can I evaluate whether the generated data is good or bad

@npatki
Copy link
Contributor

npatki commented Oct 11, 2024

Hi @celsofranssa and @19956406179, great to see interest in CTGAN.

I would recommend looking into the SDMetrics library for anything metrics-related. There is a Quality Report you can run for a general estimate of data quality, visualization utilities, and other speciality metrics such as Data likelihood.

BTW -- while you are welcome to try using CTGAN as a standalone library, we actually recommend using it via the SDV library instead. Within SDV, the CTGANSynthesizer is a wrapper around this one, but it allows additional features such as data pre-processing, as well as more convenient visualizations, metrics, etc.

@npatki npatki added under discussion Issue is currently being discussed and removed new Label applied to new issues labels Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question about the software under discussion Issue is currently being discussed
Projects
None yet
Development

No branches or pull requests

3 participants