Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor _fit Method of GaussianCopulaSynthesizer for Modularity #2267

Open
pvk-developer opened this issue Oct 22, 2024 · 0 comments
Open
Labels
internal The issue doesn't change the API or functionality maintenance Tasks related to infrastructure & dependencies

Comments

@pvk-developer
Copy link
Member

Description

To improve code reuse and maintainability in the SDV library, the _fit method of the GaussianCopulaSynthesizer class should be modularized by splitting it into multiple, well-defined functions. This will make the code easier to extend.

We need to break down the _fit method into smaller steps. These steps should be implemented as separate functions to handle specific tasks within the fitting process.

Expected Steps

  • Log Numerical Distributions: Keep the existing call to log_numerical_distributions_error as a standalone function. This step will remain unchanged.

  • Learn Number of Rows: Move the logic for determining the number of rows (self._num_rows = len(processed_data)) into a new method, e.g., self._learn_num_rows.

  • Extract Numerical Distributions for Modeling: Create a new method to extract numerical distributions for modeling. The logic inside the for loop that assigns distributions to each column should be refactored into a method, e.g., self._get_numerical_distributions.

  • Initialize the Model: Move the logic for initializing the model (self._model = GaussianMultivariate(...)) to its own method, e.g., self._initialize_model.

  • Fit the Model: Finally, create a new method to encapsulate the logic for fitting the model with scipy warnings handling, e.g., self._fit_model.

@pvk-developer pvk-developer added internal The issue doesn't change the API or functionality maintenance Tasks related to infrastructure & dependencies labels Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal The issue doesn't change the API or functionality maintenance Tasks related to infrastructure & dependencies
Projects
None yet
Development

No branches or pull requests

1 participant