Refactor _fit
Method of GaussianCopulaSynthesizer
for Modularity
#2267
Labels
internal
The issue doesn't change the API or functionality
maintenance
Tasks related to infrastructure & dependencies
Description
To improve code reuse and maintainability in the SDV library, the
_fit
method of theGaussianCopulaSynthesizer
class should be modularized by splitting it into multiple, well-defined functions. This will make the code easier to extend.We need to break down the
_fit
method into smaller steps. These steps should be implemented as separate functions to handle specific tasks within the fitting process.Expected Steps
Log Numerical Distributions: Keep the existing call to
log_numerical_distributions_error
as a standalone function. This step will remain unchanged.Learn Number of Rows: Move the logic for determining the number of rows (
self._num_rows = len(processed_data)
) into a new method, e.g.,self._learn_num_rows
.Extract Numerical Distributions for Modeling: Create a new method to extract numerical distributions for modeling. The logic inside the
for
loop that assigns distributions to each column should be refactored into a method, e.g.,self._get_numerical_distributions
.Initialize the Model: Move the logic for initializing the model (
self._model = GaussianMultivariate(...)
) to its own method, e.g.,self._initialize_model
.Fit the Model: Finally, create a new method to encapsulate the logic for fitting the model with scipy warnings handling, e.g.,
self._fit_model
.The text was updated successfully, but these errors were encountered: