Rework introduction.rst (#1110)

scikit-learn-contrib · Dec 20, 2024 · bc94b25 · bc94b25
1 parent 03078d5
commit bc94b25
Showing 1 changed file with 34 additions and 29 deletions.
diff --git a/doc/introduction.rst b/doc/introduction.rst
@@ -9,41 +9,45 @@ Introduction
 API's of imbalanced-learn samplers
 ----------------------------------
 
-The available samplers follows the scikit-learn API using the base estimator
-and adding a sampling functionality through the ``sample`` method:
+The available samplers follow the
+`scikit-learn API <https://scikit-learn.org/stable/getting_started.html#fitting-and-predicting-estimator-basics>`_
+using the base estimator
+and incorporating a sampling functionality via the ``sample`` method:
 
 :Estimator:
 
-    The base object, implements a ``fit`` method to learn from data, either::
+    The base object, implements a ``fit`` method to learn from data::
 
       estimator = obj.fit(data, targets)
 
 :Resampler:
 
-    To resample a data sets, each sampler implements::
+    To resample a data sets, each sampler implements a ``fit_resample`` method::
 
       data_resampled, targets_resampled = obj.fit_resample(data, targets)
 
-Imbalanced-learn samplers accept the same inputs that in scikit-learn:
+Imbalanced-learn samplers accept the same inputs as scikit-learn estimators:
 
-* `data`:
-   * 2-D :class:`list`,
-   * 2-D :class:`numpy.ndarray`,
-   * :class:`pandas.DataFrame`,
-   * :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;
-* `targets`:
-   * 1-D :class:`numpy.ndarray`,
-   * :class:`pandas.Series`.
+* `data`, 2-dimensional array-like structures, such as:
+   * Python's list of lists :class:`list`,
+   * Numpy arrays :class:`numpy.ndarray`,
+   * Panda dataframes :class:`pandas.DataFrame`,
+   * Scipy sparse matrices :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;
+
+* `targets`, 1-dimensional array-like structures, such as:
+   * Numpy arrays :class:`numpy.ndarray`,
+   * Pandas series :class:`pandas.Series`.
 
 The output will be of the following type:
 
-* `data_resampled`:
-   * 2-D :class:`numpy.ndarray`,
-   * :class:`pandas.DataFrame`,
-   * :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;
-* `targets_resampled`:
-   * 1-D :class:`numpy.ndarray`,
-   * :class:`pandas.Series`.
+* `data_resampled`, 2-dimensional aray-like structures, such as:
+   * Numpy arrays :class:`numpy.ndarray`,
+   * Pandas dataframes :class:`pandas.DataFrame`,
+   * Scipy sparse matrices :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;
+
+* `targets_resampled`, 1-dimensional array-like structures, such as:
+   * Numpy arrays :class:`numpy.ndarray`,
+   * Pandas series :class:`pandas.Series`.
 
 .. topic:: Pandas in/out
 
@@ -62,18 +66,19 @@ The output will be of the following type:
 Problem statement regarding imbalanced data sets
 ------------------------------------------------
 
-The learning phase and the subsequent prediction of machine learning algorithms
-can be affected by the problem of imbalanced data set. The balancing issue
-corresponds to the difference of the number of samples in the different
-classes. We illustrate the effect of training a linear SVM classifier with
-different levels of class balancing.
+The learning and prediction phrases of machine learning algorithms
+can be impacted by the issue of **imbalanced datasets**. This imbalance
+refers to the difference in the number of samples across different classes.
+We demonstrate the effect of training a `Logistic Regression classifier
+<https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html>`_
+with varying levels of class balancing by adjusting their weights.
 
 .. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_001.png
    :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
    :scale: 60
    :align: center
 
-As expected, the decision function of the linear SVM varies greatly depending
-upon how imbalanced the data is. With a greater imbalanced ratio, the decision
-function favors the class with the larger number of samples, usually referred
-as the majority class.
+As expected, the decision function of the Logistic Regression classifier varies significantly
+depending on how imbalanced the data is. With a greater imbalance ratio, the decision function
+tends to favour the class with the larger number of samples, usually referred to as the
+**majority class**.