Skip to content

Commit

Permalink
Rework introduction.rst (#1110)
Browse files Browse the repository at this point in the history
  • Loading branch information
virchan authored Dec 20, 2024
1 parent 03078d5 commit bc94b25
Showing 1 changed file with 34 additions and 29 deletions.
63 changes: 34 additions & 29 deletions doc/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,41 +9,45 @@ Introduction
API's of imbalanced-learn samplers
----------------------------------

The available samplers follows the scikit-learn API using the base estimator
and adding a sampling functionality through the ``sample`` method:
The available samplers follow the
`scikit-learn API <https://scikit-learn.org/stable/getting_started.html#fitting-and-predicting-estimator-basics>`_
using the base estimator
and incorporating a sampling functionality via the ``sample`` method:

:Estimator:

The base object, implements a ``fit`` method to learn from data, either::
The base object, implements a ``fit`` method to learn from data::

estimator = obj.fit(data, targets)

:Resampler:

To resample a data sets, each sampler implements::
To resample a data sets, each sampler implements a ``fit_resample`` method::

data_resampled, targets_resampled = obj.fit_resample(data, targets)

Imbalanced-learn samplers accept the same inputs that in scikit-learn:
Imbalanced-learn samplers accept the same inputs as scikit-learn estimators:

* `data`:
* 2-D :class:`list`,
* 2-D :class:`numpy.ndarray`,
* :class:`pandas.DataFrame`,
* :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;
* `targets`:
* 1-D :class:`numpy.ndarray`,
* :class:`pandas.Series`.
* `data`, 2-dimensional array-like structures, such as:
* Python's list of lists :class:`list`,
* Numpy arrays :class:`numpy.ndarray`,
* Panda dataframes :class:`pandas.DataFrame`,
* Scipy sparse matrices :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;

* `targets`, 1-dimensional array-like structures, such as:
* Numpy arrays :class:`numpy.ndarray`,
* Pandas series :class:`pandas.Series`.

The output will be of the following type:

* `data_resampled`:
* 2-D :class:`numpy.ndarray`,
* :class:`pandas.DataFrame`,
* :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;
* `targets_resampled`:
* 1-D :class:`numpy.ndarray`,
* :class:`pandas.Series`.
* `data_resampled`, 2-dimensional aray-like structures, such as:
* Numpy arrays :class:`numpy.ndarray`,
* Pandas dataframes :class:`pandas.DataFrame`,
* Scipy sparse matrices :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;

* `targets_resampled`, 1-dimensional array-like structures, such as:
* Numpy arrays :class:`numpy.ndarray`,
* Pandas series :class:`pandas.Series`.

.. topic:: Pandas in/out

Expand All @@ -62,18 +66,19 @@ The output will be of the following type:
Problem statement regarding imbalanced data sets
------------------------------------------------

The learning phase and the subsequent prediction of machine learning algorithms
can be affected by the problem of imbalanced data set. The balancing issue
corresponds to the difference of the number of samples in the different
classes. We illustrate the effect of training a linear SVM classifier with
different levels of class balancing.
The learning and prediction phrases of machine learning algorithms
can be impacted by the issue of **imbalanced datasets**. This imbalance
refers to the difference in the number of samples across different classes.
We demonstrate the effect of training a `Logistic Regression classifier
<https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html>`_
with varying levels of class balancing by adjusting their weights.

.. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_001.png
:target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
:scale: 60
:align: center

As expected, the decision function of the linear SVM varies greatly depending
upon how imbalanced the data is. With a greater imbalanced ratio, the decision
function favors the class with the larger number of samples, usually referred
as the majority class.
As expected, the decision function of the Logistic Regression classifier varies significantly
depending on how imbalanced the data is. With a greater imbalance ratio, the decision function
tends to favour the class with the larger number of samples, usually referred to as the
**majority class**.

0 comments on commit bc94b25

Please sign in to comment.