diff --git a/doc/introduction.rst b/doc/introduction.rst index bc0e2bb61..5e9f54686 100644 --- a/doc/introduction.rst +++ b/doc/introduction.rst @@ -9,41 +9,45 @@ Introduction API's of imbalanced-learn samplers ---------------------------------- -The available samplers follows the scikit-learn API using the base estimator -and adding a sampling functionality through the ``sample`` method: +The available samplers follow the +`scikit-learn API `_ +using the base estimator +and incorporating a sampling functionality via the ``sample`` method: :Estimator: - The base object, implements a ``fit`` method to learn from data, either:: + The base object, implements a ``fit`` method to learn from data:: estimator = obj.fit(data, targets) :Resampler: - To resample a data sets, each sampler implements:: + To resample a data sets, each sampler implements a ``fit_resample`` method:: data_resampled, targets_resampled = obj.fit_resample(data, targets) -Imbalanced-learn samplers accept the same inputs that in scikit-learn: +Imbalanced-learn samplers accept the same inputs as scikit-learn estimators: -* `data`: - * 2-D :class:`list`, - * 2-D :class:`numpy.ndarray`, - * :class:`pandas.DataFrame`, - * :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`; -* `targets`: - * 1-D :class:`numpy.ndarray`, - * :class:`pandas.Series`. +* `data`, 2-dimensional array-like structures, such as: + * Python's list of lists :class:`list`, + * Numpy arrays :class:`numpy.ndarray`, + * Panda dataframes :class:`pandas.DataFrame`, + * Scipy sparse matrices :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`; + +* `targets`, 1-dimensional array-like structures, such as: + * Numpy arrays :class:`numpy.ndarray`, + * Pandas series :class:`pandas.Series`. The output will be of the following type: -* `data_resampled`: - * 2-D :class:`numpy.ndarray`, - * :class:`pandas.DataFrame`, - * :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`; -* `targets_resampled`: - * 1-D :class:`numpy.ndarray`, - * :class:`pandas.Series`. +* `data_resampled`, 2-dimensional aray-like structures, such as: + * Numpy arrays :class:`numpy.ndarray`, + * Pandas dataframes :class:`pandas.DataFrame`, + * Scipy sparse matrices :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`; + +* `targets_resampled`, 1-dimensional array-like structures, such as: + * Numpy arrays :class:`numpy.ndarray`, + * Pandas series :class:`pandas.Series`. .. topic:: Pandas in/out @@ -62,18 +66,19 @@ The output will be of the following type: Problem statement regarding imbalanced data sets ------------------------------------------------ -The learning phase and the subsequent prediction of machine learning algorithms -can be affected by the problem of imbalanced data set. The balancing issue -corresponds to the difference of the number of samples in the different -classes. We illustrate the effect of training a linear SVM classifier with -different levels of class balancing. +The learning and prediction phrases of machine learning algorithms +can be impacted by the issue of **imbalanced datasets**. This imbalance +refers to the difference in the number of samples across different classes. +We demonstrate the effect of training a `Logistic Regression classifier +`_ +with varying levels of class balancing by adjusting their weights. .. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_001.png :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html :scale: 60 :align: center -As expected, the decision function of the linear SVM varies greatly depending -upon how imbalanced the data is. With a greater imbalanced ratio, the decision -function favors the class with the larger number of samples, usually referred -as the majority class. +As expected, the decision function of the Logistic Regression classifier varies significantly +depending on how imbalanced the data is. With a greater imbalance ratio, the decision function +tends to favour the class with the larger number of samples, usually referred to as the +**majority class**.