diff --git a/doc/introduction.rst b/doc/introduction.rst
index 5e9f54686..3398cd510 100644
--- a/doc/introduction.rst
+++ b/doc/introduction.rst
@@ -31,7 +31,7 @@ Imbalanced-learn samplers accept the same inputs as scikit-learn estimators:
 * `data`, 2-dimensional array-like structures, such as:
    * Python's list of lists :class:`list`,
    * Numpy arrays :class:`numpy.ndarray`,
-   * Panda dataframes :class:`pandas.DataFrame`,
+   * Pandas dataframes :class:`pandas.DataFrame`,
    * Scipy sparse matrices :class:`scipy.sparse.csr_matrix` or :class:`scipy.sparse.csc_matrix`;
 
 * `targets`, 1-dimensional array-like structures, such as:
diff --git a/doc/over_sampling.rst b/doc/over_sampling.rst
index 3bc975b89..683b289a0 100644
--- a/doc/over_sampling.rst
+++ b/doc/over_sampling.rst
@@ -6,21 +6,26 @@ Over-sampling
 
 .. currentmodule:: imblearn.over_sampling
 
-A practical guide
-=================
+As :ref:`discussed earlier <problem_statement>`, the decision function of a
+multi-class classifier can favour the majority class, potentially leading to overfitting
+(see, for example, a
+`Dummy classifier
+<https://scikit-learn.org/stable/modules/generated/sklearn.dummy.DummyClassifier.html>`_).
+
+One approach to address this issue is to generate new samples for the under-represented
+classes, a technique known as **over-sampling**.
 
-You can refer to
-:ref:`sphx_glr_auto_examples_over-sampling_plot_comparison_over_sampling.py`.
+Please refer to :ref:`sphx_glr_auto_examples_over-sampling_plot_comparison_over_sampling.py`
+for details on the visuals included in this document.
 
 .. _random_over_sampler:
 
-Naive random over-sampling
---------------------------
+Naive Random Over-Sampling
+==========================
 
-One way to fight this issue is to generate new samples in the classes which are
-under-represented. The most naive strategy is to generate new samples by
-randomly sampling with replacement the current available samples. The
-:class:`RandomOverSampler` offers such scheme::
+The most naive strategy is to generate new samples by
+**randomly sampling with replacement** from the existing samples. The
+:class:`RandomOverSampler` implements this approach::
 
    >>> from sklearn.datasets import make_classification
    >>> X, y = make_classification(n_samples=5000, n_features=2, n_informative=2,
@@ -35,28 +40,27 @@ randomly sampling with replacement the current available samples. The
    >>> print(sorted(Counter(y_resampled).items()))
    [(0, 4674), (1, 4674), (2, 4674)]
 
-The augmented data set should be used instead of the original data set to train
-a classifier::
+The **augmented data set** `(X_resampled, y_resampled)` should be used
+instead of the original data set to train a classifier::
 
   >>> from sklearn.linear_model import LogisticRegression
   >>> clf = LogisticRegression()
   >>> clf.fit(X_resampled, y_resampled)
   LogisticRegression(...)
 
-In the figure below, we compare the decision functions of a classifier trained
-using the over-sampled data set and the original data set.
+In the figure below, we compare the decision functions of a classifier
+trained on the augmented dataset with those trained on the original dataset.
 
 .. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_002.png
    :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
    :scale: 60
    :align: center
 
-As a result, the majority class does not take over the other classes during the
-training process. Consequently, all classes are represented by the decision
-function.
+We observe that the majority class does not dominate the other classes during training.
+Consequently, the decision function represents all classes.
 
-In addition, :class:`RandomOverSampler` allows to sample heterogeneous data
-(e.g. containing some strings)::
+In addition, :class:`RandomOverSampler` supports **heterogeneous data**
+(e.g., strings, datetime, categorical features, etc.)::
 
   >>> import numpy as np
   >>> X_hetero = np.array([['xxx', 1, 1.0], ['yyy', 2, 2.0], ['zzz', 3, 3.0]],
@@ -71,7 +75,7 @@ In addition, :class:`RandomOverSampler` allows to sample heterogeneous data
   >>> print(y_resampled)
   [0 0 1 1]
 
-It would also work with pandas dataframe::
+It also supports Pandas Dataframes::
 
   >>> from sklearn.datasets import fetch_openml
   >>> df_adult, y_adult = fetch_openml(
@@ -80,13 +84,14 @@ It would also work with pandas dataframe::
   >>> df_resampled, y_resampled = ros.fit_resample(df_adult, y_adult)
   >>> df_resampled.head()  # doctest: +SKIP
 
-If repeating samples is an issue, the parameter `shrinkage` allows to create a
-smoothed bootstrap. However, the original data needs to be numerical. The
-`shrinkage` parameter controls the dispersion of the new generated samples. We
-show an example illustrate that the new samples are not overlapping anymore
-once using a smoothed bootstrap. This ways of generating smoothed bootstrap is
-also known a Random Over-Sampling Examples
-(ROSE) :cite:`torelli2014rose`.
+If ordinary repetition is insufficient, the `shrinkage` parameter enables users to perform
+a **smoothed bootstrap** (i.e., adding noise to resampled observations). However,
+the original data must be numerical.
+
+The `shrinkage` parameter controls the dispersion of the newly generated samples.
+We demonstrate that it can be used to produce non-overlapping new samples.
+This method of generating a smoothed bootstrap is also known as **Random Over-Sampling Examples
+(ROSE)** :cite:`torelli2014rose`.
 
 .. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_003.png
    :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
@@ -95,14 +100,19 @@ also known a Random Over-Sampling Examples
 
 .. _smote_adasyn:
 
-From random over-sampling to SMOTE and ADASYN
----------------------------------------------
+From Random Over-Sampling to SMOTE and ADASYN
+=============================================
+
+Apart from the random sampling with replacement, two popular methods
+for oversampling minority classes are:
+
+1. **Synthetic Minority Oversampling Technique (SMOTE)** :class:`SMOTE`
+:cite:`chawla2002smote`; and
 
-Apart from the random sampling with replacement, there are two popular methods
-to over-sample minority classes: (i) the Synthetic Minority Oversampling
-Technique (SMOTE) :cite:`chawla2002smote` and (ii) the Adaptive Synthetic
-(ADASYN) :cite:`he2008adasyn` sampling method. These algorithms can be used in
-the same manner::
+2. **Adaptive Synthetic (ADASYN)** :class:`ADASYN`
+:cite:`he2008adasyn`.
+
+These algorithms can be applied in the same way::
 
   >>> from imblearn.over_sampling import SMOTE, ADASYN
   >>> X_resampled, y_resampled = SMOTE().fit_resample(X, y)
@@ -114,70 +124,73 @@ the same manner::
   [(0, 4673), (1, 4662), (2, 4674)]
   >>> clf_adasyn = LogisticRegression().fit(X_resampled, y_resampled)
 
-The figure below illustrates the major difference of the different
-over-sampling methods.
+The figure below illustrates the key differences between the various oversampling methods.
 
 .. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_004.png
    :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
    :scale: 60
    :align: center
 
-Ill-posed examples
-------------------
+Ill-Posed Examples
+==================
+
+While :class:`RandomOverSampler` over-samples by duplicating samples
+from the minority class, :class:`SMOTE` and :class:`ADASYN` generate
+new samples through interpolation. However, the approach used to
+interpolate or generate these synthetic samples differs.
 
-While the :class:`RandomOverSampler` is over-sampling by duplicating some of
-the original samples of the minority class, :class:`SMOTE` and :class:`ADASYN`
-generate new samples in by interpolation. However, the samples used to
-interpolate/generate new synthetic samples differ. In fact, :class:`ADASYN`
-focuses on generating samples next to the original samples which are wrongly
-classified using a k-Nearest Neighbors classifier while the basic
-implementation of :class:`SMOTE` will not make any distinction between easy and
-hard samples to be classified using the nearest neighbors rule. Therefore, the
-decision function found during training will be different among the algorithms.
+Specifically, :class:`ADASYN` focuses on generating samples near
+the original samples that are misclassified by a k-Nearest Neighbours classifier
+(more precisely, a `KDTree
+<https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KDTree.html>`_).
+In contrast, the basic implementation of :class:`SMOTE` does not distinguish between
+easily and difficultly classified samples when using the nearest neighbours rule.
+Consequently, the decision functions learned during training will differ between these algorithms.
 
 .. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_005.png
    :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
    :align: center
 
-The sampling particularities of these two algorithms can lead to some peculiar
-behavior as shown below.
+The specific sampling characteristics of these two algorithms can result in
+distinctive behaviours, as demonstrated below.
 
 .. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_006.png
    :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
    :scale: 60
    :align: center
 
-SMOTE variants
---------------
+SMOTE Variants
+==============
+
+:class:`SMOTE` might connect inliers with outliers; while :class:`ADASYN`
+might focus solely on outliers. Both cases can lead to a
+sub-optimal decision function. To address this, :class:`SMOTE`
+provides three variants for generating samples
+
+1. :class:`BorderlineSMOTE` :cite:`han2005borderline`
+2. :class:`SVMSMOTE` :cite:`nguyen2009borderline`
+3. :class:`KMeansSMOTE` :cite:`last2017oversampling`
+
+These methods focus on samples near the decision boundary and generate samples
+in the opposite direction of the nearest neighbour class.
+These variants are illustrated in the figure below.
 
-SMOTE might connect inliers and outliers while ADASYN might focus solely on
-outliers which, in both cases, might lead to a sub-optimal decision
-function. In this regard, SMOTE offers three additional options to generate
-samples. Those methods focus on samples near the border of the optimal
-decision function and will generate samples in the opposite direction of the
-nearest neighbors class. Those variants are presented in the figure below.
+In particular, the first variant of :class:`BorderlineSMOTE` corresponds to
+`kind="borderline-1"`, while the second corresponds to `kind="borderline-2"`.
 
 .. image:: ./auto_examples/over-sampling/images/sphx_glr_plot_comparison_over_sampling_007.png
    :target: ./auto_examples/over-sampling/plot_comparison_over_sampling.html
    :scale: 60
    :align: center
 
+However, none of these SMOTE variants (or, in fact,
+any of the methods presented so far, except :class:`RandomOverSampler`) can handle
+categorical features. To work with mixed data types (continuous and categorical features),
+we introduce the **Synthetic Minority Over-sampling Technique for Nominal and Continuous**
+:class:`SMOTENC` :cite:`chawla2002smote`, an extension of the :class:`SMOTE` algorithm
+designed to handle categorical features.
 
-The :class:`BorderlineSMOTE` :cite:`han2005borderline`,
-:class:`SVMSMOTE` :cite:`nguyen2009borderline`, and
-:class:`KMeansSMOTE` :cite:`last2017oversampling` offer some variant of the
-SMOTE algorithm::
-
-  >>> from imblearn.over_sampling import BorderlineSMOTE
-  >>> X_resampled, y_resampled = BorderlineSMOTE().fit_resample(X, y)
-  >>> print(sorted(Counter(y_resampled).items()))
-  [(0, 4674), (1, 4674), (2, 4674)]
-
-When dealing with mixed data type such as continuous and categorical features,
-none of the presented methods (apart of the class :class:`RandomOverSampler`)
-can deal with the categorical features. The :class:`SMOTENC`
-:cite:`chawla2002smote` is an extension of the :class:`SMOTE` algorithm for
-which categorical data are treated differently::
+We start by creating a dataset that includes both continuous and categorical features::
 
   >>> # create a synthetic data set with continuous and categorical features
   >>> rng = np.random.RandomState(42)
@@ -190,12 +203,17 @@ which categorical data are treated differently::
   >>> print(sorted(Counter(y).items()))
   [(0, 20), (1, 30)]
 
-In this data set, the first and last features are considered as categorical
-features. One needs to provide this information to :class:`SMOTENC` via the
-parameters ``categorical_features`` either by passing the indices, the feature
-names when `X` is a pandas DataFrame, a boolean mask marking these features,
-or relying on `dtype` inference if the columns are using the
-:class:`pandas.CategoricalDtype`::
+Here, the first and last features are categorical.
+This information must be provided to :class:`SMOTENC` via the `categorical_features` parameter
+in one of the following ways:
+
+- By relying on `dtype` inference if the columns use the :class:`pandas.CategoricalDtype`.
+- By passing the indices of the categorical features when `X` is a Pandas DataFrame.
+- By specifying the feature names when `X` is a Pandas DataFrame.
+- By providing a Boolean mask identifying the categorical features.
+
+Therefore, the samples generated in the first and last columns belong to the same categories
+as the original data, without any additional interpolation::
 
   >>> from imblearn.over_sampling import SMOTENC
   >>> smote_nc = SMOTENC(categorical_features=[0, 2], random_state=0)
@@ -209,22 +227,17 @@ or relying on `dtype` inference if the columns are using the
    ['B' 0.37... 2]
    ['B' 0.33... 2]]
 
-Therefore, it can be seen that the samples generated in the first and last
-columns are belonging to the same categories originally presented without any
-other extra interpolation.
-
-However, :class:`SMOTENC` is only working when data is a mixed of numerical and
-categorical features. If data are made of only categorical data, one can use
-the :class:`SMOTEN` variant :cite:`chawla2002smote`. The algorithm changes in
-two ways:
+However, :class:`SMOTENC` only works when the data is a mixture of continuous and
+categorical features. If the data consists only of categorical features,
+the **Synthetic Minority Over-sampling Technique for Nominal variant**, :class:`SMOTEN`
+:cite:`chawla2002smote` (without the "C"), can be used instead. The algorithm changes in two ways:
 
-* the nearest neighbors search does not rely on the Euclidean distance. Indeed,
-  the value difference metric (VDM) also implemented in the class
-  :class:`~imblearn.metrics.ValueDifferenceMetric` is used.
-* a new sample is generated where each feature value corresponds to the most
-  common category seen in the neighbors samples belonging to the same class.
+- The nearest neighbours search uses the **value difference metric (VDM)**
+  :class:`imblearn.metrics.pairwise.ValueDifferenceMetric` instead of Euclidean distance.
+- A new sample is generated where each feature value corresponds to the most
+  common category among the neighbour samples belonging to the same class.
 
-Let's take the following example::
+Let's consider the following example to see how :class:`SMOTEN` handles categorical data::
 
    >>> import numpy as np
    >>> X = np.array(["green"] * 5 + ["red"] * 10 + ["blue"] * 7,
@@ -232,10 +245,9 @@ Let's take the following example::
    >>> y = np.array(["apple"] * 5 + ["not apple"] * 3 + ["apple"] * 7 +
    ...              ["not apple"] * 5 + ["apple"] * 2, dtype=object)
 
-We generate a dataset associating a color to being an apple or not an apple.
-We strongly associated "green" and "red" to being an apple. The minority class
-being "not apple", we expect new data generated belonging to the category
-"blue"::
+We generate a dataset associating the colours of `apple` and `not apple`.
+We strongly associate `green` and `red` with `apple`. The minority class is `not apple`,
+so we expect the newly generated data to belong to the category `blue`::
 
    >>> from imblearn.over_sampling import SMOTEN
    >>> sampler = SMOTEN(random_state=0)
@@ -251,25 +263,31 @@ being "not apple", we expect new data generated belonging to the category
    array(['not apple', 'not apple', 'not apple', 'not apple', 'not apple',
           'not apple'], dtype=object)
 
-Mathematical formulation
-========================
-
-Sample generation
------------------
+Sample Generation
+=================
 
 Both :class:`SMOTE` and :class:`ADASYN` use the same algorithm to generate new
-samples. Considering a sample :math:`x_i`, a new sample :math:`x_{new}` will be
-generated considering its k neareast-neighbors (corresponding to
+samples. Given a sample :math:`x_i`, a new sample :math:`x_{new}` will be
+generated by considering its :math:`k` nearest-neighbors (corresponding to
 ``k_neighbors``). For instance, the 3 nearest-neighbors are included in the
 blue circle as illustrated in the figure below. Then, one of these
 nearest-neighbors :math:`x_{zi}` is selected and a sample is generated as
 follows:
 
+Both :class:`SMOTE` and :class:`ADASYN` use the same algorithm to generate new
+samples. Given a sample :math:`x_i`, a new sample :math:`x_{new}` is generated
+by considering its :math:`k` nearest neighbours (corresponding to the
+``k_neighbors`` parameter of :class:`SMOTE`, or ``n_neighbors`` of :class:`ADASYN`).
+For example, the three nearest neighbours of :math:`x_i` (including :math:`x_i`
+itself) are shown within the blue circle
+in the figure below. One of these nearest neighbours, :math:`x_{zi}`, is then selected,
+and a new sample is generated as follows:
+
 .. math::
 
-   x_{new} = x_i + \lambda \times (x_{zi} - x_i)
+   x_{new} = x_i + \lambda (x_{zi} - x_i)
 
-where :math:`\lambda` is a random number in the range :math:`[0, 1]`. This
+where :math:`\lambda \in [0,1]` is randomly picked. This
 interpolation will create a sample on the line between :math:`x_{i}` and
 :math:`x_{zi}` as illustrated in the image below:
 
@@ -278,58 +296,67 @@ interpolation will create a sample on the line between :math:`x_{i}` and
    :scale: 60
    :align: center
 
-SMOTE-NC slightly change the way a new sample is generated by performing
-something specific for the categorical features. In fact, the categories of a
-new generated sample are decided by picking the most frequent category of the
-nearest neighbors present during the generation.
+The sample generation process in :class:`SMOTENC` is slightly different because it
+applies a specific approach to categorical features.
+Specifically, the category of a newly generated sample is determined by the
+most frequent category among its nearest neighbours during the generation process.
 
 .. warning::
-   Be aware that SMOTE-NC is not designed to work with only categorical data.
-
-The other SMOTE variants and ADASYN differ from each other by selecting the
-samples :math:`x_i` ahead of generating the new samples.
-
-The **regular** SMOTE algorithm --- cf. to the :class:`SMOTE` object --- does not
-impose any rule and will randomly pick-up all possible :math:`x_i` available.
-
-The **borderline** SMOTE --- cf. to the :class:`BorderlineSMOTE` with the
-parameters ``kind='borderline-1'`` and ``kind='borderline-2'`` --- will
-classify each sample :math:`x_i` to be (i) noise (i.e. all nearest-neighbors
-are from a different class than the one of :math:`x_i`), (ii) in danger
-(i.e. at least half of the nearest neighbors are from the same class than
-:math:`x_i`, or (iii) safe (i.e. all nearest neighbors are from the same class
-than :math:`x_i`). **Borderline-1** and **Borderline-2** SMOTE will use the
-samples *in danger* to generate new samples. In **Borderline-1** SMOTE,
-:math:`x_{zi}` will belong to the same class than the one of the sample
-:math:`x_i`. On the contrary, **Borderline-2** SMOTE will consider
-:math:`x_{zi}` which can be from any class.
-
-**SVM** SMOTE --- cf. to :class:`SVMSMOTE` --- uses an SVM classifier to find
-support vectors and generate samples considering them. Note that the ``C``
-parameter of the SVM classifier allows to select more or less support vectors.
-
-For both borderline and SVM SMOTE, a neighborhood is defined using the
-parameter ``m_neighbors`` to decide if a sample is in danger, safe, or noise.
-
-**KMeans** SMOTE --- cf. to :class:`KMeansSMOTE` --- uses a KMeans clustering
-method before to apply SMOTE. The clustering will group samples together and
-generate new samples depending of the cluster density.
-
-ADASYN works similarly to the regular SMOTE. However, the number of
-samples generated for each :math:`x_i` is proportional to the number of samples
-which are not from the same class than :math:`x_i` in a given
-neighborhood. Therefore, more samples will be generated in the area that the
-nearest neighbor rule is not respected. The parameter ``m_neighbors`` is
-equivalent to ``k_neighbors`` in :class:`SMOTE`.
-
-Multi-class management
-----------------------
-
-All algorithms can be used with multiple classes as well as binary classes
-classification.  :class:`RandomOverSampler` does not require any inter-class
-information during the sample generation. Therefore, each targeted class is
-resampled independently. In the contrary, both :class:`ADASYN` and
-:class:`SMOTE` need information regarding the neighbourhood of each sample used
-for sample generation. They are using a one-vs-rest approach by selecting each
-targeted class and computing the necessary statistics against the rest of the
-data set which are grouped in a single class.
+   Note that :class:`SMOTENC` is not designed to handle datasets
+   consisting solely of categorical features.
+
+The other SMOTE variants and :class:`ADASYN` differ in how they select
+the samples :math:x_i before generating new samples:
+
+- :class:`SMOTE` imposes no specific rules and randomly selects
+  from all available :math:`x_i`.
+
+- :class:`BorderlineSMOTE` classifies each sample :math:`x_i` into one of
+  three categories:
+
+  i. **Noise**: All nearest neighbours belong to a different class than :math:`x_i`.
+
+  ii. **In danger**: At least half of the nearest neighbours belong to the same
+      class as :math:`x_i`.
+
+  iii. **Safe**: All nearest neighbours belong to the same class as :math:`x_i`.
+
+  Both ``kind="borderline-1"`` and ``kind="borderline-2"`` use samples
+  classified as *in danger* to generate new samples.
+
+  - In ``kind="borderline-1"``, :math:`x_{zi}` is selected from the same class
+    as :math:`x_i`.
+
+  - In contrast, ``kind="borderline-2"`` allows :math:`x_{zi}` to be from any class.
+
+- :class:`SVMSMOTE` uses a `SVM classifier
+  <https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html>`_
+  to identify support vectors and generate samples based on them.
+  Note that the ``C`` parameter of the SVM classifier influences the number of support vectors.
+
+For both :class:`BorderlineSMOTE` and :class:`SVMSMOTE`, the neighbourhood used to
+determine whether a sample is noise, in danger, or safe is defined by the parameter
+``m_neighbors`` rather than ``k_neighbors``.
+
+- :class:`KMeansSMOTE` employs a `k-means clustering method
+  <https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html>`_
+  before applying :class:`SMOTE`.
+  The clustering groups samples together and generates new samples based on the density of each cluster.
+
+- :class:`ADASYN` works similarly to :class:`SMOTE`. However, the number of samples generated for each
+  :math:`x_i` is proportional to the number of neighbours that do not belong to the same class as
+  :math:`x_i`. Thus, more samples are generated in areas where the *nearest-neighbour rule* is not satisfied.
+  The parameter ``m_neighbors`` is equivalent to ``k_neighbors`` in :class:`SMOTE`.
+
+Multi-Class Management
+======================
+
+All algorithms can be applied to both binary and multi-class classification.
+
+:class:`RandomOverSampler` does not rely on inter-class information during sample generation,
+meaning each target class is resampled independently.
+
+In contrast, both :class:`ADASYN` and :class:`SMOTE` require neighbourhood information
+for each sample to generate new ones. These algorithms use a *one-vs-rest* approach,
+where each target class is selected, and the necessary statistics are computed against
+the rest of the dataset, which is treated as a single class.