You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, some functions do not check if the Series they get as input really consists of strings only, and they give unexpected results, e.g. if there are missing values.
Example:
import texthero as hero
import pandas as pd
import numpy as np
s = pd.Series(["Test", np.nan])
hero.noun_chunks(s)
>>0 []
>>1 [(nan, NP, 0, 3)]
This could be fixed by stopping to use s.astype('unicode') which e.g. converts np.nan -> "nan". Instead, a function should check whether the Series consists of strings only. Something along the lines of
def _check_series_strings(s):
if not df.map(type).eq(str).all():
raise TypeError("Non-string values in series. Use hero.drop_no_content(s) to drop those values.")
The text was updated successfully, but these errors were encountered:
We might want to have a different name for this function. If we agree on the name of the kinds of pandas series defined in #60, we could call it _check_is_text_series or something like that.
Currently, some functions do not check if the Series they get as input really consists of strings only, and they give unexpected results, e.g. if there are missing values.
Example:
This could be fixed by stopping to use
s.astype('unicode')
which e.g. converts np.nan -> "nan". Instead, a function should check whether the Series consists of strings only. Something along the lines ofThe text was updated successfully, but these errors were encountered: