Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature calculation is stucked (issue on Multiprocessing lib & Windows) #35

Open
GGA-PERSO opened this issue Jun 18, 2023 · 8 comments
Open
Labels

Comments

@GGA-PERSO
Copy link

GGA-PERSO commented Jun 18, 2023

What happened + What you expected to happen

As your doc mentions it should be possible to add custom feature (I copy paste your function from README)
=> but nothing happens after a few longs minutes

Could you please check ?

Versions / Dependencies

0.4.2 (the last one)

Reproduction script

import pandas as pd
import numpy as np
from tsfeatures import tsfeatures

periods = 24
ind = pd.date_range(start='2021-01-01', periods=periods, freq='MS')
vals = np.random.rand(periods)
df = pd.DataFrame({'ds':ind, 'y':vals, 'unique_id':1})

def number_zeros(x, freq):
number = (x == 0).sum()
return {'number_zeros': number}

features_df = tsfeatures(df,freq=12, features=[number_zeros])
features_df

Issue Severity

None

@GGA-PERSO GGA-PERSO added the bug label Jun 18, 2023
@truonghm
Copy link

truonghm commented Aug 28, 2023

I'm having a similar issue. If I understand correctly, the number_zeros function will count the number of zeros for each unique_id.

def number_zeros(x, freq):

    number = (x == 0).sum()
    return {'number_zeros': number}

features = tsf.tsfeatures(data, features=[tsf.stl_features, number_zeros], dict_freqs={'MS': 12,})

Result is wrong because number_zeros is not supposed to be all zeros like this. In the data there are some unique ids that contain zeros.

unique_id number_zeros
0 282998 0
1 347809 0
2 489552 0
3 594474 0
4 594861 0
5 595209 0
6 595956 0
7 600426 0
8 600429 0

Currently I'm having to do this instead:

features = pd.merge(
    data[["unique_id", "y"]].query("y>0").groupby("unique_id").count().reset_index(),
    features,
    how="left",
    on="unique_id",
)

features.rename(columns={"y": "series_length"}, inplace=True)

@ngupta23
Copy link
Member

I think that the issue is that the scale argument in ts_features is set to True by default. You should try to change that to False and then rerun.

@GGA-PERSO
Copy link
Author

actually issue (infinite loop ) is coming from multiprocessing => I think tsfeatures cannot be used with Windows and Jupyter notebook / IPython

@ngupta23
Copy link
Member

I have used t features in Jupiter notebooks. Did not have any issues.

@GGA-PERSO
Copy link
Author

ok @ngupta23 but what is your OS ?

@ngupta23
Copy link
Member

I used it in WSL

@GGA-PERSO
Copy link
Author

Windows subsystem for linux is not pure windows. ;)
Multiprocessing works differently between Linux and Windows.

@GGA-PERSO GGA-PERSO changed the title Custom feature doesn't work Custom feature is stuck (issue on Multiprocessing lib & Windows) Jan 12, 2025
@GGA-PERSO GGA-PERSO changed the title Custom feature is stuck (issue on Multiprocessing lib & Windows) Feature calculation is stucked (issue on Multiprocessing lib & Windows) Jan 12, 2025
@GGA-PERSO
Copy link
Author

image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants