Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference data loader #81

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

EthanMarx
Copy link
Collaborator

@EthanMarx EthanMarx commented Nov 27, 2023

Adds simple inference dataloader that yields stride_size batches sequentially from a single file.

Was seeing ~12,000 seconds / second throughput from the following simple profiling code:

import h5py
from tempfile import TemporaryDirectory
from pathlib import Path
import numpy as np
from ml4gw.dataloading import InferenceDataset
import time

channels = ["H1", "L1"]
fname = "a.h5"
length = 20000
sample_rate = 2048

update_length = 16
stride_size = int(update_length * sample_rate)

with TemporaryDirectory(dir=Path.cwd()) as tmpdir:
    with h5py.File(fname, 'w') as f:
        for channel in channels:
            f.create_dataset(channel, data=np.arange(sample_rate * length), chunks=None)

    dataset = InferenceDataset(
        fname = fname,
        channels=channels,
        stride_size=stride_size,
    )


    start = time.time()
    for x in dataset:
        continue

    stop = time.time()
    duration = stop - start
    print("Throughput (s/s) ", length / duration)
Throughput (s/s)  11789.3695364983

@EthanMarx EthanMarx changed the base branch from main to dev November 27, 2023 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant