Inference data loader #81

EthanMarx · 2023-11-27T22:41:12Z

Adds simple inference dataloader that yields stride_size batches sequentially from a single file.

Was seeing ~12,000 seconds / second throughput from the following simple profiling code:

import h5py
from tempfile import TemporaryDirectory
from pathlib import Path
import numpy as np
from ml4gw.dataloading import InferenceDataset
import time

channels = ["H1", "L1"]
fname = "a.h5"
length = 20000
sample_rate = 2048

update_length = 16
stride_size = int(update_length * sample_rate)

with TemporaryDirectory(dir=Path.cwd()) as tmpdir:
    with h5py.File(fname, 'w') as f:
        for channel in channels:
            f.create_dataset(channel, data=np.arange(sample_rate * length), chunks=None)

    dataset = InferenceDataset(
        fname = fname,
        channels=channels,
        stride_size=stride_size,
    )


    start = time.time()
    for x in dataset:
        continue

    stop = time.time()
    duration = stop - start
    print("Throughput (s/s) ", length / duration)

Throughput (s/s)  11789.3695364983

EthanMarx added 3 commits November 27, 2023 10:57

initial commit of inference dataloader

5076d80

update documentation

560d9c8

add simple inference dataloader

f441cad

EthanMarx changed the base branch from main to dev November 27, 2023 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference data loader #81

Inference data loader #81

EthanMarx commented Nov 27, 2023 •

edited

Loading

Inference data loader #81

Are you sure you want to change the base?

Inference data loader #81

Conversation

EthanMarx commented Nov 27, 2023 • edited Loading

EthanMarx commented Nov 27, 2023 •

edited

Loading