Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide info of exact processed bytes to resume a large file processing #446

Open
tmokmss opened this issue Nov 30, 2024 · 0 comments
Open

Comments

@tmokmss
Copy link

tmokmss commented Nov 30, 2024

Summary

I want to know exactly how many bytes a parser read when it parsed the current record. This information is useful when a process is suspended and want to restart it from the last record, without reading the target csv from the beginning again, which sometimes requires very long time to get the last position.

Motivation

We can resume processing a csv file from a certain position using createReadStream's start option:

const parser = createReadStream('foo.csv', { start: startPosition }).pipe(parse({}));

But to use the option, we have to know the exact position (in bytes) of the first byte of the last record. However, currently CSV parse does not offer such information (ref), making it difficult to resume a process.

Alternative

  1. parser.info.bytes seems to include the number of bytes it has read so far (it eagerly reads a file). Because it does not necessarily mean the exact positon of a head of a record, it cannot be used for this purpose.

  2. parse function has from and fromLines option, but it has to read the file from the beginning so it didn't shorten the processinig time at all.

Draft

Write a proposal for the feature, how it works, its expected coverage, a sample code or unit test. If the feature is related to a documentation or article, write the content or the table of content you expect.

Additional context

Add any other context or screenshots about the feature request here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant