FileStream doesn't agree to be tricked #106917
Replies: 2 comments 14 replies
-
Is there a reason you're not using the S3 client SDK, that has a method to do this directly?
I wouldn't. I'd just use public async Task Traditional()
{
await using var readStream = GetReadStreamFromS3();
await using var writeStream = new FileStream("something.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous);
await readStream.CopyToAsync(writeStream, BufferSize);
} (unless I needed to do something really funky at each buffer chunk, I suppose...) ... if you really wanted to go low-level you could dip into For two buffers, it would be better to use public async Task TwoBuffers()
{
await using var readStream = GetReadStreamFromS3();
await using var writeStream = new FileStream("something.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous);
// TODO: use pooled/recyclable memory buffers. Implementations exist.
var readBuffer = new Memory<byte>(new byte[BufferSize]);
var writeBuffer = new Memory<byte>(new byte[BufferSize]);
var bytesRead = await readStream.ReadAsync(writeBuffer);
// TODO: I feel like other improvements are possible here in some fashion...
while (bytesRead > 0)
{
var writeTask = writeStream.WriteAsync(writeBuffer[..bytesRead]);
var readTask = readStream.ReadAsync(readBuffer);
await Task.WhenAll(writeTask, readTask);
await writeTask;
bytesRead = await readTask;
(readBuffer, writeBuffer) = (writeBuffer, readBuffer);
}
}
Maybe, but this isn't actually guaranteed. |
Beta Was this translation helpful? Give feedback.
-
This analysis is overlooking work that the operating system does concurrently with your program. On the receive side, the operating system will receive data over the network and buffer it in the kernel, even if you are not actively calling The overhead the C# program is adding is mostly copying data to and from the kernel. Modern memory bandwidth is measured in tens of gigabytes a second, so the C# part of this data copying might not be adding much overhead. I would personally use |
Beta Was this translation helpful? Give feedback.
-
I'm implementing a method that downloads objects from S3. Each invocation splits the object into chunks. On each chunk I open a FileStream and copy the data from the network stream to it iteratively using a buffer.
Traditionally one would write something like this to implement the copying.
Focus on the while loop. Do you see how I first await the readStream.ReadAsync then the writeStream.WriteAsync and thus the total runtime is equal to the combination of reading from the source stream in isolation and writing to the target stream in isolation? I figure out that it is not necessary for these operations to be sequential. With a little twist we can arrive at something like this.
Here by utilizing two buffers, one for reading and one for writing, we manage to wait for the read task and the write task simultaneously, so we should expect the runtime to be equal to either reading from the source in isolation or writing to the target stream in isolation, depending on which one is slower, right?
For example if reading from the source in isolation takes 5 seconds, writing to the target in isolation takes 3 seconds, then the first method should take 8 seconds while the second should take 5, right?
However what my benchmark results are showing is that the first one is actually faster, and my trick doesn't work.
Since I cannot simulate the network bandwidth between S3 and the EC2 instances accurately with my local machine, I use another file stream as the source stream. My code is as follows.
And the benchmark results are as follows.
The results indicate clearly that the traditional method is substantially faster than the two-buffers version. It is interesting and I would like to know why. Thank you.
Beta Was this translation helpful? Give feedback.
All reactions