FileStream doesn't agree to be tricked #106917

congthanhnguyenOL · 2024-08-24T08:02:23Z

congthanhnguyenOL
Aug 24, 2024

I'm implementing a method that downloads objects from S3. Each invocation splits the object into chunks. On each chunk I open a FileStream and copy the data from the network stream to it iteratively using a buffer.

Traditionally one would write something like this to implement the copying.

public async Task Traditional()
{
    await using var readStream = GetReadStreamFromS3();
    await using var writeStream = new FileStream("something.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous);
    var buffer = new Memory<byte>(new byte[BufferSize]);
    int bytesRead;
    while ((bytesRead = await readStream.ReadAsync(buffer)) > 0)
    {
        await writeStream.WriteAsync(buffer[..bytesRead]);
    }
}

Focus on the while loop. Do you see how I first await the readStream.ReadAsync then the writeStream.WriteAsync and thus the total runtime is equal to the combination of reading from the source stream in isolation and writing to the target stream in isolation? I figure out that it is not necessary for these operations to be sequential. With a little twist we can arrive at something like this.

public async Task TwoBuffers()
{
    await using var readStream = GetReadStreamFromS3();
    await using var writeStream = new FileStream("something.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous);
    var readBuffer = new Memory<byte>(new byte[BufferSize]);
    var writeBuffer = new Memory<byte>(new byte[BufferSize]);
    var bytesRead = await readStream.ReadAsync(writeBuffer);
    while (bytesRead > 0)
    {
        var writeTask = writeStream.WriteAsync(writeBuffer[..bytesRead]);
        bytesRead = await readStream.ReadAsync(readBuffer);
        await writeTask;
        (readBuffer, writeBuffer) = (writeBuffer, readBuffer);
    }
}

Here by utilizing two buffers, one for reading and one for writing, we manage to wait for the read task and the write task simultaneously, so we should expect the runtime to be equal to either reading from the source in isolation or writing to the target stream in isolation, depending on which one is slower, right?

For example if reading from the source in isolation takes 5 seconds, writing to the target in isolation takes 3 seconds, then the first method should take 8 seconds while the second should take 5, right?

However what my benchmark results are showing is that the first one is actually faster, and my trick doesn't work.

Since I cannot simulate the network bandwidth between S3 and the EC2 instances accurately with my local machine, I use another file stream as the source stream. My code is as follows.

public class Test
{
    [Params(1024 * 4, 1024 * 8, 1024 * 16, 1024 * 32, 1024 * 64)]
    public int BufferSize;
        [Benchmark]
    public async Task Traditional()
    {
        await using var readStream = new FileStream("something.mp4", FileMode.Open, FileAccess.Read, FileShare.Read, BufferSize, FileOptions.Asynchronous);
        await using var writeStream = new FileStream($"something - {Guid.NewGuid()}.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous | FileOptions.DeleteOnClose);
        var buffer = new Memory<byte>(new byte[BufferSize]);
        int bytesRead;
        while ((bytesRead = await readStream.ReadAsync(buffer)) > 0)
        {
            await writeStream.WriteAsync(buffer[..bytesRead]);
        }
    }
        [Benchmark]
    public async Task TwoBuffers()
    {
        await using var readStream = new FileStream("something.mp4", FileMode.Open, FileAccess.Read, FileShare.Read, BufferSize, FileOptions.Asynchronous);
        await using var writeStream = new FileStream($"something - {Guid.NewGuid()}.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous | FileOptions.DeleteOnClose);
        var readBuffer = new Memory<byte>(new byte[BufferSize]);
        var writeBuffer = new Memory<byte>(new byte[BufferSize]);
        var bytesRead = await readStream.ReadAsync(writeBuffer);
        while (bytesRead > 0)
        {
            var writeTask = writeStream.WriteAsync(writeBuffer[..bytesRead]);
            bytesRead = await readStream.ReadAsync(readBuffer);
            await writeTask;
            (readBuffer, writeBuffer) = (writeBuffer, readBuffer);
        }
    }
}

And the benchmark results are as follows.

BenchmarkDotNet v0.14.0, macOS Sonoma 14.4.1 (23E224) [Darwin 23.4.0]
Apple M2, 1 CPU, 8 logical and 8 physical cores
.NET SDK 8.0.101
  [Host]     : .NET 8.0.1 (8.0.123.58001), Arm64 RyuJIT AdvSIMD
  DefaultJob : .NET 8.0.1 (8.0.123.58001), Arm64 RyuJIT AdvSIMD


| Method      | BufferSize | Mean      | Error     | StdDev    |
|------------ |----------- |----------:|----------:|----------:|
| Traditional | 4096       | 23.247 ms | 0.3201 ms | 0.2995 ms |
| TwoBuffers  | 4096       | 29.189 ms | 0.5822 ms | 1.1493 ms |
| Traditional | 8192       | 13.619 ms | 0.0637 ms | 0.0565 ms |
| TwoBuffers  | 8192       | 15.756 ms | 0.2779 ms | 0.2464 ms |
| Traditional | 16384      |  9.074 ms | 0.1714 ms | 0.1760 ms |
| TwoBuffers  | 16384      |  9.728 ms | 0.1390 ms | 0.1232 ms |
| Traditional | 32768      |  7.691 ms | 0.0806 ms | 0.0715 ms |
| TwoBuffers  | 32768      |  7.410 ms | 0.1245 ms | 0.1165 ms |
| Traditional | 65536      |  8.857 ms | 0.1694 ms | 0.2319 ms |
| TwoBuffers  | 65536      |  8.970 ms | 0.0602 ms | 0.0534 ms |

The results indicate clearly that the traditional method is substantially faster than the two-buffers version. It is interesting and I would like to know why. Thank you.

Clockwork-Muse · 2024-08-24T17:49:08Z

Clockwork-Muse
Aug 24, 2024
Collaborator

I'm implementing a method that downloads objects from S3.

Is there a reason you're not using the S3 client SDK, that has a method to do this directly?

Traditionally one would write something like this to implement the copying.

I wouldn't. I'd just use CopyToAsync(...):

public async Task Traditional()
{
    await using var readStream = GetReadStreamFromS3();
    await using var writeStream = new FileStream("something.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous);
    await readStream.CopyToAsync(writeStream, BufferSize);
}

(unless I needed to do something really funky at each buffer chunk, I suppose...)

... if you really wanted to go low-level you could dip into System.IO.Pipelines, but that's a lot of code that the SDK probably implements for you.

For two buffers, it would be better to use Task.WhenAll(...):

public async Task TwoBuffers()
{
    await using var readStream = GetReadStreamFromS3();
    await using var writeStream = new FileStream("something.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous);
    // TODO: use pooled/recyclable memory buffers.  Implementations exist.
    var readBuffer = new Memory<byte>(new byte[BufferSize]);
    var writeBuffer = new Memory<byte>(new byte[BufferSize]);
    var bytesRead = await readStream.ReadAsync(writeBuffer);
    // TODO: I feel like other improvements are possible here in some fashion...
    while (bytesRead > 0)
    {
        var writeTask = writeStream.WriteAsync(writeBuffer[..bytesRead]);
        var readTask = readStream.ReadAsync(readBuffer);
        await Task.WhenAll(writeTask, readTask);
        await writeTask;
        bytesRead = await readTask;
        (readBuffer, writeBuffer) = (writeBuffer, readBuffer);
    }
}

Here by utilizing two buffers, one for reading and one for writing, we manage to wait for the read task and the write task simultaneously, so we should expect the runtime to be equal to either reading from the source in isolation or writing to the target stream in isolation, depending on which one is slower, right?

Maybe, but this isn't actually guaranteed. async doesn't mean that methods are run concurrently, just that the executing thread doesn't block over suspension points. It often effectively happens with IO, but there's still overhead involved (which is why your smaller buffer sizes in the benchmarks take longer - there's more tasks happening).

7 replies

congthanhnguyenOL Aug 25, 2024
Author

Hey, thanks for the detailed answer.

Your answer makes me suspect that my disk I/O is serializing requests, so I spin up a local server to serve the file via network instead to remove the potential that the write to disk is blocked by the read from it. Surprisingly, the results are still the same for Traditional and TwoBuffers. They are as follows.

| Method      | UserBufferSize | WriteBufferSize | Mean     | Error     | StdDev    |
|------------ |--------------- |---------------- |---------:|----------:|----------:|
| Traditional | 65536          | 0               | 8.899 ms | 0.0618 ms | 0.0578 ms |
| TwoBuffers  | 65536          | 0               | 8.863 ms | 0.1561 ms | 0.1460 ms |

My code is as follows. Not much change instead of the use of network call in lieu of the read file stream.

using System.Buffers;
using BenchmarkDotNet.Attributes;

namespace ProfileCopyAsset;

public class AsyncFileStream
{
    private static readonly ArrayPool<byte> ArrayPool = ArrayPool<byte>.Create(1024 * 64, 2);
    
    [Params(1024 * 64)]
    public int UserBufferSize;

    [Params(0)]
    public int WriteBufferSize;
    
    [Benchmark]
    public async Task Traditional()
    {
        await using var readStream = await new HttpClient().GetStreamAsync("http://localhost:5099/file");
        await using var writeStream = new FileStream($"/Users/nguyencongthanh/something {Guid.NewGuid()}", FileMode.Create, FileAccess.Write, FileShare.None, WriteBufferSize, FileOptions.Asynchronous | FileOptions.DeleteOnClose);

        var buffer = ArrayPool.Rent(UserBufferSize);
        int bytesRead;
        while ((bytesRead = await readStream.ReadAsync(new Memory<byte>(buffer, 0, UserBufferSize))) > 0)
        {
            await writeStream.WriteAsync(new ReadOnlyMemory<byte>(buffer, 0, bytesRead));
        }
    }

    [Benchmark]
    public async Task TwoBuffers()
    {
        await using var readStream = await new HttpClient().GetStreamAsync("http://localhost:5099/file");
        await using var writeStream = new FileStream($"/Users/nguyencongthanh/something {Guid.NewGuid()}", FileMode.Create, FileAccess.Write, FileShare.None, WriteBufferSize, FileOptions.Asynchronous | FileOptions.DeleteOnClose);
        
        var readBuffer = ArrayPool.Rent(UserBufferSize);
        var writeBuffer = ArrayPool.Rent(UserBufferSize);
        var bytesRead = await readStream.ReadAsync(new Memory<byte>(writeBuffer, 0, UserBufferSize));
        while (bytesRead > 0)
        {
            var writeTask = writeStream.WriteAsync(new ReadOnlyMemory<byte>(writeBuffer, 0, bytesRead));
            bytesRead = await readStream.ReadAsync(new Memory<byte>(readBuffer, 0, UserBufferSize));
            await writeTask;
            (readBuffer, writeBuffer) = (writeBuffer, readBuffer);
        }
    }
}

The server is as follows.

var builder = WebApplication.CreateBuilder(args);

var app = builder.Build();

byte[] memory;
await using (var fileStream = new FileStream("/Users/nguyencongthanh/Downloads/IMG_3108.MOV", FileMode.Open, FileAccess.Read, FileShare.Read, 0, FileOptions.Asynchronous))
{
    await using var memoryStream = new MemoryStream();
    await fileStream.CopyToAsync(memoryStream);
    memory = memoryStream.ToArray();
}

app.MapGet("/file", () => Results.Stream(new MemoryStream(memory)));

app.Run();

I read the entire content into memory beforehand, so no read can be made during the process of writing the file stream.

I cannot explain this result. The mere invocation of WriteAsync should not delay the execution of TwoBuffers. The code should immediately execute bytesRead = await readStream.ReadAsync() while the writeTask executes in parallel. Only when the writeTask is awaited at await writeTask should it delay TwoBuffers. The introduction of Task.Run should be pure overhead. Are you saying that just the mere invocation of WriteAsync without awaiting should still delay TwoBuffers, or does awaiting ReadAsync blocks the execution of writeTask, or is the machine incapable of executing I/O operations in parallel, regardless of them being the same kind of I/O or not?

More importantly, can you come up with a solution that actually achieves what TwoBuffers should do - being able to read from network and write to disk in parallel?

Some notes.

I did try adding Task.Run as you suggested and the results didn't change.
The file is 16MB. We cannot argue if 7ms is slow here, we can only find out if it could be faster.
Task.WhenAll only works with Task APIs. I'm using ValueTask APIs which are better than the Task counterpart and I prefer not to change.

Clockwork-Muse Aug 26, 2024
Collaborator

More importantly, can you come up with a solution that actually achieves what TwoBuffers should do - being able to read from network and write to disk in parallel?

The inherent assumption in this question is that you can meaningfully perform such work in parallel - that each call to WriteAsync and ReadAsync both take a significant amount of time.

This is not guaranteed, and it's why I asked for separate benchmarks with just disk writes and network reads. In particular, while network calls are slow, the remote server will be streaming the entire response (ie, each call to ReadAsync doesn't go to the remote server, but only to the network stack), which is also why it returns ValueTask.
In such a situation, the process can become disk-bound, and having two buffers doesn't help because the majority of the time is spent in the disk write, or in the required synchronous (and serial) portion of the calls. Note that ReadAsync is not immediately executed in parallel, but only once the required synchronous portion of WriteAsync has completed, and it hits the first await.

To wit, adding the following:

private byte[] _data;

[GlobalSetup]
public void Setup()
{
    _data = new byte[16 * 1024 * 1024];
    Random.Shared.NextBytes(_data);
}

[Benchmark]
public async Task CopyTo()
{
    await using var readStream = new FileStream("something.mp4", FileMode.Open, FileAccess.Read, FileShare.Read, BufferSize, FileOptions.Asynchronous);
    await using var writeStream = new FileStream($"/tmp/something - {Guid.NewGuid()}.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous | FileOptions.DeleteOnClose);
    await readStream.CopyToAsync(writeStream, BufferSize);
}

[Benchmark]
public async Task WriteOnly()
{
    await using var writeStream = new FileStream($"something - {Guid.NewGuid()}.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous | FileOptions.DeleteOnClose);
    var length = BufferSize;
    for (int start = 0; start < _data.Length; start += length)
    {
        await writeStream.WriteAsync(_data.AsMemory(start, length));
    }
}

Yields the following results:

| Method      | BufferSize | Mean     | Error    | StdDev   |
|------------ |----------- |---------:|---------:|---------:|
| Traditional | 4096       | 20.96 ms | 0.409 ms | 0.382 ms |
| CopyTo      | 4096       | 20.08 ms | 0.167 ms | 0.157 ms |
| WriteOnly   | 4096       | 49.40 ms | 1.479 ms | 4.361 ms |
| TwoBuffers  | 4096       | 20.72 ms | 0.168 ms | 0.157 ms |
| Traditional | 8192       | 19.73 ms | 0.176 ms | 0.156 ms |
| CopyTo      | 8192       | 18.90 ms | 0.277 ms | 0.232 ms |
| WriteOnly   | 8192       | 24.62 ms | 0.491 ms | 1.195 ms |
| TwoBuffers  | 8192       | 17.69 ms | 0.348 ms | 0.387 ms |
| Traditional | 16384      | 16.96 ms | 0.270 ms | 0.239 ms |
| CopyTo      | 16384      | 16.80 ms | 0.250 ms | 0.221 ms |
| WriteOnly   | 16384      | 47.31 ms | 1.176 ms | 3.431 ms |
| TwoBuffers  | 16384      | 15.87 ms | 0.153 ms | 0.128 ms |
| Traditional | 32768      | 16.46 ms | 0.174 ms | 0.162 ms |
| CopyTo      | 32768      | 16.02 ms | 0.192 ms | 0.180 ms |
| WriteOnly   | 32768      | 45.48 ms | 2.031 ms | 5.924 ms |
| TwoBuffers  | 32768      | 15.42 ms | 0.269 ms | 0.252 ms |
| Traditional | 65536      | 15.81 ms | 0.207 ms | 0.173 ms |
| CopyTo      | 65536      | 16.60 ms | 0.142 ms | 0.126 ms |
| WriteOnly   | 65536      | 41.05 ms | 2.289 ms | 6.750 ms |
| TwoBuffers  | 65536      | 19.30 ms | 0.356 ms | 0.333 ms |

... shows that a good chunk of the time is just taken up by the overhead of dealing with Memory (a version using a BufferSize-sized array only takes ns) and the filesystem, and thus that using a second buffer wouldn't provide a significant advantage.

In contrast, a read-only benchmark:

[Benchmark]
public async Task ReadOnly()
{
    await using var readStream = new FileStream("something.mp4", FileMode.Open, FileAccess.Read, FileShare.Read, BufferSize, FileOptions.Asynchronous);
    var buffer = new Memory<byte>(new byte[BufferSize]);
    int bytesRead;
    int sum = 0;
    while ((bytesRead = await readStream.ReadAsync(buffer)) > 0)
    {
        sum += bytesRead;
    }
    GC.KeepAlive(sum);
}

yields

| Method   | BufferSize | Mean     | Error     | StdDev    |
|--------- |----------- |---------:|----------:|----------:|
| ReadOnly | 4096       | 3.426 ms | 0.0150 ms | 0.0117 ms |
| ReadOnly | 8192       | 2.412 ms | 0.0085 ms | 0.0080 ms |
| ReadOnly | 16384      | 1.738 ms | 0.0073 ms | 0.0061 ms |
| ReadOnly | 32768      | 1.467 ms | 0.0087 ms | 0.0082 ms |
| ReadOnly | 65536      | 1.365 ms | 0.0050 ms | 0.0044 ms |

... meaning that most of the time is indeed spent in disk write operations, and that working with parallel buffers wouldn't significantly improve matters.

congthanhnguyenOL Aug 26, 2024
Author

Thanks and also sorry for wasting your time. I did test read/write separately but the result is just too trivial so I didn't report it to you. So far we already know the total time is 7ms, when I did the isolated tests, Read took 2ms, Write took 5ms which was just 7 - 2.

I wondered why Read was so fast compared to write and suspected Read was being cached since I used the same file over and over again. Now I also see your results with repeated that behavior. I switched to reading from network and effectively removing that possibility. My network read only test took 5ms, comparable with write only which also took 5ms. I however don't think 2ms is too trivial compared to 5ms. Cut that corner still saves us 2/7 = 30% duration.

Note that ReadAsync is not immediately executed in parallel, but only once the required synchronous portion of WriteAsync has completed, and it hits the first await.

What you implied was that the CPU time is non-trivial compared to the I/O time in a pure wrapper for OS call to I/O. This information, if true, would end the discussion immediately and obviously but I have some naive faith in me that tells me to believe it's not true. If we follow such reasoning, CPU time must take 100% the time, since I haven't manage to cut even 1ms off of the total of 5ms. Can you confirm CPU time really takes the full duration of WriteAsync?

Your isolated-write test had some issue for sure because only Write cannot take twice as long as the combination of Read/Write. It should take less time if anything. I repeat that test and this is my result.

| Method    | BufferSize | Mean     | Error    | StdDev   | Median   |
|---------- |----------- |---------:|---------:|---------:|---------:|
| CopyTo    | 32768      | 30.11 ms | 1.190 ms | 3.395 ms | 29.01 ms |
| WriteOnly | 32768      | 20.60 ms | 0.562 ms | 1.594 ms | 20.43 ms |
| CopyTo    | 65536      | 19.75 ms | 0.488 ms | 1.409 ms | 19.64 ms |
| WriteOnly | 65536      | 13.08 ms | 0.398 ms | 1.103 ms | 12.84 ms |

I'm using another computer now so the results are not comparable from the previous ones (~5ms then compared to ~10ms now) but as we can see CopyTo obviously and clearly took longer than WriteOnly. To remove the possibility that Read is cached, I performed a File.Copy to a fresh file before each iteration. This is the code.

using BenchmarkDotNet.Attributes;

namespace Benchmark;

public class CopyFile
{
    [Params(32 * 1024, 64 * 1024)]
    public int BufferSize;
    private byte[] _data;

    [GlobalSetup]
    public void GlobalSetup()
    {
        _data = new byte[16 * 1024 * 1024];
        Random.Shared.NextBytes(_data);
    }

    public string ReadFileName;

    [IterationSetup]
    public void Setup()
    {
        var originalFileName = @"C:\Users\congthanh.nguyen\Downloads\something.mp4";
        ReadFileName = $@"C:\Users\congthanh.nguyen\Downloads\something {Guid.NewGuid()}.mp4";
        File.Copy(originalFileName, ReadFileName);
    }

    [Benchmark]
    public async Task CopyTo()
    {
        await using var readStream = new FileStream(ReadFileName, FileMode.Open, FileAccess.Read, FileShare.Read, BufferSize, FileOptions.Asynchronous);
        await using var writeStream = new FileStream($@"C:\Users\congthanh.nguyen\Downloads\something {Guid.NewGuid()}.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous | FileOptions.DeleteOnClose);
        await readStream.CopyToAsync(writeStream, BufferSize);
    }

    [Benchmark]
    public async Task WriteOnly()
    {
        await using var writeStream = new FileStream($@"C:\Users\congthanh.nguyen\Downloads\something {Guid.NewGuid()}.mp4", FileMode.Create, FileAccess.Write, FileShare.None, 0, FileOptions.Asynchronous | FileOptions.DeleteOnClose);
        var length = BufferSize;
        for (int start = 0; start < _data.Length; start += length)
        {
            await writeStream.WriteAsync(_data.AsMemory(start, length));
        }
    }
}

Clockwork-Muse Aug 26, 2024
Collaborator

My network read only test took 5ms, comparable with write only which also took 5ms. I however don't think 2ms is too trivial compared to 5ms. Cut that corner still saves us 2/7 = 30% duration.

No, parallel execution doesn't work that way. There's always overhead for multithreading scenarios, especially for async like this where you're switching contexts. How much the overhead is depends on a few factors. It's part of the reason why the smaller buffer sizes were taking longer (more suspension/context switches).

What you implied was that the CPU time is non-trivial compared to the I/O time in a pure wrapper for OS call to I/O.

Possibly non-trivial, but really the comment was more about the fact that it doesn't start immediately.

Your isolated-write test had some issue for sure because only Write cannot take twice as long as the combination of Read/Write. It should take less time if anything. I repeat that test and this is my result.

I had thought that the slowness was due to the need to create multiple Memory instances (Well, really when Span gets created farther down the stack), but maybe not. I may need to tune my WSL distro/run things more isolated. At least part of that was due to using the raw backing array directly was yielding nanosecond runtimes.

To remove the possibility that Read is cached, I performed a File.Copy to a fresh file before each iteration. This is the code.

Not necessarily going to help. It used to be more a thing with old magnetic hard drives (and before drives got smarter about copies), but (some portion of) the data may not have been fully persisted to disk yet

Anyways, if you really want to get into what may be slow where, you really ought to be using the various performance tools

congthanhnguyenOL Aug 26, 2024
Author

Hey man, if you are able to go from milliseconds to nanoseconds, we should just stop there because no trade-off would give us such a 1 million-time-faster benefit. Actually we should replace the implementation of CopyToAsync with that implementation because currently CopyToAsync use the Memory implementation. It initializes a Memory and a ReadOnlyMemory on each iteration of the reading loop. If you think there's some case that we still better off using the Memory implementation, at least we should be able to swap that implementation with your implementation under certain conditions.

AustinWise · 2024-08-27T00:00:00Z

AustinWise
Aug 27, 2024

Focus on the while loop. Do you see how I first await the readStream.ReadAsync then the writeStream.WriteAsync and thus the total runtime is equal to the combination of reading from the source stream in isolation and writing to the target stream in isolation? I figure out that it is not necessary for these operations to be sequential.

This analysis is overlooking work that the operating system does concurrently with your program. On the receive side, the operating system will receive data over the network and buffer it in the kernel, even if you are not actively calling ReadAsync. On the WriteAsync side, the operating system might complete the write operation without actually writing the bytes to the disk. The operating system may write the bytes later, even if you are not actively calling WriteAsync.

The overhead the C# program is adding is mostly copying data to and from the kernel. Modern memory bandwidth is measured in tens of gigabytes a second, so the C# part of this data copying might not be adding much overhead.

I would personally use CopyToAsync to avoid having to maintain custom copying logic. Only if performance analysis show that the CopyToAsync was the bottle neck would I try something more exotic to avoid the copying overhead.

7 replies

congthanhnguyenOL Aug 27, 2024
Author

Could you show how this test would prove your statement, the one asked in #106917 (reply in thread)?

AustinWise Aug 27, 2024

The point of the test is that if your application is able to saturate the network connection, it is not possible to change your program to go faster. Since you care about the performance of your application as a whole, working on improving a microbenchmark becomes unnecessary.

I performed a test similar to the one I describe: downloading a 500MB file across a 1Gbps network on my Linux computer:

https://gist.github.com/AustinWise/1e635a5c7a07e6052c69e720f46ec5a7

CopyToAsync as able to saturate my 1Gbps connection (as monitored in Activity Monitor). To show that some amount of inefficiency does not effect the throughput, I have a version of the benchmark that adds some extra sleeps between ReadAsync and WriteAsync.

To more directly address my point that the operating system continues to receive data over the network even if you are not actively calling recv, see this Cloudflare article about the receive buffer in Linux. The author performs some experiments that show Linux is able to buffer up to 113KiB of data on the receive side with default settings.

congthanhnguyenOL Aug 27, 2024
Author

That's for read from network. How about write to disk?

AustinWise Aug 28, 2024

How about write to disk?

Wikipedia has an article on the general concept of a page cache.

When you write a file using FileStream, by default the data you write may initially only be writen to the page cache, not the actual disk device. The operating system over time writes the data back to the disk in the background. If you want to guarantee that the data you have written is on the disk, you have to call Flush(flushToDisk: true). You can observe a performance difference when you demand that the operating system immediately write the file completely to disk:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Benchmarks>(args: args);

[MediumRunJob]
public class Benchmarks
{
    // 229MB file
    const string SRC = "/tank/home/austin/input.gz";
    const string DEST = "/tank/home/austin/output.gz";

    [ParamsAllValues]
    public bool FlushToDisk { get; set; }

    [Benchmark]
    public async Task CopyToAsync()
    {
        using var srcStream = File.OpenRead(SRC);
        using var destStream = File.Create(DEST);
        await srcStream.CopyToAsync(destStream);
        await destStream.FlushAsync();
        destStream.Flush(FlushToDisk);
    }
}

BenchmarkDotNet v0.14.0, Ubuntu 22.04.4 LTS (Jammy Jellyfish)
AMD Ryzen Threadripper PRO 3955WX 16-Cores, 1 CPU, 32 logical and 16 physical cores
.NET SDK 8.0.401
  [Host]    : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
  MediumRun : .NET 8.0.8 (8.0.824.36612), X64 RyuJIT AVX2
  DOTNET_TieredCompilation=0
  FileSystem: ZFS

Job=MediumRun  IterationCount=15  LaunchCount=2  
WarmupCount=10

Method	FlushToDisk	Mean	Error	StdDev
CopyToAsync	False	291.6 ms	2.51 ms	3.68 ms
CopyToAsync	True	425.5 ms	28.47 ms	41.73 ms

congthanhnguyenOL Aug 29, 2024
Author

That's awesome, thank you @AustinWise. I will verify it, try some workaround if needed and report back when I find new thing. I will then mark your answer as the answer of this question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileStream doesn't agree to be tricked #106917

{{title}}

Replies: 2 comments 14 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

FileStream doesn't agree to be tricked #106917

congthanhnguyenOL Aug 24, 2024

Replies: 2 comments · 14 replies

Clockwork-Muse Aug 24, 2024 Collaborator

congthanhnguyenOL Aug 25, 2024 Author

Clockwork-Muse Aug 26, 2024 Collaborator

congthanhnguyenOL Aug 26, 2024 Author

Clockwork-Muse Aug 26, 2024 Collaborator

congthanhnguyenOL Aug 26, 2024 Author

AustinWise Aug 27, 2024

congthanhnguyenOL Aug 27, 2024 Author

AustinWise Aug 27, 2024

congthanhnguyenOL Aug 27, 2024 Author

AustinWise Aug 28, 2024

congthanhnguyenOL Aug 29, 2024 Author

congthanhnguyenOL
Aug 24, 2024

Replies: 2 comments 14 replies

Clockwork-Muse
Aug 24, 2024
Collaborator

congthanhnguyenOL Aug 25, 2024
Author

Clockwork-Muse Aug 26, 2024
Collaborator

congthanhnguyenOL Aug 26, 2024
Author

Clockwork-Muse Aug 26, 2024
Collaborator

congthanhnguyenOL Aug 26, 2024
Author

AustinWise
Aug 27, 2024

congthanhnguyenOL Aug 27, 2024
Author

congthanhnguyenOL Aug 27, 2024
Author

congthanhnguyenOL Aug 29, 2024
Author