Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileStream not suitable for FileIO on POSIX systems #1846

Open
BCSharp opened this issue Dec 20, 2024 · 0 comments
Open

FileStream not suitable for FileIO on POSIX systems #1846

BCSharp opened this issue Dec 20, 2024 · 0 comments

Comments

@BCSharp
Copy link
Member

BCSharp commented Dec 20, 2024

_io.FileIO is implemented by utilizing mostly FileStream to access files in the OS file system. Unfortunately, this class does not work well when there are multiple simultaneous writers. This is possibly the Win32 legacy, where simultaneous writes to a file may cause an exception during write through another handle, according to documentation. I have not observed exceptions, but I have noticed that simultaneous writes overwrite each other. This is not POSIX behaviour, which safely allows multiple writes through the same descriptor, duplicate descriptor, or another opened descriptor to the same file, if appropriate file mode flags are used (e.g. O_APPEND).

Consider the following example:

// Test code that accesses one file opened in Append mode simultaneously on two threads
string filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.UserProfile), "testfile.txt");

if (File.Exists(filePath)) {
    File.Delete(filePath);
}
// Number of writes
const int ndata = 100200300;

Task task1 = Task.Run(() => WriteToFile(filePath, Encoding.ASCII.GetBytes("xxxxxxxxx\n")));
Task task2 = Task.Run(() => WriteToFile(filePath, Encoding.ASCII.GetBytes("zzzzzzzzz\n")));

Task.WaitAll(task1, task2);

void WriteToFile(string name, byte[] data) {
	using (var fs = new FileStream(name, FileMode.Append, FileAccess.Write, FileShare.Write)) {
		for (int i = 0; i < ndata; i++) {
			fs.Write(data, 0, data.Length);
		}
	}
}

This snippet uses two tasks to perform 100200300 writes, each write 10 bytes long, so each task produces 1002003000 bytes. Two such tasks should produce a file twice that size, that is, 2004006000 bytes. However, the file created is only 1002003000 bytes long (sometimes a bit more), containing a mixture of x's and z's, clearly a sign of the tasks overwriting the data from each other.

For comparison, here is the equivalent example in Python:

import os
import threading

file_path = os.path.join(os.path.expanduser("~"), "testfile.txt")
if os.path.exists(file_path):
    os.remove(file_path)

# Number of writes
ndata = 100200300

def write_to_file(file_path, data):
    with open(file_path, 'ab') as f:
        for _ in range(ndata):
            f.write(data)

thread1 = threading.Thread(target=write_to_file, args=(file_path, b"xxxxxxxxx\n"))
thread2 = threading.Thread(target=write_to_file, args=(file_path, b"zzzzzzzzz\n"))

thread1.start()
thread2.start()

thread1.join()
thread2.join()

This code, when run with CPython on Linux or macOS (not Windows), correctly produces a file that is 2004006000 bytes long. IronPython, obviously, does not.

I am considering the following possible solutions:

  1. In place of System.IO.FileStream, use Mono.Unix.UnixStream (which operates directly on the file descriptor) for all file access in IronPython when run on POSIX OSes. However:
    1. UnixStream is unbuffered, which changes the runtime profile of IronPython. This may actually be not a bad thing since at this level FileIO is supposed to provide a "raw" (unbuffered) access to the file. Nevertheless, it's a change, and let's hope that the buffered wrappers above it do a good job in buffering.
    2. The OS errors inside UnixStream are translated to native CLR exceptions, as much as possible. This is not desirable for IronPython which, to match CPython, should produce OSError with an appropriate errno code.
    3. UnixStream does not support efficient ReadOnlySpan<byte> interfaces of .NET.
      All three concerns can be addressed in various ways (proxy class, exception unpacking etc.)
  2. Write own dedicated stream class that makes low level OS calls to perform IO operations (e.g. using Mono.Unix.Native). Such a class can be easily integrated into the rest of the IronPython runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant