You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
_io.FileIO is implemented by utilizing mostly FileStream to access files in the OS file system. Unfortunately, this class does not work well when there are multiple simultaneous writers. This is possibly the Win32 legacy, where simultaneous writes to a file may cause an exception during write through another handle, according to documentation. I have not observed exceptions, but I have noticed that simultaneous writes overwrite each other. This is not POSIX behaviour, which safely allows multiple writes through the same descriptor, duplicate descriptor, or another opened descriptor to the same file, if appropriate file mode flags are used (e.g. O_APPEND).
Consider the following example:
// Test code that accesses one file opened in Append mode simultaneously on two threadsstringfilePath=Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.UserProfile),"testfile.txt");if(File.Exists(filePath)){File.Delete(filePath);}// Number of writesconstintndata=100200300;Tasktask1=Task.Run(()=>WriteToFile(filePath,Encoding.ASCII.GetBytes("xxxxxxxxx\n")));Tasktask2=Task.Run(()=>WriteToFile(filePath,Encoding.ASCII.GetBytes("zzzzzzzzz\n")));Task.WaitAll(task1,task2);voidWriteToFile(stringname,byte[]data){using(varfs=newFileStream(name,FileMode.Append,FileAccess.Write,FileShare.Write)){for(inti=0;i<ndata;i++){fs.Write(data,0,data.Length);}}}
This snippet uses two tasks to perform 100200300 writes, each write 10 bytes long, so each task produces 1002003000 bytes. Two such tasks should produce a file twice that size, that is, 2004006000 bytes. However, the file created is only 1002003000 bytes long (sometimes a bit more), containing a mixture of x's and z's, clearly a sign of the tasks overwriting the data from each other.
For comparison, here is the equivalent example in Python:
This code, when run with CPython on Linux or macOS (not Windows), correctly produces a file that is 2004006000 bytes long. IronPython, obviously, does not.
I am considering the following possible solutions:
In place of System.IO.FileStream, use Mono.Unix.UnixStream (which operates directly on the file descriptor) for all file access in IronPython when run on POSIX OSes. However:
UnixStream is unbuffered, which changes the runtime profile of IronPython. This may actually be not a bad thing since at this level FileIO is supposed to provide a "raw" (unbuffered) access to the file. Nevertheless, it's a change, and let's hope that the buffered wrappers above it do a good job in buffering.
The OS errors inside UnixStream are translated to native CLR exceptions, as much as possible. This is not desirable for IronPython which, to match CPython, should produce OSError with an appropriate errno code.
UnixStream does not support efficient ReadOnlySpan<byte> interfaces of .NET.
All three concerns can be addressed in various ways (proxy class, exception unpacking etc.)
Write own dedicated stream class that makes low level OS calls to perform IO operations (e.g. using Mono.Unix.Native). Such a class can be easily integrated into the rest of the IronPython runtime.
The text was updated successfully, but these errors were encountered:
_io.FileIO
is implemented by utilizing mostlyFileStream
to access files in the OS file system. Unfortunately, this class does not work well when there are multiple simultaneous writers. This is possibly the Win32 legacy, where simultaneous writes to a file may cause an exception during write through another handle, according to documentation. I have not observed exceptions, but I have noticed that simultaneous writes overwrite each other. This is not POSIX behaviour, which safely allows multiple writes through the same descriptor, duplicate descriptor, or another opened descriptor to the same file, if appropriate file mode flags are used (e.g.O_APPEND
).Consider the following example:
This snippet uses two tasks to perform 100200300 writes, each write 10 bytes long, so each task produces 1002003000 bytes. Two such tasks should produce a file twice that size, that is, 2004006000 bytes. However, the file created is only 1002003000 bytes long (sometimes a bit more), containing a mixture of
x
's andz
's, clearly a sign of the tasks overwriting the data from each other.For comparison, here is the equivalent example in Python:
This code, when run with CPython on Linux or macOS (not Windows), correctly produces a file that is 2004006000 bytes long. IronPython, obviously, does not.
I am considering the following possible solutions:
System.IO.FileStream
, useMono.Unix.UnixStream
(which operates directly on the file descriptor) for all file access in IronPython when run on POSIX OSes. However:UnixStream
is unbuffered, which changes the runtime profile of IronPython. This may actually be not a bad thing since at this levelFileIO
is supposed to provide a "raw" (unbuffered) access to the file. Nevertheless, it's a change, and let's hope that the buffered wrappers above it do a good job in buffering.UnixStream
are translated to native CLR exceptions, as much as possible. This is not desirable for IronPython which, to match CPython, should produceOSError
with an appropriate errno code.UnixStream
does not support efficientReadOnlySpan<byte>
interfaces of .NET.All three concerns can be addressed in various ways (proxy class, exception unpacking etc.)
Mono.Unix.Native
). Such a class can be easily integrated into the rest of the IronPython runtime.The text was updated successfully, but these errors were encountered: