Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does the API have file handles and Paths in several calls? #22

Open
pedrocr opened this issue Jan 13, 2018 · 9 comments
Open

Why does the API have file handles and Paths in several calls? #22

pedrocr opened this issue Jan 13, 2018 · 9 comments
Labels

Comments

@pedrocr
Copy link

pedrocr commented Jan 13, 2018

I've been trialing a fuse filesystem implementation to see if it's workable to have a local filesystem that overflows into a NAS so you can have your multi-TB photo/video collection always available on your laptop. So far I've just implemented a basic naive in-RAM filesystem to try out the API:

https://github.com/pedrocr/syncer/blob/1110097ecac53dca8133b0fd8d883c1ea597dba4/src/main.rs

This seems to work fine but I have just ignored the file handles in the API completely and gone with the Paths at all time. Is this way of working completely broken? Having file handles is problematic for my design as I'm planning on having the backend be content addressable instead of having some kind of sequential inode number.

@wfraser
Copy link
Owner

wfraser commented Jan 13, 2018

File handles are used when working on an opened file. When FUSE calls open, you can give it some arbitrary u64 as a handle, and then when FUSE calls read, write, etc., it passes along that same handle you gave it.

Looking at your code, you don't seem to be implementing open at all, so the file handle never gets set, so you can ignore it.

(To be honest, I'm surprised that it works with open unimplemented. FUSE must have some special logic to ignore the ENOTIMPL that comes from the default implementation of that function.)

Be aware, however, that doing things this way can behave strangely if you also implement unlink (which you have), because in a proper POSIX filesystem, once you have a file open, even if you unlink that file, it should still be accessible through the open file handle. But in your filesystem, unlinking an open file causes opened handles to that file to no longer work. Other programs may behave badly in this case because they're not expecting that.

For example, the following C code will do the wrong thing:

int fd = open("file.txt", O_RDWR);
unlink("file.txt");
ssize_t result = write(fd, "hello", 5);
printf("result = %d, errno = %d\n", result, errno);

On a normal filesystem, it should print result = 5, errno = 0 meaning that it wrote 5 bytes even though the file was unlinked.

On yours, it will return result = -1, errno = 2, where 2 is ENOENT.

@pedrocr
Copy link
Author

pedrocr commented Jan 13, 2018

Thanks for the extremely prompt reply :)

I understand why unix has the file handles, my question was more about the API having several cases where a handle and a path are given. Sometimes the handle is optional. It seems the API is half way to abstracting away the handles but not quite there. For example chmod() takes a &Path and an Option<u64> while in read() the file handle is no longer optional but the path is still provided. How did you envision this being used? If read needs to always take the handle why is a path provided? If chmod always receives a Path why is the handle optional?

@wfraser
Copy link
Owner

wfraser commented Jan 14, 2018

The file handles are optional for some calls because kernel may call them with or without having previously opened the file. Using chmod as an example again: there exist two Linux syscalls -- chmod which takes a path, and fchmod which takes an open file descriptor. FUSE turns them both into one call which gets an inode number and maybe a file handle. Fuse-MT then turns the inode number back into a path, using a map it maintains.

@pedrocr
Copy link
Author

pedrocr commented Jan 14, 2018

The chmod()/fchmod() case I had seen in the man page and assumed something of the sort was happening. I'm more puzzled how the Fuse-MT write() is supposed to be used. Normal UNIX write() is always on a file descriptor as far as I can tell, so why does the Fuse-MT API even give me a path? Should I read it as "this is the path the original file descriptor was obtained for but it may have been invalidated the moment it was used so don't trust it ever again for anything"? Why even have it in the API? For read-only filesystems to be easier to write?

@wfraser
Copy link
Owner

wfraser commented Jan 16, 2018

For read-only filesystems to be easier to write?

Pretty much, yeah. FUSE gives us the file handle and the inode number, and Fuse-MT maintains an inode <-> path mapping, so it's easy and efficient to figure out which path is being referred to, so might as well provide it to the filesystem. It lets you write stateless read-only filesystems that don't depend on file handles, makes debugging / logging easier, etc.

But yeah, if you allow unlinking and/or hard linking, take that path with a grain of salt. In the case of hard links, the path given may not be the same as the path used to open the file, and in the case of unlinks, the path may not exist any more. Prefer using the file handle.

@pedrocr
Copy link
Author

pedrocr commented Jan 16, 2018

I've switched over to a fully inode based structure that I can then associate handles to:

https://github.com/pedrocr/syncer/blob/6b1a36f811942af35550532b595e5a348dabb922/src/main.rs

This seems to work fine although I need to run some kind of test suite on it to check for corner cases. Rust's more modern features over C/C++ really help here. I created a set of with_path()/with_handle()/with_path_optional_handle() methods that abstract away all that stuff and allow the filesystem method implementations to be really simple.

For future reference it would probably help to add something to the FilesystemMT documentation about paths/handles. Something like:

  • If you're implementing a filesystem that does not allow removing files/directories or hard linking files them you can ignore file and directory handles and just use the Paths for everything. fuse_mt keeps an internal mapping that solves everything for you.
  • If you've implemented any of those features you should return handles from open()/opendir() and use them exclusively (ignoring the path) in read()/write()/readdir(). In all the other calls that take an optional file handle you should use it if it is passed (and again ignore the path) and only if that's not the case use the path instead.

@droundy
Copy link

droundy commented Jan 16, 2018 via email

@pedrocr
Copy link
Author

pedrocr commented Jan 16, 2018

@droundy that's an interesting solution. Maybe a SimpleFilesystemMT trait that just removes all the handles as well as open()/opendir()/rmdir()/unlink()/link()/release()/releasedir() and FileSystemMT drops path from read()/write()/flush()/release()/fsync()/readdir()/releasedir()/fsyncdir()?

@asomers
Copy link

asomers commented Jan 9, 2019

The problem is worse than @pedrocr and @droundy realize. Some FUSE filesystems actually need inode numbers, because that's how they organize themselves. While I like the other things you've done with fuse-mt, converting inode numbers into paths makes it useless for such filesystems. Would you consider making the conversion optional? Perhaps methods could have signatures like this:

struct File {
...
}
impl File {
    fn path(&self) -> Option(&Path) {...}
    fn ino(&self) -> u64 {...}
    fn fh(&self) -> Option<u64> {...}
}
fn operation(
        &self, 
        req: RequestInfo, 
        file: File,
        args: SomethingElse
    ) -> ResultData

Even better would be if the FilesystemMT::init method allowed the filesystem to return a flag indicating whether or not it wants pathname translation. Filesystems that don't care can disable it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants