Request: Cache whole file #10

christianreiss · 2017-05-12T08:25:48Z

Hey!

Great job! :) In the times of CloudDrives this will be very handy-- and is. Would it be possible to add a switch to make backfs cache entire files (regardless of size) uppon first read?

Some people will have this use case; if not already.
Cheers,
-Chris.

wfraser · 2017-05-24T07:56:17Z

Thanks!

Unfortunately, adding such functionality would be hard to do correctly. Currently, each read blocks the caller while it fetches the block. Because the block size is typically quite small, this delay isn't very noticeable. But if we block the first read on fetching the entire file, which might be very large, then the stall would be very noticeable.

Of course, the proper fix would be to fetch the data asynchronously and complete the first read once the first block has been fetched, but this is complex to implement.

I'm curious what your use case is. When I've needed to ensure entire files are loaded into the cache, I just do something like cat file > /dev/null beforehand. Would that not be sufficient for your use case?

christianreiss · 2017-05-24T08:46:10Z

Hey,

thanks for replying- am just about to head out, so I must be brief.
I am using google Drive and Amazon CloudDrive which I am mounting into my home Fileserver. I then use backfs to accelerate the caches which I am then sharing. So there is the use case of "remote" ISO files that need to be pulled over. This I could work around by simply copying the file to my local PC.

I do have a set top Box that does record TV locally and during the night it pushes all records into the cloud. The STB has the cloud mounted via said Server by employing cifs. This would greatly benefit If a read ahead/ cache whole file would be implemented. A read-ahead would need to be in effect uppon the first access of the file. A read ahread would be prefferable as a cache-whole file would really be bad: You would open/stream a video file, backfs would start caching the whole (10gb) file. after 2-3 Minutes you'll notice you have already seeen this show, quit, open the next and another cache would start...

The current state of backfs would not help at all with this setup:

I would need read-ahead of some sort
Accelleration must occur uppon first access
re-reading the same file will very much never happen
Due to the immense size of the folder It would be awesome if the directory hierarchy would be locally cached, too.

Thanks for reading and your consideration!
-Chris.

wfraser · 2017-05-24T19:58:58Z

As a quick workaround, you could also experiment with increasing the block size. The default is 128 KiB (0x20000 bytes); you could bump this up to several megabytes and effectively get very large readahead, though it may have the problems with stalling periodically as I mentioned, depending on how userspace does its reads.

You can specify -o block_size=$((10*1024*1024)) when mounting backfs to get a 10 MiB block size, for example.

(note that you will have to delete your cache every time you change the block size)

christianreiss · 2017-05-25T15:48:24Z

Hey,

Will try that when I get home-- would this affect first-access as well or 'only' subsequent access to the files?

Cheers,
-Chris

wfraser · 2017-05-25T19:05:29Z

It'll mean the first access of each block will take much longer.

Say you set the block size to 10M. The kernel still issues individual read calls with pretty small buffers -- usually on the order of a few kbytes. Imagine a program reading a file sequentially from the start to the end. It'll go like this:

first read, 4K at offset 0. This causes the first 10M of the file to be fetched, and the call blocks for a while until this is done.
subsequent reads of 4K from offsets 4K to (10M - 4K): these are cache hits and complete basically immediately.
next read of 4K at offset 10M. This causes the next 10M to be fetched, and blocks for a while.
... etc.

Of course this is just the first time you read that file. Afterwards, it'll be in the cache, and all these calls will be cache hits and it'll be read very quickly.

wfraser added the Feature Request label May 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: Cache whole file #10

Request: Cache whole file #10

christianreiss commented May 12, 2017

wfraser commented May 24, 2017 •

edited

Loading

christianreiss commented May 24, 2017

wfraser commented May 24, 2017 •

edited

Loading

christianreiss commented May 25, 2017

wfraser commented May 25, 2017

Request: Cache whole file #10

Request: Cache whole file #10

Comments

christianreiss commented May 12, 2017

wfraser commented May 24, 2017 • edited Loading

christianreiss commented May 24, 2017

wfraser commented May 24, 2017 • edited Loading

christianreiss commented May 25, 2017

wfraser commented May 25, 2017

wfraser commented May 24, 2017 •

edited

Loading

wfraser commented May 24, 2017 •

edited

Loading