Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle failing mmap on FileReader. #867

Merged
merged 5 commits into from
Apr 3, 2024
Merged

Conversation

mgautierfr
Copy link
Collaborator

If we try to mmap a region to high (>4GB), makeMmappedBuffer will throw a MMapException. In this case, we want to do a read in the file instead of mmap.

This case is handled in MultiPartFileReader but was missing in FileReader.

Fix #866

@mgautierfr
Copy link
Collaborator Author

@BPerlakiH can you test this on your side ?

Copy link

codecov bot commented Mar 22, 2024

Codecov Report

Attention: Patch coverage is 76.92308% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 58.07%. Comparing base (48da5b8) to head (916afdf).

Files Patch % Lines
src/file_reader.cpp 71.42% 1 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #867      +/-   ##
==========================================
+ Coverage   58.00%   58.07%   +0.07%     
==========================================
  Files         101      101              
  Lines        4622     4623       +1     
  Branches     1923     1922       -1     
==========================================
+ Hits         2681     2685       +4     
+ Misses        667      665       -2     
+ Partials     1274     1273       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BPerlakiH
Copy link

@mgautierfr I have tested with this branch by replacing it in kiwix-build:

diff --git a/kiwixbuild/dependencies/libzim.py b/kiwixbuild/dependencies/libzim.py
index a57e69e..88dfbd6 100644
--- a/kiwixbuild/dependencies/libzim.py
+++ b/kiwixbuild/dependencies/libzim.py
@@ -9,7 +9,7 @@ class Libzim(Dependency):
     class Source(GitClone):
         git_remote = "https://github.com/openzim/libzim.git"
         git_dir = "libzim"
-        git_ref = "trace_mmap_macos"
+        git_ref = "fix_macos_mmap"
 
     class Builder(MesonBuilder):
         test_options = ["-t", "8"]

and it is still crashing at the same point with:

mmap with flags:2 offest:0 size:80
munmap 4780654592 size:80
pageAlignedOffset (8416296960) is too big

@mgautierfr
Copy link
Collaborator Author

This branch doesn't have traces. So if you have this output you probably not testing the right thing.

@kelson42
Copy link
Contributor

@mgautierfr Are we not impacted by kiwix/kiwix-build#32? Are we sure that kiwix-build has the HEAD of the proper branch?

@mgautierfr
Copy link
Collaborator Author

@BPerlakiH you have to checkout yourself the right branch (but as you already done it for trace_mmap_macos, I suspect you are good with this)

Are we not impacted by kiwix/kiwix-build#32?

No. A checkout to the right branch and a build (of libzim AND libkiwix (as libkiwix is statically linked to libzim)) and you should be ok (kiwix-build libkiwix --config ... already compile all dependencies of libkiwix by default)

Are we sure that kiwix-build has the HEAD of the proper branch?

If there is no modification of the code (clean working directory) and kiwix-build is configured to get the branch you are on (see kiwix/kiwix-build@2359d09 for example), kiwix-build update the source code automatically

Of course, all this is true if you don't ask to not do it (--skip-source-prepare, --build-nodeps)

@mgautierfr
Copy link
Collaborator Author

@veloman-yunkan please review without waiting confirmation for @BPerlakiH, I'm pretty confident it is a real bug and we should merge it, whatever it fixes the linked issue or not.

Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing this PR, I seem to have found another bug. I believe its fix should be included in this PR (and the title and description of the PR properly updated).

Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what do you think of introducing a base class for FileReader and MultiPartFileReader that will provide a partial implementation of the get_buffer() method in terms of a new virtual function get_mmapped_buffer()? Such a class can also be a common place for the data members _offset and _size and their accessor methods.

@BPerlakiH
Copy link

@BPerlakiH you have to checkout yourself the right branch (but as you already done it for trace_mmap_macos, I suspect you are good with this)

@mgautierfr I am not sure about this part. I did as described earlier, changing the git_ref in the file: kiwixbuild/dependencies/libzim.py.
I did git clean -dfx before to delete the old build files. I did clean derived data on Xcode, but I am still getting the same results, it is still tracing and still crashing. I tried it 3 or 4 times with different combinations of clearing things first...

Maybe we need a more step by step instructions how to build this the right way, or if you could send me a complete xcframework built with your changes, that would be even easier for me to test.

@rgaudin
Copy link
Member

rgaudin commented Mar 29, 2024

When @mgautierfr mentions a clean state, I believe you need to remove SOURCE/libzim and SOURCE/libkiwix as well. you can check the git branch in those folders post-build to verify which branch was actually used

@mgautierfr
Copy link
Collaborator Author

I have launch a new build for provide you a xcframework.

I am not sure about this part. I did as described earlier, changing the git_ref in the file: kiwixbuild/dependencies/libzim.py.

You should not. The git_ref is used:

  • When no source is set, kiwix-build will checkout the git_ref.
  • When source is set, it will update the source ONLY if our git directory is clean and you haven't not change the branch (git_ref == current git branch). (The idea is that it update everything you haven't touch and don't remove your changes).

If you change the git_ref after a first run, git_ref != current git branch and update is not made (You should have a warning in the output).

You can directly goes into SOURCE/libzim and run git remote update && git pull fix_macos_mmap. If you don't have change git_ref, git_ref != fix_macos_mmap, kiwix-build will not change the source, so you will compile fix_macos_mmap.

Or you can remove SOURCE/libzim, (and update git_ref) so kiwix-build will download again the source and switch to git_ref before compiling.

@BPerlakiH
Copy link

Thank you @mgautierfr ,
I can confirm it is fixing the issue!
See below, "Canadian Prepper" now opens correctly:
Screenshot 2024-03-29 at 10 59 37 Medium

@mgautierfr
Copy link
Collaborator Author

And what do you think of introducing a base class for FileReader and MultiPartFileReader that will provide a partial implementation of the get_buffer() method in terms of a new virtual function get_mmapped_buffer()? Such a class can also be a common place for the data members _offset and _size and their accessor methods.

I think well about this. I will do it.

@mgautierfr
Copy link
Collaborator Author

I will do it.

Done. Ready for another review.

Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only one slight concern. If it is unjustified paranoia then this is an approval.

If we try to mmap a region to high (>4GB), `makeMmappedBuffer` will
throw a `MMapException`. In this case, we want to do a read in the file
instead of mmap.

This case is handled in `MultiPartFileReader` but was missing in
`FileReader`.

Fix #866
This commit move only the _size and _offset (and accessors) in
BaseFileReader.
Copy link
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Rebase&fixup is due.

BaseFileReader is now responsible for trying to get a mmap and then
fallback to reading in file.
@mgautierfr
Copy link
Collaborator Author

LGTM. Rebase&fixup is due.

Done

@mgautierfr mgautierfr merged commit d7a8cac into main Apr 3, 2024
29 of 30 checks passed
@mgautierfr mgautierfr deleted the fix_macos_mmap branch April 3, 2024 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LibZim 9.1 under macOS throws exception that cannot be caught
5 participants