-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why we tried ib_read_bw and ib_write_bw testings without FFO installed but succeeded? And why we installed libibverbs but can't find drivers? #10
Comments
Did you install Mellanox OFED driver outside the container, and mount the user space driver path into the container, like -v /sys/class/:/sys/class/ ? You can find this in the README.md command line. Do you have /sys/class/infiniband_verbs/uverbs0 ? |
We have found out why it fails to find devices. Because the abi_version of Mellanox NICs we used is 1, not within 3 to 4, so it needs to match libmlx5, not libmlx4. However, we must use high version Mellanox NICs. Anyway, thank you for your attentions. |
@ling0329 I think in this implementation, they hardcoded the one-sided mapping information in code.
Also, it looks like you are also trying to run Freeflow with newer NICs. Do you get it to work successfully? And can you share what version of Ubuntu and OFED you are running? (both container and host OS) |
I see. Thanks for sharing your setup. |
The current architecture of Freeflow works only with libmlx4. It's possible to use the LD_PRELOAD trick to re-implement a cross-driver-version solution by intercepting relevant calls. However, it requires quite a bit efforts, and all the authors of this project are now busy with something else... |
In Section 4.3 where one-sided operations are discussed, we see there are two problems to support one-sided operations, and the first is the local FFR does not know the corresponding s-mem on the other side. To solve this problem, FreeFlow builds a central key-value store in FFO for all FFRs to learn the mapping between mem’s pointer in application’s virtual memory space and the corresponding s-mem’s pointer in FFR’s virtual memory space. However, our testings of ib_read_bw and ib_write_bw all succeeded without FFO installed, though we don't know how to install FFO.
It should be noted that all of our ib_send/read/write_bw testings are based on rdma_cm mode, because if we install libibverbs, we will encounter a warning of 'no userspace device-specific driver found'.
So we only install libmlx4 and librdmacm, and all testings are based on standard libibvers of rdma. Then if we test based on non rdma_cm mode, it will not go through router.
Did you met this problem before? We tried to solve this problem, and found that the function try_driver in init.c fails to find dirvers when executing
Then we think it is caused by driver initialization, and locate to function mlx4_driver_init defined in mlx4.c in libmlx4. We also found in file mlx4.c, you cut many lines, that make us confused. The problem we finally located to is in the following code, it doesn't 'goto found', so 'return NULL' early.
But why? Why rdma_cm mode doesn't met this problem? But with libibvers installed, both modes are influenced?
Wish your answer!
The text was updated successfully, but these errors were encountered: