Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a problem with faiss simdlib use #153

Conversation

alexanderguzhva
Copy link
Collaborator

related issue: #149

The problem is very tricky.
FAISS contain several implementations of simdlib, an utility that wraps SIMD register into a kinda platform-independent entity. There are several implementations, which depend on the platform, such as utils/simdlib_avx2.h or utils/simdlib_neon.h, and I will add utils/simdlib_avx512.h at certain moment :) If no specialized version for a platform is found, then utils/simdlib_emulated.h is going to be used. It is slow, but it works.

I've been seen completely unexplainable SIGSEGV problems which make zero sense. For example,

I1016 17:54:05.701352 35219 index.cc:184] [KNOWHERE][Deserialize][knowhere_tests] Deserialize config dump: {"dim":128,"k":5,"metric_type":"L2","nlist":16,"nprobe":14,"radius":10.0,"range_filter":0.0,"reorder_k":500,"with_raw_data":true}
I1016 17:54:05.702730 35219 index.cc:30] [KNOWHERE][LoadConfig][knowhere_tests] Search config dump: {"dim":128,"k":5,"metric_type":"L2","nlist":16,"nprobe":14,"radius":10.0,"range_filter":0.0,"reorder_k":500,"with_raw_data":true}

Thread 21 "Knowhere_Search" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff8113f640 (LWP 35241)]
0x00007ffff7198dfe in faiss::simd16uint16::unary_func<faiss::simd16uint16::operator>>(int) const::{lambda(unsigned short)#1}>(faiss::simd16uint16 const&, faiss::simd16uint16::operator>>(int) const::{lambda(unsigned short)#1}&&) (a=..., f=...) at /home/nop/projects/zilliz/20230911/upgrade_faiss/knowhere_upgr/knowhere/thirdparty/faiss/faiss/utils/simdlib_emulated.h:146
146	            c.u16[j] = f(a.u16[j]);
(gdb) bt
#0  0x00007ffff7198dfe in faiss::simd16uint16::unary_func<faiss::simd16uint16::operator>>(int) const::{lambda(unsigned short)#1}>(faiss::simd16uint16 const&, faiss::simd16uint16::operator>>(int) const::{lambda(unsigned short)#1}&&) (a=..., f=...)
    at /home/nop/projects/zilliz/20230911/upgrade_faiss/knowhere_upgr/knowhere/thirdparty/faiss/faiss/utils/simdlib_emulated.h:146
#1  0x00007ffff71976bb in faiss::simd16uint16::operator>> (this=0x4, shift=-168430091)
    at /home/nop/projects/zilliz/20230911/upgrade_faiss/knowhere_upgr/knowhere/thirdparty/faiss/faiss/utils/simdlib_emulated.h:176
#2  0x00007ffff73b7b62 in faiss::(anonymous namespace)::kernel_accumulate_block<1, faiss::simd_result_handlers::FixedStorageHandler<1, 2>, faiss::DummyScaler> (
    nsq=64, codes=0x6190000087a0 "\a\017", LUT=0x619000030280 "U\025ւEY\f{l\220\245\"4\030", res=..., scaler=...)
    at /home/nop/projects/zilliz/20230911/upgrade_faiss/knowhere_upgr/knowhere/thirdparty/faiss/faiss/impl/pq4_fast_scan_search_qbs.cpp:58
#3  0x00007ffff75a24cf in faiss::(anonymous namespace)::accumulate_q_4step<1, faiss::simd_result_handlers::ReservoirHandler<faiss::CMax<unsigned short, long>, true>, faiss::DummyScaler> (ntotal2=16, nsq=64, codes=0x619000008780 "\n\r", LUT0=0x619000030280 "U\025ւEY\f{l\220\245\"4\030", res=..., scaler=...)
    at /home/nop/projects/zilliz/20230911/upgrade_faiss/knowhere_upgr/knowhere/thirdparty/faiss/faiss/impl/pq4_fast_scan_search_qbs.cpp:129
#4  0x00007ffff770199e in faiss::pq4_accumulate_loop_qbs<faiss::simd_result_handlers::ReservoirHandler<faiss::CMax<unsigned short, long>, true>, faiss::DummyScaler> (qbs=1, ntotal2=16, nsq=64, codes=0x619000008780 "\n\r", LUT0=0x619000030280 "U\025ւEY\f{l\220\245\"4\030", res=..., scaler=...)
    at /home/nop/projects/zilliz/20230911/upgrade_faiss/knowhere_upgr/knowhere/thirdparty/faiss/faiss/impl/pq4_fast_scan_search_qbs.cpp:240
#5  0x00007ffff6f6a44f in faiss::IndexIVFFastScan::search_implem_12<faiss::CMax<unsigned short, long>, faiss::DummyScaler> (this=0x617000002a80, n=1, 
    x=0x622000006900, k=500, distances=0x61d000091a80, labels=0x621000091100, impl=13, ndis_out=0x7fff801404f0, nlist_out=0x7fff80140510, scaler=..., 
    params=0x7fff801402d0) at /home/nop/projects/zilliz/20230911/upgrade_faiss/knowhere_upgr/knowhere/thirdparty/faiss/faiss/IndexIVFFastScan.cpp:965
#6  0x00007ffff6f6264a in faiss::IndexIVFFastScan::search_dispatch_implem<true, faiss::DummyScaler> (this=0x617000002a80, n=1, x=0x622000006900, k=500, 
    distances=0x61d000091a80, labels=0x621000091100, scaler=..., params=0x7fff801402d0)
    at /home/nop/projects/zilliz/20230911/upgrade_faiss/knowhere_upgr/knowhere/thirdparty/faiss/faiss/IndexIVFFastScan.cpp:399
#7  0x00007ffff6ec8d21 in faiss::IndexIVFFastScan::search (this=0x617000002a80, n=1, x=0x622000006900, k=500, distances=0x61d000091a80, labels=0x621000091100, 
    params_in=0x7fff801402d0) at /home/nop/projects/zilliz/20230911/upgrade_faiss/knowhere_upgr/knowhere/thirdparty/faiss/faiss/IndexIVFFastScan.cpp:324
#8  0x00007ffff6a11b6f in faiss::IndexScaNN::search (this=0x607000029b50, n=1, x=0x622000006900, k=5, distances=0x611000003510, labels=0x6140000032e0, 
    params_in=0x7fff80140320) at /home/nop/projects/zilliz/20230911/upgrade_faiss/knowhere_upgr/knowhere/thirdparty/faiss/faiss/IndexScaNN.cpp:120
#9  0x00007ffff66ba995 in knowhere::IvfIndexNode<faiss::IndexScaNN>::Search(knowhere::DataSet const&, knowhere::Config const&, knowhere::BitsetView const&) const::{lambda()#1}::operator()() const (this=0x60b000003e20) at /home/nop/projects/zilliz/20230911/upgrade_faiss/knowhere_upgr/knowhere/src/index/ivf/ivf.cc:469
#10 0x00007ffff66b9dd9 in knowhere::ThreadPool::push<knowhere::IvfIndexNode<faiss::IndexScaNN>::Search(knowhere::DataSet const&, knowhere::Config const&, knowhere::BitsetView const&) const::{lambda()#1}>(knowhere::IvfIndexNode<faiss::IndexScaNN>::Search(knowhere::DataSet const&, knowhere::Config const&, knowhere::BitsetView const&) const::{lambda()#1}&&)::{lambda(auto:1&&)#1}::operator()<folly::Try<folly::Unit> >(knowhere::IvfIndexNode<faiss::IndexScaNN>::Search(knowhere::DataSet const&, knowhere::Config const&, knowhere::BitsetView const&) const::{lambda()#1}&&) (this=0x60b000003e20)

It was found that the problem is related to the usage of simdlib implementations for different platforms in the same project. At least, the provided SIGSEGV above was seen when simdlib_emulated was used together with simdlib_avx2. This could be somewhat mitigated by moving operators of simdlib classes to friend operators. But the provided fix is much cleaner: I'm bringing back PQFastScan from AVX2 to baseline.

Technically, PQFastScan works correctly without AVX2, it just will be slow and not practical.

/kind improvement

@Presburger
Copy link
Collaborator

I also encountered this SIGSEGV issue while using gcc-13 on Arch Linux.

@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexanderguzhva, Presburger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@chasingegg
Copy link
Collaborator

@Presburger @alexanderguzhva But it is needed to have a fast-version scann... we need a better work-around here.

@chasingegg
Copy link
Collaborator

/hold

@chasingegg
Copy link
Collaborator

we could close this pr since #155 be merged.

@chasingegg chasingegg closed this Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants