-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Knowhere support multi data type #210
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cqy123456 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@cqy123456 🔍 Important: PR Classification Needed! For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:
For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”. Thanks for your efforts and contribution to the community!. |
30e7f15
to
4b272a5
Compare
include/knowhere/dataset.h
Outdated
@@ -60,6 +60,11 @@ class DataSet { | |||
} | |||
} | |||
|
|||
std::shared_ptr<const DataSet> | |||
Get() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unneeded method, just use shared_from_this();
@@ -21,23 +21,52 @@ | |||
namespace knowhere { | |||
class IndexFactory { | |||
public: | |||
template <typename DataType> | |||
Index<IndexNode> | |||
Create(const std::string& name, const int32_t& version, const Object& object = nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the advantage of templatizing methods instead of using
class IndexFactoryBase {
virtual Index<IndexNode> Create(...) = 0;
...
}
template <typename DataType> class IndexFactory : public IndexFactoryBase {
Index<IndexNode> Create(...) override;
...
};
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because index factory is the entry of all indices, it should not be a template class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to Restricting Template Parameter Types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
}; // namespace | ||
|
||
namespace knowhere { | ||
using fp32 = float; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of adding fp32
? Basically, fp16
and bf16
were added, because C++ cannot natively handle them, I understand that. Here, you're literally saying #define fp32 float
and I can't imagine that this particular line will be changed ever.
So, please feel free to keep it, if appropriate, just make sure that you're not overcomplicating the code :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to unify the names of various data types.
@@ -24,7 +24,7 @@ | |||
|
|||
namespace knowhere { | |||
|
|||
class DataSet { | |||
class DataSet : public std::enable_shared_from_this<const DataSet> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a simpler solution for this. Just pass std::shared_ptr<DataSet>
or std::shared_ptr<DataSet>&
only across the code (such as function parameters) and get rid of passing DataSet&
ones
src/index/flat/flat.cc
Outdated
auto cur_i_dis = reinterpret_cast<int32_t*>(cur_dis); | ||
|
||
faiss::SearchParameters search_params; | ||
search_params.sel = id_selector; | ||
|
||
index_->search(1, (const uint8_t*)x + index * dim / 8, k, cur_i_dis, cur_ids, &search_params); | ||
index_->search(1, (const DataType*)x + index * dim / 8, k, cur_i_dis, cur_ids, &search_params); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty confusing piece of code, because it literally implies that DataType
is uint8_t
because of the following dim / 8
, and if DataType
is uint32_t
, then this code will fail.
Same note for this whole file.
Maybe, put an Index that uses faiss::IndexBinaryFlat
into a separate class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Serialize datatype. (Don't forget to take compatibility into consideration)
- Simplify wrapper creation by directly transferring data into fp32. (More comments needed)
927d315
to
1cbb332
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cqy123456: 1 invalid OWNERS file
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
OWNERS
Outdated
@@ -1,32 +1,12 @@ | |||
filters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot parse file: error converting YAML to JSON: yaml: mapping values are not allowed in this context.
1cbb332
to
25f0abe
Compare
bb74446
to
288c2d2
Compare
} | ||
}; | ||
|
||
struct bf16 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while c++23 will have std::float16_t and std::bfloat16_t, leave some macro for the future support of this.
791279e
to
d79c576
Compare
57654a4
to
8435c40
Compare
@@ -18,10 +18,10 @@ | |||
#include "knowhere/dataset.h" | |||
#include "knowhere/expected.h" | |||
#include "knowhere/object.h" | |||
#include "knowhere/operands.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used in all index_node.cc
src/index/hnsw/hnsw.cc
Outdated
@@ -107,7 +113,7 @@ class HnswIndexNode : public IndexNode { | |||
expected<DataSetPtr> | |||
Search(const DataSet& dataset, const Config& cfg, const BitsetView& bitset) const override { | |||
if (!index_) { | |||
LOG_KNOWHERE_WARNING_ << "search on empty index"; | |||
LOG_KNOWHERE_WARNING_ << "search on empty indefloatx"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
src/index/hnsw/hnsw.cc
Outdated
public: | ||
using DistType = typename ResultType<DataType>::type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any other expected type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe support uint8, and DistType is uint32 in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's don't reserve anything for the future for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
88e9606
to
92c0aeb
Compare
src/index/hnsw/hnsw.cc
Outdated
public: | ||
using DistType = typename ResultType<DataType>::type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's don't reserve anything for the future for now.
return Index<IndexNodeThreadPoolWrapper>::Create(std::make_unique<GpuRaftIvfFlatIndexNode>(version, object), | ||
cuda_concurrent_size); | ||
}); | ||
KNOWHERE_REGISTER_GLOBAL( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make a macro for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a macro function for IndexNodeThreadPoolWrapper
tests/ut/test_bruteforce.cc
Outdated
@@ -38,7 +38,7 @@ TEST_CASE("Test Brute Force", "[float vector]") { | |||
}; | |||
|
|||
SECTION("Test Search") { | |||
auto res = knowhere::BruteForce::Search(train_ds, query_ds, conf, nullptr); | |||
auto res = knowhere::BruteForce::Search<float>(train_ds, query_ds, conf, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we stick on fp32
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
tests/ut/test_diskann.cc
Outdated
@@ -52,12 +52,13 @@ constexpr float kL2RangeAp = 0.9; | |||
constexpr float kIpRangeAp = 0.9; | |||
constexpr float kCosineRangeAp = 0.9; | |||
|
|||
template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Let's avoid vague type name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
tests/ut/test_diskann.cc
Outdated
@@ -144,7 +145,9 @@ TEST_CASE("Invalid diskann params test", "[diskann]") { | |||
fs::remove(kDir); | |||
} | |||
|
|||
TEST_CASE("Test DiskANNIndexNode.", "[diskann]") { | |||
template <typename data_type> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Use capital letters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
tests/ut/test_feder.cc
Outdated
@@ -173,11 +173,11 @@ TEST_CASE("Test Feder", "[feder]") { | |||
const auto query_ds = GenDataSet(nq, dim, seed); | |||
|
|||
const knowhere::Json conf = base_gen(); | |||
auto gt = knowhere::BruteForce::Search(train_ds, query_ds, conf, nullptr); | |||
auto gt = knowhere::BruteForce::Search<float>(train_ds, query_ds, conf, nullptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same, can we use fp32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
template<typename IN, typename OUT> | ||
void convert_types_in_file(const std::string& inFileName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used in diskann,remove it first and add it next time.
dfb2ecb
to
e1ca8c1
Compare
040392b
to
88a91e2
Compare
Signed-off-by: cqy123456 <[email protected]> Add (b)float16 interface for pyknowhere Signed-off-by: Writer-X <[email protected]>
88a91e2
to
0dab512
Compare
/lgtm |
Signed-off-by: Yudong Cai <[email protected]>
issue: #287
/kind feature