Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Knowhere support multi data type #210

Merged
merged 1 commit into from
Jan 9, 2024

Conversation

cqy123456
Copy link
Collaborator

@cqy123456 cqy123456 commented Nov 23, 2023

issue: #287
/kind feature

@sre-ci-robot sre-ci-robot requested a review from liliu-z November 23, 2023 03:55
@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cqy123456

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

mergify bot commented Nov 23, 2023

@cqy123456 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

@cqy123456 cqy123456 force-pushed the template_data branch 3 times, most recently from 30e7f15 to 4b272a5 Compare November 23, 2023 07:56
@@ -60,6 +60,11 @@ class DataSet {
}
}

std::shared_ptr<const DataSet>
Get() const {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unneeded method, just use shared_from_this();

@@ -21,23 +21,52 @@
namespace knowhere {
class IndexFactory {
public:
template <typename DataType>
Index<IndexNode>
Create(const std::string& name, const int32_t& version, const Object& object = nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the advantage of templatizing methods instead of using

class IndexFactoryBase { 
   virtual Index<IndexNode> Create(...) = 0; 
   ...
}

template <typename DataType> class IndexFactory : public IndexFactoryBase {
    Index<IndexNode> Create(...) override;
    ...
};

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because index factory is the entry of all indices, it should not be a template class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to Restricting Template Parameter Types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

}; // namespace

namespace knowhere {
using fp32 = float;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of adding fp32? Basically, fp16 and bf16 were added, because C++ cannot natively handle them, I understand that. Here, you're literally saying #define fp32 float and I can't imagine that this particular line will be changed ever.

So, please feel free to keep it, if appropriate, just make sure that you're not overcomplicating the code :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to unify the names of various data types.

@@ -24,7 +24,7 @@

namespace knowhere {

class DataSet {
class DataSet : public std::enable_shared_from_this<const DataSet> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a simpler solution for this. Just pass std::shared_ptr<DataSet> or std::shared_ptr<DataSet>& only across the code (such as function parameters) and get rid of passing DataSet& ones

auto cur_i_dis = reinterpret_cast<int32_t*>(cur_dis);

faiss::SearchParameters search_params;
search_params.sel = id_selector;

index_->search(1, (const uint8_t*)x + index * dim / 8, k, cur_i_dis, cur_ids, &search_params);
index_->search(1, (const DataType*)x + index * dim / 8, k, cur_i_dis, cur_ids, &search_params);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty confusing piece of code, because it literally implies that DataType is uint8_t because of the following dim / 8, and if DataType is uint32_t, then this code will fail.
Same note for this whole file.
Maybe, put an Index that uses faiss::IndexBinaryFlat into a separate class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Copy link
Collaborator

@liliu-z liliu-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Serialize datatype. (Don't forget to take compatibility into consideration)
  2. Simplify wrapper creation by directly transferring data into fp32. (More comments needed)

Copy link
Collaborator

@sre-ci-robot sre-ci-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cqy123456: 1 invalid OWNERS file

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

OWNERS Outdated
@@ -1,32 +1,12 @@
filters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot parse file: error converting YAML to JSON: yaml: mapping values are not allowed in this context.

}
};

struct bf16 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while c++23 will have std::float16_t and std::bfloat16_t, leave some macro for the future support of this.

@cqy123456 cqy123456 force-pushed the template_data branch 2 times, most recently from 791279e to d79c576 Compare December 20, 2023 06:29
@cqy123456 cqy123456 force-pushed the template_data branch 2 times, most recently from 57654a4 to 8435c40 Compare January 4, 2024 07:24
@@ -18,10 +18,10 @@
#include "knowhere/dataset.h"
#include "knowhere/expected.h"
#include "knowhere/object.h"
#include "knowhere/operands.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used in all index_node.cc

@@ -107,7 +113,7 @@ class HnswIndexNode : public IndexNode {
expected<DataSetPtr>
Search(const DataSet& dataset, const Config& cfg, const BitsetView& bitset) const override {
if (!index_) {
LOG_KNOWHERE_WARNING_ << "search on empty index";
LOG_KNOWHERE_WARNING_ << "search on empty indefloatx";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

public:
using DistType = typename ResultType<DataType>::type;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other expected type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe support uint8, and DistType is uint32 in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's don't reserve anything for the future for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@cqy123456 cqy123456 force-pushed the template_data branch 2 times, most recently from 88e9606 to 92c0aeb Compare January 4, 2024 09:14
public:
using DistType = typename ResultType<DataType>::type;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's don't reserve anything for the future for now.

return Index<IndexNodeThreadPoolWrapper>::Create(std::make_unique<GpuRaftIvfFlatIndexNode>(version, object),
cuda_concurrent_size);
});
KNOWHERE_REGISTER_GLOBAL(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make a macro for this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a macro function for IndexNodeThreadPoolWrapper

@@ -38,7 +38,7 @@ TEST_CASE("Test Brute Force", "[float vector]") {
};

SECTION("Test Search") {
auto res = knowhere::BruteForce::Search(train_ds, query_ds, conf, nullptr);
auto res = knowhere::BruteForce::Search<float>(train_ds, query_ds, conf, nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we stick on fp32 here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -52,12 +52,13 @@ constexpr float kL2RangeAp = 0.9;
constexpr float kIpRangeAp = 0.9;
constexpr float kCosineRangeAp = 0.9;

template <typename T>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's avoid vague type name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@@ -144,7 +145,9 @@ TEST_CASE("Invalid diskann params test", "[diskann]") {
fs::remove(kDir);
}

TEST_CASE("Test DiskANNIndexNode.", "[diskann]") {
template <typename data_type>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use capital letters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -173,11 +173,11 @@ TEST_CASE("Test Feder", "[feder]") {
const auto query_ds = GenDataSet(nq, dim, seed);

const knowhere::Json conf = base_gen();
auto gt = knowhere::BruteForce::Search(train_ds, query_ds, conf, nullptr);
auto gt = knowhere::BruteForce::Search<float>(train_ds, query_ds, conf, nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: same, can we use fp32

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update

Comment on lines 840 to 841
template<typename IN, typename OUT>
void convert_types_in_file(const std::string& inFileName,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the usage?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used in diskann,remove it first and add it next time.

@cqy123456 cqy123456 force-pushed the template_data branch 2 times, most recently from dfb2ecb to e1ca8c1 Compare January 5, 2024 03:51
@cqy123456 cqy123456 force-pushed the template_data branch 4 times, most recently from 040392b to 88a91e2 Compare January 9, 2024 09:42
Signed-off-by: cqy123456 <[email protected]>

Add (b)float16 interface for pyknowhere

Signed-off-by: Writer-X <[email protected]>
@liliu-z
Copy link
Collaborator

liliu-z commented Jan 9, 2024

/lgtm

@mergify mergify bot added the ci-passed label Jan 9, 2024
@sre-ci-robot sre-ci-robot merged commit 4e8244e into zilliztech:main Jan 9, 2024
9 checks passed
cydrain added a commit to cydrain/knowhere that referenced this pull request Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants