Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGram index creation/usage example in the documentation #8457

Closed
MiklosPathy opened this issue Feb 11, 2025 · 6 comments
Closed

NGram index creation/usage example in the documentation #8457

MiklosPathy opened this issue Feb 11, 2025 · 6 comments
Labels
8.x Relates to 8.x client version Category: Question

Comments

@MiklosPathy
Copy link

Is your feature request related to a problem? Please describe.
Actually, yes. I would like to create such an index on a filed, but my knowledge is limited to do it with the .NET API.

Describe the solution you'd like
A section in the documentation with this particular example, maybe more advanced examples.

Describe alternatives you've considered
Tried to search for something like this. I found an issue similar to this on Stack Overflow, but the guy gave up, and created his index with web api, and direct json. ChatGPT hallucinates something based on the old 7.x API, but it is unusable with 8.x. The closest i was, to create an analyser, but i was not able to bind it to a field.

Additional context
Nothing special, just a code snippet. For example: here is a type: Document, it has a field DocText, this is how you create the index, with NGram analyser, this is how you use it for searching.

@flobernd
Copy link
Member

Hey @MiklosPathy, could you please post the JSON request for creating the desired index?

@flobernd flobernd added 8.x Relates to 8.x client version Category: Question and removed Category: Feature labels Feb 11, 2025
@MiklosPathy
Copy link
Author

Hey @MiklosPathy, could you please post the JSON request for creating the desired index?

Unfortunately, I am not on that level in ElasticSearch. Any example would do, I (and probably many other) will figure out the rest.

@flobernd
Copy link
Member

@MiklosPathy

here is a type: Document, it has a field DocText

Create an index with field mappings

this is how you use it for searching

Execute search queries

this is how you create the index, with NGram analyser

Could you at least link to a corresponding example in the Elasticsearch REST API or paste the equivalent 7.x NEST code?

It's hard to give you an example without knowing what exactly you want to achieve.

Do you want the NGram analyzer to be used on index level? Or do you want that to apply to a single field only, etc.?

@MiklosPathy
Copy link
Author

@flobernd
Ok, Try this way.

Could you at least link to a corresponding example in the Elasticsearch REST API or paste the equivalent 7.x NEST code?

No, I am totally newbie to ElasticSearch, so I have no idea how 7.x worked, or the REST API works, or ElasticSearch works in general (ok, in the latter I am starting to have some understanding). All I can provide what I "achieved" so far.

It's hard to give you an example without knowing what exactly you want to achieve.

I understand. It can be any theoretical example, included my case. Let it be my particular case.

So, I am trying to put an NGram analyzer/tokenizer/indexing (not really know the terms yet, or why they are necessary) on a field, to achieve some sort of useable full text search experience. (NGram looks something does the trick, for now) For that, I have a simple collection (index?) with the type ESTextDoc

    public class ESTextDoc
    {
        public string Text { get; set; } = "";
        public string Status { get; set; } = "";
    }

For creating the index (collection?) I tried this, based on ChatGPT hallucinations, where doc.IndexName is the index (collection?) name, I want to create, and the NGramTokenizerName() and NGramAnalizerName() just simple extension methods to concatenate the name with "_tokenizer" and "_analizer".

        var createIndexResponse = await elasticclient.Indices.CreateAsync<ESTextDoc>(doc.IndexName, c => c
            .Settings(s => s
                .Analysis(a => a
                    .Tokenizers(t => t
                        .NGram(doc.IndexName.NGramTokenizerName(), ng => ng
                            .MinGram(2)
                            .MaxGram(3)
                            .TokenChars([TokenChar.Letter, TokenChar.Digit])
                        )
                    )
                    .Analyzers(an => an
                        .Custom(doc.IndexName.NGramAnalizerName(), ca => ca
                            .Tokenizer(doc.IndexName.NGramTokenizerName())
                        )
                    )
                )
            )
            .Mappings(mappings => mappings
                .Properties(properties => properties
                    .Text(field => field.Text)
                    )
                )
            );

The index is created, but not working as expected. Looks like, the analyzers and the tokenizers (whatever they are) created, but not mapped to the required field, so the search works by the default text indexing method.

This is the output when I query the index metadata from ElasticSearch.

{
  "teszt": {
    "aliases": {},
    "mappings": {
      "properties": {
        "status": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "text": { "type": "text" }
      }
    },
    "settings": {
      "index": {
        "routing": { "allocation": { "include": { "_tier_preference": "data_content" } } },
        "number_of_shards": "1",
        "provided_name": "teszt",
        "creation_date": "1739350250996",
        "analysis": {
          "analyzer": {
            "teszt_ngram_analyzer": {
              "type": "custom",
              "tokenizer": "teszt_ngram_tokenizer"
            }
          },
          "tokenizer": {
            "teszt_ngram_tokenizer": {
              "token_chars": [ "letter", "digit" ],
              "min_gram": "2",
              "type": "ngram",
              "max_gram": "3"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "qqrCBYD9QCyvSMg4pzwTmA",
        "version": { "created": "8521000" }
      }
    }
  }
}

Do you want the NGram analyzer to be used on index level? Or do you want that to apply to a single field only, etc.?

I have no idea. Probably I want index single field, because that is what I need to search in, but I have not enough experience with ElasticSearch to correctly answer this question.

@flobernd
Copy link
Member

Hi @MiklosPathy, I'm very sorry, but this place is for issues regarding to the .NET client only and not for general Elasticsearch related questions.

I would suggest learning a little bit about the Elasticsearch basics first so that you at least know what you want to achieve in the end. I'm happy to answer specific questions about the .NET client, if there are any.

For general Elasticsearch related guidance, our Community Forums are the correct place.

Besides that, here are some useful resources to get started with ES:

That being said, the code that ChatGPT provided, is actually pretty much correct. Like you observed, the only thing missing is mapping the analyzer to the search field:

.Mappings(mappings => mappings
    .Properties(properties => properties
        .Text(field => field.Data, m => m.Analyzer("my_ngram_analyzer"))
    )
)

The API structure of the .NET client closely maps to the Elasticsearch RESP API:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html

Going to close this issue for now.

@MiklosPathy
Copy link
Author

Yes, thank you this is what I was after. It was not totally intuitive, that it is in a second parameter of an extension method, regarding the logic of the API.

Could you please put an example like this in the documentation, for the next delikvent who tries to understand the structure of the fluent API?

The best way is to learn is through examples... When someone does not know everything necessary, anything could help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.x Relates to 8.x client version Category: Question
Projects
None yet
Development

No branches or pull requests

2 participants