forked from opensearch-project/documentation-website
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add stemmer_override token filter docs opensearch-project#8445
Signed-off-by: Anton Rubin <[email protected]>
- Loading branch information
1 parent
76486a4
commit 93c4c41
Showing
2 changed files
with
138 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
--- | ||
layout: default | ||
title: Stemmer override | ||
parent: Token filters | ||
nav_order: 400 | ||
--- | ||
|
||
# Stemmer override token filter | ||
|
||
The `stemmer_override` token filter allows you to define custom stemming rules that override the behavior of default stemmers like Porter or Snowball. This is useful when you want to apply specific stemming behavior to certain words that might not be handled correctly by the standard stemming algorithms. | ||
|
||
## Parameters | ||
|
||
The `stemmer_override` token filter needs be configured with *one* of the following parameters: | ||
|
||
- `rules`: Defines the override rules directly in the settings. | ||
- `rules_path`: Specifies the file to use with custom rules/mappings. (Either absolute path or relative to config directory) | ||
|
||
## Example | ||
|
||
The following example request creates a new index named `my-index` and configures an analyzer with `stemmer_override` filter: | ||
|
||
```json | ||
PUT /my-index | ||
{ | ||
"settings": { | ||
"analysis": { | ||
"filter": { | ||
"my_stemmer_override_filter": { | ||
"type": "stemmer_override", | ||
"rules": [ | ||
"running, runner => run", | ||
"bought => buy", | ||
"best => good" | ||
] | ||
} | ||
}, | ||
"analyzer": { | ||
"my_custom_analyzer": { | ||
"type": "custom", | ||
"tokenizer": "standard", | ||
"filter": [ | ||
"lowercase", | ||
"my_stemmer_override_filter" | ||
] | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Generated tokens | ||
|
||
Use the following request to examine the tokens generated using the analyzer: | ||
|
||
```json | ||
GET /my-index/_analyze | ||
{ | ||
"analyzer": "my_custom_analyzer", | ||
"text": "I am a runner and bought the best shoes" | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
The response contains the generated tokens: | ||
|
||
```json | ||
{ | ||
"tokens": [ | ||
{ | ||
"token": "i", | ||
"start_offset": 0, | ||
"end_offset": 1, | ||
"type": "<ALPHANUM>", | ||
"position": 0 | ||
}, | ||
{ | ||
"token": "am", | ||
"start_offset": 2, | ||
"end_offset": 4, | ||
"type": "<ALPHANUM>", | ||
"position": 1 | ||
}, | ||
{ | ||
"token": "a", | ||
"start_offset": 5, | ||
"end_offset": 6, | ||
"type": "<ALPHANUM>", | ||
"position": 2 | ||
}, | ||
{ | ||
"token": "run", | ||
"start_offset": 7, | ||
"end_offset": 13, | ||
"type": "<ALPHANUM>", | ||
"position": 3 | ||
}, | ||
{ | ||
"token": "and", | ||
"start_offset": 14, | ||
"end_offset": 17, | ||
"type": "<ALPHANUM>", | ||
"position": 4 | ||
}, | ||
{ | ||
"token": "buy", | ||
"start_offset": 18, | ||
"end_offset": 24, | ||
"type": "<ALPHANUM>", | ||
"position": 5 | ||
}, | ||
{ | ||
"token": "the", | ||
"start_offset": 25, | ||
"end_offset": 28, | ||
"type": "<ALPHANUM>", | ||
"position": 6 | ||
}, | ||
{ | ||
"token": "good", | ||
"start_offset": 29, | ||
"end_offset": 33, | ||
"type": "<ALPHANUM>", | ||
"position": 7 | ||
}, | ||
{ | ||
"token": "shoes", | ||
"start_offset": 34, | ||
"end_offset": 39, | ||
"type": "<ALPHANUM>", | ||
"position": 8 | ||
} | ||
] | ||
} | ||
``` |