Skip to content

Commit

Permalink
[DOCS] Update documentation for index sorting and routing for logsdb (#…
Browse files Browse the repository at this point in the history
…120721) (#120904)

* [DOCS] Update documentation for index sorting and routing for logsdb

* update

* Apply suggestions from code review



* Update logs.asciidoc

* Update docs/reference/data-streams/logs.asciidoc



* Update logs.asciidoc

---------

Co-authored-by: Marci W <[email protected]>
  • Loading branch information
kkrik-es and marciw authored Jan 27, 2025
1 parent 7245c05 commit 40d0eb0
Showing 1 changed file with 52 additions and 38 deletions.
90 changes: 52 additions & 38 deletions docs/reference/data-streams/logs.asciidoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
[[logs-data-stream]]
== Logs data stream

IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted
and self-managed Elasticsearch as of version 8.17, and is enabled by default for
logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}].
IMPORTANT: The {es} `logsdb` index mode is generally available in Elastic Cloud Hosted
and self-managed Elasticsearch as of version 8.17, and is enabled by default for
logs in https://www.elastic.co/elasticsearch/serverless[{serverless-full}].

A logs data stream is a data stream type that stores log data more efficiently.

Expand Down Expand Up @@ -54,57 +54,49 @@ DELETE _index_template/my-index-template
=== Synthetic source

If you have the required https://www.elastic.co/subscriptions[subscription], `logsdb` index mode uses <<synthetic-source,synthetic `_source`>>, which omits storing the original `_source`
field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval.
field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval.

If you don't have the required https://www.elastic.co/subscriptions[subscription], `logsdb` mode uses the original `_source` field.

Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>.
Before using synthetic source, make sure to review the <<synthetic-source-restrictions,restrictions>>.

When working with multi-value fields, the `index.mapping.synthetic_source_keep` setting controls how field values
are preserved for <<synthetic-source,synthetic source>> reconstruction. In `logsdb`, the default value is `arrays`,
which retains both duplicate values and the order of entries. However, the exact structure of
array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some
log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events.
array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some
log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events.

[discrete]
[[logsdb-sort-settings]]
=== Index sort settings

In `logsdb` index mode, the following sort settings are applied by default:
In `logsdb` index mode, indices are sorted by the fields `host.name` and `@timestamp` by default.

`index.sort.field`: `["host.name", "@timestamp"]`::
Indices are sorted by `host.name` and `@timestamp` by default. The `@timestamp` field is automatically injected if it is not present.

`index.sort.order`: `["desc", "desc"]`::
Both `host.name` and `@timestamp` are sorted in descending (`desc`) order, prioritizing the latest data.

`index.sort.mode`: `["min", "min"]`::
The `min` mode sorts indices by the minimum value of multi-value fields.

`index.sort.missing`: `["_first", "_first"]`::
Missing values are sorted to appear `_first`.

You can override these default sort settings. For example, to sort on different fields
and change the order, manually configure `index.sort.field` and `index.sort.order`. For more details, see
<<index-modules-index-sorting>>.

When using the default sort settings, the `host.name` field is automatically injected into the index mappings as a `keyword` field to ensure that sorting can be applied. This guarantees that logs are efficiently sorted and retrieved based on the `host.name` and `@timestamp` fields.

NOTE: If `subobjects` is set to `true` (default), the `host` field is mapped as an object field
named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`,
* If the `@timestamp` field is not present, it is automatically injected.
* If the `host.name` field is not present, it is automatically injected as a `keyword` field, if possible.
** If `host.name` can't be injected (for example, `host` is a keyword field) or can't be used for sorting
(for example, its value is an IP address), only the `@timestamp` is used for sorting.
** If `host.name` is injected and `subobjects` is set to `true` (default), the `host` field is mapped as
an object field named `host` with a `name` child field of type `keyword`. If `subobjects` is set to `false`,
a single `host.name` field is mapped as a `keyword` field.
* To prioritize the latest data, `host.name` is sorted in ascending order and `@timestamp` is sorted in
descending order.

You can override the default sort settings by manually configuring `index.sort.field`
and `index.sort.order`. For more details, see <<index-modules-index-sorting>>.

To apply different sort settings to an existing data stream, update the data stream's component templates, and then
perform or wait for a <<data-streams-rollover,rollover>>.
To modify the sort configuration of an existing data stream, update the data stream's
component templates, and then perform or wait for a <<data-streams-rollover,rollover>>.

NOTE: In `logsdb` mode, the `@timestamp` field is automatically injected if it's not already present. If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not
automatically added to the list of sort fields.
NOTE: If you apply custom sort settings, the `@timestamp` field is injected into the mappings but is not
automatically added to the list of sort fields. For best results, include it manually as the last sort
field, with `desc` ordering.

[discrete]
[[logsdb-host-name]]
==== Existing data streams

If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied.
If you're enabling `logsdb` index mode on a data stream that already exists, make sure to check mappings and sorting. The `logsdb` mode automatically maps `host.name` as a keyword if it's included in the sort settings. If a `host.name` field already exists but has a different type, mapping errors might occur, preventing `logsdb` mode from being fully applied.

To avoid mapping conflicts, consider these options:

Expand All @@ -114,7 +106,29 @@ To avoid mapping conflicts, consider these options:

* **Switch to a different <<index-mode-setting,index mode>>**: If resolving `host.name` mapping conflicts is not feasible, you can choose not to use `logsdb` mode.

IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual).
IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-rollover,rollover>> (automatic or manual).

[discrete]
[[logsdb-sort-routing]]
==== Optimized routing on sort fields

To reduce the storage footprint of `logsdb` indexes, you can enable routing optimizations. A routing optimization uses the fields in the sort configuration (except for `@timestamp`) to route documents to shards.

In benchmarks,
routing optimizations reduced storage requirements by 20% compared to the default `logsdb` configuration, with a negligible penalty to ingestion
performance (1-4%). Routing optimizations can benefit data streams that are expected to grow substantially over
time. Exact results depend on the sort configuration and the nature of the logged data.

To configure a routing optimization:

* Include the index setting `[index.logsdb.route_on_sort_fields:true]` in the data stream configuration.
* <<index-modules-index-sorting, Configure index sorting>> with two or more fields, in addition to `@timestamp`.
* Make sure the <<mapping-id-field,`_id`>> field is not populated in ingested documents. It should be
auto-generated instead.

A custom sort configuration is required, to improve storage efficiency and to minimize hotspots
from logging spikes that may route documents to a single shard. For best results, use a few sort fields
that have a relatively low cardinality and don't co-vary (for example, `host.name` and `host.id` are not optimal).

[discrete]
[[logsdb-specialized-codecs]]
Expand All @@ -123,7 +137,7 @@ IMPORTANT: On existing data streams, `logsdb` mode is applied on <<data-streams-
By default, `logsdb` index mode uses the `best_compression` <<index-codec,codec>>, which applies {wikipedia}/Zstd[ZSTD]
compression to stored fields. You can switch to the `default` codec for faster compression with a slightly larger storage footprint.

The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are
The `logsdb` index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are
encoded using the following sequence of codecs:

* **Delta encoding**:
Expand Down Expand Up @@ -173,9 +187,9 @@ _characters._ Using UTF-8 encoding, this results in a limit of 32764 bytes, depe

The mapping-level `ignore_above` setting takes precedence. If a specific field has an `ignore_above` value
defined in its mapping, that value overrides the index-level `index.mapping.ignore_above` value. This default
behavior helps to optimize indexing performance by preventing excessively large string values from being indexed.
behavior helps to optimize indexing performance by preventing excessively large string values from being indexed.

If you need to customize the limit, you can override it at the mapping level or change the index level default.
If you need to customize the limit, you can override it at the mapping level or change the index level default.

[discrete]
[[logs-db-ignore-limit]]
Expand All @@ -202,7 +216,7 @@ reconstructing the original value.
[[logsdb-settings-summary]]
=== Settings reference

The `logsdb` index mode uses the following settings:
The `logsdb` index mode uses the following settings:

* **`index.mode`**: `"logsdb"`

Expand Down

0 comments on commit 40d0eb0

Please sign in to comment.