Releases: cortexproject/cortex
Cortex 1.5.0
Changelog
Cortex
- [CHANGE] Blocks storage: update the default HTTP configuration values for the S3 client to the upstream Thanos default values. #3244
-blocks-storage.s3.http.idle-conn-timeout
is set 90 seconds.-blocks-storage.s3.http.response-header-timeout
is set to 2 minutes.
- [CHANGE] Improved shuffle sharding support in the write path. This work introduced some config changes: #3090
- Introduced
-distributor.sharding-strategy
CLI flag (and its respectivesharding_strategy
YAML config option) to explicitly specify which sharding strategy should be used in the write path -experimental.distributor.user-subring-size
flag renamed to-distributor.ingestion-tenant-shard-size
user_subring_size
limit YAML config option renamed toingestion_tenant_shard_size
- Introduced
- [CHANGE] Dropped "blank Alertmanager configuration; using fallback" message from Info to Debug level. #3205
- [CHANGE] Zone-awareness replication for time-series now should be explicitly enabled in the distributor via the
-distributor.zone-awareness-enabled
CLI flag (or its respective YAML config option). Before, zone-aware replication was implicitly enabled if a zone was set on ingesters. #3200 - [CHANGE] Removed the deprecated CLI flag
-config-yaml
. You should use-schema-config-file
instead. #3225 - [CHANGE] Enforced the HTTP method required by some API endpoints which did (incorrectly) allow any method before that. #3228
GET /
GET /config
GET /debug/fgprof
GET /distributor/all_user_stats
GET /distributor/ha_tracker
GET /all_user_stats
GET /ha-tracker
GET /api/v1/user_stats
GET /api/v1/chunks
GET <legacy-http-prefix>/user_stats
GET <legacy-http-prefix>/chunks
GET /services
GET /multitenant_alertmanager/status
GET /status
(alertmanager microservice)GET|POST /ingester/ring
GET|POST /ring
GET|POST /store-gateway/ring
GET|POST /compactor/ring
GET|POST /ingester/flush
GET|POST /ingester/shutdown
GET|POST /flush
GET|POST /shutdown
GET|POST /ruler/ring
POST /api/v1/push
POST <legacy-http-prefix>/push
POST /push
POST /ingester/push
- [CHANGE] Renamed CLI flags to configure the network interface names from which automatically detect the instance IP. #3295
-compactor.ring.instance-interface
renamed to-compactor.ring.instance-interface-names
-store-gateway.sharding-ring.instance-interface
renamed to-store-gateway.sharding-ring.instance-interface-names
-distributor.ring.instance-interface
renamed to-distributor.ring.instance-interface-names
-ruler.ring.instance-interface
renamed to-ruler.ring.instance-interface-names
- [CHANGE] Renamed
-<prefix>.redis.enable-tls
CLI flag to-<prefix>.redis.tls-enabled
, and its respective YAML config option fromenable_tls
totls_enabled
. #3298 - [CHANGE] Increased default
-<prefix>.redis.timeout
from100ms
to500ms
. #3301 - [CHANGE]
cortex_alertmanager_config_invalid
has been removed in favor ofcortex_alertmanager_config_last_reload_successful
. #3289 - [CHANGE] Query-frontend: POST requests whose body size exceeds 10MiB will be rejected. The max body size can be customised via
-frontend.max-body-size
. #3276 - [FEATURE] Shuffle sharding: added support for shuffle-sharding queriers in the query-frontend. When configured (
-frontend.max-queriers-per-tenant
globally, or using per-tenant limitmax_queriers_per_tenant
), each tenants's requests will be handled by different set of queriers. #3113 #3257 - [FEATURE] Shuffle sharding: added support for shuffle-sharding ingesters on the read path. When ingesters shuffle-sharding is enabled and
-querier.shuffle-sharding-ingesters-lookback-period
is set, queriers will fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since 'now - lookback period'. #3252 - [FEATURE] Query-frontend: added
compression
config to support results cache with compression. #3217 - [FEATURE] Add OpenStack Swift support to blocks storage. #3303
- [FEATURE] Added support for applying Prometheus relabel configs on series received by the distributor. A
metric_relabel_configs
field has been added to the per-tenant limits configuration. #3329 - [FEATURE] Support for Cassandra client SSL certificates. #3384
- [ENHANCEMENT] Ruler: Introduces two new limits
-ruler.max-rules-per-rule-group
and-ruler.max-rule-groups-per-tenant
to control the number of rules per rule group and the total number of rule groups for a given user. They are disabled by default. #3366 - [ENHANCEMENT] Allow to specify multiple comma-separated Cortex services to
-target
CLI option (or its respective YAML config option). For example,-target=all,compactor
can be used to start Cortex single-binary with compactor as well. #3275 - [ENHANCEMENT] Expose additional HTTP configs for the S3 backend client. New flag are listed below: #3244
-blocks-storage.s3.http.idle-conn-timeout
-blocks-storage.s3.http.response-header-timeout
-blocks-storage.s3.http.insecure-skip-verify
- [ENHANCEMENT] Added
cortex_query_frontend_connected_clients
metric to show the number of workers currently connected to the frontend. #3207 - [ENHANCEMENT] Shuffle sharding: improved shuffle sharding in the write path. Shuffle sharding now should be explicitly enabled via
-distributor.sharding-strategy
CLI flag (or its respective YAML config option) and guarantees stability, consistency, shuffling and balanced zone-awareness properties. #3090 #3214 - [ENHANCEMENT] Ingester: added new metric
cortex_ingester_active_series
to track active series more accurately. Also added options to control whether active series tracking is enabled (-ingester.active-series-enabled
, defaults to false), and how often this metric is updated (-ingester.active-series-update-period
) and max idle time for series to be considered inactive (-ingester.active-series-idle-timeout
). #3153 - [ENHANCEMENT] Store-gateway: added zone-aware replication support to blocks replication in the store-gateway. #3200
- [ENHANCEMENT] Store-gateway: exported new metrics. #3231
cortex_bucket_store_cached_series_fetch_duration_seconds
cortex_bucket_store_cached_postings_fetch_duration_seconds
cortex_bucket_stores_gate_queries_max
- [ENHANCEMENT] Added
-version
flag to Cortex. #3233 - [ENHANCEMENT] Hash ring: added instance registered timestamp to the ring. #3248
- [ENHANCEMENT] Reduce tail latency by smoothing out spikes in rate of chunk flush operations. #3191
- [ENHANCEMENT] User Cortex as User Agent in http requests issued by Configs DB client. #3264
- [ENHANCEMENT] Experimental Ruler API: Fetch rule groups from object storage in parallel. #3218
- [ENHANCEMENT] Chunks GCS object storage client uses the
fields
selector to limit the payload size when listing objects in the bucket. #3218 #3292 - [ENHANCEMENT] Added shuffle sharding support to ruler. Added new metric
cortex_ruler_sync_rules_total
. #3235 - [ENHANCEMENT] Return an explicit error when the store-gateway is explicitly requested without a blocks storage engine. #3287
- [ENHANCEMENT] Ruler: only load rules that belong to the ruler. Improves rules synching performances when ruler sharding is enabled. #3269
- [ENHANCEMENT] Added
-<prefix>.redis.tls-insecure-skip-verify
flag. #3298 - [ENHANCEMENT] Added
cortex_alertmanager_config_last_reload_successful_seconds
metric to show timestamp of last successful AM config reload. #3289 - [ENHANCEMENT] Blocks storage: reduced number of bucket listing operations to list block content (applies to newly created blocks only). #3363
- [ENHANCEMENT] Ruler: Include the tenant ID on the notifier logs. #3372
- [ENHANCEMENT] Blocks storage Compactor: Added
-compactor.enabled-tenants
and-compactor.disabled-tenants
to explicitly enable or disable compaction of specific tenants. #3385 - [ENHANCEMENT] Blocks storage ingester: Creating checkpoint only once even when there are multiple Head compactions in a single
Compact()
call. #3373 - [BUGFIX] Blocks storage ingester: Read repair memory-mapped chunks file which can end up being empty on abrupt shutdowns combined with faulty disks. #3373
- [BUGFIX] Blocks storage ingester: Close TSDB resources on failed startup preventing ingester OOMing. #3373
- [BUGFIX] No-longer-needed ingester operations for queries triggered by queriers and rulers are now canceled. #3178
- [BUGFIX] Ruler: directories in the configured
rules-path
will be removed on startup and shutdown in order to ensure they don't persist between runs. #3195 - [BUGFIX] Handle hash-collisions in the query path. #3192
- [BUGFIX] Check for postgres rows errors. #3197
- [BUGFIX] Ruler Experimental API: Don't allow rule groups without names or empty rule groups. #3210
- [BUGFIX] Experimental Alertmanager API: Do not allow empty Alertmanager configurations or bad template filenames to be submitted through the configuration API. #3185
- [BUGFIX] Reduce failures to update heartbeat when using Consul. #3259
- [BUGFIX] When using ruler sharding, moving all user rule groups from ruler to a different one and then back could end up with some user groups not being evaluated at all. #3235
- [BUGFIX] Fixed shuffle sharding consistency when zone-awareness is enabled and the shard size is increased or instances in a new zone are added. #3299
- [BUGFIX] Use a valid grpc header when logging IP addresses. #3307
- [BUGFIX] Fixed the metric
cortex_prometheus_rule_group_duration_seconds
in the Ruler, it wouldn't report any values. #3310 - [BUGFIX] Fixed gRPC connections leaking in rulers when rulers sharding is enabled and APIs called. #3314
- [BUGFIX] Fixed shuffle sharding consistency when zone-awareness is enabled and the shard size is increased or instances...
Cortex 1.5.0-rc.1
Cortex 1.5.0-rc.0
Changelog
Cortex
- [CHANGE] Blocks storage: update the default HTTP configuration values for the S3 client to the upstream Thanos default values. #3244
-blocks-storage.s3.http.idle-conn-timeout
is set 90 seconds.-blocks-storage.s3.http.response-header-timeout
is set to 2 minutes.
- [CHANGE] Improved shuffle sharding support in the write path. This work introduced some config changes: #3090
- Introduced
-distributor.sharding-strategy
CLI flag (and its respectivesharding_strategy
YAML config option) to explicitly specify which sharding strategy should be used in the write path -experimental.distributor.user-subring-size
flag renamed to-distributor.ingestion-tenant-shard-size
user_subring_size
limit YAML config option renamed toingestion_tenant_shard_size
- Introduced
- [CHANGE] Dropped "blank Alertmanager configuration; using fallback" message from Info to Debug level. #3205
- [CHANGE] Zone-awareness replication for time-series now should be explicitly enabled in the distributor via the
-distributor.zone-awareness-enabled
CLI flag (or its respective YAML config option). Before, zone-aware replication was implicitly enabled if a zone was set on ingesters. #3200 - [CHANGE] Removed the deprecated CLI flag
-config-yaml
. You should use-schema-config-file
instead. #3225 - [CHANGE] Enforced the HTTP method required by some API endpoints which did (incorrectly) allow any method before that. #3228
GET /
GET /config
GET /debug/fgprof
GET /distributor/all_user_stats
GET /distributor/ha_tracker
GET /all_user_stats
GET /ha-tracker
GET /api/v1/user_stats
GET /api/v1/chunks
GET <legacy-http-prefix>/user_stats
GET <legacy-http-prefix>/chunks
GET /services
GET /multitenant_alertmanager/status
GET /status
(alertmanager microservice)GET|POST /ingester/ring
GET|POST /ring
GET|POST /store-gateway/ring
GET|POST /compactor/ring
GET|POST /ingester/flush
GET|POST /ingester/shutdown
GET|POST /flush
GET|POST /shutdown
GET|POST /ruler/ring
POST /api/v1/push
POST <legacy-http-prefix>/push
POST /push
POST /ingester/push
- [CHANGE] Renamed CLI flags to configure the network interface names from which automatically detect the instance IP. #3295
-compactor.ring.instance-interface
renamed to-compactor.ring.instance-interface-names
-store-gateway.sharding-ring.instance-interface
renamed to-store-gateway.sharding-ring.instance-interface-names
-distributor.ring.instance-interface
renamed to-distributor.ring.instance-interface-names
-ruler.ring.instance-interface
renamed to-ruler.ring.instance-interface-names
- [CHANGE] Renamed
-<prefix>.redis.enable-tls
CLI flag to-<prefix>.redis.tls-enabled
, and its respective YAML config option fromenable_tls
totls_enabled
. #3298 - [CHANGE] Increased default
-<prefix>.redis.timeout
from100ms
to500ms
. #3301 - [CHANGE]
cortex_alertmanager_config_invalid
has been removed in favor ofcortex_alertmanager_config_last_reload_successful
. #3289 - [CHANGE] Query-frontend: POST requests whose body size exceeds 10MiB will be rejected. The max body size can be customised via
-frontend.max-body-size
. #3276 - [FEATURE] Shuffle sharding: added support for shuffle-sharding queriers in the query-frontend. When configured (
-frontend.max-queriers-per-tenant
globally, or using per-tenant limitmax_queriers_per_tenant
), each tenants's requests will be handled by different set of queriers. #3113 #3257 - [FEATURE] Shuffle sharding: added support for shuffle-sharding ingesters on the read path. When ingesters shuffle-sharding is enabled and
-querier.shuffle-sharding-ingesters-lookback-period
is set, queriers will fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since 'now - lookback period'. #3252 - [FEATURE] Query-frontend: added
compression
config to support results cache with compression. #3217 - [FEATURE] Add OpenStack Swift support to blocks storage. #3303
- [FEATURE] Added support for applying Prometheus relabel configs on series received by the distributor. A
metric_relabel_configs
field has been added to the per-tenant limits configuration. #3329 - [FEATURE] Support for Cassandra client SSL certificates. #3384
- [ENHANCEMENT] Ruler: Introduces two new limits
-ruler.max-rules-per-rule-group
and-ruler.max-rule-groups-per-tenant
to control the number of rules per rule group and the total number of rule groups for a given user. They are disabled by default. #3366 - [ENHANCEMENT] Allow to specify multiple comma-separated Cortex services to
-target
CLI option (or its respective YAML config option). For example,-target=all,compactor
can be used to start Cortex single-binary with compactor as well. #3275 - [ENHANCEMENT] Expose additional HTTP configs for the S3 backend client. New flag are listed below: #3244
-blocks-storage.s3.http.idle-conn-timeout
-blocks-storage.s3.http.response-header-timeout
-blocks-storage.s3.http.insecure-skip-verify
- [ENHANCEMENT] Added
cortex_query_frontend_connected_clients
metric to show the number of workers currently connected to the frontend. #3207 - [ENHANCEMENT] Shuffle sharding: improved shuffle sharding in the write path. Shuffle sharding now should be explicitly enabled via
-distributor.sharding-strategy
CLI flag (or its respective YAML config option) and guarantees stability, consistency, shuffling and balanced zone-awareness properties. #3090 #3214 - [ENHANCEMENT] Ingester: added new metric
cortex_ingester_active_series
to track active series more accurately. Also added options to control whether active series tracking is enabled (-ingester.active-series-enabled
, defaults to false), and how often this metric is updated (-ingester.active-series-update-period
) and max idle time for series to be considered inactive (-ingester.active-series-idle-timeout
). #3153 - [ENHANCEMENT] Store-gateway: added zone-aware replication support to blocks replication in the store-gateway. #3200
- [ENHANCEMENT] Store-gateway: exported new metrics. #3231
cortex_bucket_store_cached_series_fetch_duration_seconds
cortex_bucket_store_cached_postings_fetch_duration_seconds
cortex_bucket_stores_gate_queries_max
- [ENHANCEMENT] Added
-version
flag to Cortex. #3233 - [ENHANCEMENT] Hash ring: added instance registered timestamp to the ring. #3248
- [ENHANCEMENT] Reduce tail latency by smoothing out spikes in rate of chunk flush operations. #3191
- [ENHANCEMENT] User Cortex as User Agent in http requests issued by Configs DB client. #3264
- [ENHANCEMENT] Experimental Ruler API: Fetch rule groups from object storage in parallel. #3218
- [ENHANCEMENT] Chunks GCS object storage client uses the
fields
selector to limit the payload size when listing objects in the bucket. #3218 #3292 - [ENHANCEMENT] Added shuffle sharding support to ruler. Added new metric
cortex_ruler_sync_rules_total
. #3235 - [ENHANCEMENT] Return an explicit error when the store-gateway is explicitly requested without a blocks storage engine. #3287
- [ENHANCEMENT] Ruler: only load rules that belong to the ruler. Improves rules synching performances when ruler sharding is enabled. #3269
- [ENHANCEMENT] Added
-<prefix>.redis.tls-insecure-skip-verify
flag. #3298 - [ENHANCEMENT] Added
cortex_alertmanager_config_last_reload_successful_seconds
metric to show timestamp of last successful AM config reload. #3289 - [ENHANCEMENT] Blocks storage: reduced number of bucket listing operations to list block content (applies to newly created blocks only). #3363
- [ENHANCEMENT] Ruler: Include the tenant ID on the notifier logs. #3372
- [ENHANCEMENT] Blocks storage Compactor: Added
-compactor.enabled-tenants
and-compactor.disabled-tenants
to explicitly enable or disable compaction of specific tenants. #3385 - [ENHANCEMENT] Blocks storage ingester: Creating checkpoint only once even when there are multiple Head compactions in a single
Compact()
call. #3373 - [BUGFIX] Blocks storage ingester: Read repair memory-mapped chunks file which can end up being empty on abrupt shutdowns combined with faulty disks. #3373
- [BUGFIX] Blocks storage ingester: Close TSDB resources on failed startup preventing ingester OOMing. #3373
- [BUGFIX] No-longer-needed ingester operations for queries triggered by queriers and rulers are now canceled. #3178
- [BUGFIX] Ruler: directories in the configured
rules-path
will be removed on startup and shutdown in order to ensure they don't persist between runs. #3195 - [BUGFIX] Handle hash-collisions in the query path. #3192
- [BUGFIX] Check for postgres rows errors. #3197
- [BUGFIX] Ruler Experimental API: Don't allow rule groups without names or empty rule groups. #3210
- [BUGFIX] Experimental Alertmanager API: Do not allow empty Alertmanager configurations or bad template filenames to be submitted through the configuration API. #3185
- [BUGFIX] Reduce failures to update heartbeat when using Consul. #3259
- [BUGFIX] When using ruler sharding, moving all user rule groups from ruler to a different one and then back could end up with some user groups not being evaluated at all. #3235
- [BUGFIX] Fixed shuffle sharding consistency when zone-awareness is enabled and the shard size is increased or instances in a new zone are added. #3299
- [BUGFIX] Use a valid grpc header when logging IP addresses. #3307
- [BUGFIX] Fixed the metric
cortex_prometheus_rule_group_duration_seconds
in the Ruler, it wouldn't report any values. #3310 - [BUGFIX] Fixed gRPC connections leaking in rulers when rulers sharding is enabled and APIs called. #3314
- [BUGFIX] Fixed shuffle sharding consistency when zone-awareness is enabled and the shard size is increased or instances...
Cortex 1.4.0
This Cortex release features 112 contributions from 32 authors and exciting news!
Highlights
- Cortex blocks storage is now GA.
- Cassandra support for the chunks storage is now GA.
- Redis caching backend now supports Redis sentinel and Redis cluster too.
- Introduced shuffle sharding support to store-gateway blocks sharding (blocks storage).
- The ruler and alertmanager got several improvements
- Last, but not the least, many enhancements, optimisations and bug fixes.
Please refer to the changelog for full list of changes and improvements.
Changelog
- [CHANGE] Cassandra backend support is now GA (stable). #3180
- [CHANGE] Blocks storage is now GA (stable). The
-experimental
prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180-experimental.blocks-storage.*
flags renamed to-blocks-storage.*
-experimental.store-gateway.*
flags renamed to-store-gateway.*
-experimental.querier.store-gateway-client.*
flags renamed to-querier.store-gateway-client.*
-experimental.querier.store-gateway-addresses
flag renamed to-querier.store-gateway-addresses
-store-gateway.replication-factor
flag renamed to-store-gateway.sharding-ring.replication-factor
-store-gateway.tokens-file-path
flag renamed tostore-gateway.sharding-ring.tokens-file-path
- [CHANGE] Ingester: Removed deprecated untyped record from chunks WAL. Only if you are running
v1.0
or below, it is recommended to first upgrade tov1.1
/v1.2
/v1.3
and run it for a day before upgrading tov1.4
to avoid data loss. #3115 - [CHANGE] Distributor API endpoints are no longer served unless target is set to
distributor
orall
. #3112 - [CHANGE] Increase the default Cassandra client replication factor to 3. #3007
- [CHANGE] Blocks storage: removed the support to transfer blocks between ingesters on shutdown. When running the Cortex blocks storage, ingesters are expected to run with a persistent disk. The following metrics have been removed: #2996
cortex_ingester_sent_files
cortex_ingester_received_files
cortex_ingester_received_bytes_total
cortex_ingester_sent_bytes_total
- [CHANGE] The buckets for the
cortex_chunk_store_index_lookups_per_query
metric have been changed to 1, 2, 4, 8, 16. #3021 - [CHANGE] Blocks storage: the
operation
label valuegetrange
has changed intoget_range
for the metricsthanos_store_bucket_cache_operation_requests_total
andthanos_store_bucket_cache_operation_hits_total
. #3000 - [CHANGE] Experimental Delete Series:
/api/v1/admin/tsdb/delete_series
and/api/v1/admin/tsdb/cancel_delete_request
purger APIs to return status code204
instead of200
for success. #2946 - [CHANGE] Histogram
cortex_memcache_request_duration_seconds
method
label value changes fromMemcached.Get
toMemcached.GetBatched
for batched lookups, and is not reported for non-batched lookups (label valueMemcached.GetMulti
remains, and had exactly the same value asGet
in nonbatched lookups). The same change applies to tracing spans. #3046 - [CHANGE] TLS server validation is now enabled by default, a new parameter
tls_insecure_skip_verify
can be set to true to skip validation optionally. #3030 - [CHANGE]
cortex_ruler_config_update_failures_total
has been removed in favor ofcortex_ruler_config_last_reload_successful
. #3056 - [CHANGE]
ruler.evaluation_delay_duration
field in YAML config has been moved and renamed tolimits.ruler_evaluation_delay_duration
. #3098 - [CHANGE] Removed obsolete
results_cache.max_freshness
from YAML config (deprecated since Cortex 1.2). #3145 - [CHANGE] Removed obsolete
-promql.lookback-delta
option (deprecated since Cortex 1.2, replaced with-querier.lookback-delta
). #3144 - [CHANGE] Cache: added support for Redis Cluster and Redis Sentinel. #2961
- The following changes have been made in Redis configuration:
-redis.master_name
added-redis.db
added-redis.max-active-conns
changed to-redis.pool-size
-redis.max-conn-lifetime
changed to-redis.max-connection-age
-redis.max-idle-conns
removed-redis.wait-on-pool-exhaustion
removed
- [CHANGE] TLS configuration for gRPC, HTTP and etcd clients is now marked as experimental. These features are not yet fully baked, and we expect possible small breaking changes in Cortex 1.5. #3198
- [CHANGE] Fixed store-gateway CLI flags inconsistencies. #3201
-store-gateway.replication-factor
flag renamed to-store-gateway.sharding-ring.replication-factor
-store-gateway.tokens-file-path
flag renamed tostore-gateway.sharding-ring.tokens-file-path
- [FEATURE] Logging of the source IP passed along by a reverse proxy is now supported by setting the
-server.log-source-ips-enabled
. For non standard headers the settings-server.log-source-ips-header
and-server.log-source-ips-regex
can be used. #2985 - [FEATURE] Blocks storage: added shuffle sharding support to store-gateway blocks sharding. Added the following additional metrics to store-gateway: #3069
cortex_bucket_stores_tenants_discovered
cortex_bucket_stores_tenants_synced
- [FEATURE] Experimental blocksconvert: introduce an experimental tool
blocksconvert
to migrate long-term storage chunks to blocks. #3092 #3122 #3127 #3162 - [ENHANCEMENT] Add support for azure storage in China, German and US Government environments. #2988
- [ENHANCEMENT] Query-tee: added a small tolerance to floating point sample values comparison. #2994
- [ENHANCEMENT] Query-tee: add support for doing a passthrough of requests to preferred backend for unregistered routes #3018
- [ENHANCEMENT] Expose
storage.aws.dynamodb.backoff_config
configuration file field. #3026 - [ENHANCEMENT] Added
cortex_request_message_bytes
andcortex_response_message_bytes
histograms to track received and sent gRPC message and HTTP request/response sizes. Addedcortex_inflight_requests
gauge to track number of inflight gRPC and HTTP requests. #3064 - [ENHANCEMENT] Publish ruler's ring metrics. #3074
- [ENHANCEMENT] Add config validation to the experimental Alertmanager API. Invalid configs are no longer accepted. #3053
- [ENHANCEMENT] Add "integration" as a label for
cortex_alertmanager_notifications_total
andcortex_alertmanager_notifications_failed_total
metrics. #3056 - [ENHANCEMENT] Add
cortex_ruler_config_last_reload_successful
andcortex_ruler_config_last_reload_successful_seconds
to check status of users rule manager. #3056 - [ENHANCEMENT] The configuration validation now fails if an empty YAML node has been set for a root YAML config property. #3080
- [ENHANCEMENT] Memcached dial() calls now have a circuit-breaker to avoid hammering a broken cache. #3051, #3189
- [ENHANCEMENT]
-ruler.evaluation-delay-duration
is now overridable as a per-tenant limit,ruler_evaluation_delay_duration
. #3098 - [ENHANCEMENT] Add TLS support to etcd client. #3102
- [ENHANCEMENT] When a tenant accesses the Alertmanager UI or its API, if we have valid
-alertmanager.configs.fallback
we'll use that to start the manager and avoid failing the request. #3073 - [ENHANCEMENT] Add
DELETE api/v1/rules/{namespace}
to the Ruler. It allows all the rule groups of a namespace to be deleted. #3120 - [ENHANCEMENT] Experimental Delete Series: Retry processing of Delete requests during failures. #2926
- [ENHANCEMENT] Improve performance of QueryStream() in ingesters. #3177
- [ENHANCEMENT] Modules included in "All" target are now visible in output of
-modules
CLI flag. #3155 - [ENHANCEMENT] Added
/debug/fgprof
endpoint to debug running Cortex process usingfgprof
. This adds up to the existing/debug/...
endpoints. #3131 - [ENHANCEMENT] Blocks storage: optimised
/api/v1/series
for blocks storage. (#2976) - [BUGFIX] Ruler: when loading rules from "local" storage, check for directory after resolving symlink. #3137
- [BUGFIX] Query-frontend: Fixed rounding for incoming query timestamps, to be 100% Prometheus compatible. #2990
- [BUGFIX] Querier: Merge results from chunks and blocks ingesters when using streaming of results. #3013
- [BUGFIX] Querier: query /series from ingesters regardless the
-querier.query-ingesters-within
setting. #3035 - [BUGFIX] Blocks storage: Ingester is less likely to hit gRPC message size limit when streaming data to queriers. #3015
- [BUGFIX] Blocks storage: fixed memberlist support for the store-gateways and compactors ring used when blocks sharding is enabled. #3058 #3095
- [BUGFIX] Fix configuration for TLS server validation, TLS skip verify was hardcoded to true for all TLS configurations and prevented validation of server certificates. #3030
- [BUGFIX] Fixes the Alertmanager panicking when no
-alertmanager.web.external-url
is provided. #3017 - [BUGFIX] Fixes the registration of the Alertmanager API metrics
cortex_alertmanager_alerts_received_total
andcortex_alertmanager_alerts_invalid_total
. #3065 - [BUGFIX] Fixes
flag needs an argument: -config.expand-env
error. #3087 - [BUGFIX] An index optimisation actually slows things down when using caching. Moved it to the right location. #2973
- [BUGFIX] Ingester: If push request contained both valid and invalid samples, valid samples were ingested but not stored to WAL of the chunks storage. This has been fixed. #3067
- [BUGFIX] Cassandra: fixed consistency setting in the CQL session when creating the keyspace. #3105
- [BUGFIX] Ruler: Config API would return both the
record
andalert
inYAML
response keys even when one of them must be empty. #3120 - [BUGFIX] Index page now uses configured HTTP path prefix when creating links. #3126
- [BUGFIX] Purger: fixed deadlock when reloading of tombstones failed. #3182
- [BU...
Cortex 1.4.0-rc.1
This is the second release candidate for Cortex 1.4.0
.
Changelog
- [CHANGE] TLS configuration for gRPC, HTTP and etcd clients is now marked as experimental. These features are not yet fully baked, and we expect possible small breaking changes in Cortex 1.5. #3198
- [CHANGE] Fixed store-gateway CLI flags inconsistencies. #3201
-store-gateway.replication-factor
flag renamed to-store-gateway.sharding-ring.replication-factor
-store-gateway.tokens-file-path
flag renamed tostore-gateway.sharding-ring.tokens-file-path
- [BUGFIX] Handle hash-collisions in the query path. Before this fix, Cortex could occasionally mix up two different series in a query, leading to invalid results, when
-querier.ingester-streaming
was used. #3192
Cortex 1.4.0-rc.0
This Cortex releases features 112 contributions from 32 authors and exciting news!
Highlights
- Cortex blocks storage is now GA.
- Cassandra support for the chunks storage is now GA.
- Redis caching backend now supports Redis sentinel and Redis cluster too.
- Introduced shuffle sharding support to store-gateway blocks sharding (blocks storage).
- The ruler and alertmanager got several improvements
- Last, but not the least, many enhancements, optimisations and bug fixes.
Please refer to the changelog for full list of changes and improvements.
Changelog
- [CHANGE] Cassandra backend support is now GA (stable). #3180
- [CHANGE] Blocks storage is now GA (stable). The
-experimental
prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180-experimental.blocks-storage.*
flags renamed to-blocks-storage.*
-experimental.store-gateway.*
flags renamed to-store-gateway.*
-experimental.querier.store-gateway-client.*
flags renamed to-querier.store-gateway-client.*
-experimental.querier.store-gateway-addresses
flag renamed to-querier.store-gateway-addresses
- [CHANGE] Ingester: Removed deprecated untyped record from chunks WAL. Only if you are running
v1.0
or below, it is recommended to first upgrade tov1.1
/v1.2
/v1.3
and run it for a day before upgrading tov1.4
to avoid data loss. #3115 - [CHANGE] Distributor API endpoints are no longer served unless target is set to
distributor
orall
. #3112 - [CHANGE] Increase the default Cassandra client replication factor to 3. #3007
- [CHANGE] Blocks storage: removed the support to transfer blocks between ingesters on shutdown. When running the Cortex blocks storage, ingesters are expected to run with a persistent disk. The following metrics have been removed: #2996
cortex_ingester_sent_files
cortex_ingester_received_files
cortex_ingester_received_bytes_total
cortex_ingester_sent_bytes_total
- [CHANGE] The buckets for the
cortex_chunk_store_index_lookups_per_query
metric have been changed to 1, 2, 4, 8, 16. #3021 - [CHANGE] Blocks storage: the
operation
label valuegetrange
has changed intoget_range
for the metricsthanos_store_bucket_cache_operation_requests_total
andthanos_store_bucket_cache_operation_hits_total
. #3000 - [CHANGE] Experimental Delete Series:
/api/v1/admin/tsdb/delete_series
and/api/v1/admin/tsdb/cancel_delete_request
purger APIs to return status code204
instead of200
for success. #2946 - [CHANGE] Histogram
cortex_memcache_request_duration_seconds
method
label value changes fromMemcached.Get
toMemcached.GetBatched
for batched lookups, and is not reported for non-batched lookups (label valueMemcached.GetMulti
remains, and had exactly the same value asGet
in nonbatched lookups). The same change applies to tracing spans. #3046 - [CHANGE] TLS server validation is now enabled by default, a new parameter
tls_insecure_skip_verify
can be set to true to skip validation optionally. #3030 - [CHANGE]
cortex_ruler_config_update_failures_total
has been removed in favor ofcortex_ruler_config_last_reload_successful
. #3056 - [CHANGE]
ruler.evaluation_delay_duration
field in YAML config has been moved and renamed tolimits.ruler_evaluation_delay_duration
. #3098 - [CHANGE] Removed obsolete
results_cache.max_freshness
from YAML config (deprecated since Cortex 1.2). #3145 - [CHANGE] Removed obsolete
-promql.lookback-delta
option (deprecated since Cortex 1.2, replaced with-querier.lookback-delta
). #3144 - [CHANGE] Cache: added support for Redis Cluster and Redis Sentinel. #2961
- The following changes have been made in Redis configuration:
-redis.master_name
added-redis.db
added-redis.max-active-conns
changed to-redis.pool-size
-redis.max-conn-lifetime
changed to-redis.max-connection-age
-redis.max-idle-conns
removed-redis.wait-on-pool-exhaustion
removed
- [FEATURE] Logging of the source IP passed along by a reverse proxy is now supported by setting the
-server.log-source-ips-enabled
. For non standard headers the settings-server.log-source-ips-header
and-server.log-source-ips-regex
can be used. #2985 - [FEATURE] Blocks storage: added shuffle sharding support to store-gateway blocks sharding. Added the following additional metrics to store-gateway: #3069
cortex_bucket_stores_tenants_discovered
cortex_bucket_stores_tenants_synced
- [FEATURE] Experimental blocksconvert: introduce an experimental tool
blocksconvert
to migrate long-term storage chunks to blocks. #3092 #3122 #3127 #3162 - [ENHANCEMENT] Add support for azure storage in China, German and US Government environments. #2988
- [ENHANCEMENT] Query-tee: added a small tolerance to floating point sample values comparison. #2994
- [ENHANCEMENT] Query-tee: add support for doing a passthrough of requests to preferred backend for unregistered routes #3018
- [ENHANCEMENT] Expose
storage.aws.dynamodb.backoff_config
configuration file field. #3026 - [ENHANCEMENT] Added
cortex_request_message_bytes
andcortex_response_message_bytes
histograms to track received and sent gRPC message and HTTP request/response sizes. Addedcortex_inflight_requests
gauge to track number of inflight gRPC and HTTP requests. #3064 - [ENHANCEMENT] Publish ruler's ring metrics. #3074
- [ENHANCEMENT] Add config validation to the experimental Alertmanager API. Invalid configs are no longer accepted. #3053
- [ENHANCEMENT] Add "integration" as a label for
cortex_alertmanager_notifications_total
andcortex_alertmanager_notifications_failed_total
metrics. #3056 - [ENHANCEMENT] Add
cortex_ruler_config_last_reload_successful
andcortex_ruler_config_last_reload_successful_seconds
to check status of users rule manager. #3056 - [ENHANCEMENT] The configuration validation now fails if an empty YAML node has been set for a root YAML config property. #3080
- [ENHANCEMENT] Memcached dial() calls now have a circuit-breaker to avoid hammering a broken cache. #3051, #3189
- [ENHANCEMENT]
-ruler.evaluation-delay-duration
is now overridable as a per-tenant limit,ruler_evaluation_delay_duration
. #3098 - [ENHANCEMENT] Add TLS support to etcd client. #3102
- [ENHANCEMENT] When a tenant accesses the Alertmanager UI or its API, if we have valid
-alertmanager.configs.fallback
we'll use that to start the manager and avoid failing the request. #3073 - [ENHANCEMENT] Add
DELETE api/v1/rules/{namespace}
to the Ruler. It allows all the rule groups of a namespace to be deleted. #3120 - [ENHANCEMENT] Experimental Delete Series: Retry processing of Delete requests during failures. #2926
- [ENHANCEMENT] Improve performance of QueryStream() in ingesters. #3177
- [ENHANCEMENT] Modules included in "All" target are now visible in output of
-modules
CLI flag. #3155 - [ENHANCEMENT] Added
/debug/fgprof
endpoint to debug running Cortex process usingfgprof
. This adds up to the existing/debug/...
endpoints. #3131 - [ENHANCEMENT] Blocks storage: optimised
/api/v1/series
for blocks storage. (#2976) - [BUGFIX] Ruler: when loading rules from "local" storage, check for directory after resolving symlink. #3137
- [BUGFIX] Query-frontend: Fixed rounding for incoming query timestamps, to be 100% Prometheus compatible. #2990
- [BUGFIX] Querier: Merge results from chunks and blocks ingesters when using streaming of results. #3013
- [BUGFIX] Querier: query /series from ingesters regardless the
-querier.query-ingesters-within
setting. #3035 - [BUGFIX] Blocks storage: Ingester is less likely to hit gRPC message size limit when streaming data to queriers. #3015
- [BUGFIX] Blocks storage: fixed memberlist support for the store-gateways and compactors ring used when blocks sharding is enabled. #3058 #3095
- [BUGFIX] Fix configuration for TLS server validation, TLS skip verify was hardcoded to true for all TLS configurations and prevented validation of server certificates. #3030
- [BUGFIX] Fixes the Alertmanager panicking when no
-alertmanager.web.external-url
is provided. #3017 - [BUGFIX] Fixes the registration of the Alertmanager API metrics
cortex_alertmanager_alerts_received_total
andcortex_alertmanager_alerts_invalid_total
. #3065 - [BUGFIX] Fixes
flag needs an argument: -config.expand-env
error. #3087 - [BUGFIX] An index optimisation actually slows things down when using caching. Moved it to the right location. #2973
- [BUGFIX] Ingester: If push request contained both valid and invalid samples, valid samples were ingested but not stored to WAL of the chunks storage. This has been fixed. #3067
- [BUGFIX] Cassandra: fixed consistency setting in the CQL session when creating the keyspace. #3105
- [BUGFIX] Ruler: Config API would return both the
record
andalert
inYAML
response keys even when one of them must be empty. #3120 - [BUGFIX] Index page now uses configured HTTP path prefix when creating links. #3126
- [BUGFIX] Purger: fixed deadlock when reloading of tombstones failed. #3182
- [BUGFIX] Fixed panic in flusher job, when error writing chunks to the store would cause "idle" chunks to be flushed, which triggered panic. #3140
- [BUGFIX] Index page no longer shows links that are not valid for running Cortex instance. #3133
- [BUGFIX] Configs: prevent validation of templates to fail when using template functions. #3157
- [BUGFIX] Configuring the S3 URL with an
@
but without username and password doesn't enable the AWS static credentials anymore. #3170 - [BUGFIX] Limit errors on ranged queries (
api/v1/query_range
) no longer return a status code500
but422
instead. #3167
Cortex 1.3.0
This Cortex release features 125 contributions from 37 different authors. It's yet another great milestone we have reached thanks to the amazing support from our community ❤️ Thanks!
Highlights:
- The blocks storage is getting closer to production readiness. In this release we've done several fixes and improvements. In particular, you should be aware of:
- Some CLI flags and YAML config options have been renamed
- The store-gateway service is now mandatory when running the blocks storage
- Introduced support for a live cluster migration from chunks to blocks (and rollback)
- Introduced support to flush blocks on-demand from ingesters
- The ruler and alertmanager got several improvements, including but not limited to:
- The ruler now runs in the single binary when Cortex gets started with
-target=all
- Introduced new config options to fine-tune the ruler
- Introduced support to load locally stored rules (eg. loaded via Kubernetes config map)
- Multiple alertmanager URLs can now be specified in the ruler; each URL is treated as a separate alertmanager group
- Alertmanager configuration can be persisted to object storage via API
- The ruler now runs in the single binary when Cortex gets started with
- Other changes worth to note:
- Added optional
snappy
compression support to internal gRPC connections - Starting from this release we're going to publish
.rpm
and.deb
packages too
- Added optional
Please refer to the full changelog for full list of changes and improvements.
Changelog
- [CHANGE] Replace the metric
cortex_alertmanager_configs
withcortex_alertmanager_config_invalid
exposed by Alertmanager. #2960 - [CHANGE] Experimental Delete Series: Change target flag for purger from
data-purger
topurger
. #2777 - [CHANGE] Experimental blocks storage: The max concurrent queries against the long-term storage, configured via
-experimental.blocks-storage.bucket-store.max-concurrent
, is now a limit shared across all tenants and not a per-tenant limit anymore. The default value has changed from20
to100
and the following new metrics have been added: #2797cortex_bucket_stores_gate_queries_concurrent_max
cortex_bucket_stores_gate_queries_in_flight
cortex_bucket_stores_gate_duration_seconds
- [CHANGE] Metric
cortex_ingester_flush_reasons
has been renamed tocortex_ingester_flushing_enqueued_series_total
, and new metriccortex_ingester_flushing_dequeued_series_total
withoutcome
label (superset of reason) has been added. #2802 #2818 #2998 - [CHANGE] Experimental Delete Series: Metric
cortex_purger_oldest_pending_delete_request_age_seconds
would track age of delete requests since they are over their cancellation period instead of their creation time. #2806 - [CHANGE] Experimental blocks storage: the store-gateway service is required in a Cortex cluster running with the experimental blocks storage. Removed the
-experimental.tsdb.store-gateway-enabled
CLI flag andstore_gateway_enabled
YAML config option. The store-gateway is now always enabled when the storage engine isblocks
. #2822 - [CHANGE] Experimental blocks storage: removed support for
-experimental.blocks-storage.bucket-store.max-sample-count
flag because the implementation was flawed. To limit the number of samples/chunks processed by a single query you can set-store.query-chunk-limit
, which is now supported by the blocks storage too. #2852 - [CHANGE] Ingester: Chunks flushed via /flush stay in memory until retention period is reached. This affects
cortex_ingester_memory_chunks
metric. #2778 - [CHANGE] Querier: the error message returned when the query time range exceeds
-store.max-query-length
has changed frominvalid query, length > limit (X > Y)
tothe query time range exceeds the limit (query length: X, limit: Y)
. #2826 - [CHANGE] Add
component
label to metrics exposed by chunk, delete and index store clients. #2774 - [CHANGE] Querier: when
-querier.query-ingesters-within
is configured, the time range of the query sent to ingesters is now manipulated to ensure the query start time is not older than 'now - query-ingesters-within'. #2904 - [CHANGE] KV: The
role
label which was a label ofmulti
KV store client only has been added to metrics of every KV store client. If KV store client is notmulti
, then the value ofrole
label isprimary
. #2837 - [CHANGE] Added the
engine
label to the metrics exposed by the Prometheus query engine, to distinguish betweenruler
andquerier
metrics. #2854 - [CHANGE] Added ruler to the single binary when started with
-target=all
(default). #2854 - [CHANGE] Experimental blocks storage: compact head when opening TSDB. This should only affect ingester startup after it was unable to compact head in previous run. #2870
- [CHANGE] Metric
cortex_overrides_last_reload_successful
has been renamed tocortex_runtime_config_last_reload_successful
. #2874 - [CHANGE] HipChat support has been removed from the alertmanager (because removed from the Prometheus upstream too). #2902
- [CHANGE] Add constant label
name
to metriccortex_cache_request_duration_seconds
. #2903 - [CHANGE] Add
user
label to metriccortex_query_frontend_queue_length
. #2939 - [CHANGE] Experimental blocks storage: cleaned up the config and renamed "TSDB" to "blocks storage". #2937
- The storage engine setting value has been changed from
tsdb
toblocks
; this affects-store.engine
CLI flag and its respective YAML option. - The root level YAML config has changed from
tsdb
toblocks_storage
- The prefix of all CLI flags has changed from
-experimental.tsdb.
to-experimental.blocks-storage.
- The following settings have been grouped under
tsdb
property in the YAML config and their CLI flags changed:-experimental.tsdb.dir
changed to-experimental.blocks-storage.tsdb.dir
-experimental.tsdb.block-ranges-period
changed to-experimental.blocks-storage.tsdb.block-ranges-period
-experimental.tsdb.retention-period
changed to-experimental.blocks-storage.tsdb.retention-period
-experimental.tsdb.ship-interval
changed to-experimental.blocks-storage.tsdb.ship-interval
-experimental.tsdb.ship-concurrency
changed to-experimental.blocks-storage.tsdb.ship-concurrency
-experimental.tsdb.max-tsdb-opening-concurrency-on-startup
changed to-experimental.blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
-experimental.tsdb.head-compaction-interval
changed to-experimental.blocks-storage.tsdb.head-compaction-interval
-experimental.tsdb.head-compaction-concurrency
changed to-experimental.blocks-storage.tsdb.head-compaction-concurrency
-experimental.tsdb.head-compaction-idle-timeout
changed to-experimental.blocks-storage.tsdb.head-compaction-idle-timeout
-experimental.tsdb.stripe-size
changed to-experimental.blocks-storage.tsdb.stripe-size
-experimental.tsdb.wal-compression-enabled
changed to-experimental.blocks-storage.tsdb.wal-compression-enabled
-experimental.tsdb.flush-blocks-on-shutdown
changed to-experimental.blocks-storage.tsdb.flush-blocks-on-shutdown
- The storage engine setting value has been changed from
- [CHANGE] Flags
-bigtable.grpc-use-gzip-compression
,-ingester.client.grpc-use-gzip-compression
,-querier.frontend-client.grpc-use-gzip-compression
are now deprecated. #2940 - [CHANGE] Limit errors reported by ingester during query-time now return HTTP status code 422. #2941
- [FEATURE] Introduced
ruler.for-outage-tolerance
, Max time to tolerate outage for restoring "for" state of alert. #2783 - [FEATURE] Introduced
ruler.for-grace-period
, Minimum duration between alert and restored "for" state. This is maintained only for alerts with configured "for" time greater than grace period. #2783 - [FEATURE] Introduced
ruler.resend-delay
, Minimum amount of time to wait before resending an alert to Alertmanager. #2783 - [FEATURE] Ruler: added
local
filesystem support to store rules (read-only). #2854 - [ENHANCEMENT] Upgraded Docker base images to
alpine:3.12
. #2862 - [ENHANCEMENT] Experimental: Querier can now optionally query secondary store. This is specified by using
-querier.second-store-engine
option, with valueschunks
orblocks
. Standard configuration options for this store are used. Additionally, this querying can be configured to happen only for queries that need data older than-querier.use-second-store-before-time
. Default value of zero will always query secondary store. #2747 - [ENHANCEMENT] Query-tee: increased the
cortex_querytee_request_duration_seconds
metric buckets granularity. #2799 - [ENHANCEMENT] Query-tee: fail to start if the configured
-backend.preferred
is unknown. #2799 - [ENHANCEMENT] Ruler: Added the following metrics: #2786
cortex_prometheus_notifications_latency_seconds
cortex_prometheus_notifications_errors_total
cortex_prometheus_notifications_sent_total
cortex_prometheus_notifications_dropped_total
cortex_prometheus_notifications_queue_length
cortex_prometheus_notifications_queue_capacity
cortex_prometheus_notifications_alertmanagers_discovered
- [ENHANCEMENT] The behavior of the
/ready
was changed for the query frontend to indicate when it was ready to accept queries. This is intended for use by a read path load balancer that would want to wait for the frontend to have attached queriers before including it in the backend. #2733 - [ENHANCEMENT] Experimental Delete Series: Add support for deletion of chunks for remaining stores. #2801
- [ENHANCEMENT] Add
-modules
command line flag to list possible values for-target
. Also, log warning if given target is internal component. #2752 - [ENHANCEMENT] Added
-ingester.flush-on-shutdown-with-wal-enabled
option to enable chunks flushing even when WAL is enabled. #2780 - [ENHANCEMENT] Query-tee: Support for custom API prefix by using
-server.path-prefix
option. #2814 - [ENHANCEMENT] Query-tee: Forwar...
Cortex 1.3.0-rc.2
This is the third release candidate for Cortex 1.3.0, including a bug fix:
- [BUGFIX] Querier: query /series from ingesters regardless the
-querier.query-ingesters-within
setting. #3035
Cortex 1.3.0-rc.1
This is the second release candidate for Cortex 1.3.0
, including a bug fix and an improvement:
Cortex 1.3.0-rc.0
This Cortex release features 125 contributions from 37 different authors. It's yet another great milestone we have reached thanks to the amazing support from our community ❤️ Thanks!
Highlights:
- The blocks storage is getting closer to production readiness. In this release we've done several fixes and improvements. In particular, you should be aware of:
- Some CLI flags and YAML config options have been renamed
- The store-gateway service is now mandatory when running the blocks storage
- Introduced support for a live cluster migration from chunks to blocks (and rollback)
- Introduced support to flush blocks on-demand from ingesters
- The ruler and alertmanager got several improvements, including but not limited to:
- The ruler now runs in the single binary when Cortex gets started with
-target=all
- Introduced new config options to fine-tune the ruler
- Introduced support to load locally stored rules (eg. loaded via Kubernetes config map)
- Multiple alertmanager URLs can now be specified in the ruler; each URL is treated as a separate alertmanager group
- Alertmanager configuration can be persisted to object storage via API
- The ruler now runs in the single binary when Cortex gets started with
- Other changes worth to note:
- Added optional
snappy
compression support to internal gRPC connections - Starting from this release we're going to publish
.rpm
and.deb
packages too
- Added optional
Please refer to the full changelog for full list of changes and improvements.
Changelog
- [CHANGE] Replace the metric
cortex_alertmanager_configs
withcortex_alertmanager_config_invalid
exposed by Alertmanager. #2960 - [CHANGE] Experimental Delete Series: Change target flag for purger from
data-purger
topurger
. #2777 - [CHANGE] Experimental blocks storage: The max concurrent queries against the long-term storage, configured via
-experimental.blocks-storage.bucket-store.max-concurrent
, is now a limit shared across all tenants and not a per-tenant limit anymore. The default value has changed from20
to100
and the following new metrics have been added: #2797cortex_bucket_stores_gate_queries_concurrent_max
cortex_bucket_stores_gate_queries_in_flight
cortex_bucket_stores_gate_duration_seconds
- [CHANGE] Metric
cortex_ingester_flush_reasons
has been renamed tocortex_ingester_flushing_enqueued_series_total
, and new metriccortex_ingester_flushing_dequeued_series_total
withoutcome
label (superset of reason) has been added. #2802, #2818 - [CHANGE] Experimental Delete Series: Metric
cortex_purger_oldest_pending_delete_request_age_seconds
would track age of delete requests since they are over their cancellation period instead of their creation time. #2806 - [CHANGE] Experimental blocks storage: the store-gateway service is required in a Cortex cluster running with the experimental blocks storage. Removed the
-experimental.tsdb.store-gateway-enabled
CLI flag andstore_gateway_enabled
YAML config option. The store-gateway is now always enabled when the storage engine isblocks
. #2822 - [CHANGE] Experimental blocks storage: removed support for
-experimental.blocks-storage.bucket-store.max-sample-count
flag because the implementation was flawed. To limit the number of samples/chunks processed by a single query you can set-store.query-chunk-limit
, which is now supported by the blocks storage too. #2852 - [CHANGE] Ingester: Chunks flushed via /flush stay in memory until retention period is reached. This affects
cortex_ingester_memory_chunks
metric. #2778 - [CHANGE] Querier: the error message returned when the query time range exceeds
-store.max-query-length
has changed frominvalid query, length > limit (X > Y)
tothe query time range exceeds the limit (query length: X, limit: Y)
. #2826 - [CHANGE] Add
component
label to metrics exposed by chunk, delete and index store clients. #2774 - [CHANGE] Querier: when
-querier.query-ingesters-within
is configured, the time range of the query sent to ingesters is now manipulated to ensure the query start time is not older than 'now - query-ingesters-within'. #2904 - [CHANGE] KV: The
role
label which was a label ofmulti
KV store client only has been added to metrics of every KV store client. If KV store client is notmulti
, then the value ofrole
label isprimary
. #2837 - [CHANGE] Added the
engine
label to the metrics exposed by the Prometheus query engine, to distinguish betweenruler
andquerier
metrics. #2854 - [CHANGE] Added ruler to the single binary when started with
-target=all
(default). #2854 - [CHANGE] Experimental blocks storage: compact head when opening TSDB. This should only affect ingester startup after it was unable to compact head in previous run. #2870
- [CHANGE] Metric
cortex_overrides_last_reload_successful
has been renamed tocortex_runtime_config_last_reload_successful
. #2874 - [CHANGE] HipChat support has been removed from the alertmanager (because removed from the Prometheus upstream too). #2902
- [CHANGE] Add constant label
name
to metriccortex_cache_request_duration_seconds
. #2903 - [CHANGE] Add
user
label to metriccortex_query_frontend_queue_length
. #2939 - [CHANGE] Experimental blocks storage: cleaned up the config and renamed "TSDB" to "blocks storage". #2937
- The storage engine setting value has been changed from
tsdb
toblocks
; this affects-store.engine
CLI flag and its respective YAML option. - The root level YAML config has changed from
tsdb
toblocks_storage
- The prefix of all CLI flags has changed from
-experimental.tsdb.
to-experimental.blocks-storage.
- The following settings have been grouped under
tsdb
property in the YAML config and their CLI flags changed:-experimental.tsdb.dir
changed to-experimental.blocks-storage.tsdb.dir
-experimental.tsdb.block-ranges-period
changed to-experimental.blocks-storage.tsdb.block-ranges-period
-experimental.tsdb.retention-period
changed to-experimental.blocks-storage.tsdb.retention-period
-experimental.tsdb.ship-interval
changed to-experimental.blocks-storage.tsdb.ship-interval
-experimental.tsdb.ship-concurrency
changed to-experimental.blocks-storage.tsdb.ship-concurrency
-experimental.tsdb.max-tsdb-opening-concurrency-on-startup
changed to-experimental.blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
-experimental.tsdb.head-compaction-interval
changed to-experimental.blocks-storage.tsdb.head-compaction-interval
-experimental.tsdb.head-compaction-concurrency
changed to-experimental.blocks-storage.tsdb.head-compaction-concurrency
-experimental.tsdb.head-compaction-idle-timeout
changed to-experimental.blocks-storage.tsdb.head-compaction-idle-timeout
-experimental.tsdb.stripe-size
changed to-experimental.blocks-storage.tsdb.stripe-size
-experimental.tsdb.wal-compression-enabled
changed to-experimental.blocks-storage.tsdb.wal-compression-enabled
-experimental.tsdb.flush-blocks-on-shutdown
changed to-experimental.blocks-storage.tsdb.flush-blocks-on-shutdown
- The storage engine setting value has been changed from
- [CHANGE] Flags
-bigtable.grpc-use-gzip-compression
,-ingester.client.grpc-use-gzip-compression
,-querier.frontend-client.grpc-use-gzip-compression
are now deprecated. #2940 - [CHANGE] Limit errors reported by ingester during query-time now return HTTP status code 422. #2941
- [FEATURE] Introduced
ruler.for-outage-tolerance
, Max time to tolerate outage for restoring "for" state of alert. #2783 - [FEATURE] Introduced
ruler.for-grace-period
, Minimum duration between alert and restored "for" state. This is maintained only for alerts with configured "for" time greater than grace period. #2783 - [FEATURE] Introduced
ruler.resend-delay
, Minimum amount of time to wait before resending an alert to Alertmanager. #2783 - [FEATURE] Ruler: added
local
filesystem support to store rules (read-only). #2854 - [ENHANCEMENT] Upgraded Docker base images to
alpine:3.12
. #2862 - [ENHANCEMENT] Experimental: Querier can now optionally query secondary store. This is specified by using
-querier.second-store-engine
option, with valueschunks
orblocks
. Standard configuration options for this store are used. Additionally, this querying can be configured to happen only for queries that need data older than-querier.use-second-store-before-time
. Default value of zero will always query secondary store. #2747 - [ENHANCEMENT] Query-tee: increased the
cortex_querytee_request_duration_seconds
metric buckets granularity. #2799 - [ENHANCEMENT] Query-tee: fail to start if the configured
-backend.preferred
is unknown. #2799 - [ENHANCEMENT] Ruler: Added the following metrics: #2786
cortex_prometheus_notifications_latency_seconds
cortex_prometheus_notifications_errors_total
cortex_prometheus_notifications_sent_total
cortex_prometheus_notifications_dropped_total
cortex_prometheus_notifications_queue_length
cortex_prometheus_notifications_queue_capacity
cortex_prometheus_notifications_alertmanagers_discovered
- [ENHANCEMENT] The behavior of the
/ready
was changed for the query frontend to indicate when it was ready to accept queries. This is intended for use by a read path load balancer that would want to wait for the frontend to have attached queriers before including it in the backend. #2733 - [ENHANCEMENT] Experimental Delete Series: Add support for deletion of chunks for remaining stores. #2801
- [ENHANCEMENT] Add
-modules
command line flag to list possible values for-target
. Also, log warning if given target is internal component. #2752 - [ENHANCEMENT] Added
-ingester.flush-on-shutdown-with-wal-enabled
option to enable chunks flushing even when WAL is enabled. #2780 - [ENHANCEMENT] Query-tee: Support for custom API prefix by using
-server.path-prefix
option. #2814 - [ENHANCEMENT] Query-tee: Forward `X-...