S03E09: ILM and data management

A presentation at Elastic Daily B(y|i)te - S03 in October 2021 in by David Pilato

Slide 1

Slide 1

Daily Elastic Observability B(y|i)te Index Lifecycle Management & data streams David Pilato (@dadoonet)

Slide 2

Slide 2

Before data streams • One single index for a use case (for example logs) ‒ mixing nginx logs with smtp logs (different fields) ‒ with thousand of fields (10k limit) ‒ most fields are empty (bad compression because of sparsity) ‒ very big index mapping

Slide 3

Slide 3

Data streams • One datastream per dataset • Naming convention <type>-<dataset>-<namespace> ‒ logs-nginx-default, logs-smtp-default ‒ metrics-linux.iostat-default, metrics-system.diskio-default • Data streams must have a matching index template • Documents must contain the @timestamp field (configurable) • for append-only time series data

Slide 4

Slide 4

Data streams • backing index .ds-<data-stream>-<yyyy.MM.dd>-<generation> • rollover and alias ‒ many search indices ‒ one single write index

Slide 5

Slide 5

Before ILM • Time based indices like filebeat-2021.10.28 ‒ some are very small ‒ what happens for big events (like christmas)? ‒ what happens when you have unpredictable traffic?

Slide 6

Slide 6

ILM • • an ILM policy: ‒ Might use rollover ‒ Might be moved to warm, cold, frozen phases ‒ Might be removed ILM can be applied to data streams

Slide 7

Slide 7

In action