đŸŽč đŸŽ€ Et si nous cherchions đŸŽ»đŸŽž des morceaux de musique ? PrĂ©sentĂ© par GROUPE ENI David Pilato @pilato.fr @dadoonet

Découvrez Elasticsearch en 2H30 ! https://www.editions-eni.fr/video/elasticsearch-indexez-vos-donnees-pour-une-recherche-efficace-vtelastic 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Elasticsearch You Know, for Search GROUPE ENI

GROUPE ENI

1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

These are not the droids you are looking for. 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Text analysis GET /_analyze { “char_filter”: [ “html_strip” ], “tokenizer”: “standard”, “filter”: [ “lowercase”, “stop”, “snowball” ], “text”: “These are <em>not</em> the droids you are looking for.” } 1re Ă©dition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

“char_filter”: “html_strip” These are <em>not</em> the droids you are looking for. These are not the droids you are looking for. 1re Ă©dition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

“tokenizer”: “standard” These are not the droids you are looking for. GROUPE ENI These are not the droids you are looking for 1re Ă©dition 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

“filter”: “lowercase” These are not the droids you are looking for these are not the droids you are looking for 1re Ă©dition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

“filter”: “stop” These are not the droids you are looking for these are not the droids you are looking for droids you looking 1re Ă©dition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

“filter”: “snowball” These are not the droids you are looking for these are not the droids you are looking for droids you droid you looking look 1re Ă©dition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

These are <em>not</em> the droids you are looking for. { “tokens”: [{ “token”: “droid”, “start_offset”: 27, “end_offset”: 33, “type”: “<ALPHANUM>”, “position”: 4 },{ “token”: “you”, “start_offset”: 34, “end_offset”: 37, “type”: “<ALPHANUM>”, “position”: 5 }, { “token”: “look”, “start_offset”: 42, “end_offset”: 49, “type”: “<ALPHANUM>”, “position”: 7 GROUPE ENI }]} 1re Ă©dition 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Semantic search ≠ Literal matches GROUPE ENI Semantic search ≠

Elasticsearch You Know, for Search GROUPE ENI

Elasticsearch You Know, for Vector Search GROUPE ENI

1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Example: 1-dimensional vector Character Vector [ 1 Realistic ] ] GROUPE ENI

[ Embeddings represent your data Cartoon 1

represent different data aspects Human Character Vector [ 1, 1 Realistic ] ] GROUPE ENI

[ Multiple dimensions Cartoon Machine 1, 0

is grouped together Human Character Vector [ 1.0, 1.0 1.0, 0.0 Realistic Cartoon [ 1.0, 0.8 1.0, 1.0 [ 1.0, 1.0 ] ] ] ] ]

GROUPE ENI

[ [ Similar data Machine

Vector search ranks objects by similarity (~relevance) to the query Rank Human Query 1 2 Realistic Cartoon 3 4 5 GROUPE ENI Machine Result

How do you index vectors ? GROUPE ENI

Architecture of Vector Search 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

dense_vector field type PUT ecommerce { “mappings”: { “properties”: { “description”: { “type”: “text” } “desc_embedding”: { “type”: “dense_vector” } } } } 1re Ă©dition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Data Ingestion and Embedding Generation POST /ecommerce/_doc { “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling
”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “fabric”:”cotton” “desc_embedding”:[0.452,0.3242,
], } “desc_embedding”:[0.452,0.3242,
] } “img_embedding”:[0.012,0.0,
] } Source data POST /ecommerce/_doc 1re Ă©dition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

How do you search vectors ? GROUPE ENI

Architecture of Vector Search 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

knn query GET /ecommerce/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “query_vector”: [0.123, 0.244,
] } }], “filter”: { “term”: { “department”: “women” } } } }, “size”: 10 } 1re Ă©dition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Architecture of Vector Search 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

But how does it really work? GROUPE ENI

Similarity Human q cos(ξ) = d1 d2 Realistic GROUPE ENI ξ q⃗ × d ⃗ | q⃗ | × | d |⃗ _score = 1 + cos(ξ) 2

Similarity: cosine (cosine) ξ Similar vectors ξ close to 0 cos(ξ) close to 1 1+1 _score = =1 2 GROUPE ENI ξ Orthogonal vectors ξ close to 90° cos(ξ) close to 0 1+0 _score = = 0.5 2 ξ Opposite vectors ξ close to 180° cos(ξ) close to -1 1−1 _score = =0 2

https://djdadoo.pilato.fr/ GROUPE ENI

https://github.com/dadoonet/music-search/ 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

đŸŽč đŸŽ€ Et si nous cherchions đŸŽ»đŸŽž des morceaux de musique ? PrĂ©sentĂ© par GROUPE ENI David Pilato @pilato.fr @dadoonet es d sli & m de o