🎹🎻🎸 Et si nous cherchions des morceaux de musique 🎼🎶 ?

A presentation at ENI Tech Fest in March 2025 in Nantes, France by David Pilato

Slide 1

Slide 1

🎹 🎤 Et si nous cherchions 🎻🎸 des morceaux de musique ? Présenté par GROUPE ENI David Pilato @pilato.fr @dadoonet

Slide 2

Slide 2

Découvrez Elasticsearch en 2H30 ! https://www.editions-eni.fr/video/elasticsearch-indexez-vos-donnees-pour-une-recherche-efficace-vtelastic 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 3

Slide 3

Elasticsearch You Know, for Search GROUPE ENI

Slide 4

Slide 4

GROUPE ENI

Slide 5

Slide 5

1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 6

Slide 6

These are not the droids you are looking for. 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 7

Slide 7

Text analysis GET /_analyze { “char_filter”: [ “html_strip” ], “tokenizer”: “standard”, “filter”: [ “lowercase”, “stop”, “snowball” ], “text”: “These are <em>not</em> the droids you are looking for.” } 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 8

Slide 8

“char_filter”: “html_strip” These are <em>not</em> the droids you are looking for. These are not the droids you are looking for. 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 9

Slide 9

“tokenizer”: “standard” These are not the droids you are looking for. GROUPE ENI These are not the droids you are looking for 1re édition 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 10

Slide 10

“filter”: “lowercase” These are not the droids you are looking for these are not the droids you are looking for 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 11

Slide 11

“filter”: “stop” These are not the droids you are looking for these are not the droids you are looking for droids you looking 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 12

Slide 12

“filter”: “snowball” These are not the droids you are looking for these are not the droids you are looking for droids you droid you looking look 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 13

Slide 13

These are <em>not</em> the droids you are looking for. { “tokens”: [{ “token”: “droid”, “start_offset”: 27, “end_offset”: 33, “type”: “<ALPHANUM>”, “position”: 4 },{ “token”: “you”, “start_offset”: 34, “end_offset”: 37, “type”: “<ALPHANUM>”, “position”: 5 }, { “token”: “look”, “start_offset”: 42, “end_offset”: 49, “type”: “<ALPHANUM>”, “position”: 7 GROUPE ENI }]} 1re édition 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 14

Slide 14

Semantic search ≠ Literal matches GROUPE ENI Semantic search ≠

Slide 15

Slide 15

Elasticsearch You Know, for Search GROUPE ENI

Slide 16

Slide 16

Elasticsearch You Know, for Vector Search GROUPE ENI

Slide 17

Slide 17

1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 18

Slide 18

Example: 1-dimensional vector Character Vector [ 1 Realistic ] ] GROUPE ENI

[ Embeddings represent your data Cartoon 1

Slide 19

Slide 19

represent different data aspects Human Character Vector [ 1, 1 Realistic ] ] GROUPE ENI

[ Multiple dimensions Cartoon Machine 1, 0

Slide 20

Slide 20

is grouped together Human Character Vector [ 1.0, 1.0 1.0, 0.0 Realistic Cartoon [ 1.0, 0.8 1.0, 1.0 [ 1.0, 1.0 ] ] ] ] ]

GROUPE ENI

[ [ Similar data Machine

Slide 21

Slide 21

Vector search ranks objects by similarity (~relevance) to the query Rank Human Query 1 2 Realistic Cartoon 3 4 5 GROUPE ENI Machine Result

Slide 22

Slide 22

How do you index vectors ? GROUPE ENI

Slide 23

Slide 23

Architecture of Vector Search 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 24

Slide 24

dense_vector field type PUT ecommerce { “mappings”: { “properties”: { “description”: { “type”: “text” } “desc_embedding”: { “type”: “dense_vector” } } } } 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 25

Slide 25

Data Ingestion and Embedding Generation POST /ecommerce/_doc { “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “fabric”:”cotton” “desc_embedding”:[0.452,0.3242,…], } “desc_embedding”:[0.452,0.3242,…] } “img_embedding”:[0.012,0.0,…] } Source data POST /ecommerce/_doc 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 26

Slide 26

How do you search vectors ? GROUPE ENI

Slide 27

Slide 27

Architecture of Vector Search 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 28

Slide 28

knn query GET /ecommerce/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “query_vector”: [0.123, 0.244,…] } }], “filter”: { “term”: { “department”: “women” } } } }, “size”: 10 } 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 29

Slide 29

Architecture of Vector Search 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 30

Slide 30

But how does it really work? GROUPE ENI

Slide 31

Slide 31

Similarity Human q cos(θ) = d1 d2 Realistic GROUPE ENI θ q⃗ × d ⃗ | q⃗ | × | d |⃗ _score = 1 + cos(θ) 2

Slide 32

Slide 32

Similarity: cosine (cosine) θ Similar vectors θ close to 0 cos(θ) close to 1 1+1 _score = =1 2 GROUPE ENI θ Orthogonal vectors θ close to 90° cos(θ) close to 0 1+0 _score = = 0.5 2 θ Opposite vectors θ close to 180° cos(θ) close to -1 1−1 _score = =0 2

Slide 33

Slide 33

https://djdadoo.pilato.fr/ GROUPE ENI

Slide 34

Slide 34

https://github.com/dadoonet/music-search/ 1re édition GROUPE ENI 25.03.2025

LA CARRIÈRE, SAINT HERBLAIN

Slide 35

Slide 35

🎹 🎤 Et si nous cherchions 🎻🎸 des morceaux de musique ? Présenté par GROUPE ENI David Pilato @pilato.fr @dadoonet es d sli & m de o