🎹 🎤 Et si nous cherchions 🎻🎸
des morceaux de musique ? Présenté par
GROUPE ENI
David Pilato @pilato.fr
@dadoonet
Slide 2
Découvrez Elasticsearch en 2H30 !
https://www.editions-eni.fr/video/elasticsearch-indexez-vos-donnees-pour-une-recherche-efficace-vtelastic 1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 3
Elasticsearch You Know, for Search GROUPE ENI
Slide 4
GROUPE ENI
Slide 5
1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 6
These are not the droids you are looking for. 1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 7
Text analysis
GET /_analyze { “char_filter”: [ “html_strip” ], “tokenizer”: “standard”, “filter”: [ “lowercase”, “stop”, “snowball” ], “text”: “These are <em>not</em> the droids you are looking for.” } 1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 8
“char_filter”: “html_strip” These are <em>not</em> the droids you are looking for.
These are not the droids you are looking for.
1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 9
“tokenizer”: “standard” These are not the droids you are looking for.
GROUPE ENI
These are not the droids you are looking for
1re édition
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 10
“filter”: “lowercase” These are not the droids you are looking for
these are not the droids you are looking for 1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 11
“filter”: “stop” These are not the droids you are looking for
these are not the droids you are looking for
droids you looking 1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 12
“filter”: “snowball” These are not the droids you are looking for
these are not the droids you are looking for
droids you
droid you
looking
look 1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 13
These are <em>not</em> the droids you are looking for. { “tokens”: [{ “token”: “droid”, “start_offset”: 27, “end_offset”: 33, “type”: “<ALPHANUM>”, “position”: 4 },{ “token”: “you”, “start_offset”: 34, “end_offset”: 37, “type”: “<ALPHANUM>”, “position”: 5 }, { “token”: “look”, “start_offset”: 42, “end_offset”: 49, “type”: “<ALPHANUM>”, “position”: 7 GROUPE ENI
}]}
1re édition
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 14
Semantic search ≠ Literal matches
GROUPE ENI
Semantic search ≠
Slide 15
Elasticsearch You Know, for Search GROUPE ENI
Slide 16
Elasticsearch You Know, for Vector Search GROUPE ENI
Slide 17
1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 18
Example: 1-dimensional vector
Character
Vector
[ 1 Realistic
]
]
GROUPE ENI
[
Embeddings represent your data
Cartoon
1
Slide 19
represent different data aspects Human
Character
Vector
[ 1, 1 Realistic
]
]
GROUPE ENI
[
Multiple dimensions
Cartoon
Machine
1, 0
Slide 20
is grouped together Human
Character
Vector
[ 1.0, 1.0 1.0, 0.0
Realistic
Cartoon
[ 1.0, 0.8 1.0, 1.0 [ 1.0, 1.0
]
]
]
]
]
GROUPE ENI
[
[
Similar data
Machine
Slide 21
Vector search ranks objects
by similarity (~relevance) to the query Rank Human
Query 1 2
Realistic
Cartoon
3 4 5
GROUPE ENI
Machine
Result
Slide 22
How do you index vectors ? GROUPE ENI
Slide 23
Architecture of Vector Search
1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 24
dense_vector field type PUT ecommerce { “mappings”: { “properties”: { “description”: { “type”: “text” } “desc_embedding”: { “type”: “dense_vector” } } } } 1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 25
Data Ingestion and Embedding Generation
POST /ecommerce/_doc
{
“_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “fabric”:”cotton” “desc_embedding”:[0.452,0.3242,…], } “desc_embedding”:[0.452,0.3242,…] } “img_embedding”:[0.012,0.0,…] }
Source data
POST /ecommerce/_doc 1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 26
How do you search vectors ? GROUPE ENI
Slide 27
Architecture of Vector Search
1re édition
GROUPE ENI
25.03.2025
Architecture of Vector Search
1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 30
But how does it really work? GROUPE ENI
Slide 31
Similarity
Human
q
cos(θ) =
d1 d2 Realistic
GROUPE ENI
θ
q⃗ × d ⃗
| q⃗ | × | d |⃗
_score =
1 + cos(θ) 2
Slide 32
Similarity: cosine (cosine) θ
Similar vectors θ close to 0 cos(θ) close to 1
1+1 _score = =1 2 GROUPE ENI
θ
Orthogonal vectors θ close to 90° cos(θ) close to 0
1+0 _score = = 0.5 2
θ
Opposite vectors θ close to 180° cos(θ) close to -1
1−1 _score = =0 2
Slide 33
https://djdadoo.pilato.fr/
GROUPE ENI
Slide 34
https://github.com/dadoonet/music-search/
1re édition
GROUPE ENI
25.03.2025
LA CARRIÈRE, SAINT HERBLAIN
Slide 35
🎹 🎤 Et si nous cherchions 🎻🎸
des morceaux de musique ? Présenté par
GROUPE ENI
David Pilato @pilato.fr
@dadoonet
es d sli
&
m de
o