🎹🎻🎸 Searching for similar music tracks 🎼🎶

A presentation at Elastic Sofia Meetup in May 2024 in Sofia, Bulgaria by David Pilato

Slide 1

Slide 1

🎹 Searching for similar musics 🎻🎸 David Pilato | @dadoonet

Slide 2

Slide 2

Elasticsearch You Know, for Search

Slide 3

Slide 3

Slide 4

Slide 4

Slide 5

Slide 5

These are not the droids you are looking for.

Slide 6

Slide 6

GET /_analyze { “char_filter”: [ “html_strip” ], “tokenizer”: “standard”, “filter”: [ “lowercase”, “stop”, “snowball” ], “text”: “These are <em>not</em> the droids you are looking for.” }

Slide 7

Slide 7

These are <em>not</em> the droids you are looking for. { “tokens”: [{ “token”: “droid”, “start_offset”: 27, “end_offset”: 33, “type”: “<ALPHANUM>”, “position”: 4 },{ “token”: “you”, “start_offset”: 34, “end_offset”: 37, “type”: “<ALPHANUM>”, “position”: 5 }, { “token”: “look”, “start_offset”: 42, “end_offset”: 49, “type”: “<ALPHANUM>”, “position”: 7 }]}

Slide 8

Slide 8

Elasticsearch You Know, for Search

Slide 9

Slide 9

Elasticsearch You Know, for Vector Search

Slide 10

Slide 10

Example: 1-dimensional vector Character Vector [ 1 ] ] Realistic

[ Embeddings represent your data Cartoon 1

Slide 11

Slide 11

represent different data aspects Human Character Vector [ 1, 1 Realistic Cartoon ] ] Machine

[ Multiple dimensions 1, 0

Slide 12

Slide 12

is grouped together Human Character Vector [ 1.0, 1.0 Realistic Cartoon 1.0, 0.0 [ 1.0, 0.8 ] ] ]

Machine

[ Similar data

Slide 13

Slide 13

Vector search ranks objects by similarity (~relevance) to the query Human Rank Query 1 Realistic Cartoon 2 3 4 5 Machine Result

Slide 14

Slide 14

Data Ingestion and Embedding Generation POST /_doc { “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “fabric”:”cotton” } “desc_embedding”:[0.452,0.3242,…], “desc_embedding”:[0.452,0.3242,…] } “img_embedding”:[0.012,0.0,…] } Source data POST /_doc

Slide 15

Slide 15

Vector Query GET product-catalog/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “num_candidates”: 50, “query_vector”: [0.123, 0.244,…] } }], “filter”: { “term”: { “department”: “women” } } } } }, “size”: 10

Slide 16

Slide 16

But how does it really work?

Slide 17

Slide 17

Similarity: cosine (cosine) Human q cos(θ) = d1 d2 Realistic θ q⃗ × d ⃗ | q⃗ | × | d |⃗ _score = 1 + cos(θ) 2

Slide 18

Slide 18

Similarity: cosine (cosine) 1+1 _score = =1 2 1+0 _score = = 0.5 2 1−1 _score = =0 2

Slide 19

Slide 19

https://djdadoo.pilato.fr/

Slide 20

Slide 20

https://github.com/dadoonet/music-search/

Slide 21

Slide 21

🎹 Searching for similar musics 🎻🎸 David Pilato | @dadoonet