Search a new era David Pilato @dadoonet @pilato.fr

Elasticsearch You Know, for Search

These are not the droids you are looking for.

GET /_analyze { “char_filter”: [ “html_strip” ], “tokenizer”: “standard”, “filter”: [ “lowercase”, “stop”, “snowball” ], “text”: “These are not the droids you are looking for.” }

“char_filter”: “html_strip” These are not the droids you are looking for. These are not the droids you are looking for.

“tokenizer”: “standard” These are not the droids you are looking for. These are not the droids you are looking for

“filter”: “lowercase” These are not the droids you are looking for these are not the droids you are looking for

“filter”: “stop” These are not the droids you are looking for these are not the droids you are looking for droids you looking

“filter”: “snowball” These are not the droids you are looking for these are not the droids you are looking for droids you droid you looking look

These are not the droids you are looking for. { “tokens”: [{ “token”: “droid”, “start_offset”: 27, “end_offset”: 33, “type”: “<ALPHANUM>”, “position”: 4 },{ “token”: “you”, “start_offset”: 34, “end_offset”: 37, “type”: “<ALPHANUM>”, “position”: 5 }, { “token”: “look”, “start_offset”: 42, “end_offset”: 49, “type”: “<ALPHANUM>”, “position”: 7 }]}

Semantic search ≠ Literal matches

Elasticsearch You Know, for Search

Elasticsearch You Know, for Vector Search

What is a Vector ?

Example: 1-dimensional vector Character Vector [ 1 ] ] Realistic

[ Embeddings represent your data Cartoon 1

represent different data aspects Human Character Vector [ 1, 1 Realistic Cartoon ] ] Machine

[ Multiple dimensions 1, 0

is grouped together Human Character Vector [ 1.0, 1.0 1.0, 0.0 Realistic Cartoon [ 1.0, 0.8 1.0, 1.0 [ 1.0, 1.0 ] ] ] ] ]

Machine

[ [ Similar data

Vector search ranks objects by similarity (~relevance) to the query Human Rank Query 1 Realistic Cartoon 2 3 4 5 Machine Result

How do you index vectors ?

Architecture of Vector Search

Choice of Embedding Model Start with Off-the Shelf Models Extend to Higher Relevance ●Text data: Hugging Face (like Microsoft’s E5 ●Apply hybrid scoring ) ●Images: OpenAI’s CLIP ●Bring Your Own Model: requires expertise + labeled data

Problem training vs actual use-case

dense_vector field type PUT ecommerce { “mappings”: { “properties”: { “description”: { “type”: “text” } “desc_embedding”: { “type”: “dense_vector” } } } }

Data Ingestion and Embedding Generation POST /ecommerce/_doc { “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “fabric”:”cotton” “desc_embedding”:[0.452,0.3242,…], } “desc_embedding”:[0.452,0.3242,…] } “img_embedding”:[0.012,0.0,…] } Source data POST /ecommerce/_doc

co m m er ci With Elastic ML al { } Source data { } “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, POST /ecommerce/_doc “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “desc_embedding”:[0.452,0.3242,…]

How do you search vectors ?

Architecture of Vector Search

knn query GET /ecommerce/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “query_vector”: [0.123, 0.244,…] } }], “filter”: { “term”: { “department”: “women” } } } } }, “size”: 10

knn query (with Elastic ML co m m er ci al GET /ecommerce/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “query_vector_builder”: { “text_embedding”: { “model_text”: “summer clothes”, “model_id”: <text-embedding-model> } } } }], “filter”: { “term”: { “department”: “women” } } } }, “size”: 10 } ) Transformer model

ne w semantic_text field type PUT ecommerce { “mappings”: { “properties”: { “description”: { “type”: “text”, “copy_to”: [ “desc_embedding” ] } “desc_embedding”: { “type”: “semantic_text” } } } } POST ecommerce/_doc { “description”: “Our best-selling…” } GET ecommerce/_search { “query”: { “semantic”: { “field”: “desc_embedding” “query” : “I’m looking for a red dress for a DJ party” }}} fro m 8. 15

Architecture of Vector Search

But how does it really work?

Similarity Human q cos(θ) = d1 d2 Realistic θ q⃗ × d ⃗ | q⃗ | × | d |⃗ _score = 1 + cos(θ) 2

Similarity: cosine (cosine) θ Similar vectors θ close to 0 cos(θ) close to 1 1+1 _score = =1 2 θ Orthogonal vectors θ close to 90° cos(θ) close to 0 1+0 _score = = 0.5 2 θ Opposite vectors θ close to 180° cos(θ) close to -1 1−1 _score = =0 2

Similarity: Dot Product (dot_product or max_inner_product) q⃗ × d ⃗ = | q⃗ | × cos(θ) × | d |⃗ q d θ | q⃗ | × co s (θ ) 1 + dot_ product(q, d) scorefloat = 2 0.5 + dot product(q, d) _scorebyte = 32768 × dims

Similarity: Euclidean distance (l2_norm) y 2 n i (x ∑ 1 i= − y i) q l2_normq,d = y1 d x1 y2 x2 n ∑ i=1 (xi − yi) 1 _score = 1 + (l2_normq,d )2 x 2

Brute Force

Hierarchical Navigable Small Worlds (HNSW One popular approach HNSW: a layered approach that simplifies access to the nearest neighbor Tiered: from coarse to fine approximation over a few steps Balance: Bartering a little accuracy for a lot of scalability ) Speed: Excellent query latency on large scale indices

Scalar Quantization Ela s 8.14 ticsea def rch aul t float32 Recall: High Precision: High Rescore: Likely Not Needed + Full RAM Required int8 int4 bit Recall: Good Precision: Good Oversampling: Moderate Recall: Low Precision: Low Oversampling: Needed Recall: Bad Precision: Bad Oversampling: Needed Rescore: Reasonable Rescore: may be slower Rescore: Expensive and Limiting 4X RAM Savings 8X RAM Savings 32X RAM Savings

BBQ aka Better Binary Quantization float32 int8 int4 bit : BBQ 32X RAM savings. Faster & more accurate than Product Quantization BBQ*

Memory required 100M vectors? Only 12GB!?! One single node.

Benchmarketing

https://djdadoo.pilato.fr/

https://github.com/dadoonet/music-search/

Elasticsearch You Know, for Hybrid Search

Hybrid scoring Term-based score Linear Combination manual boosting Vector similarity score Combine

Manual boosting GET ecommerce/_search { “query” : { “bool” : { “must” : [{ “match”: { “description”: { “query”: “summer clothes” } } },{ “semantic”: { “field”: “desc_embbeding”, “query”: “summer clothes”, “boost”: 100.0 } }] } } }

PUT starwars { “mappings”: { “properties”: { “text.tokens”: { “type”: “sparse_vector” } } } “These are not the droids you are looking for.”, } “Obi-Wan never told you what happened to your father.” GET starwars/_search { “query”:{ “sparse_vector”: { “field”: “text.tokens”, “query_vector”: { “lucas”: 0.50047517, “ship”: 0.29860738, “dragon”: 0.5300422, “quest”: 0.5974301, … } } } }

ELSER Elastic Learned Sparse EncodER sparse_vector Not BM25 or (dense) vector Sparse vector like BM25 Stored as inverted index Co m m er ci al

Hybrid ranking ranking 2 ranking 3 Term-based score Dense vector score Sparse vector score Reciprocal Rank Fusion (RRF blend multiple ranking methods Combine ) ranking 1

Reciprocal Rank Fusion (RRF D set of docs R set of rankings as permutation on 1..|D| k - typically set to 60 by default Dense Vector r(d) k+r(d) A 1 1 B 0.7 C D Score r(d) k+r(d) 61 C 1,341 1 61 2 62 A 739 2 62 0.5 3 63 F 732 3 63 0.2 4 64 G 192 4 64 0.01

=

= +

E Doc 5 65 H 183 5 65 ) Score

Doc BM25 Doc RRF Score A 1/61 1/62 0,0325 C 1/63 1/61 0,0323 B 1/62 0,0161 F 1/63 0,0159 D 1/64 0,0156

GET index/_search { “retriever”: { “rrf”: { “retrievers”: [{ “standard” { “query”: { “match”: {…} } } },{ “standard” { “query”: { “sparse_vector”: {…} } } },{ “knn”: { … } } ] } } } Hybrid Ranking BM25f + Sparse Vector + Dense Vector co m m er ci al

ChatGPT Elastic and LLM

Gen AI Search engines

LLM opportunities and limits your question one answer your question GAI / LLM : public internet data

Retrieval Augmented Generation your question the right answer your question + context window GAI / LLM public internet data your business data documents images audio

Demo Elastic Playground

Elasticsearch You Know, for Semantic Search

Search a new era David Pilato @dadoonet @pilato.fr