Search a new era David Pilato @dadoonet @pilato.fr

Elasticsearch You Know, for Search

These are not the droids you are looking for.

GET /_analyze { “char_filter”: [ “html_strip” ], “tokenizer”: “standard”, “filter”: [ “lowercase”, “stop”, “snowball” ], “text”: “These are <em>not</em> the droids you are looking for.” }

“char_filter”: “html_strip” These are <em>not</em> the droids you are looking for. These are not the droids you are looking for.

“tokenizer”: “standard” These are not the droids you are looking for. These are not the droids you are looking for

“filter”: “lowercase” These are not the droids you are looking for these are not the droids you are looking for

“filter”: “stop” These are not the droids you are looking for these are not the droids you are looking for droids you looking

“filter”: “snowball” These are not the droids you are looking for these are not the droids you are looking for droids you droid you looking look

These are <em>not</em> the droids you are looking for. { “tokens”: [{ “token”: “droid”, “start_offset”: 27, “end_offset”: 33, “type”: “<ALPHANUM>”, “position”: 4 },{ “token”: “you”, “start_offset”: 34, “end_offset”: 37, “type”: “<ALPHANUM>”, “position”: 5 }, { “token”: “look”, “start_offset”: 42, “end_offset”: 49, “type”: “<ALPHANUM>”, “position”: 7 }]}

Semantic search ≠ Literal matches

Elasticsearch You Know, for Search

Elasticsearch You Know, for Vector Search

What is a Vector ?

Example: 1-dimensional vector Character Vector [ 1 ] ] Realistic

[ Embeddings represent your data Cartoon 1

represent different data aspects Human Character Vector [ 1, 1 Realistic Cartoon ] ] Machine

[ Multiple dimensions 1, 0

is grouped together Human Character Vector [ 1.0, 1.0 1.0, 0.0 Realistic Cartoon [ 1.0, 0.8 1.0, 1.0 [ 1.0, 1.0 ] ] ] ] ]

Machine

[ [ Similar data

Vector search ranks objects by similarity (~relevance) to the query Human Rank Query 1 Realistic Cartoon 2 3 4 5 Machine Result

How do you index vectors ?

Architecture of Vector Search

Choice of Embedding Model Start with Off-the Shelf Models Extend to Higher Relevance ●Text data: Hugging Face (like Microsoft’s E5 ●Apply hybrid scoring ) ●Images: OpenAI’s CLIP ●Bring Your Own Model: requires expertise + labeled data

Problem training vs actual use-case

dense_vector field type PUT ecommerce { “mappings”: { “properties”: { “description”: { “type”: “text” } “desc_embedding”: { “type”: “dense_vector” } } } }

Data Ingestion and Embedding Generation POST /ecommerce/_doc { “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “fabric”:”cotton” “desc_embedding”:[0.452,0.3242,…], } “desc_embedding”:[0.452,0.3242,…] } “img_embedding”:[0.012,0.0,…] } Source data POST /ecommerce/_doc

co m m er ci With Elastic ML al { } Source data { } “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, POST /ecommerce/_doc “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “desc_embedding”:[0.452,0.3242,…]

How do you search vectors ?

Architecture of Vector Search

knn query GET /ecommerce/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “query_vector”: [0.123, 0.244,…] } }], “filter”: { “term”: { “department”: “women” } } } } }, “size”: 10

knn query (with Elastic ML co m m er ci al GET /ecommerce/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “query_vector_builder”: { “text_embedding”: { “model_text”: “summer clothes”, “model_id”: <text-embedding-model> } } } }], “filter”: { “term”: { “department”: “women” } } } }, “size”: 10 } ) Transformer model

ne w semantic_text field type PUT ecommerce { “mappings”: { “properties”: { “description”: { “type”: “text”, “copy_to”: [ “desc_embedding” ] } “desc_embedding”: { “type”: “semantic_text” } } } } POST ecommerce/_doc { “description”: “Our best-selling…” } GET ecommerce/_search { “query”: { “semantic”: { “field”: “desc_embedding” “query” : “I’m looking for a red dress for a DJ party” }}} fro m 8. 15

Architecture of Vector Search

But how does it really work?

Similarity Human q cos(θ) = d1 d2 Realistic θ q⃗ × d ⃗ | q⃗ | × | d |⃗ _score = 1 + cos(θ) 2

Similarity: cosine (cosine) θ Similar vectors θ close to 0 cos(θ) close to 1 1+1 _score = =1 2 θ Orthogonal vectors θ close to 90° cos(θ) close to 0 1+0 _score = = 0.5 2 θ Opposite vectors θ close to 180° cos(θ) close to -1 1−1 _score = =0 2

Similarity: Dot Product (dot_product or max_inner_product) q⃗ × d ⃗ = | q⃗ | × cos(θ) × | d |⃗ q d θ | q⃗ | × co s (θ ) 1 + dot_ product(q, d) scorefloat = 2 0.5 + dot product(q, d) _scorebyte = 32768 × dims

Similarity: Euclidean distance (l2_norm) y 2 n i (x ∑ 1 i= − y i) q l2_normq,d = y1 d x1 y2 x2 n ∑ i=1 (xi − yi) 1 _score = 1 + (l2_normq,d )2 x 2

Brute Force

Hierarchical Navigable Small Worlds (HNSW One popular approach HNSW: a layered approach that simplifies access to the nearest neighbor Tiered: from coarse to fine approximation over a few steps Balance: Bartering a little accuracy for a lot of scalability ) Speed: Excellent query latency on large scale indices

Scalar Quantization Ela s 8.14 ticsea def rch aul t float32 Recall: High Precision: High Rescore: Likely Not Needed + Full RAM Required int8 int4 bit Recall: Good Precision: Good Oversampling: Moderate Recall: Low Precision: Low Oversampling: Needed Recall: Bad Precision: Bad Oversampling: Needed Rescore: Reasonable Rescore: may be slower Rescore: Expensive and Limiting 4X RAM Savings 8X RAM Savings 32X RAM Savings

BBQ aka Better Binary Quantization float32 int8 int4 bit : BBQ 32X RAM savings. Faster & more accurate than Product Quantization BBQ*

Memory required 100M vectors? Only 12GB!?! One single node.

Benchmarketing

https://djdadoo.pilato.fr/

https://github.com/dadoonet/music-search/

Elasticsearch You Know, for Hybrid Search

Hybrid scoring Term-based score Linear Combination manual boosting Vector similarity score Combine

Manual boosting GET ecommerce/_search { “query” : { “bool” : { “must” : [{ “match”: { “description”: { “query”: “summer clothes” } } },{ “semantic”: { “field”: “desc_embbeding”, “query”: “summer clothes”, “boost”: 100.0 } }] } } }

PUT starwars { “mappings”: { “properties”: { “text.tokens”: { “type”: “sparse_vector” } } } “These are not the droids you are looking for.”, } “Obi-Wan never told you what happened to your father.” GET starwars/_search { “query”:{ “sparse_vector”: { “field”: “text.tokens”, “query_vector”: { “lucas”: 0.50047517, “ship”: 0.29860738, “dragon”: 0.5300422, “quest”: 0.5974301, … } } } }

ELSER Elastic Learned Sparse EncodER sparse_vector Not BM25 or (dense) vector Sparse vector like BM25 Stored as inverted index Co m m er ci al

Hybrid ranking ranking 2 ranking 3 Term-based score Dense vector score Sparse vector score Reciprocal Rank Fusion (RRF blend multiple ranking methods Combine ) ranking 1

Reciprocal Rank Fusion (RRF D set of docs R set of rankings as permutation on 1..|D| k - typically set to 60 by default Dense Vector r(d) k+r(d) A 1 1 B 0.7 C D Score r(d) k+r(d) 61 C 1,341 1 61 2 62 A 739 2 62 0.5 3 63 F 732 3 63 0.2 4 64 G 192 4 64 0.01

=

= +

E Doc 5 65 H 183 5 65 ) Score

Doc BM25 Doc RRF Score A 1/61 1/62 0,0325 C 1/63 1/61 0,0323 B 1/62 0,0161 F 1/63 0,0159 D 1/64 0,0156

GET index/_search { “retriever”: { “rrf”: { “retrievers”: [{ “standard” { “query”: { “match”: {…} } } },{ “standard” { “query”: { “sparse_vector”: {…} } } },{ “knn”: { … } } ] } } } Hybrid Ranking BM25f + Sparse Vector + Dense Vector co m m er ci al

ChatGPT Elastic and LLM

Gen AI Search engines

LLM opportunities and limits your question one answer your question GAI / LLM : public internet data

Retrieval Augmented Generation your question the right answer your question + context window GAI / LLM public internet data your business data documents images audio

Demo Elastic Playground

Elasticsearch You Know, for Semantic Search

Search a new era David Pilato @dadoonet @pilato.fr