La recherche à l’ère de l’IA

Search a new era David Pilato @dadoonet @pilato.fr

omailEciearch You Know, for Search

These are not the droids you are looking for.

GET /_analyze { “char_filter”: [ “html_strip” ], “tokenizer”: “standard”, “filter”: [ “lowercase”, “stop”, “snowball” ], “text”: “These are not the droids you are looking for.” }

“char_filter”: “html_strip” These are not the droids you are looking for. These are not the droids you are looking for.

“tokenizer”: “standard” These are not the droids you are looking for. These are not the droids you are looking for

“filter”: “lowercase” These are not the droids you are looking for these are not the droids you are looking for

“filter”: “stop” These are not the droids you are looking for these are not the droids you are looking for droids you looking

“filter”: “snowball” These are not the droids you are looking for these are not the droids you are looking for droids you droid you looking look

These are not the droids you are looking for. { “tokens”: [{ “token”: “droid”, “start_offset”: 27, “end_offset”: 33, “type”: “<ALPHANUM>”, “position”: 4 },{ “token”: “you”, “start_offset”: 34, “end_offset”: 37, “type”: “<ALPHANUM>”, “position”: 5 }, { “token”: “look”, “start_offset”: 42, “end_offset”: 49, “type”: “<ALPHANUM>”, “position”: 7 }]}

SesatlEc search ≠ nEleram matches

omailEciearch You Know, for LeclVr Search

What is a LeclVr ?

Embeddings represent your data Example: 1-dimensional vector Characler LeclVr [ 1  ReamEilEc CarlVVt 1

Multiple dimensions represent different data aspects Husat Characler LeclVr [ 1, 1  ReamEilEc CarlVVt MachEte  1, 0 

Similar data is grouped together Husat Characler LeclVr [ 1.0, 1.0   1.0, 0.0  ReamEilEc CarlVVt [ 1.0, 0.8   1.0, 1.0  [ 1.0, 1.0  MachEte

Vector search ranks objects by similarity (~relevance) to the query Husat Ratk Query 1 ReamEilEc CarlVVt 2 3 4 5 MachEte Reiuml

HVw dV yVu index veclVri ?

Architecture of Vector Search

dense_vector field type PUT ecommerce { “mappings”: { “properties”: { “description”: { “type”: “text” } “desc_embedding”: { “type”: “dense_vector” } } } }

Data Ingestion and Embedding Generation POST /ecommerce/_doc { “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “fabric”:”cotton” “desc_embedding”:[0.452,0.3242,…], } “desc_embedding”:[0.452,0.3242,…] } “img_embedding”:[0.012,0.0,…] } SVurce dala POST /ecommerce/_doc

cV s s er cE With Elastic ML am { } SVurce dala { } “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, POST /ecommerce/_doc “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “desc_embedding”:[0.452,0.3242,…]

Eland Imports PyTorch Models CV s s er cE am $ eland_import_hub_model —url https://cluster_URL —hubmodel-id BERT-MiniLM-L6 —tasktype text_embedding —start BERT-MiniLM-L6 Select the appropriate model Load it Manage models

Elastic’s range of supported NLP models cV s s er cE ● FEmm saik sVdem Mask some of the words in a sentence and predict words that replace masks ● Nased etlEly recVgtElEVt sVdem NLP method that extracts information from text ● Texl esbeddEtg sVdem Represent individual words as numerical vectors in a predefined vector space ● Texl cmaiiEfEcalEVt sVdem Assign a set of predefined categories to open-ended text ● QueilEVt atiwerEtg sVdem Model that can answer questions given some or no context ● ZerV-ihVl lexl cmaiiEfEcalEVt sVdem Model trained on a set of labeled examples, that is able to classify previously unseen examples Full list at: ela.st/nlp-supported-models am

HVw dV yVu search veclVri ?

Architecture of Vector Search

knn query GET /ecommerce/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “query_vector”: [0.123, 0.244,…] } }], “filter”: { “term”: { “department”: “women” } } } } }, “size”: 10

knn query (with Elastic ML cV s s er cE am GET /ecommerce/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “query_vector_builder”: { “text_embedding”: { “model_text”: “summer clothes”, “model_id”: <text-embedding-model> } } } }], “filter”: { “term”: { “department”: “women” } } } }, “size”: 10 } TratifVrser sVdem

te w semantic_text field type PUT /_inference/text_embedding/e5-small-multilingual { “service”: “elasticsearch”, “service_settings”: { “num_allocations”: 1, “num_threads”: 1, “model_id”: “.multilingual-e5-small_linux-x86_64” } } POST ecommerce/_doc { “description”: “Our best-selling…” } frV s 8. 15 PUT ecommerce { “mappings”: { “properties”: { “description”: { “type”: “text”, “copy_to”: [ “desc_embedding” ] } “desc_embedding”: { “type”: “semantic_text”, “inference_id”: “e5-small-multilingual” } } } } GET ecommerce/_search { “query”: { “semantic”: { “field”: “desc_embedding” “query” : “I’m looking for a red dress for a DJ party” }}}

Architecture of Vector Search

ChVEce Vf osbeddEtg MVdem Slarl wElh Off-lhe Shemf MVdemi oxletd lV HEgher Remevatce ●Text data: Hugging Face (like Microsoft’s E5 ●Apply hybrid scoring ●Images: OpenAI’s CLIP ●Bring Your Own Model: requires expertise + labeled data

Problem training vs actual use-case

Bul hVw dVei El really work?

Similarity Husat q cos(θ) = d1 d2 ReamEilEc θ q⃗ × d ⃗ | q⃗ | × | d |⃗ _score = 1 + cos(θ) 2

Similarity: cosine (cosine) θ Similar vectors θ close to 0 cos(θ) close to 1 1+1 _score = =1 2 θ Orthogonal vectors θ close to 90° cos(θ) close to 0 1+0 _score = = 0.5 2 θ Opposite vectors θ close to 180° cos(θ) close to -1 1−1 _score = =0 2

Similarity: Dot Product (dot_product or max_inner_product) q d q⃗ × d ⃗ = | q⃗ | × cos(θ) × | d |⃗ θ | q⃗ | × co s (θ ) 1 + dot_ product(q, d) scorefloat = 2 0.5 + dot product(q, d) _scorebyte = 32768 × dims

Similarity: Euclidean distance (l2_norm) y 2 n i (x ∑ 1 i= − y i) q l2_normq,d = y1 d x1 y2 x2 n ∑ i=1 (xi − yi) 1 _score = 1 + (l2_normq,d )2 x 2

Brule FVrce

Hierarchical Navigable Small Worlds (HNSW One popular approach HNSW: a layered approach that simplifies access to the nearest neighbor Tiered: from coarse to fine approximation over a few steps Balance: Bartering a little accuracy for a lot of scalability Speed: Excellent query latency on large scale indices

Scaling Vector Search LeclVr iearch Beil praclEcei

Needs lots of memory
Avoid searches during indexing
Indexing is slower
Exclude vectors from _source
Merging is slow
Reduce vector dimensionality 4. Use int8/int4/bit rather than float

Continuous improvements in Lucene + Elasticsearch

Ela s 8.14 ticsea rc d efa h ult float32 Recall: High Precision: High Rescore: Likely Not Needed Full RAM Required Scalar Quantization int8 int4 bit Recall: Good Precision: Good Oversampling: Moderate Recall: Low Precision: Low Oversampling: Needed Recall: Bad Precision: Bad Oversampling: Needed Rescore: Reasonable Rescore: may be slower Rescore: Expensive and Limiting 4X RAM Savings 8X RAM Savings 32X RAM Savings

BBQ aka Better Binary Quantization float32 int8 int4 bit BBQ 32X RAM savings. Faster & more accurate than Product Quantization BBQ*

Memory required 100M vectors? Only 12GB!?! One single node.

Benchmarketing

https://djdadoo.pilato.fr/

https://github.com/dadoonet/music-search/

omailEciearch You Know, for HybrEd Search

HybrEd icVrEtg Term-based score nEtear CVsbEtalEVt manual boosting Vector similarity score Combine

GET ecommerce/_search { “query” : { “bool” : { “must” : [{ “match”: { “description”: { “query”: “summer clothes”, “boost”: 0.1 } } },{ “knn”: { “field”: “desc_embbeding”, “query_vector”: [0.123, 0.244,…], “boost”: 2.0, “filter”: { “term”: { “department”: “women” } } } }], “filter” : { “range” : { “price”: { “lte”: 30 } } } } } } summer clothes pre-filter post-filter

PUT starwars { “mappings”: { “properties”: { “text.tokens”: { “type”: “sparse_vector” } } } “These are not the droids you are looking for.”, } “Obi-Wan never told you what happened to your father.” GET starwars/_search { “query”:{ “sparse_vector”: { “field”: “text.tokens”, “query_vector”: { “lucas”: 0.50047517, “ship”: 0.29860738, “dragon”: 0.5300422, “quest”: 0.5974301, … } } } }

onSoR olastic nearned Sparse EncodoR sparse_vector Not BM25 or (dense) vector Sparse vector like BM25 Stored as inverted index CV s s er cE am

HybrEd ratkEtg ranking 1 ranking 2 ranking 3 Term-based score Dense vector score Sparse vector score RecEprVcam Ratk FuiEVt (RRF blend multiple ranking methods Combine

Reciprocal Rank Fusion (RRF D  set of docs R  set of rankings as permutation on 1..|D| k - typically set to 60 by default Detie LeclVr Doc BM25 Score r(d) k+r(d) A 1 1 B 0.7 C D o Doc Score r(d) k+r(d) 61 C 1,341 1 61 2 62 A 739 2 62 0.5 3 63 F 732 3 63 0.2 4 64 G 192 4 64 0.01 5 65 H 183 5 65 DVc RRF ScVre A 1/61  1/62  0,0325 C 1/63  1/61  0,0323 B 1/62  0,0161 F 1/63  0,0159 D 1/64  0,0156

GET index/_search { “retriever”: { “rrf”: { “retrievers”: [{ “standard” { “query”: { “match”: {…} } } },{ “standard” { “query”: { “sparse_vector”: {…} } } },{ “knn”: { … } } ] } } } Hybrid Ranking BM25f + Sparse Vector + Dense Vector cV s s er cE am

ChalGPT Elastic and LLM

Gen AI Search engines

LLM opportunities and limits your question Vte answer your question GAI / LLM public internet data

Retrieval Augmented Generation your question lhe right answer your question + context window GAI / LLM public internet data your business data documents images audio

DesV Elastic Playground

omailEciearch You Know, for SesatlEc Search

Search a new era David Pilato @dadoonet @pilato.fr