Search: a new era

Search a new era David Pilato | @dadoonet

Elasticsearch You Know, for Search

These are not the droids you are looking for.

GET /_analyze { “char_filter”: [ “html_strip” ], “tokenizer”: “standard”, “filter”: [ “lowercase”, “stop”, “snowball” ], “text”: “These are not the droids you are looking for.” }

“char_filter”: “html_strip” These are not the droids you are looking for. These are not the droids you are looking for.

“tokenizer”: “standard” These are not the droids you are looking for. These are not the droids you are looking for

“filter”: “lowercase” These are not the droids you are looking for these are not the droids you are looking for

“filter”: “stop” These are not the droids you are looking for droids you looking

“filter”: “snowball” droids you looking droid you look

These are not the droids you are looking for. { “tokens”: [{ “token”: “droid”, “start_offset”: 27, “end_offset”: 33, “type”: “<ALPHANUM>”, “position”: 4 },{ “token”: “you”, “start_offset”: 34, “end_offset”: 37, “type”: “<ALPHANUM>”, “position”: 5 }, { “token”: “look”, “start_offset”: 42, “end_offset”: 49, “type”: “<ALPHANUM>”, “position”: 7 }]}

Semantic search ≠ Literal matches

Elasticsearch You Know, for Vector Search

What is a Vector ?

Embeddings represent your data Example: 1-dimensional vector Character Vector [ 1  Realistic Cartoon 1

Multiple dimensions represent different data aspects Human Character Vector [ 1, 1  Realistic Cartoon Machine  1, 0 

Similar data is grouped together Human Character Vector [ 1.0, 1.0   1.0, 0.0  Realistic Cartoon [ 1.0, 0.8   1.0, 1.0  [ 1.0, 1.0  Machine

Vector search ranks objects by similarity (~relevance) to the query Human Rank Query 1 Realistic Cartoon 2 3 4 5 Machine Result

How do you index vectors ?

Architecture of Vector Search

dense_vector field type PUT ecommerce { “mappings”: { “properties”: { “description”: { “type”: “text” } “desc_embedding”: { “type”: “dense_vector” } } } }

Data Ingestion and Embedding Generation POST ecommerce/_doc { “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “fabric”:”cotton” “desc_embedding”:[0.452,0.3242,…], } “desc_embedding”:[0.452,0.3242,…] } “img_embedding”:[0.012,0.0,…] } Source data POST /ecommerce/_doc

Co m m er ci With Elastic ML al { } Source data { } “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, POST /ecommerce/_doc “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “desc_embedding”:[0.452,0.3242,…]

Eland Imports PyTorch Models Co m m er ci al $ eland_import_hub_model —url https://cluster_URL —hubmodel-id BERT-MiniLM-L6 —tasktype text_embedding —start BERT-MiniLM-L6 Select the appropriate model Load it Manage models

Elastic’s range of supported NLP models Co m m er ci ● Fill mask model Mask some of the words in a sentence and predict words that replace masks ● Named entity recognition model NLP method that extracts information from text ● Text embedding model Represent individual words as numerical vectors in a predefined vector space ● Text classification model Assign a set of predefined categories to open-ended text ● Question answering model Model that can answer questions given some or no context ● Zero-shot text classification model Model trained on a set of labeled examples, that is able to classify previously unseen examples Full list at: ela.st/nlp-supported-models al

How do you search vectors ?

Architecture of Vector Search

knn query GET ecommerce/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “query_vector”: [0.123, 0.244,…] } }], “filter”: { “term”: { “department”: “women” } } } } }, “size”: 10

knn query (with Elastic ML Co m m er ci al GET ecommerce/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “query_vector_builder”: { “text_embedding”: { “model_text”: “summer clothes”, “model_id”: <text-embedding-model> } } } }], “filter”: { “term”: { “department”: “women” } } } }, “size”: 10 } Transformer model

semantic_text field type PUT /_inference/text_embedding/e5-small-multilingual { “service”: “elasticsearch”, “service_settings”: { “num_allocations”: 1, “num_threads”: 1, “model_id”: “.multilingual-e5-small_linux-x86_64” } } POST ecommerce/_doc { “description”: “Our best-selling…” } ne w in 8. 15 PUT ecommerce { “mappings”: { “properties”: { “description”: { “type”: “text”, “copy_to”: [ “desc_embedding” ] } “desc_embedding”: { “type”: “semantic_text”, “inference_id”: “e5-small-multilingual” } } } } GET ecommerce/_search { “query”: { “semantic”: { “field”: “desc_embedding” “query” : “I’m looking for a red dress for a DJ party” }}}

Architecture of Vector Search

Choice of Embedding Model Start with Off-the Shelf Models Extend to Higher Relevance ●Text data: Hugging Face (like Microsoft’s E5 ●Apply hybrid scoring ●Images: OpenAI’s CLIP ●Bring Your Own Model: requires expertise + labeled data

Problem training vs actual use-case

But how does it really work?

Similarity Human q cos(θ) = d1 d2 Realistic θ q⃗ × d ⃗ | q⃗ | × | d |⃗ _score = 1 + cos(θ) 2

Similarity: cosine (cosine) θ Similar vectors θ close to 0 cos(θ) close to 1 1+1 _score = =1 2 θ Orthogonal vectors θ close to 90° cos(θ) close to 0 1+0 _score = = 0.5 2 θ Opposite vectors θ close to 180° cos(θ) close to -1 1−1 _score = =0 2

Similarity: Dot Product (dot_product or max_inner_product) q d q⃗ × d ⃗ = | q⃗ | × cos(θ) × | d |⃗ θ | q⃗ | × co s (θ ) 1 + dot_ product(q, d) scorefloat = 2 0.5 + dot product(q, d) _scorebyte = 32768 × dims

Similarity: Euclidean distance (l2_norm) y 2 n i (x ∑ 1 i= − y i) q l2_normq,d = y1 d x1 y2 x2 n ∑ i=1 (xi − yi) 1 _score = 1 + (l2_normq,d )2 x 2

Brute Force

Hierarchical Navigable Small Worlds (HNSW One popular approach HNSW: a layered approach that simplifies access to the nearest neighbor Tiered: from coarse to fine approximation over a few steps Balance: Bartering a little accuracy for a lot of scalability Speed: Excellent query latency on large scale indices

Scaling Vector Search Vector search Best practices

Needs lots of memory
Avoid searches during indexing
Indexing is slower
Exclude vectors from _source
Merging is slow
Reduce vector dimensionality 4. Use byte rather than float

Continuous improvements in Lucene + Elasticsearch

Reduce Required Memory 2. Reduce of number of dimensions per vector

Vector element size reduction (“quantize”)

Benchmarketing

https://djdadoo.pilato.fr/

https://github.com/dadoonet/music-search/

Elasticsearch You Know, for Hybrid Search

Hybrid scoring Term-based score Linear Combination manual boosting Vector similarity score Combine

GET ecommerce/_search { “query” : { “bool” : { “must” : [{ “match”: { “description”: { “query”: “summer clothes”, “boost”: 0.1 } } },{ “knn”: { “field”: “desc_embbeding”, “query_vector”: [0.123, 0.244,…], “boost”: 2.0, “filter”: { “term”: { “department”: “women” } } } }], “filter” : { “range” : { “price”: { “lte”: 30 } } } } } } summer clothes pre-filter post-filter

PUT starwars { “mappings”: { “properties”: { “text.tokens”: { “type”: “sparse_vector” } } } “These are not the droids you are looking for.”, } “Obi-Wan never told you what happened to your father.” GET starwars/_search { “query”:{ “sparse_vector”: { “field”: “text.tokens”, “query_vector”: { “lucas”: 0.50047517, “ship”: 0.29860738, “dragon”: 0.5300422, “quest”: 0.5974301, … } } } }

ELSER Elastic Learned Sparse EncodER sparse_vector Not BM25 or (dense) vector Sparse vector like BM25 Stored as inverted index Co m m er ci al

Hybrid ranking Term-based score Dense vector score Reciprocal Rank Fusion (RRF blend multiple ranking methods Combine Sparse vector score

Reciprocal Rank Fusion (RRF D  set of docs R  set of rankings as permutation on 1..|D| k - typically set to 60 by default Dense Vector Doc BM25 Score r(d) k+r(d) A 1 1 B 0.7 C D E Doc Score r(d) k+r(d) 61 C 1,341 1 61 2 62 A 739 2 62 0.5 3 63 F 732 3 63 0.2 4 64 G 192 4 64 0.01 5 65 H 183 5 65 Doc RRF Score A 1/61  1/62  0,0325 C 1/63  1/61  0,0323 B 1/62  0,0161 F 1/63  0,0159 D 1/64  0,0156

GET index/_search { “retriever”: { “rrf”: { “retrievers”: [{ “standard” { “query”: { “match”: {…} } } },{ “standard” { “query”: { “sparse_vector”: {…} } } },{ “knn”: { … } } ] } } } Hybrid Ranking BM25f + Sparse Vector + Dense Vector Co m m er ci al

ChatGPT Elastic and LLM

Gen AI Search engines

LLM opportunities and limits your question one answer your question GAI / LLM public internet data

Retrieval Augmented Generation your question the right answer your question + context window GAI / LLM public internet data your business data documents images audio

Demo Elastic Playground

Elasticsearch You Know, for Semantic Search

Search a new era David Pilato | @dadoonet