Elasticsearch

A presentation at BBL Talan (Private Event) in May 2024 in Amiens, France by David Pilato

Slide 1

Slide 1

BBL at

Slide 2

Slide 2

$ curl http://localhost:9200/speaker/_doc/dpilato { “name” : “David Pilato”, “jobs” : [ { “name” : “SRA Europe (SSII)”, “date” : “1995” }, { “name” : “SFR”, “date” : “1997” }, { “name” : “e-Brands / Vivendi”, “date”: “2000” }, { “name” : “DGDDI (douane)”, “date” : “2005” }, { “name” : “elastic”, “date” : “2013” } ], “motivations” : [ “family”, “job”, “deejay” ], “blog” : “https://david.pilato.fr/”, “twitter” : [ “@dadoonet”, “@elasticfr” ], “email” : “david@pilato.fr” }

Slide 3

Slide 3

Performance that Delivers Relevant Results in Real-time Out-of-the-Box Solutions Any Data, Any Source Elastic Observability Logs, APM, Tracing, Metrics, Synthetics, Profiling, RUM Elastic Security SIEM, Endpoint, Cloud Build Your Own Elastic Search Generative AI Apps, Product Search, Workplace Search, Custom Search Apps The Elastic Search AI Platform Databases 69% Legacy Systems Ingest & Secure Storage AI / ML & Search Visualization & Automation Data Extraction Full-Text / Vector Search Share & Collaborate Transformation / Normalization Machine Learning Data Exploration Enrichment Correlations Data Visualization Loading / Indexing Analytics & Aggregations Custom Dashboards Intelligent Data Storage Data Manipulation 3rd Party Integrations Security / Governance Federated Searches & Queries Workflow Automation Public Cloud Applications SaaS Apps On-Premises Web Services Files Business Outcomes for Everyone Improvement in customer and employee satisfaction 60% Reduction in risk 62% Reduction in revenue disruption *Validated by third-party research

Slide 4

Slide 4

Slide 5

Slide 5

A typical search implementation… CREATE TABLE user ( name VARCHAR(100), comments VARCHAR(1000) ); INSERT INTO user VALUES (‘David Pilato’, ‘Developer at elastic’); INSERT INTO user VALUES (‘Malloum Laya’, ‘Worked with David at french customs service’); INSERT INTO user VALUES (‘David Gageot’, ‘Engineer at Doctolib’); INSERT INTO user VALUES (‘David David’, ‘Who is that guy?’); David

Slide 6

Slide 6

Search on term INSERT INTO user VALUES INSERT INTO user VALUES customs service’); INSERT INTO user VALUES INSERT INTO user VALUES (‘David Pilato’, ‘Developer at elastic’); (‘Malloum Laya’, ‘Worked with David at french (‘David Gageot’, ‘Engineer at Doctolib’); (‘David David’, ‘Who is that guy?’); SELECT * FROM user WHERE name=”David”; Empty set (0,00 sec) David

Slide 7

Slide 7

Search like INSERT INTO user VALUES INSERT INTO user VALUES customs service’); INSERT INTO user VALUES INSERT INTO user VALUES (‘David Pilato’, ‘Developer at elastic’); (‘Malloum Laya’, ‘Worked with David at french (‘David Gageot’, ‘Engineer at Doctolib’); (‘David David’, ‘Who is that guy?’); SELECT * FROM user WHERE name LIKE “%David%”; +———————+———————————+ | name | comments | +———————+———————————+ | David Pilato | Developer at elastic | | David Gageot | Engineer at Doctolib | | David David | Who is that guy? | +———————+———————————+ David

Slide 8

Slide 8

Search for terms INSERT INTO user VALUES INSERT INTO user VALUES customs service’); INSERT INTO user VALUES INSERT INTO user VALUES (‘David Pilato’, ‘Developer at elastic’); (‘Malloum Laya’, ‘Worked with David at french (‘David Gageot’, ‘Engineer at Doctolib’); (‘David David’, ‘Who is that guy?’); SELECT * FROM user WHERE name LIKE “%David Pilato%”; +———————+———————————+ | name | comments | +———————+———————————+ | David Pilato | Developer at elastic | +———————+———————————+ David Pilato

Slide 9

Slide 9

Search with inverted terms INSERT INTO user VALUES INSERT INTO user VALUES customs service’); INSERT INTO user VALUES INSERT INTO user VALUES (‘David Pilato’, ‘Developer at elastic’); (‘Malloum Laya’, ‘Worked with David at french (‘David Gageot’, ‘Engineer at Doctolib’); (‘David David’, ‘Who is that guy?’); SELECT * FROM user WHERE name LIKE “%Pilato David%”; Empty set (0,00 sec) SELECT * FROM user WHERE name LIKE “%Pilato%David%”; Empty set (0,00 sec) Pilato David

Slide 10

Slide 10

Search for terms INSERT INTO user VALUES INSERT INTO user VALUES customs service’); INSERT INTO user VALUES INSERT INTO user VALUES (‘David Pilato’, ‘Developer at elastic’); (‘Malloum Laya’, ‘Worked with David at french (‘David Gageot’, ‘Engineer at Doctolib’); (‘David David’, ‘Who is that guy?’); SELECT * FROM user WHERE name LIKE “%David%” AND name LIKE “%Pilato%”; +———————+———————————+ | name | comments | +———————+———————————+ | David Pilato | Developer at elastic | +———————+———————————+ Pilato David

Slide 11

Slide 11

Search in two fields INSERT INTO user VALUES INSERT INTO user VALUES customs service’); INSERT INTO user VALUES INSERT INTO user VALUES (‘David Pilato’, ‘Developer at elastic’); (‘Malloum Laya’, ‘Worked with David at french (‘David Gageot’, ‘Engineer at Doctolib’); (‘David David’, ‘Who is that guy?’); SELECT * FROM user WHERE name LIKE “%David%” OR comments LIKE “%David%”; +———————+——————————————————————-+ | name | comments | +———————+——————————————————————-+ | David Pilato | Developer at elastic | | Malloum Laya | Worked with David at french customs service | | David Gageot | Engineer at Doctolib | | David David | Who is that guy? | +———————+——————————————————————-+ David

Slide 12

Slide 12

Slide 13

Slide 13

Search with typos INSERT INTO user VALUES INSERT INTO user VALUES customs service’); INSERT INTO user VALUES INSERT INTO user VALUES (‘David Pilato’, ‘Developer at elastic’); (‘Malloum Laya’, ‘Worked with David at french (‘David Gageot’, ‘Engineer at Doctolib’); (‘David David’, ‘Who is that guy?’); SELECT * FROM user WHERE name LIKE “%Dadid%”; Empty set (0,00 sec) Dadid

Slide 14

Slide 14

Search with typos INSERT INTO user VALUES INSERT INTO user VALUES customs service’); INSERT INTO user VALUES INSERT INTO user VALUES (‘David Pilato’, ‘Developer at elastic’); (‘Malloum Laya’, ‘Worked with David at french (‘David Gageot’, ‘Engineer at Doctolib’); (‘David David’, ‘Who is that guy?’); SELECT * FROM user WHERE name LIKE “%adid%” OR name LIKE “%D_did%” OR name LIKE “%Da_id%” OR name LIKE “%Dad_d%” OR name LIKE “%Dadi%”; +———————+———————————+ | name | comments | +———————+———————————+ | David Pilato | Developer at elastic | | David Gageot | Engineer at Doctolib | | David David | Who is that guy? | +———————+———————————+ Dadid

Slide 15

Slide 15

Slide 16

Slide 16

User Interface

Slide 17

Slide 17

What is a search engine? ● Index engine (indexing documents) ● Search engine (within the created indices)

Slide 18

Slide 18

Demo time!

Slide 19

Slide 19

Elasticsearch You Know, for Search

Slide 20

Slide 20

GET /_analyze { “char_filter”: [ “html_strip” ], “tokenizer”: “standard”, “filter”: [ “lowercase”, “stop”, “snowball” ], “text”: “These are <em>not</em> the droids you are looking for.” }

Slide 21

Slide 21

These are <em>not</em> the droids you are looking for. { “tokens”: [{ “token”: “droid”, “start_offset”: 27, “end_offset”: 33, “type”: “<ALPHANUM>”, “position”: 4 },{ “token”: “you”, “start_offset”: 34, “end_offset”: 37, “type”: “<ALPHANUM>”, “position”: 5 }, { “token”: “look”, “start_offset”: 42, “end_offset”: 49, “type”: “<ALPHANUM>”, “position”: 7 }]}

Slide 22

Slide 22

Elasticsearch You Know, for Vector Search

Slide 23

Slide 23

Example: 1-dimensional vector Character Vector [ 1 ] ] Realistic

[ Embeddings represent your data Cartoon 1

Slide 24

Slide 24

represent different data aspects Human Character Vector [ 1, 1 Realistic Cartoon ] ] Machine

[ Multiple dimensions 1, 0

Slide 25

Slide 25

is grouped together Human Character Vector [ 1.0, 1.0 Realistic Cartoon 1.0, 0.0 [ 1.0, 0.8 ] ] ]

Machine

[ Similar data

Slide 26

Slide 26

Vector search ranks objects by similarity (~relevance) to the query Human Rank Query 1 Realistic Cartoon 2 3 4 5 Machine Result

Slide 27

Slide 27

Data Ingestion and Embedding Generation POST /_doc { “_id”:”product-1234”, “product_name”:”Summer Dress”, “description”:”Our best-selling…”, “Price”: 118, “color”:”blue”, “fabric”:”cotton”, “fabric”:”cotton” } “desc_embedding”:[0.452,0.3242,…], “desc_embedding”:[0.452,0.3242,…] } “img_embedding”:[0.012,0.0,…] } Source data POST /_doc

Slide 28

Slide 28

Vector Query GET product-catalog/_search { “query” : { “bool”: { “must”: [{ “knn”: { “field”: “desc_embbeding”, “num_candidates”: 50, “query_vector”: [0.123, 0.244,…] } }], “filter”: { “term”: { “department”: “women” } } } } }, “size”: 10

Slide 29

Slide 29

Similarity: cosine (cosine) Human q cos(θ) = d1 d2 Realistic θ q⃗ × d ⃗ | q⃗ | × | d |⃗ _score = 1 + cos(θ) 2

Slide 30

Slide 30

Similarity: cosine (cosine) 1+1 _score = =1 2 1+0 _score = = 0.5 2 1−1 _score = =0 2

Slide 31

Slide 31

https://djdadoo.pilato.fr/

Slide 32

Slide 32

https://github.com/dadoonet/music-search/

Slide 33

Slide 33

Performance that Delivers Relevant Results in Real-time Out-of-the-Box Solutions Any Data, Any Source Elastic Observability Logs, APM, Tracing, Metrics, Synthetics, Profiling, RUM Elastic Security SIEM, Endpoint, Cloud Build Your Own Elastic Search Generative AI Apps, Product Search, Workplace Search, Custom Search Apps The Elastic Search AI Platform Databases 69% Legacy Systems Ingest & Secure Storage AI / ML & Search Visualization & Automation Data Extraction Full-Text / Vector Search Share & Collaborate Transformation / Normalization Machine Learning Data Exploration Enrichment Correlations Data Visualization Loading / Indexing Analytics & Aggregations Custom Dashboards Intelligent Data Storage Data Manipulation 3rd Party Integrations Security / Governance Federated Searches & Queries Workflow Automation Public Cloud Applications SaaS Apps On-Premises Web Services Files Business Outcomes for Everyone Improvement in customer and employee satisfaction 60% Reduction in risk 62% Reduction in revenue disruption *Validated by third-party research

Slide 34

Slide 34

Elastic Observability Converge metrics, logs, traces, and more to deliver unified visibility and actionable insights with the most widely deployed observability solution.

Slide 35

Slide 35

Elastic Security Protect, investigate, and respond to complex threats with a security solution that unifies the capabilities of SIEM, endpoint security, and cloud security.

Slide 36

Slide 36

Slide 37

Slide 37

www.meetup.com/ElasticFR @elasticfr discuss.elastic.co

Slide 38

Slide 38

Thank You