Édition_2024 / 14_11_2024
SÉCURITÉ & QUALITÉ DU CODE DEVCON#23
L’approche RAG pour la cyber-sécurité
David Pilato @dadoonet
Slide 2
DEVCON#23 : MERCI AUX PARTENAIRES DE LA DEVCON et du HORS-SERIE SECURITE
MERCI À
PARTENAIRE SPÉCIAL HARDWARE
Slide 3
Elasticsearch You Know, for Search
Slide 4
Slide 5
These are not the droids you are looking for.
Slide 6
GET /_analyze { “char_filter”: [ “html_strip” ], “tokenizer”: “standard”, “filter”: [ “lowercase”, “stop”, “snowball” ], “text”: “These are <em>not</em> the droids you are looking for.” }
Slide 7
These are <em>not</em> the droids you are looking for. { “tokens”: [{ “token”: “droid”, “start_offset”: 27, “end_offset”: 33, “type”: “<ALPHANUM>”, “position”: 4 },{ “token”: “you”, “start_offset”: 34, “end_offset”: 37, “type”: “<ALPHANUM>”, “position”: 5 }, { “token”: “look”, “start_offset”: 42, “end_offset”: 49, “type”: “<ALPHANUM>”, “position”: 7 }]}
Slide 8
Elasticsearch You Know, for Search
Slide 9
Elasticsearch You Know, for Vector Search
Slide 10
Example: 1-dimensional vector
Character
Vector
[ 1
]
]
Realistic
[
Embeddings represent your data
Cartoon
1
Slide 11
represent different data aspects Human
Character
Vector
[ 1, 1 Realistic
Cartoon
]
]
Machine
[
Multiple dimensions
1, 0
Slide 12
is grouped together Human Character
Vector
[ 1.0, 1.0 1.0, 0.0 Realistic
Cartoon
[ 1.0, 0.8 1.0, 1.0 [ 1.0, 1.0
]
]
]
]
]
Machine
[
[
Similar data
Slide 13
Vector search ranks objects
by similarity (~relevance) to the query Human
Rank
Query 1
Realistic
Cartoon
2 3 4 5
Machine
Result
Slide 14
Similarity
Human
q
cos(θ) =
d1 d2 Realistic
θ
q⃗ × d ⃗
| q⃗ | × | d |⃗
_score =
1 + cos(θ) 2
Slide 15
Similarity: cosine (cosine) θ
Similar vectors θ close to 0 cos(θ) close to 1
1+1 _score = =1 2
θ
Orthogonal vectors θ close to 90° cos(θ) close to 0
1+0 _score = = 0.5 2
θ
Opposite vectors θ close to 180° cos(θ) close to -1
1−1 _score = =0 2
Slide 16
LLM opportunities and limits your question
one answer
your question
GAI / LLM
:
public internet data
Slide 17
Slide 18
Retrieval Augmented Generation your question
the right answer
your question
+
context window
GAI / LLM
public internet data
your business data
documents
images
audio
Slide 19
Attack Discovery 100s of alerts Configurable Anonymization
Summary Prompt
+
Alert Context
Alert Context
Elastic Detections
Other Detections
Handful of Discoveries mapped across MITRE ATT&CK
Slide 20
Elastic AI Assistant Prebuilt/Custom Prompt
+
Prebuilt/Custom Prompt
Context Window
Knowledge Base / User Data
User data
Alerts
Elastic Provided Content
Response
Retrieval Augmented Generation your question
the right answer
your question
+
context window
GAI / LLM
public internet data
your business data
documents
images
audio
Slide 28
Retrieval Augmented Generation your question
the right answer
your question
+
context window
Locally hosted LLM
your business data
documents
images
audio
Slide 29
ne w
in
8. 16
Slide 30
Édition_2024 / 14_11_2024
SÉCURITÉ & QUALITÉ DU CODE DEVCON#23
L’approche RAG pour la cyber-sécurité
David Pilato @dadoonet