0 p e t t! s r n a u t r s e e s w a e l P fore be Indexer ses documents bureautique avec la suite Elastic et FSCrawler David Pilato Developer | Evangelist, Community @dadoonet https://github.com/dadoonet/JDLL

Lab 0 setup https://github.com/dadoonet/JDLL 2

The Elastic Search Platform Enterprise Search Observability Security Kibana Explore, Visualize, Engage Elasticsearch Store, Search, Analyze Integrations Connect, Collect, Alert Public cloud Hybrid On-premises

Lab 1 indexing json documents 4

ingest-attachment processor extracting from BASE64 or CBOR 5

Parsing a stream and getting content and metadata static void extractTextAndMetadata(InputStream stream) throws Exception { BodyContentHandler handler = new BodyContentHandler(); Metadata metadata = new Metadata(); try (stream) { new DefaultParser().parse(stream, handler, metadata, new ParseContext()); String extractedText = handler.toString(); String title = metadata.get(TikaCoreProperties.TITLE); String keywords = metadata.get(TikaCoreProperties.KEYWORDS); String author = metadata.get(TikaCoreProperties.CREATOR); } }

An ingest pipeline

ingest-attachment processor using Tika behind the scene

Lab 2 ingest attachment 11

FSCrawler You know, for files… 12

Disclaimer This project is a community project. It is not officially supported by Elastic. Support is only provided by FSCrawler community on discuss and stackoverflow. http://discuss.elastic.co/ https://stackoverflow.com/questions/tagged/fscrawler

FSCrawler Architecture FSCrawler Local Dir JSON (noop) Mount Point XML SSH / SCP / FTP Apache Tika ES 6/7/8 HTTP Rest Inputs Filters Outputs

Lab 3 fscrawler 16

FSCrawler even better with a UI 17

FSCrawler Architecture FSCrawler Local Dir JSON (noop) Mount Point XML SSH / SCP / FTP Apache Tika WP 7/8 Filters Outputs ES 6/7/8 HTTP Rest Inputs

Lab 4 workplace search 19

Si nc 8. e 2 Network drives connector package for Enterprise Search https://github.com/elastic/enterprise-search-network-drives-connector/

Thanks! PR are warmly welcomed! https://github.com/dadoonet/fscrawler