This project is a community project. It is not officially supported by Elastic. Support is only provided by FSCrawler community on discuss and stackoverflow.
Slide 4
FSCrawler Architecture
Inputs:
Local Dir
Mount Point
SSH / SCP
HTTP Rest
Filters
JSON (noop)
XML
Apache Tika
Outputs
ES6
ES7
Slide 5
FSCrawler Key Features
Much more formats than ingest attachment plugin
OCR (Tesseract)
Much more metadata than ingest attachment plugin (See Generated Fields)
FSCrawler with Workplace Search output is not in watch mode (you can use systemd)
To transform your Workplace Search index you will have to set dynamic mapping to true first (default is strict)
If you have other standard Workplace Search connectors, you will have to transform your data in another index because the full sync refresh the content source from scratch
Slide 17
Needs to be done (Help Wanted!)
New local file crawling implementation (WatchService): #399