Skip to content
This repository has been archived by the owner on Jul 10, 2019. It is now read-only.

Tika module

jnioche edited this page Jan 23, 2013 · 3 revisions

Tika commands are found in behemoth-tika.job.

usage: com.digitalpebble.behemoth.tika.TikaDriver -i <input> -o <output> [-t <TikaProcessorClass>
-m <mimeType>]    
-i, --input           The input path
-o, --output          The output path
-t, --tikaProcesssor  The fully qualified name of a TikaProcessor class that handles the extraction
-m, --mimeType        The mime type to use

Parses the Behemoth corpus using Tika.

The boolean parameter tika.convert.markup can be used to deactivate the conversion of the original markup (e.g. HTML tags) into annotations.

Behemoth Modules | Home

Clone this wiki locally