Class TikaParser

  • All Implemented Interfaces:
    Configurable, Parser, Pluggable

    public class TikaParser
    extends Object
    implements Parser
    Wrapper for Tika parsers. Mimics the HTMLParser but using the XHTML representation returned by Tika as SAX events
    • Constructor Detail

      • TikaParser

        public TikaParser()
    • Method Detail

      • getParse

        public ParseResult getParse​(Content content)
        Description copied from interface: Parser

        This method parses the given content and returns a map of <key, parse> pairs. Parse instances will be persisted under the given key.

        Note: Meta-redirects should be followed only when they are coming from the original URL. That is:
        Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html", Parse with a ParseStatus indicating the redirect>.

        Specified by:
        getParse in interface Parser
        Parameters:
        content - Content to be parsed
        Returns:
        a map containing <key, parse> pairs