Class ZipParser

  • All Implemented Interfaces:
    Configurable, Parser, Pluggable

    public class ZipParser
    extends Object
    implements Parser
    ZipParser class based on MSPowerPointParser class by Stephan Strittmatter. Nutch parse plugin for zip files - Content Type : application/zip
    • Constructor Detail

      • ZipParser

        public ZipParser()
        Creates a new instance of ZipParser
    • Method Detail

      • getParse

        public ParseResult getParse​(Content content)
        Description copied from interface: Parser

        This method parses the given content and returns a map of <key, parse> pairs. Parse instances will be persisted under the given key.

        Note: Meta-redirects should be followed only when they are coming from the original URL. That is:
        Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html", Parse with a ParseStatus indicating the redirect>.

        Specified by:
        getParse in interface Parser
        Parameters:
        content - Content to be parsed
        Returns:
        a map containing <key, parse> pairs