Class ParserFactory


  • public final class ParserFactory
    extends Object
    Creates and caches Parser plugins.
    • Constructor Detail

    • Method Detail

      • getParsers

        public Parser[] getParsers​(String contentType,
                                   String url)
                            throws ParserNotFound
        Function returns an array of Parsers for a given content type. The function consults the internal list of parse plugins for the ParserFactory to determine the list of pluginIds, then gets the appropriate extension points to instantiate as Parsers.
        Parameters:
        contentType - The contentType to return the Array of Parser s for.
        url - The url for the content that may allow us to get the type from the file suffix.
        Returns:
        An Array of Parsers for the given contentType. If there were plugins mapped to a contentType via the parse-plugins.xml file, but never enabled via the plugin.includes Nutch conf, then those plugins won't be part of this array, i.e., they will be skipped. So, if the ordered list of parsing plugins for text/plain was [parse-text,parse-html, parse-rtf], and only parse-html and parse-rtf were enabled via plugin.includes, then this ordered Array would consist of two Parser interfaces, [parse-html, parse-rtf].
        Throws:
        ParserNotFound - if there is a runtime error locating a parser for the given content type and url
      • getParserById

        public Parser getParserById​(String id)
                             throws ParserNotFound
        Function returns a Parser instance with the specified extId, representing its extension ID. If the Parser instance isn't found, then the function throws a ParserNotFound exception. If the function is able to find the Parser in the internal PARSER_CACHE then it will return the already instantiated Parser. Otherwise, if it has to instantiate the Parser itself , then this function will cache that Parser in the internal PARSER_CACHE.
        Parameters:
        id - The string extension ID (e.g., "org.apache.nutch.parse.rss.RSSParser", "org.apache.nutch.parse.rtf.RTFParseFactory") of the Parser implementation to return.
        Returns:
        A Parser implementation specified by the parameter id.
        Throws:
        ParserNotFound - If the Parser is not found (i.e., registered with the extension point), or if the there a PluginRuntimeException instantiating the Parser.
      • getExtensions

        protected List<Extension> getExtensions​(String contentType)
        Finds the best-suited parse plugin for a given contentType.
        Parameters:
        contentType - Content-Type for which we seek a parse plugin.
        Returns:
        a list of extensions to be used for this contentType. If none, returns null.