Package org.apache.nutch.parse
Interface Parser
-
- All Superinterfaces:
Configurable
,Pluggable
- All Known Implementing Classes:
ExtParser
,FeedParser
,HtmlParser
,JSParseFilter
,TikaParser
,ZipParser
public interface Parser extends Pluggable, Configurable
A parser for content generated by aProtocol
implementation. This interface is implemented by extensions. Nutch's core contains no page parsing code.
-
-
Field Summary
Fields Modifier and Type Field Description static String
X_POINT_ID
The name of the extension point.
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description ParseResult
getParse(Content c)
This method parses the given content and returns a map of <key, parse> pairs.-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Field Detail
-
X_POINT_ID
static final String X_POINT_ID
The name of the extension point.
-
-
Method Detail
-
getParse
ParseResult getParse(Content c)
This method parses the given content and returns a map of <key, parse> pairs.
Parse
instances will be persisted under the given key.Note: Meta-redirects should be followed only when they are coming from the original URL. That is:
Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html",Parse
with aParseStatus
indicating the redirect>.- Parameters:
c
- Content to be parsed- Returns:
- a map containing <key, parse> pairs
- Since:
- NUTCH-443
-
-