Package org.apache.nutch.parse.html
Class HtmlParser
- java.lang.Object
-
- org.apache.nutch.parse.html.HtmlParser
-
- All Implemented Interfaces:
Configurable
,Parser
,Pluggable
public class HtmlParser extends Object implements Parser
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description HtmlParser()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Configuration
getConf()
ParseResult
getParse(Content content)
This method parses the given content and returns a map of <key, parse> pairs.static void
main(String[] args)
void
setConf(Configuration conf)
-
-
-
Method Detail
-
getParse
public ParseResult getParse(Content content)
Description copied from interface:Parser
This method parses the given content and returns a map of <key, parse> pairs.
Parse
instances will be persisted under the given key.Note: Meta-redirects should be followed only when they are coming from the original URL. That is:
Assume fetcher is in parsing mode and is currently processing foo.bar.com/redirect.html. If this url contains a meta redirect to another url, fetcher should only follow the redirect if the map contains an entry of the form <"foo.bar.com/redirect.html",Parse
with aParseStatus
indicating the redirect>.
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interfaceConfigurable
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interfaceConfigurable
-
-