Package org.apache.nutch.parse.feed
Class FeedParser
- java.lang.Object
-
- org.apache.nutch.parse.feed.FeedParser
-
-
Field Summary
Fields Modifier and Type Field Description static String
CHARSET_UTF8
static String
TEXT_PLAIN_CONTENT_TYPE
-
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description FeedParser()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Configuration
getConf()
ParseResult
getParse(Content content)
Parses the given feed and extracts out and parsers all linked items within the feed, using the underlying ROME feed parsing library.static void
main(String[] args)
Runs a command line version of thisParser
.void
setConf(Configuration conf)
Sets theConfiguration
object for thisParser
.
-
-
-
Field Detail
-
CHARSET_UTF8
public static final String CHARSET_UTF8
- See Also:
- Constant Field Values
-
TEXT_PLAIN_CONTENT_TYPE
public static final String TEXT_PLAIN_CONTENT_TYPE
- See Also:
- Constant Field Values
-
-
Method Detail
-
getParse
public ParseResult getParse(Content content)
Parses the given feed and extracts out and parsers all linked items within the feed, using the underlying ROME feed parsing library.
-
setConf
public void setConf(Configuration conf)
Sets theConfiguration
object for thisParser
. ThisParser
expects the following configuration properties to be set:- URLNormalizers - properties in the configuration object to set up the default url normalizers.
- URLFilters - properties in the configuration object to set up the default url filters.
- Specified by:
setConf
in interfaceConfigurable
- Parameters:
conf
- The HadoopConfiguration
object to use to configure thisParser
.
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interfaceConfigurable
- Returns:
- The
Configuration
object used to configure thisParser
.
-
-