Package org.apache.nutch.parse
Class ParseUtil
- java.lang.Object
-
- org.apache.nutch.parse.ParseUtil
-
public class ParseUtil extends Object
-
-
Constructor Summary
Constructors Constructor Description ParseUtil(Configuration conf)
Overloaded constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description ParseResult
parse(Content content)
ParseResult
parseByExtensionId(String extId, Content content)
-
-
-
Constructor Detail
-
ParseUtil
public ParseUtil(Configuration conf)
Overloaded constructor- Parameters:
conf
- a populatedConfiguration
-
-
Method Detail
-
parse
public ParseResult parse(Content content) throws ParseException
Performs a parse by iterating through a List of preferredParser
s until a successful parse is performed and aParse
object is returned. If the parse is unsuccessful, a message is logged to theWARNING
level, and an empty parse is returned.- Parameters:
content
- The content to try and parse.- Returns:
- <key,
Parse
> pairs. - Throws:
ParseException
- If no suitable parser is found to perform the parse.
-
parseByExtensionId
public ParseResult parseByExtensionId(String extId, Content content) throws ParseException
Method parses aContent
object using theParser
specified by the parameterextId
, i.e., the Parser's extension ID. If a suitableParser
is not found, then aWARNING
level message is logged, and a ParseException is thrown. If the parse is uncessful for any other reason, then aWARNING
level message is logged, and aParseStatus.getEmptyParse()
is returned.- Parameters:
extId
- The extension implementation ID of theParser
to use to parse the specified content.content
- The content to parse.- Returns:
- <key,
Parse
> pairs if the parse is successful, otherwise, a single <key,ParseStatus.getEmptyParse()
> pair. - Throws:
ParseException
- If there is no suitableParser
found to perform the parse.
-
-