Package org.apache.nutch.parse
Class ParseData
- java.lang.Object
-
- org.apache.hadoop.io.VersionedWritable
-
- org.apache.nutch.parse.ParseData
-
- All Implemented Interfaces:
Writable
public final class ParseData extends VersionedWritable
Data extracted from a page's content.- See Also:
Parse.getData()
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
equals(Object o)
Metadata
getContentMeta()
The originalMetadata
retrieved from contentString
getMeta(String name)
Get a metadata single value.Outlink[]
getOutlinks()
Get the outlinks of the page.Metadata
getParseMeta()
Other content properties.ParseStatus
getStatus()
Get the status of parsing the page.String
getTitle()
Get the title of the page.byte
getVersion()
static void
main(String[] argv)
static ParseData
read(DataInput in)
void
readFields(DataInput in)
void
setOutlinks(Outlink[] outlinks)
void
setParseMeta(Metadata parseMeta)
String
toString()
void
write(DataOutput out)
-
-
-
Field Detail
-
DIR_NAME
public static final String DIR_NAME
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
ParseData
public ParseData()
-
ParseData
public ParseData(ParseStatus status, String title, Outlink[] outlinks, Metadata contentMeta)
-
ParseData
public ParseData(ParseStatus status, String title, Outlink[] outlinks, Metadata contentMeta, Metadata parseMeta)
-
-
Method Detail
-
getStatus
public ParseStatus getStatus()
Get the status of parsing the page.- Returns:
- the
ParseStatus
-
getOutlinks
public Outlink[] getOutlinks()
Get the outlinks of the page.- Returns:
- an array of
Outlink
's
-
getContentMeta
public Metadata getContentMeta()
The originalMetadata
retrieved from content- Returns:
- the original content
Metadata
-
getParseMeta
public Metadata getParseMeta()
Other content properties. This is the place to find format-specific properties. Different parser implementations for different content types will populate this differently.- Returns:
- a
Metadata
-
setParseMeta
public void setParseMeta(Metadata parseMeta)
-
setOutlinks
public void setOutlinks(Outlink[] outlinks)
-
getMeta
public String getMeta(String name)
Get a metadata single value. This method first looks for the metadata value in the parse metadata. If no value is found it the looks for the metadata in the content metadata.- Parameters:
name
- the metadata key for which to retrieve a value- Returns:
- the (string) metadata value
- See Also:
getContentMeta()
,getParseMeta()
-
getVersion
public byte getVersion()
- Specified by:
getVersion
in classVersionedWritable
-
readFields
public final void readFields(DataInput in) throws IOException
- Specified by:
readFields
in interfaceWritable
- Overrides:
readFields
in classVersionedWritable
- Throws:
IOException
-
write
public final void write(DataOutput out) throws IOException
- Specified by:
write
in interfaceWritable
- Overrides:
write
in classVersionedWritable
- Throws:
IOException
-
read
public static ParseData read(DataInput in) throws IOException
- Throws:
IOException
-
-