Class WebGraph.OutlinkDb
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.scoring.webgraph.WebGraph.OutlinkDb
-
- All Implemented Interfaces:
Configurable
- Enclosing class:
- WebGraph
public static class WebGraph.OutlinkDb extends Configured
The OutlinkDb creates a database of all outlinks. Outlinks to internal urls by domain and host can be ignored. The number of Outlinks out to a given page or domain can also be limited.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
WebGraph.OutlinkDb.OutlinkDbMapper
Passes through existing LinkDatum objects from an existing OutlinkDb and maps out new LinkDatum objects from new crawls ParseData.static class
WebGraph.OutlinkDb.OutlinkDbReducer
-
Field Summary
Fields Modifier and Type Field Description static String
URL_FILTERING
static String
URL_NORMALIZING
-
Constructor Summary
Constructors Constructor Description OutlinkDb()
Default constructor.OutlinkDb(Configuration conf)
Configurable constructor.
-
-
-
Field Detail
-
URL_NORMALIZING
public static final String URL_NORMALIZING
- See Also:
- Constant Field Values
-
URL_FILTERING
public static final String URL_FILTERING
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
OutlinkDb
public OutlinkDb()
Default constructor.
-
OutlinkDb
public OutlinkDb(Configuration conf)
Configurable constructor.- Parameters:
conf
- a populatedConfiguration
-
-