Class LinkDumper
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.scoring.webgraph.LinkDumper
-
- All Implemented Interfaces:
Configurable
,Tool
public class LinkDumper extends Configured implements Tool
The LinkDumper tool creates a database of node to inlink information that can be read using the nested Reader class. This allows the inlink and scoring state of a single url to be reviewed quickly to determine why a given url is ranking a certain way. This tool is to be used with the LinkRank analysis.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
LinkDumper.Inverter
Inverts outlinks from the WebGraph to inlinks and attaches node information.static class
LinkDumper.LinkNode
Bean class which holds url to node information.static class
LinkDumper.LinkNodes
Writable class which holds an array of LinkNode objects.static class
LinkDumper.Merger
Merges LinkNode objects into a single array value per url.static class
LinkDumper.Reader
Reader class which will print out the url and all of its inlinks to system out.
-
Constructor Summary
Constructors Constructor Description LinkDumper()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
dumpLinks(Path webGraphDb)
Runs the inverter and merger jobs of the LinkDumper tool to create the url to inlink node database.static void
main(String[] args)
int
run(String[] args)
Runs the LinkDumper tool.-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Field Detail
-
DUMP_DIR
public static final String DUMP_DIR
- See Also:
- Constant Field Values
-
-
Method Detail
-
dumpLinks
public void dumpLinks(Path webGraphDb) throws IOException, InterruptedException, ClassNotFoundException
Runs the inverter and merger jobs of the LinkDumper tool to create the url to inlink node database.- Parameters:
webGraphDb
- thePath
to the output ofWebGraph.createWebGraph(Path, Path[], boolean, boolean)
- Throws:
IOException
- if there is a fatal I/O issue at runtimeInterruptedException
- if the Job is interrupted during executionClassNotFoundException
- if classes required to run the Job cannot be located
-
-