Class ScoreUpdater
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.scoring.webgraph.ScoreUpdater
-
- All Implemented Interfaces:
Configurable
,Tool
public class ScoreUpdater extends Configured implements Tool
Updates the score from the WebGraph node database into the crawl database. Any score that is not in the node database is set to the clear score in the crawl database.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ScoreUpdater.ScoreUpdaterMapper
Changes input into ObjectWritables.static class
ScoreUpdater.ScoreUpdaterReducer
Creates new CrawlDatum objects with the updated score from the NodeDb or with a cleared score.
-
Constructor Summary
Constructors Constructor Description ScoreUpdater()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static void
main(String[] args)
int
run(String[] args)
Runs the ScoreUpdater tool.void
update(Path crawlDb, Path webGraphDb)
Updates the inlink score in the web graph node databsae into the crawl database.-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Method Detail
-
update
public void update(Path crawlDb, Path webGraphDb) throws IOException, ClassNotFoundException, InterruptedException
Updates the inlink score in the web graph node databsae into the crawl database.- Parameters:
crawlDb
- The crawl database to updatewebGraphDb
- The webgraph database to use.- Throws:
IOException
- If an error occurs while updating the scores.InterruptedException
- if the Job is interrupted during executionClassNotFoundException
- if classes required to run the Job cannot be located
-
-