Package org.apache.nutch.indexer
Class CleaningJob
- java.lang.Object
-
- org.apache.nutch.indexer.CleaningJob
-
- All Implemented Interfaces:
Configurable
,Tool
public class CleaningJob extends Object implements Tool
The class scans CrawlDB looking for entries with status DB_GONE (404) or DB_DUPLICATE and sends delete requests to indexers for those documents.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CleaningJob.DBFilter
static class
CleaningJob.DeleterReducer
-
Constructor Summary
Constructors Constructor Description CleaningJob()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
delete(String crawldb, boolean noCommit)
Configuration
getConf()
static void
main(String[] args)
int
run(String[] args)
void
setConf(Configuration conf)
-
-
-
Method Detail
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interfaceConfigurable
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interfaceConfigurable
-
delete
public void delete(String crawldb, boolean noCommit) throws IOException, InterruptedException, ClassNotFoundException
-
run
public int run(String[] args) throws IOException
- Specified by:
run
in interfaceTool
- Throws:
IOException
-
-