Package org.apache.nutch.util
Class NutchTool
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.util.NutchTool
-
- All Implemented Interfaces:
Configurable
- Direct Known Subclasses:
CommonCrawlDataDumper
,CrawlDb
,DeduplicationJob
,Fetcher
,Generator
,IndexingJob
,Injector
,LinkDb
,ParseSegment
public abstract class NutchTool extends Configured
-
-
Constructor Summary
Constructors Constructor Description NutchTool()
NutchTool(Configuration conf)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description float
getProgress()
Get relative progress of the tool.Map<String,Object>
getStatus()
Returns current status of the running toolboolean
killJob()
Kill the job immediately.abstract Map<String,Object>
run(Map<String,Object> args, String crawlId)
Runs the tool, using a map of arguments.void
setConf(Configuration conf)
boolean
stopJob()
Stop the job with the possibility to resume.-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf
-
-
-
-
Constructor Detail
-
NutchTool
public NutchTool(Configuration conf)
-
NutchTool
public NutchTool()
-
-
Method Detail
-
run
public abstract Map<String,Object> run(Map<String,Object> args, String crawlId) throws Exception
Runs the tool, using a map of arguments. May return results, or null.
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interfaceConfigurable
- Overrides:
setConf
in classConfigured
-
getProgress
public float getProgress()
Get relative progress of the tool. Progress is represented as a float in range [0,1] where 1 is complete.- Returns:
- a float in range [0,1].
-
getStatus
public Map<String,Object> getStatus()
Returns current status of the running tool- Returns:
- a populated
Map
, the fields of which can be accessed to obtain status.
-
stopJob
public boolean stopJob() throws Exception
Stop the job with the possibility to resume. Subclasses should override this, since by default it callskillJob()
.
-
-