Package org.apache.nutch.indexer
Class IndexingJob
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.util.NutchTool
-
- org.apache.nutch.indexer.IndexingJob
-
- All Implemented Interfaces:
Configurable
,Tool
public class IndexingJob extends NutchTool implements Tool
Generic indexer which relies on the plugins implementing IndexWriter
-
-
Field Summary
-
Fields inherited from class org.apache.nutch.util.NutchTool
currentJob, currentJobNum, numJobs, results, status
-
-
Constructor Summary
Constructors Constructor Description IndexingJob()
IndexingJob(Configuration conf)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit)
void
index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit, boolean deleteGone)
void
index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit, boolean deleteGone, String params)
void
index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit, boolean deleteGone, String params, boolean filter, boolean normalize)
void
index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit, boolean deleteGone, String params, boolean filter, boolean normalize, boolean addBinaryContent)
void
index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit, boolean deleteGone, String params, boolean filter, boolean normalize, boolean addBinaryContent, boolean base64)
static void
main(String[] args)
int
run(String[] args)
Map<String,Object>
run(Map<String,Object> args, String crawlId)
Runs the tool, using a map of arguments.-
Methods inherited from class org.apache.nutch.util.NutchTool
getProgress, getStatus, killJob, setConf, stopJob
-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Constructor Detail
-
IndexingJob
public IndexingJob()
-
IndexingJob
public IndexingJob(Configuration conf)
-
-
Method Detail
-
index
public void index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit) throws IOException, InterruptedException, ClassNotFoundException
-
index
public void index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit, boolean deleteGone) throws IOException, InterruptedException, ClassNotFoundException
-
index
public void index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit, boolean deleteGone, String params) throws IOException, InterruptedException, ClassNotFoundException
-
index
public void index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit, boolean deleteGone, String params, boolean filter, boolean normalize) throws IOException, InterruptedException, ClassNotFoundException
-
index
public void index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit, boolean deleteGone, String params, boolean filter, boolean normalize, boolean addBinaryContent) throws IOException, InterruptedException, ClassNotFoundException
-
index
public void index(Path crawlDb, Path linkDb, List<Path> segments, boolean noCommit, boolean deleteGone, String params, boolean filter, boolean normalize, boolean addBinaryContent, boolean base64) throws IOException, InterruptedException, ClassNotFoundException
-
run
public Map<String,Object> run(Map<String,Object> args, String crawlId) throws Exception
Description copied from class:NutchTool
Runs the tool, using a map of arguments. May return results, or null.- Specified by:
run
in classNutchTool
- Parameters:
args
- aMap
of arguments to be run with the toolcrawlId
- a crawl identifier to associate with the tool invocation- Returns:
- Map results object if tool executes successfully otherwise null
- Throws:
Exception
- if there is an error during the tool execution
-
-