Package org.apache.nutch.indexer
Class IndexerMapReduce
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.indexer.IndexerMapReduce
-
- All Implemented Interfaces:
Configurable
public class IndexerMapReduce extends Configured
This class is typically invoked from within
IndexingJob
and handles all MapReduce functionality required when undertaking indexing.This is a consequence of one or more indexing plugins being invoked which extend
IndexWriter
.See
initMRJob(Path, Path, Collection, Job, boolean)
for details on the specific data structures and parameters required for indexing.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
IndexerMapReduce.IndexerMapper
static class
IndexerMapReduce.IndexerReducer
-
Field Summary
Fields Modifier and Type Field Description static String
INDEXER_BINARY_AS_BASE64
static String
INDEXER_DELETE
static String
INDEXER_DELETE_ROBOTS_NOINDEX
static String
INDEXER_DELETE_SKIPPED
static String
INDEXER_NO_COMMIT
static String
INDEXER_PARAMS
static String
INDEXER_SKIP_NOTMODIFIED
static String
URL_FILTERING
static String
URL_NORMALIZING
-
Constructor Summary
Constructors Constructor Description IndexerMapReduce()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
initMRJob(Path crawlDb, Path linkDb, Collection<Path> segments, Job job, boolean addBinaryContent)
-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
-
-
-
Field Detail
-
INDEXER_PARAMS
public static final String INDEXER_PARAMS
- See Also:
- Constant Field Values
-
INDEXER_DELETE
public static final String INDEXER_DELETE
- See Also:
- Constant Field Values
-
INDEXER_NO_COMMIT
public static final String INDEXER_NO_COMMIT
- See Also:
- Constant Field Values
-
INDEXER_DELETE_ROBOTS_NOINDEX
public static final String INDEXER_DELETE_ROBOTS_NOINDEX
- See Also:
- Constant Field Values
-
INDEXER_DELETE_SKIPPED
public static final String INDEXER_DELETE_SKIPPED
- See Also:
- Constant Field Values
-
INDEXER_SKIP_NOTMODIFIED
public static final String INDEXER_SKIP_NOTMODIFIED
- See Also:
- Constant Field Values
-
URL_FILTERING
public static final String URL_FILTERING
- See Also:
- Constant Field Values
-
URL_NORMALIZING
public static final String URL_NORMALIZING
- See Also:
- Constant Field Values
-
INDEXER_BINARY_AS_BASE64
public static final String INDEXER_BINARY_AS_BASE64
- See Also:
- Constant Field Values
-
-
Method Detail
-
initMRJob
public static void initMRJob(Path crawlDb, Path linkDb, Collection<Path> segments, Job job, boolean addBinaryContent) throws IOException
- Throws:
IOException
-
-