Package org.apache.nutch.crawl
Class Injector.InjectReducer
- java.lang.Object
-
- org.apache.hadoop.mapreduce.Reducer<Text,CrawlDatum,Text,CrawlDatum>
-
- org.apache.nutch.crawl.Injector.InjectReducer
-
- Enclosing class:
- Injector
public static class Injector.InjectReducer extends Reducer<Text,CrawlDatum,Text,CrawlDatum>
Combine multiple new entries for a url.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Reducer
Reducer.Context
-
-
Constructor Summary
Constructors Constructor Description InjectReducer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
reduce(Text key, Iterable<CrawlDatum> values, Reducer.Context context)
Merge the input records of one URL as per rules below :void
setup(Reducer.Context context)
-
-
-
Method Detail
-
setup
public void setup(Reducer.Context context)
- Overrides:
setup
in classReducer<Text,CrawlDatum,Text,CrawlDatum>
-
reduce
public void reduce(Text key, Iterable<CrawlDatum> values, Reducer.Context context) throws IOException, InterruptedException
Merge the input records of one URL as per rules below :1. If there is ONLY new injected record ==> emit injected record 2. If there is ONLY old record ==> emit existing record 3. If BOTH new and old records are present: (a) If 'overwrite' is true ==> emit injected record (b) If 'overwrite' is false : (i) If 'update' is false ==> emit existing record (ii) If 'update' is true ==> update existing record and emit it
For more details @see NUTCH-1405- Overrides:
reduce
in classReducer<Text,CrawlDatum,Text,CrawlDatum>
- Throws:
IOException
InterruptedException
-
-