Package org.apache.nutch.crawl
Class DeduplicationJob.DedupReducer<K extends Writable>
- java.lang.Object
-
- org.apache.hadoop.mapreduce.Reducer<K,CrawlDatum,Text,CrawlDatum>
-
- org.apache.nutch.crawl.DeduplicationJob.DedupReducer<K>
-
- Enclosing class:
- DeduplicationJob
public static class DeduplicationJob.DedupReducer<K extends Writable> extends Reducer<K,CrawlDatum,Text,CrawlDatum>
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Reducer
Reducer.Context
-
-
Field Summary
Fields Modifier and Type Field Description protected String[]
compareOrder
-
Constructor Summary
Constructors Constructor Description DedupReducer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected CrawlDatum
getDuplicate(CrawlDatum existingDoc, CrawlDatum newDoc)
void
reduce(K key, Iterable<CrawlDatum> values, Reducer.Context context)
void
setup(Reducer.Context context)
protected void
writeOutAsDuplicate(CrawlDatum datum, Reducer.Context context)
-
-
-
Field Detail
-
compareOrder
protected String[] compareOrder
-
-
Method Detail
-
setup
public void setup(Reducer.Context context)
- Overrides:
setup
in classReducer<K extends Writable,CrawlDatum,Text,CrawlDatum>
-
writeOutAsDuplicate
protected void writeOutAsDuplicate(CrawlDatum datum, Reducer.Context context) throws IOException, InterruptedException
- Throws:
IOException
InterruptedException
-
reduce
public void reduce(K key, Iterable<CrawlDatum> values, Reducer.Context context) throws IOException, InterruptedException
- Overrides:
reduce
in classReducer<K extends Writable,CrawlDatum,Text,CrawlDatum>
- Throws:
IOException
InterruptedException
-
getDuplicate
protected CrawlDatum getDuplicate(CrawlDatum existingDoc, CrawlDatum newDoc)
-
-