Package org.apache.nutch.tools.arc
Class ArcSegmentCreator.ArcSegmentCreatorMapper
- java.lang.Object
-
- org.apache.hadoop.mapreduce.Mapper<Text,BytesWritable,Text,NutchWritable>
-
- org.apache.nutch.tools.arc.ArcSegmentCreator.ArcSegmentCreatorMapper
-
- Enclosing class:
- ArcSegmentCreator
public static class ArcSegmentCreator.ArcSegmentCreatorMapper extends Mapper<Text,BytesWritable,Text,NutchWritable>
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper
Mapper.Context
-
-
Field Summary
Fields Modifier and Type Field Description static String
URL_VERSION
-
Constructor Summary
Constructors Constructor Description ArcSegmentCreatorMapper()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
map(Text key, BytesWritable bytes, Mapper.Context context)
Runs the Map job to translate an arc record into output for Nutch segments.void
setup(Mapper.Context context)
Configures the job mapper.
-
-
-
Field Detail
-
URL_VERSION
public static final String URL_VERSION
- See Also:
- Constant Field Values
-
-
Method Detail
-
setup
public void setup(Mapper.Context context)
Configures the job mapper. Sets the url filters, scoring filters, url normalizers and other relevant data.- Overrides:
setup
in classMapper<Text,BytesWritable,Text,NutchWritable>
- Parameters:
context
- The task context.
-
map
public void map(Text key, BytesWritable bytes, Mapper.Context context) throws IOException, InterruptedException
Runs the Map job to translate an arc record into output for Nutch segments.- Overrides:
map
in classMapper<Text,BytesWritable,Text,NutchWritable>
- Parameters:
key
- The arc record header.bytes
- The arc record raw content bytes.context
- The context of the mapreduce job.- Throws:
IOException
InterruptedException
-
-