Package org.apache.nutch.tools.arc
Class ArcInputFormat
- java.lang.Object
-
- org.apache.hadoop.mapreduce.InputFormat<K,V>
-
- org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Text,BytesWritable>
-
- org.apache.nutch.tools.arc.ArcInputFormat
-
public class ArcInputFormat extends FileInputFormat<Text,BytesWritable>
A input format the reads arc files.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
FileInputFormat.Counter
-
-
Field Summary
-
Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
-
-
Constructor Summary
Constructors Constructor Description ArcInputFormat()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description RecordReader<Text,BytesWritable>
createRecordReader(InputSplit split, TaskAttemptContext context)
RecordReader<Text,BytesWritable>
getRecordReader(InputSplit split, Job job, Mapper.Context context)
Get theRecordReader
for reading the arc file.-
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, listStatus, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
-
-
-
-
Method Detail
-
createRecordReader
public RecordReader<Text,BytesWritable> createRecordReader(InputSplit split, TaskAttemptContext context)
- Specified by:
createRecordReader
in classInputFormat<Text,BytesWritable>
-
getRecordReader
public RecordReader<Text,BytesWritable> getRecordReader(InputSplit split, Job job, Mapper.Context context) throws IOException
Get theRecordReader
for reading the arc file.- Parameters:
split
- The InputSplit of the arc file to process.job
- The job configuration.context
- The task context.- Returns:
- A configured
ArcRecordReader
- Throws:
IOException
- if there is a fatal I/O error reading theInputSplit
-
-