Package org.apache.nutch.segment
Class ContentAsTextInputFormat
- java.lang.Object
-
- org.apache.hadoop.mapreduce.InputFormat<K,V>
-
- org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>
-
- org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<Text,Text>
-
- org.apache.nutch.segment.ContentAsTextInputFormat
-
public class ContentAsTextInputFormat extends SequenceFileInputFormat<Text,Text>
An input format that takes Nutch Content objects and converts them to text while converting newline endings to spaces. This format is useful for working with Nutch content objects in Hadoop Streaming with other languages.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
FileInputFormat.Counter
-
-
Field Summary
-
Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
-
-
Constructor Summary
Constructors Constructor Description ContentAsTextInputFormat()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description RecordReader<Text,Text>
getRecordReader(InputSplit split, Job job, Mapper.Context context)
-
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat
createRecordReader, getFormatMinSplitSize, listStatus
-
Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, isSplitable, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
-
-
-
-
Method Detail
-
getRecordReader
public RecordReader<Text,Text> getRecordReader(InputSplit split, Job job, Mapper.Context context) throws IOException
- Throws:
IOException
-
-