Package org.apache.nutch.segment
Class SegmentReader
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.nutch.segment.SegmentReader
-
- All Implemented Interfaces:
Configurable
,Tool
public class SegmentReader extends Configured implements Tool
Dump the content of a segment.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
SegmentReader.InputCompatMapper
static class
SegmentReader.InputCompatReducer
static class
SegmentReader.SegmentReaderStats
static class
SegmentReader.TextOutputFormat
Implements a text output format
-
Constructor Summary
Constructors Constructor Description SegmentReader()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
dump(Path segment, Path output)
void
get(Path segment, Text key, Writer writer, Map<String,List<Writable>> results)
static Charset
getCharset(Metadata parseMeta)
Try to get HTML encoding from parse metadata.void
getStats(Path segment, SegmentReader.SegmentReaderStats stats)
void
list(List<Path> dirs, Writer writer)
static void
main(String[] args)
int
run(String[] args)
-
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
-
-
-
Method Detail
-
dump
public void dump(Path segment, Path output) throws IOException, InterruptedException, ClassNotFoundException
-
get
public void get(Path segment, Text key, Writer writer, Map<String,List<Writable>> results) throws Exception
- Throws:
Exception
-
getCharset
public static Charset getCharset(Metadata parseMeta)
Try to get HTML encoding from parse metadata. TryNutch.CHAR_ENCODING_FOR_CONVERSION
, thenHttpHeaders.CONTENT_ENCODING
then fallbackStandardCharsets.UTF_8
-
getStats
public void getStats(Path segment, SegmentReader.SegmentReaderStats stats) throws Exception
- Throws:
Exception
-
-