Package org.apache.nutch.indexwriter.csv
Class CSVIndexWriter
- java.lang.Object
-
- org.apache.nutch.indexwriter.csv.CSVIndexWriter
-
- All Implemented Interfaces:
Configurable
,IndexWriter
,Pluggable
public class CSVIndexWriter extends Object implements IndexWriter
Write Nutch documents to a CSV file (comma separated values), i.e., dump index as CSV or tab-separated plain text table. Format (encoding, separators, etc.) is configurable by a couple of options, see output ofdescribe()
.Note: works only in local mode, to be used with index option
-noCommit
.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected class
CSVIndexWriter.Separator
represent separators (also quote and escape characters) as char(s) and byte(s) in the output encoding for efficiency.
-
Field Summary
Fields Modifier and Type Field Description protected FSDataOutputStream
csvout
protected Charset
encoding
encoding of CSV file-
Fields inherited from interface org.apache.nutch.indexer.IndexWriter
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description CSVIndexWriter()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
void
commit()
(nothing to commit)void
delete(String key)
(deletion of documents is not supported)Map<String,Map.Entry<String,Object>>
describe()
ReturnsMap
with the specific parameters the IndexWriter instance can take.Configuration
getConf()
static void
main(String[] args)
void
open(Configuration conf, String name)
void
open(IndexWriterParams parameters)
Initializes the internal variables from a given index writer configuration.void
setConf(Configuration conf)
void
update(NutchDocument doc)
void
write(NutchDocument doc)
-
-
-
Field Detail
-
encoding
protected Charset encoding
encoding of CSV file
-
csvout
protected FSDataOutputStream csvout
-
-
Method Detail
-
open
public void open(Configuration conf, String name) throws IOException
- Specified by:
open
in interfaceIndexWriter
- Parameters:
conf
- Nutch configurationname
- target name of theIndexWriter
to be opened- Throws:
IOException
- Some exception thrown by some writer.
-
open
public void open(IndexWriterParams parameters) throws IOException
Initializes the internal variables from a given index writer configuration.- Specified by:
open
in interfaceIndexWriter
- Parameters:
parameters
- Params from the index writer configuration.- Throws:
IOException
- Some exception thrown by writer.
-
write
public void write(NutchDocument doc) throws IOException
- Specified by:
write
in interfaceIndexWriter
- Throws:
IOException
-
delete
public void delete(String key)
(deletion of documents is not supported)- Specified by:
delete
in interfaceIndexWriter
-
update
public void update(NutchDocument doc) throws IOException
- Specified by:
update
in interfaceIndexWriter
- Throws:
IOException
-
close
public void close() throws IOException
- Specified by:
close
in interfaceIndexWriter
- Throws:
IOException
-
commit
public void commit()
(nothing to commit)- Specified by:
commit
in interfaceIndexWriter
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interfaceConfigurable
-
describe
public Map<String,Map.Entry<String,Object>> describe()
ReturnsMap
with the specific parameters the IndexWriter instance can take.- Specified by:
describe
in interfaceIndexWriter
- Returns:
- The values of each row. It must have the form <KEY,<DESCRIPTION,VALUE>>.
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interfaceConfigurable
-
-