Package org.apache.nutch.metadata
Interface Nutch
-
- All Known Implementing Classes:
CaseInsensitiveMetadata
,Metadata
,SpellCheckedMetadata
public interface Nutch
A collection of Nutch internal metadata constants.- Author:
- Chris Mattmann, Jérôme Charron
-
-
Field Summary
Fields Modifier and Type Field Description static String
ARG_CRAWLDB
Argument key to specify the location of crawldb for the REST endpointsstatic String
ARG_HOSTDB
Argument key to specify the location of hostdb for the REST endpointsstatic String
ARG_LINKDB
Argument key to specify the location of linkdb for the REST endpointsstatic String
ARG_SEEDDIR
Argument key to specify location of the seed url dir for the REST endpointsstatic String
ARG_SEEDNAME
Argument key to specify name of a seed list for the REST endpointsstatic String
ARG_SEGMENTDIR
Argument key to specify the location of a directory of segments for the REST endpoints.static String
ARG_SEGMENTS
Argument key to specify the location of individual segment or list of segments for the REST endpoints.static String
CACHING_FORBIDDEN_ALL
Don't show either original forbidden content or summaries.static String
CACHING_FORBIDDEN_CONTENT
Don't show original forbidden content, but show summaries.static String
CACHING_FORBIDDEN_KEY
Sites may request that search engines don't provide access to cached documents.static String
CACHING_FORBIDDEN_NONE
Show both original forbidden content and summaries (default).static String
CHAR_ENCODING_FOR_CONVERSION
static String
CRAWL_ID_KEY
Used by Nutch REST servicestatic String
FETCH_EVENT_CONTENTLANG
Content-lanueage key in the Pub/Sub event metadata for the content-language of the parsed pagestatic String
FETCH_EVENT_CONTENTTYPE
Content-type key in the Pub/Sub event metadata for the content-type of the parsed pagestatic String
FETCH_EVENT_FETCHTIME
Fetch time key in the Pub/Sub event metadata for the fetch time of the parsed pagestatic String
FETCH_EVENT_SCORE
Score key in the Pub/Sub event metadata for the score of the parsed pagestatic String
FETCH_EVENT_TITLE
Title key in the Pub/Sub event metadata for the title of the parsed pagestatic String
FETCH_STATUS_KEY
static String
FETCH_TIME_KEY
static String
FIXED_INTERVAL_KEY
Used by AdaptiveFetchSchedule to maintain custom fetch intervalstatic String
GENERATE_TIME_KEY
static String
ORIGINAL_CHAR_ENCODING
static String
PROTO_STATUS_KEY
static Text
PROTOCOL_STATUS_CODE_KEY
static String
REPR_URL_KEY
static String
ROBOTS_METATAG
Name to store the robots metatag inParseData
's metadata.static String
SCORE_KEY
static String
SEGMENT_NAME_KEY
static String
SIGNATURE_KEY
static String
STAT_PROGRESS
For progress of job.static String
VAL_RESULT
Name of the key used in the Result Map sent back by the REST endpointstatic Text
WRITABLE_FIXED_INTERVAL_KEY
static Text
WRITABLE_GENERATE_TIME_KEY
static Text
WRITABLE_PROTO_STATUS_KEY
static Text
WRITABLE_REPR_URL_KEY
-
-
-
Field Detail
-
ORIGINAL_CHAR_ENCODING
static final String ORIGINAL_CHAR_ENCODING
- See Also:
- Constant Field Values
-
CHAR_ENCODING_FOR_CONVERSION
static final String CHAR_ENCODING_FOR_CONVERSION
- See Also:
- Constant Field Values
-
SIGNATURE_KEY
static final String SIGNATURE_KEY
- See Also:
- Constant Field Values
-
SEGMENT_NAME_KEY
static final String SEGMENT_NAME_KEY
- See Also:
- Constant Field Values
-
SCORE_KEY
static final String SCORE_KEY
- See Also:
- Constant Field Values
-
GENERATE_TIME_KEY
static final String GENERATE_TIME_KEY
- See Also:
- Constant Field Values
-
WRITABLE_GENERATE_TIME_KEY
static final Text WRITABLE_GENERATE_TIME_KEY
-
PROTOCOL_STATUS_CODE_KEY
static final Text PROTOCOL_STATUS_CODE_KEY
-
PROTO_STATUS_KEY
static final String PROTO_STATUS_KEY
- See Also:
- Constant Field Values
-
WRITABLE_PROTO_STATUS_KEY
static final Text WRITABLE_PROTO_STATUS_KEY
-
FETCH_TIME_KEY
static final String FETCH_TIME_KEY
- See Also:
- Constant Field Values
-
FETCH_STATUS_KEY
static final String FETCH_STATUS_KEY
- See Also:
- Constant Field Values
-
ROBOTS_METATAG
static final String ROBOTS_METATAG
Name to store the robots metatag inParseData
's metadata.- See Also:
- Constant Field Values
-
CACHING_FORBIDDEN_KEY
static final String CACHING_FORBIDDEN_KEY
Sites may request that search engines don't provide access to cached documents.- See Also:
- Constant Field Values
-
CACHING_FORBIDDEN_NONE
static final String CACHING_FORBIDDEN_NONE
Show both original forbidden content and summaries (default).- See Also:
- Constant Field Values
-
CACHING_FORBIDDEN_ALL
static final String CACHING_FORBIDDEN_ALL
Don't show either original forbidden content or summaries.- See Also:
- Constant Field Values
-
CACHING_FORBIDDEN_CONTENT
static final String CACHING_FORBIDDEN_CONTENT
Don't show original forbidden content, but show summaries.- See Also:
- Constant Field Values
-
REPR_URL_KEY
static final String REPR_URL_KEY
- See Also:
- Constant Field Values
-
WRITABLE_REPR_URL_KEY
static final Text WRITABLE_REPR_URL_KEY
-
FIXED_INTERVAL_KEY
static final String FIXED_INTERVAL_KEY
Used by AdaptiveFetchSchedule to maintain custom fetch interval- See Also:
- Constant Field Values
-
WRITABLE_FIXED_INTERVAL_KEY
static final Text WRITABLE_FIXED_INTERVAL_KEY
-
STAT_PROGRESS
static final String STAT_PROGRESS
For progress of job. Used by the Nutch REST service- See Also:
- Constant Field Values
-
CRAWL_ID_KEY
static final String CRAWL_ID_KEY
Used by Nutch REST service- See Also:
- Constant Field Values
-
ARG_SEEDDIR
static final String ARG_SEEDDIR
Argument key to specify location of the seed url dir for the REST endpoints- See Also:
- Constant Field Values
-
ARG_SEEDNAME
static final String ARG_SEEDNAME
Argument key to specify name of a seed list for the REST endpoints- See Also:
- Constant Field Values
-
ARG_CRAWLDB
static final String ARG_CRAWLDB
Argument key to specify the location of crawldb for the REST endpoints- See Also:
- Constant Field Values
-
ARG_LINKDB
static final String ARG_LINKDB
Argument key to specify the location of linkdb for the REST endpoints- See Also:
- Constant Field Values
-
VAL_RESULT
static final String VAL_RESULT
Name of the key used in the Result Map sent back by the REST endpoint- See Also:
- Constant Field Values
-
ARG_SEGMENTDIR
static final String ARG_SEGMENTDIR
Argument key to specify the location of a directory of segments for the REST endpoints. Similar to the -dir command in the bin/nutch script- See Also:
- Constant Field Values
-
ARG_SEGMENTS
static final String ARG_SEGMENTS
Argument key to specify the location of individual segment or list of segments for the REST endpoints. The behavior differs for diffirent endpoints: CrawlDb, LinkDb and Indexing Jobs take list of segments, Fetcher and Parse segment take one segment- See Also:
- Constant Field Values
-
ARG_HOSTDB
static final String ARG_HOSTDB
Argument key to specify the location of hostdb for the REST endpoints- See Also:
- Constant Field Values
-
FETCH_EVENT_TITLE
static final String FETCH_EVENT_TITLE
Title key in the Pub/Sub event metadata for the title of the parsed page- See Also:
- Constant Field Values
-
FETCH_EVENT_CONTENTTYPE
static final String FETCH_EVENT_CONTENTTYPE
Content-type key in the Pub/Sub event metadata for the content-type of the parsed page- See Also:
- Constant Field Values
-
FETCH_EVENT_SCORE
static final String FETCH_EVENT_SCORE
Score key in the Pub/Sub event metadata for the score of the parsed page- See Also:
- Constant Field Values
-
FETCH_EVENT_FETCHTIME
static final String FETCH_EVENT_FETCHTIME
Fetch time key in the Pub/Sub event metadata for the fetch time of the parsed page- See Also:
- Constant Field Values
-
FETCH_EVENT_CONTENTLANG
static final String FETCH_EVENT_CONTENTLANG
Content-lanueage key in the Pub/Sub event metadata for the content-language of the parsed page- See Also:
- Constant Field Values
-
-