Class BasicURLNormalizer
- java.lang.Object
-
- org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer
-
- All Implemented Interfaces:
Configurable
,URLNormalizer
public class BasicURLNormalizer extends Object implements URLNormalizer
Converts URLs to a normal form:- remove dot segments in path:
/./
or/../
- remove default ports, e.g. 80 for protocol
http://
- normalize percent-encoding in URL paths
-
-
Field Summary
Fields Modifier and Type Field Description static String
NORM_HOST_IDN
static String
NORM_HOST_TRIM_TRAILING_DOT
-
Fields inherited from interface org.apache.nutch.net.URLNormalizer
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description BasicURLNormalizer()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Configuration
getConf()
static void
main(String[] args)
String
normalize(String urlString, String scope)
void
setConf(Configuration conf)
-
-
-
Field Detail
-
NORM_HOST_IDN
public static final String NORM_HOST_IDN
- See Also:
- Constant Field Values
-
NORM_HOST_TRIM_TRAILING_DOT
public static final String NORM_HOST_TRIM_TRAILING_DOT
- See Also:
- Constant Field Values
-
-
Method Detail
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interfaceConfigurable
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interfaceConfigurable
-
normalize
public String normalize(String urlString, String scope) throws MalformedURLException
- Specified by:
normalize
in interfaceURLNormalizer
- Throws:
MalformedURLException
-
main
public static void main(String[] args) throws IOException
- Throws:
IOException
-
-