Class HostURLNormalizer
- java.lang.Object
-
- org.apache.nutch.net.urlnormalizer.host.HostURLNormalizer
-
- All Implemented Interfaces:
Configurable
,URLNormalizer
public class HostURLNormalizer extends Object implements URLNormalizer
URL normalizer for mapping hosts to their desired form. It takes a simple text file as source in the format: example.org www.example.org mapping all URL's of example.org the the www sub-domain. It also allows for wildcards to be used to map all sub-domains to another host: *.example.org www.example.org
-
-
Field Summary
-
Fields inherited from interface org.apache.nutch.net.URLNormalizer
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description HostURLNormalizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Configuration
getConf()
String
normalize(String urlString, String scope)
protected String
replaceHost(String urlString, String host, String target)
void
setConf(Configuration conf)
-
-
-
Method Detail
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interfaceConfigurable
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interfaceConfigurable
-
normalize
public String normalize(String urlString, String scope) throws MalformedURLException
- Specified by:
normalize
in interfaceURLNormalizer
- Throws:
MalformedURLException
-
-