Class AjaxURLNormalizer
- java.lang.Object
-
- org.apache.nutch.net.urlnormalizer.ajax.AjaxURLNormalizer
-
- All Implemented Interfaces:
Configurable
,URLNormalizer
public class AjaxURLNormalizer extends Object implements URLNormalizer
URLNormalizer capable of dealing with AJAX URL's. Use the following regex filter to prevent escaped fragments from being fetched. ^(.*)\?.*_escaped_fragment_
-
-
Field Summary
Fields Modifier and Type Field Description static String
AJAX_URL_PART
static String
ESCAPED_URL_PART
-
Fields inherited from interface org.apache.nutch.net.URLNormalizer
X_POINT_ID
-
-
Constructor Summary
Constructors Constructor Description AjaxURLNormalizer()
Default constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String
escape(String fragmentPart)
Escape some exotic characters in the fragment partConfiguration
getConf()
String
normalize(String urlString, String scope)
Attempts to normalize the input URL stringprotected String
normalizeEscapedFragment(String urlString)
Returns a normalized input URL.protected String
normalizeHashedFragment(String urlString)
Returns a normalized input URL.void
setConf(Configuration conf)
protected String
unescape(String fragmentPart)
Unescape some exotic characters in the fragment part
-
-
-
Method Detail
-
normalize
public String normalize(String urlString, String scope) throws MalformedURLException
Attempts to normalize the input URL string- Specified by:
normalize
in interfaceURLNormalizer
- Parameters:
urlString
- a String to processscope
- used when indexing URLs- Returns:
- String
- Throws:
MalformedURLException
- if the urlString is malformed
-
normalizeHashedFragment
protected String normalizeHashedFragment(String urlString) throws MalformedURLException
Returns a normalized input URL. #! querystrings are transformed to a _escaped_fragment_ form.- Parameters:
urlString
- a String to process- Returns:
- String
- Throws:
MalformedURLException
- if the urlString is malformed
-
normalizeEscapedFragment
protected String normalizeEscapedFragment(String urlString) throws MalformedURLException
Returns a normalized input URL. _escaped_fragment_ querystrings are transformed to a #! form.- Parameters:
urlString
- a String to process- Returns:
- String
- Throws:
MalformedURLException
- if the urlString is malformed
-
unescape
protected String unescape(String fragmentPart)
Unescape some exotic characters in the fragment part- Parameters:
fragmentPart
- a String to process- Returns:
- String
-
escape
protected String escape(String fragmentPart)
Escape some exotic characters in the fragment part- Parameters:
fragmentPart
- a String to process- Returns:
- String
-
setConf
public void setConf(Configuration conf)
- Specified by:
setConf
in interfaceConfigurable
- Parameters:
conf
- a populatedConfiguration
-
getConf
public Configuration getConf()
- Specified by:
getConf
in interfaceConfigurable
- Returns:
- Configuration
-
-