Class Model
- java.lang.Object
-
- org.apache.nutch.scoring.similarity.cosine.Model
-
public class Model extends Object
This class creates a model used to store Document vector representation of the corpus.
-
-
Field Summary
Fields Modifier and Type Field Description static ArrayList<DocVector>
docVectors
static boolean
isModelCreated
-
Constructor Summary
Constructors Constructor Description Model()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static float
computeCosineSimilarity(DocVector docVector)
static DocVector
createDocVector(String content, int mingram, int maxgram)
Used to create a DocVector from given String text.static void
createModel(Configuration conf)
static int[]
retrieveNgrams(Configuration conf)
Retrieves mingram and maxgram from configuration
-
-
-
Method Detail
-
createModel
public static void createModel(Configuration conf) throws IOException
- Throws:
IOException
-
createDocVector
public static DocVector createDocVector(String content, int mingram, int maxgram)
Used to create a DocVector from given String text. Used during the parse stage of the crawl cycle to create a DocVector of the currently parsed page from the parseText attribute value- Parameters:
content
- The text to tokenizemingram
- Value of mingram for tokenizingmaxgram
- Value of maxgram for tokenizing- Returns:
- The created
DocVector
-
computeCosineSimilarity
public static float computeCosineSimilarity(DocVector docVector)
-
retrieveNgrams
public static int[] retrieveNgrams(Configuration conf)
Retrieves mingram and maxgram from configuration- Parameters:
conf
- Configuration to retrieve mingram and maxgram- Returns:
- ngram array as mingram at first index and maxgram at second index
-
-