Class Model


  • public class Model
    extends Object
    This class creates a model used to store Document vector representation of the corpus.
    • Field Detail

      • isModelCreated

        public static boolean isModelCreated
    • Constructor Detail

      • Model

        public Model()
    • Method Detail

      • createDocVector

        public static DocVector createDocVector​(String content,
                                                int mingram,
                                                int maxgram)
        Used to create a DocVector from given String text. Used during the parse stage of the crawl cycle to create a DocVector of the currently parsed page from the parseText attribute value
        Parameters:
        content - The text to tokenize
        mingram - Value of mingram for tokenizing
        maxgram - Value of maxgram for tokenizing
        Returns:
        The created DocVector
      • computeCosineSimilarity

        public static float computeCosineSimilarity​(DocVector docVector)
      • retrieveNgrams

        public static int[] retrieveNgrams​(Configuration conf)
        Retrieves mingram and maxgram from configuration
        Parameters:
        conf - Configuration to retrieve mingram and maxgram
        Returns:
        ngram array as mingram at first index and maxgram at second index