The bag-of-words model is the earliest text vectorization method with words as the basic processing unit. Which of the following options is not the bag of words model.
Answer : C
The LDA algorithm assumes that the prior distribution of topics in documents and the prior distribution of words in topics follow the Dirichlet distribution.
Answer : A
The TF algorithm is a statistical word that appears in how many documents in the document set. The basic idea is that if a word appears in fewer documents, the stronger the ability to distinguish documents.
Answer : B
TF-IDF is a calculation method based on statistics, commonly used to evaluate the importance of a word in a document set to all documents.
Answer : B
What is incorrect about the following analysis?
Answer : B
Word2vce uses the cosine value to calculate the similarity, the distance range is between 0-1, the larger the value, the higher the correlation between the two words.
Answer : A
Word2vce is an open source tool for goodle, which can calculate the distance between words based on the set of input words.
Answer : B