Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
103 views
in Technique[技术] by (71.8m points)

r - How to calculate similarity in pairwise_similarity function?

Document similarity was calculated using tidytext package and widyr package. like this..

library(janeaustenr)
library(dplyr)
library(tidytext)

# Comparing Jane Austen novels
austen_words <- austen_books() %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words, by = "word") %>%
  count(book, word) %>%
  ungroup()

# closest books to each other
closest <- austen_words %>%
  pairwise_similarity(book, word, n) %>%
  arrange(desc(similarity))

closest

closest %>%
  filter(item1 == "Emma")

How is the similarity calculated in pairwise_similarity function?

Some words may not appear in common in the two documents. Are these words counted?

Or is it ignoring these words and counting only those words that are common to both documents?

If a word has similar tf-idf scores in both documents, is it considered similar?

question from:https://stackoverflow.com/questions/65947803/how-to-calculate-similarity-in-pairwise-similarity-function

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...