r - How to calculate similarity in pairwise_similarity function?

Document similarity was calculated using tidytext package and widyr package. like this..

library(janeaustenr)
library(dplyr)
library(tidytext)

# Comparing Jane Austen novels
austen_words <- austen_books() %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words, by = "word") %>%
  count(book, word) %>%
  ungroup()

# closest books to each other
closest <- austen_words %>%
  pairwise_similarity(book, word, n) %>%
  arrange(desc(similarity))

closest

closest %>%
  filter(item1 == "Emma")

How is the similarity calculated in pairwise_similarity function?

Some words may not appear in common in the two documents. Are these words counted?

Or is it ignoring these words and counting only those words that are common to both documents?

If a word has similar tf-idf scores in both documents, is it considered similar?

question from:https://stackoverflow.com/questions/65947803/how-to-calculate-similarity-in-pairwise-similarity-function

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

Categories

r - How to calculate similarity in pairwise_similarity function?

r - How to calculate similarity in pairwise_similarity function?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags