Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
157 views
in Technique[技术] by (71.8m points)

dataframe - How can I transform my data frame from wide to long in R?

I have issues with transforming my data frame from wide to long. I'm well aware that there are plenty of excellent vignettes out there, which explain gather() or pivot_longer() very precisely (e.g. https://www.storybench.org/pivoting-data-from-columns-to-rows-and-back-in-the-tidyverse/). Nevertheless, I'm still stuck for days now and this drives me crazy. Thus, I dediced to ask the internet. You.

I have a data frame that looks like this:

id     <- c(1,2,3)
year   <- c(2018,2003,2011)
lvl    <- c("A","B","C")
item.1 <- factor(c("A","A","C"),levels = lvl)
item.2 <- factor(c("C","B","A"),levels = lvl)
item.3 <- factor(c("B","B","C"),levels = lvl)
df     <- data.frame(id,year,item.1,item.2,item.3)

So we have an id variable for each observation (e.g. movies). We have a year variable, indicating when the observation took place (e.g. when the movie was released). And we have three factor variables that assessed different characteristics of the observation (e.g. cast, storyline and film music). Those three factor variables share the same factor levels "A","B" or "C" (e.g. cast of the movie was "excellent", "okay" or "shitty").

But in my wildest dreams, the data more look like this:

id.II     <- c(rep(1, 9), rep(2, 9), rep(3,9))
year.II   <- c(rep(2018, 9), rep(2003, 9), rep(2011,9))
item.II   <- rep(c(c(1,1,1),c(2,2,2),c(3,3,3)),3)
rating.II <- rep(c("A", "B", "C"), 9)
number.II  <- c(1,0,0,0,0,1,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,0,0,1)
df.II     <- data.frame(id.II,year.II,item.II,rating.II,number.II)

So now the data frame would be way more useable for further analysis. For example, the next step would be to calculate for each year the number (or even better percentage) of movies that were rated as "excellent".

year.III   <- factor(c(rep(2018, 3), rep(2003, 3), rep(2011,3)))
item.III   <- factor(rep(c(1, 2, 3), 3))
number.A.III <- c(1,0,0,1,0,0,0,1,0)
df.III     <- data.frame(year.III,item.III,number.A.III)

ggplot(data=df.III, aes(x=year.III, y=number.A.III, group=item.III)) +
  geom_line(aes(color=item.III))+
  geom_point(aes(color=item.III))+
  theme(panel.background = element_blank(),
        axis.title.y = element_blank(),
        axis.title.x = element_blank(),
        legend.position = "bottom")+
  labs(colour="Item")

Or even more important to me, show for each item (cast, storytelling, film music) the percentage of being rated as "excellent", "okay" and "shitty".

item.IV   <- factor(rep(c(c(1,1,1),c(2,2,2),c(3,3,3)),3))
rating.IV <- factor(rep(c("A", "B", "C"), 9))
number.IV <- c(2,0,1,1,1,1,0,2,1)
df.IV     <- data.frame(item.IV,rating.IV,number.IV)
df.IV

ggplot(df.IV,aes(fill=rating.IV,y=number.IV,x=item.IV))+
  geom_bar(position= position_fill(reverse = TRUE), stat="identity")+
  theme(axis.title.y = element_text(size = rel(1.2), angle = 0),
        axis.title.x = element_blank(),
        panel.background = element_blank(),
        legend.title = element_blank(),
        legend.position = "bottom")+
  labs(x = "Item")+
  coord_flip()+
  scale_x_discrete(limits = rev(levels(df.IV$item.IV)))+
  scale_y_continuous(labels = scales::percent)

My primary question is: How do I transform the data frame df into df.II? That would make my day. Wrong. My weekend.

And if you could then also give a hint how to proceed from df.II to df.III and df.IV that would be absolutely mindblowing. However, I don't want to burden you too much with my problems.

Best wishes Jascha

question from:https://stackoverflow.com/questions/66066224/how-can-i-transform-my-data-frame-from-wide-to-long-in-r

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Does this achieve what you need?

library(tidyverse)

df_long <- df %>%
  pivot_longer(cols = item.1:item.3, names_to = "item", values_to = "rating") %>%
  mutate(
    item = str_remove(item, "item.")
  )


df2 <- crossing(
  df_long,
  rating_all = unique(df_long$rating)
) %>%
  mutate(n = rating_all == rating) %>%
  group_by(id, year, item, rating_all) %>%
  summarise(n = sum(n))

df3 <- df2 %>%
  filter(item == "3")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

56.8k users

...