Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
166 views
in Technique[技术] by (71.8m points)

r - Separating list-col values into singular values relative to a condition

Simplified explanation

Convert from long to wide, whilst filling missing values as 17 for 2019 and 16 for 2010, whilst those values in 2010 that match with 2019, then minus their pland value (i.e. 2019-2010). If there are no values for 2019 and its filled in with 17, give that pland value a negative value. Whilst, if 16 is filled for the missing value in 2010 leave the pland value as is, positive.

This should look like table 2.

Table 1: Example of the dataframe in long format

# A tibble: 10 x 4
   year  locality_id landcover  pland
   <chr> <chr>           <int>  <dbl>
 1 2010  L452817             8 0.0968
 2 2010  L452817             9 0.0323
 3 2010  L452817            12 0.613 
 4 2010  L452817            13 0.194 
 5 2010  L452817            14 0.0645
 6 2019  L452817             8 0.0645
 7 2019  L452817             9 0.0645
 8 2019  L452817            12 0.516 
 9 2019  L452817            13 0.194 
10 2019  L452817            14 0.161 

Table 2: Expected format of table 2

   locality_id X2010 X2019       pland
1      L452817     8     8 -0.03225806
2      L452817     9     9  0.03225807
3      L452817    12    12 -0.09677420
4      L452817    13    13  0.00000000
5      L452817    14    14  0.09677419
6      L910180     0    17 -0.43750000
7      L910180     8    17 -0.34375000
8      L910180     9    17 -0.03125000
9      L910180    10    17 -0.03125000
10     L910180    11    17 -0.09375000
11     L910180    13    17 -0.06250000

What I have tried:

#set the values of t inot another variable
y <- t
#remove pland from the new variable
y <- y[, -4]

#set from long to wide providing the pland differences from t as another column
y %>%
    group_by(year) %>%
    mutate(row = row_number()) %>%
    tidyr::pivot_wider(names_from = year, values_from = landcover) %>%
    select(-row) %>% mutate(across(`2010`:`2019`, ~if(cur_column() == '2019') 
        replace_na(.x, 17) else replace_na(.x, 16))) %>% mutate(t[t$year %in% 2019,]$pland - t[t$year %in% 2010,]$pland)

# A tibble: 11 x 4
   locality_id `2010` `2019` `t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland`
   <chr>        <dbl>  <dbl>                                                       <dbl>
 1 L452817          8      8                                                    -0.0323 
 2 L452817          9      9                                                     0.0323 
 3 L452817         12     12                                                    -0.0968 
 4 L452817         13     13                                                     0      
 5 L452817         14     14                                                     0.0968 
 6 L910180          0     17                                                    -0.373  
 7 L910180          8     17                                                    -0.279  
 8 L910180          9     17                                                     0.485  
 9 L910180         10     17                                                     0.162  
10 L910180         11     17                                                     0.0675 
11 L910180         13     17                                                     0.00202

The problem with my code above, is that it always calculate the differences, it shouldn't calculate differences for those values that have been introduced because of missing values, so when there is either 16 or 17 on either side.

Resources I have tried: One, and two.

reproducible code:

structure(list(year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 
2010L, 2010L, 2010L, 2010L, 2010L, 2019L, 2019L, 2019L, 2019L, 
2019L), locality_id = c("L452817", "L452817", "L452817", "L452817", 
"L452817", "L910180", "L910180", "L910180", "L910180", "L910180", 
"L910180", "L452817", "L452817", "L452817", "L452817", "L452817"
), landcover = c(8L, 9L, 12L, 13L, 14L, 0L, 8L, 9L, 10L, 11L, 
13L, 8L, 9L, 12L, 13L, 14L), pland = c(0.0967741935483871, 0.032258064516129, 
0.612903225806452, 0.193548387096774, 0.0645161290322581, 0.4375, 
0.34375, 0.03125, 0.03125, 0.09375, 0.0625, 0.0645161290322581, 
0.0645161290322581, 0.516129032258065, 0.193548387096774, 0.161290322580645
)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))
question from:https://stackoverflow.com/questions/65872847/separating-list-col-values-into-singular-values-relative-to-a-condition

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Instead of using dummy variables to identify missing I use a different approach with complete with df is your original data structure.

df %>%
  # fill in the data with missing year so we can compute while data in long format
  complete(year, nesting(locality_id, landcover), fill = list(pland = 0)) %>%
  arrange(desc(year)) %>%
  group_by(locality_id, landcover) %>%
  summarize(
    X2010 = if_else(pland[year == 2010] == 0 , 16L, first(landcover)),
    X2019 = if_else(pland[year == 2019] == 0 , 17L, first(landcover)),
    pland  = pland[year == 2019] - pland[year == 2010]) %>%
  arrange(locality_id, landcover)

Here is the output

   locality_id landcover X2010 X2019   pland
   <chr>           <int> <int> <int>   <dbl>
 1 L452817             8     8     8 -0.0323
 2 L452817             9     9     9  0.0323
 3 L452817            12    12    12 -0.0968
 4 L452817            13    13    13  0     
 5 L452817            14    14    14  0.0968
 6 L910180             0     0    17 -0.438 
 7 L910180             8     8    17 -0.344 
 8 L910180             9     9    17 -0.0312
 9 L910180            10    10    17 -0.0312
10 L910180            11    11    17 -0.0938
11 L910180            13    13    17 -0.0625

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...