Simplified explanation
Convert from long to wide, whilst filling missing values as 17
for 2019
and 16
for 2010
, whilst those values in 2010
that match with 2019
, then minus their pland value (i.e. 2019-2010). If there are no values for 2019 and its filled in with 17
, give that pland value a negative
value. Whilst, if 16
is filled for the missing value in 2010
leave the pland value as is, positive
.
This should look like table 2.
Table 1: Example of the dataframe in long format
# A tibble: 10 x 4
year locality_id landcover pland
<chr> <chr> <int> <dbl>
1 2010 L452817 8 0.0968
2 2010 L452817 9 0.0323
3 2010 L452817 12 0.613
4 2010 L452817 13 0.194
5 2010 L452817 14 0.0645
6 2019 L452817 8 0.0645
7 2019 L452817 9 0.0645
8 2019 L452817 12 0.516
9 2019 L452817 13 0.194
10 2019 L452817 14 0.161
Table 2: Expected format of table 2
locality_id X2010 X2019 pland
1 L452817 8 8 -0.03225806
2 L452817 9 9 0.03225807
3 L452817 12 12 -0.09677420
4 L452817 13 13 0.00000000
5 L452817 14 14 0.09677419
6 L910180 0 17 -0.43750000
7 L910180 8 17 -0.34375000
8 L910180 9 17 -0.03125000
9 L910180 10 17 -0.03125000
10 L910180 11 17 -0.09375000
11 L910180 13 17 -0.06250000
What I have tried:
#set the values of t inot another variable
y <- t
#remove pland from the new variable
y <- y[, -4]
#set from long to wide providing the pland differences from t as another column
y %>%
group_by(year) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = year, values_from = landcover) %>%
select(-row) %>% mutate(across(`2010`:`2019`, ~if(cur_column() == '2019')
replace_na(.x, 17) else replace_na(.x, 16))) %>% mutate(t[t$year %in% 2019,]$pland - t[t$year %in% 2010,]$pland)
# A tibble: 11 x 4
locality_id `2010` `2019` `t[t$year %in% 2019, ]$pland - t[t$year %in% 2010, ]$pland`
<chr> <dbl> <dbl> <dbl>
1 L452817 8 8 -0.0323
2 L452817 9 9 0.0323
3 L452817 12 12 -0.0968
4 L452817 13 13 0
5 L452817 14 14 0.0968
6 L910180 0 17 -0.373
7 L910180 8 17 -0.279
8 L910180 9 17 0.485
9 L910180 10 17 0.162
10 L910180 11 17 0.0675
11 L910180 13 17 0.00202
The problem with my code above, is that it always calculate the differences, it shouldn't calculate differences for those values that have been introduced because of missing values, so when there is either 16
or 17
on either side.
Resources I have tried: One, and two.
reproducible code:
structure(list(year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2010L, 2010L, 2010L, 2010L, 2010L, 2019L, 2019L, 2019L, 2019L,
2019L), locality_id = c("L452817", "L452817", "L452817", "L452817",
"L452817", "L910180", "L910180", "L910180", "L910180", "L910180",
"L910180", "L452817", "L452817", "L452817", "L452817", "L452817"
), landcover = c(8L, 9L, 12L, 13L, 14L, 0L, 8L, 9L, 10L, 11L,
13L, 8L, 9L, 12L, 13L, 14L), pland = c(0.0967741935483871, 0.032258064516129,
0.612903225806452, 0.193548387096774, 0.0645161290322581, 0.4375,
0.34375, 0.03125, 0.03125, 0.09375, 0.0625, 0.0645161290322581,
0.0645161290322581, 0.516129032258065, 0.193548387096774, 0.161290322580645
)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))
question from:
https://stackoverflow.com/questions/65872847/separating-list-col-values-into-singular-values-relative-to-a-condition