Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
154 views
in Technique[技术] by (71.8m points)

r - Normality test in a dataframe with multiple factors

I have a data frame with five columns

> str(testco)
'data.frame':   78 obs. of  6 variables:
$ id          : chr  "J09-M1" "J09-M2" "J09-M3" "J10-M1" ...
$ group       : Factor w/ 2 levels "Control","CUS": 1 1 1 1 1 1 1 1 1 1 ...
$ sex         : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 1 1 1 2 ...
$ freq.light  : int  5 2 14 6 8 9 5 10 8 7 ...
$ duration    : num  60.1 18.5 151 71.5 118.6 ...
$ latency.dark: num  14 6.12 3.46 33.25 25.67 ...

I want to perform a normality test (i.e. shapiro.test()) for every one of the three last columns considering two factors from the second and third column (it would be 4 Shapiro-Wilk tests for every numeric column).

I have tried to put on list or using lapply but I only get error messages. Is there any function or package that could help me to do the normality test in a data frame with different factor columns?

The data frame is:

> testco
       id   group    sex freq.light duration latency.dark
1  J09-M1 Control   Male          5  60.0832    14.000000
2  J09-M2 Control   Male          2  18.4583     6.124990
3  J09-M3 Control   Male         14 150.9580     3.458330
4  J10-M1 Control   Male          6  71.4999    33.249900
5  J10-M2 Control   Male          8 118.5830    25.666600
6  J10-M3 Control   Male          9 143.2080     5.958320
7  J11-F1 Control Female          5  82.8332    21.541600
8  J11-F2 Control Female         10 112.3750     8.749990
9  J11-F3 Control Female          8  92.7499     6.749990
10 J22-M2 Control   Male          7  63.2499     8.916650
11 J22-M3 Control   Male          5  71.9166     5.499990
12 J22-M4 Control   Male          7  83.3332    19.333300
13 J27-F1 Control Female          8 108.2080     6.374990
14 J27-F2 Control Female          9 116.5830     5.666660
15 J28-M1 Control   Male          9 141.6660    14.875000
16 J28-M2 Control   Male          7 134.9580     6.708320
17 J28-M3 Control   Male          8 104.3750     2.083330
18 J29-F1 Control Female          5  84.9999    11.000000
19 J29-F2 Control Female          7  74.2082    30.749900
20 J29-F3 Control Female          7  88.9165     1.375000
21 J12-F1     CUS Female         12  93.0832    15.500000
22 J12-F2     CUS Female          9  59.6249     4.749990
23 J12-F3     CUS Female         12 151.2500    12.458300
24 J12-F4     CUS Female          7  83.9165     1.208330
25 J13-M1     CUS   Male         12 114.0410     4.291660
26 J13-M2     CUS   Male         15 138.2500     1.333330
27 J13-M3     CUS   Male         12 118.7910     1.750000
28 J13-M4     CUS   Male         15  95.4582     2.458330
29 J14-M1     CUS   Male         11 114.3750     7.124990
30 J14-M2     CUS   Male         20 147.7910     4.791660
31 J14-M3     CUS   Male         14 135.5410    12.750000
32 J14-M4     CUS   Male         11  89.5415     5.124990
33 J14-M5     CUS   Male         20 135.2080     0.749998
34 J15-M1     CUS   Male          9 104.6660    12.916600
35 J15-M2     CUS   Male         10 105.4580     4.333330
36 J15-M3     CUS   Male         12 113.0000     2.416660
37 J15-M4     CUS   Male         11 104.7500    11.250000
38 J16-M1     CUS   Male         14 134.8750     5.916660
39 J16-M2     CUS   Male         19 153.0410     6.041660
40 J16-M3     CUS   Male         14 112.9580     7.958320
41 J16-M4     CUS   Male         15 141.2910     6.458320
42 J16-M5     CUS   Male         12 162.0000     1.958330
43 J17-M1     CUS   Male         12 149.1250    23.416600
44 J17-M2     CUS   Male          9 110.4160     2.250000
45 J17-M3     CUS   Male          9 116.2500    11.333300
46 J17-M4     CUS   Male         13 130.1250     5.958320
47 J18-M1     CUS   Male          9  82.3749     9.458320
48 J18-M2     CUS   Male          7  54.7916     6.333320
49 J18-M3     CUS   Male         17 172.9580     2.958330
50 J19-M1     CUS   Male          8  96.2498     3.666660
51 J19-M2     CUS   Male         10  71.3332     9.333320
52 J19-M3     CUS   Male          9  65.7499     5.166660
53 J19-M4     CUS   Male         15 125.5410     5.791660
54 J20-M1     CUS   Male          6 100.6670     7.499990
55 J20-M2     CUS   Male         10 129.8330     4.499990
56 J20-M3     CUS   Male         14 166.4160     3.416660
57 J20-M4     CUS   Male         10 116.6660     9.999980
58 J21-M1     CUS   Male         10  61.0832    14.208300
59 J21-M2     CUS   Male          9  67.9999    11.375000
60 J21-M3     CUS   Male         11 161.6250     4.291660
61 J21-M4     CUS   Male         10 110.4580     3.083330
62 J23-F1     CUS Female         13 129.8750     8.166650
63 J23-F2     CUS Female         17 137.0830     5.916660
64 J23-F3     CUS Female         12 139.6250    15.666600
65 J24-F1     CUS Female         12 103.7080     5.416660
66 J24-F2     CUS Female         13 109.3750     3.124990
67 J24-F3     CUS Female         16 152.8330     7.583320
68 J24-F4     CUS Female         16 138.7500     2.708330
69 J25-F1     CUS Female          4  37.9166    16.750000
70 J25-F2     CUS Female         12 153.9160     7.333320
71 J25-F3     CUS Female         13 122.7080     4.708320
72 J25-F4     CUS Female         10  76.0415     4.791660
73 J25-F5     CUS Female          9 117.7080    12.416600
74 J26-F1     CUS Female         11  87.4999     3.291660
75 J26-F2     CUS Female          7  92.8332     4.791660
76 J26-F3     CUS Female         10  81.2499     4.249990
77 J26-F4     CUS Female         11  84.2499     6.041660
78 J26-F5     CUS Female          8  65.6666     4.166660

And I would like a table or something similar in which can be seem something like:

freq.light
Group    sex    p.value
Control  Male   0.02
Control  Female 0.06
CUS      Male   0.05
Cus      Female 0.01
question from:https://stackoverflow.com/questions/65877285/normality-test-in-a-dataframe-with-multiple-factors

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can get the dataframe in long format and apply shapiro.test for each numeric column and extract the p.value from it.

library(dplyr)

testco %>%
  select(-id) %>%
  tidyr::pivot_longer(cols = freq.light:latency.dark) %>%
  group_by(group, sex, name) %>%
  summarise(val = shapiro.test(value)$p.value) -> result

result

#   group   sex    name          val
#   <chr>   <chr>  <chr>         <dbl>
# 1 Control Female duration     0.482   
# 2 Control Female freq.light   0.561   
# 3 Control Female latency.dark 0.0877  
# 4 Control Male   duration     0.409   
# 5 Control Male   freq.light   0.350   
# 6 Control Male   latency.dark 0.0681  
# 7 CUS     Female duration     0.449   
# 8 CUS     Female freq.light   0.662   
# 9 CUS     Female latency.dark 0.00586 
#10 CUS     Male   duration     0.622   
#11 CUS     Male   freq.light   0.0360  
#12 CUS     Male   latency.dark 0.000744

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...