Skip to content

Latest commit

 

History

History
249 lines (153 loc) · 9.43 KB

README.md

File metadata and controls

249 lines (153 loc) · 9.43 KB

STATISTICS - GROUP COMPARISONS

The aim of this repository is to show the statistical workflow when comparing groups of data in order to prove if there is significant difference between them. As data I provide some borrowed from my master thesis (in spanish), which consisted in a first approach to establishing background tissue concentration in macroinvertebrates of rivers from mining areas of northern Spain. This time we will only perform a statistical analysis to compare three different taxa to set if their metal levels really differ between them. The three taxa are: Heptageniidae (scraper), Hydropsychidae (collector-filterer) and Rhyacophilidae (predator).

alt text

WORKFLOW

alt text

DESCRIPTIVE ANALYSIS

The first step is always to take a quick view of the basic features of the data, perfoming some descriptive analysis.

> summary(mydata) # quick first look at global data

 Taxa                As               Se               Cd              Hg        
 Heptageniidae :27   Min.   :0.1100   Min.   : 0.500   Min.   :0.010   Min.   :0.0300  
 Hydropsychidae:23   1st Qu.:0.6325   1st Qu.: 1.930   1st Qu.:0.080   1st Qu.:0.0700  
 Rhyacophilidae:20   Median :1.0900   Median : 3.605   Median :0.255   Median :0.1000  
                     Mean   :1.9741   Mean   : 4.961   Mean   :0.622   Mean   :0.1359  
                     3rd Qu.:2.9775   3rd Qu.: 6.478   3rd Qu.:0.865   3rd Qu.:0.1500  
                     Max.   :7.2300   Max.   :20.280   Max.   :2.820   Max.   :0.5200  

alt text

At first look we could think that there may be signifficant differences in the next cases:

  • As -> the three taxas
  • Se -> Hepta vs Hydro and Rhya
  • Cd -> Hepta vs Hydro and Rhya
  • Hg -> none

In the next step we will prove if our hypothesis is true or not.

STATISTICAL ANALYSIS

NORMAL DISTRIBUTION

To know which statistical test we need to use we have to know if our data follows a normal distribution or not. For this case we'll use three approaches:

This are the results:

--------------------------------------------------
      Taxa        Metal   Shapiro.Wilk   Skewness 
---------------- ------- -------------- ----------
 Heptageniidae     As        0.292         0.23   

 Heptageniidae     Se        0.216         0.77   

 Heptageniidae     Cd        0.456         0.3    

 Heptageniidae     Hg       *0.009*        1.54   

 Hydropsychidae    As       *0.004*        0.68   

 Hydropsychidae    Se        0.334         0.76   

 Hydropsychidae    Cd        0.106       **2.6**  

 Hydropsychidae    Hg         *0*          1.74   

 Rhyacophilidae    As       *0.001*        0.28   

 Rhyacophilidae    Se         *0*          0.33   

 Rhyacophilidae    Cd         *0*          1.81   

 Rhyacophilidae    Hg       *0.001*        1.53   
--------------------------------------------------

Table: *p<0.01; **-2<p>2


---------------------
 Metal   Levene.Test 
------- -------------
  As         *0*     

  Se         *0*     

  Cd         *0*     

  Hg        0.154    
---------------------

Table: *p<0.01

As we can see, some of the varaibles adjust to normal distribution and others don't. To better understand the difference we'll create some visualization for each one of the tests. For Shapiro-Wilk we can perform a QQ plot, which is commonly used to detect deviations from the normal distribution.

alt text

Cd in Heptageniidae fits the QQplot, Cd in Rhyacophilidae doesn't.

We can visualize the skweness with histograms.

alt text

In the case of Hydropsychidae the density line is skewed to the left.

Finally to better understand the Levene's test we can build some boxplots and pay attention to the whiskers.

alt text

Heptageniidae shows a lot of variance in all metals except in Hg, that's why Levene fails for the first three metals.

NOTE: One important thing to consider is the amount of samples that we are working with, less than 30 for each metal and taxa. The smallest the population the harder the probability of having normal ditribution. Outliers can have a big impact.

DATA TRANSFORMATION AND SECOND ROUND CHECKING DISTRIBUTION

None of the metals pass the three test we stablished on all taxa. Transforming the data we may achieve normal distribution, so that's the next step. Take note though, that in Cd in Hydropsychidae showed skewness, a parameter that does not change even if the data is transformed,so we already know that the data referring to Cd must be treated with non-parametric test.

After a logarithm transformation this are the results for SW and Levene:

---------------------------------------
      Taxa        Metal   Shapiro.Wilk 
---------------- ------- --------------
 Heptageniidae     As         *0*      

 Heptageniidae     Se        0.912     

 Heptageniidae     Cd        0.449     

 Heptageniidae     Hg         0.18     

 Hydropsychidae    As        0.243     

 Hydropsychidae    Se        0.383     

 Hydropsychidae    Cd       *0.002*    

 Hydropsychidae    Hg        0.162     

 Rhyacophilidae    As        0.959     

 Rhyacophilidae    Se        0.315     

 Rhyacophilidae    Cd        0.815     

 Rhyacophilidae    Hg        0.365     
---------------------------------------

Table: *p<0.01

---------------------
 Metal   Levene.Test 
------- -------------
  As        0.609    

  Se        0.597    

  Cd        0.551    

  Hg        0.919    
---------------------

Table: *p<0.01

So now, all cases pass the Levene's Test, and only two cases do not pass the SW -> As and Cd. This means that we can perform parametric test with the transformed data for Se and Hg, and non-parametric test for As and Cd with original data.

SIGNIFFICANT DIFFERENCES?

PARAMETRIC TESTS

First we run a global test to see if there are signfficant differences between groups, in this case an ANOVA. If the the p value is <0.05 there are differences, and we have to perform a paired comparison as a post-hoc analysis, in this case Bonferroni.

Results:

---------------
 Metal   Anova 
------- -------
  Se      *0*  

  Hg     0.057 
---------------

Table: *p<0.05

--------------------------------------------------
          Comparisons               Se       Hg   
-------------------------------- --------- -------
 Heptageniidae - Hydropsychidae     *0*     0.068 

 Heptageniidae - Rhyacophilidae   *0.004*   0.28  

 Hydropsychidae - Rhyacophilidae  *0.001*     1                            
--------------------------------------------------

Table: *p<0.05
NON-PARAMETRIC TESTS

As non-parametric tests we'll use Kruskal-Wallis as global comparation and a Dunn's Test as post hoc paired comparison.

Results:

------------------------
 Metal   Kruskal.Wallis 
------- ----------------
  As          *0*       

  Cd          *0*       
------------------------

Table: *p<0.05

--------------------------------------------------
          Comparisons               As       Cd   
-------------------------------- --------- -------
 Heptageniidae - Hydropsychidae     *0*      *0*  

 Heptageniidae - Rhyacophilidae     *0*      *0*  

 Hydropsychidae - Rhyacophilidae  *0.001*   0.079 
 --------------------------------------------------

Table: *p<0.05

So now we know which cases have signifficant differences we can visualize the boxplots from before with annotations of this cases

alt text

PCA

As a final step we can check the correlation between the metals and the taxa with a PCA

alt text

So most influencial metals in taxa differentiation are As and Se; Cd has a bit of differentiation power towards Heptageniidae; Hg shows same accumulation levels in three taxa.

THE END!!!