Skip to content

Commit

Permalink
changes pdf
Browse files Browse the repository at this point in the history
  • Loading branch information
jeitziner committed Oct 18, 2022
1 parent fd8b4d1 commit 0ba8c84
Show file tree
Hide file tree
Showing 29 changed files with 34 additions and 121 deletions.
Binary file not shown.
Binary file removed docs/assets/pdf/Beyond linearity.pdf
Binary file not shown.
Binary file added docs/assets/pdf/Clustering.pdf
Binary file not shown.
Binary file added docs/assets/pdf/Data analysis with R - 080221.pdf
Binary file not shown.
Binary file added docs/assets/pdf/Day3_corr_and_reg.pdf
Binary file not shown.
Binary file not shown.
Binary file removed docs/assets/pdf/GAM - 190821 (2) (2).pptx
Binary file not shown.
Binary file removed docs/assets/pdf/GAM.pdf
Binary file not shown.
Binary file removed docs/assets/pdf/Intro and linear models - 160821.pdf
Binary file not shown.
Binary file not shown.
Binary file removed docs/assets/pdf/Logistic regression and GLM.pdf
Binary file not shown.
Binary file not shown.
Binary file removed docs/assets/pdf/Longitudinal data.pdf
Binary file not shown.
Binary file removed docs/assets/pdf/Longitudinal.pdf
Binary file not shown.
Binary file not shown.
Binary file removed docs/assets/pdf/Regularization.pdf
Binary file not shown.
Binary file not shown.
Binary file removed docs/assets/pdf/day1b-Beyond linearity - 220822.pdf
Binary file not shown.
Binary file not shown.
68 changes: 0 additions & 68 deletions docs/bonus_code.md
Original file line number Diff line number Diff line change
@@ -1,69 +1 @@
## **Bonus code** :champagne_glass:

The following code was added thanks to questions from course participants of past sessions. They might be useful for you too.

[Download slides](assets/pdf/Regularization.pdf){: .md-button }

## Linear Regression using model selection

### Linear regression

```r
library(MASS)

data(birthwt)
summary(birthwt)

help(birthwt)

colnames(birthwt)
colnames(birthwt) <- c("birthwt.below.2500", "mother.age","mother.weight", "race",
"smoking.status", "nb.previous.prem.labor", "hypertension",
"uterine.irrit","nb.physician.visits", "birthwt.grams")

str(birthwt)
summary(birthwt)
birthwt$race <- as.factor(birthwt$race)
str(birthwt)
summary(birthwt)
```



### Model selection

```r
library(leaps)

best_subset <- regsubsets(birthwt.grams ~ . - birthwt.below.2500, data = birthwt, nvmax = 8)
results <- summary(best_subset)

# Adjusted R-squared
plot(results$adjr2, xlab = "Number of Variables", ylab = "Adjusted R-squared", type = "l")
# Residual sum of squares for each model
plot(results$rss, xlab = "Number of Variables", ylab = "RSS", type = "l")
# R-squared
plot(results$rsq, xlab = "Number of Variables", ylab = "R-squared", type = "l")

which.max(results$adjr2)
```


### Model selection using the validation set approach

```r
set.seed(1)
train <- sample(c(TRUE, FALSE), size = nrow(birthwt), rep = TRUE)
test <- (!train)

best_subset_train <- regsubsets(birthwt.grams ~ . - birthwt.below.2500, data = birthwt[train ,], nvmax = 8)
test_mat <- model.matrix(birthwt.grams ~ . - birthwt.below.2500, data = birthwt[test,])

val_errors <- rep(NA , 8)
for(i in 1:8){
coefi = coef(best_subset_train, id = i)
pred = test_mat[,names(coefi)]%*%coefi
val_errors[i] = mean((birthwt$birthwt.grams[test] - pred)^2)
}
which.min(val_errors)
```
8 changes: 5 additions & 3 deletions docs/day1.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@ In this section, you will find the R code that we will use during the course. We

Slides of lectures:

[Download slides Morning Lecture](assets/pdf/day1a-Intro and linear models - 220822.pdf){: .md-button }
[Download slides Morning Lecture](assets/pdf/Exploratory data analysis - 080221.pdf){: .md-button }

[Download slides Afternoon Lecture](assets/pdf/day1b-Beyond linearity - 220822.pdf){: .md-button }
[Download slides Morning Lecture 2](assets/pdf/Data analysis with R - 080221.pdf){: .md-button }

[Download slides Afternoon Lecture](assets/pdf/Introduction to hypothesis testing - 080221.pdf){: .md-button }

Data for exercises:

Expand Down Expand Up @@ -37,7 +39,7 @@ help("mean")

You can also use the alternative syntax
```r
?mean.
?mean
```

If you don't know the exact command name, use
Expand Down
15 changes: 8 additions & 7 deletions docs/day2.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ In this section, you will find the R code that we will use during the course. We

Slides of lectures:

[Download slides](assets/pdf/day2-Logistic regression and GLM - 230822.pdf){: .md-button }
[Download slides Morning](assets/pdf/Parametric and non parametric tests - 090221.pdf){: .md-button }
[Download slides Afternoon](assets/pdf/ANOVA and confidence intervals - 090221.pdf){: .md-button }


The purpose of this exercise is to help you to better interpret a p-value by using R for introducing you to some simple hypothesis testing functions. As usual, be sure to read the help documentation for any new functions.
Expand Down Expand Up @@ -72,7 +73,7 @@ energy
```
What are the assumptions you need to check for carring out a test ?

??? "done" Answer
??? done "Answer"
```r
# assumption 1: data in each group are normally distributed.

Expand Down Expand Up @@ -103,7 +104,7 @@ Paired tests are used when there are two measurements (a 'pair') on the same ind

Any assumptions to be tested ?

??? "done" Answer
??? done "Answer"
```r
# assumption 1: Each of the paired measurements must be obtained from the same subject
# check your sampling design !
Expand Down Expand Up @@ -183,7 +184,7 @@ How many tests are significant ? What if you apply a Bonferroni correction ? Wha

Change the parameters of the simulations and see what is the effect on the p-values.

??? "done" Answer
??? done "Answer"
```r
adj.bonf <- p.adjust(sim.p.welch.test, method="bonf")
sum(adj.bonf < 0.05)
Expand All @@ -204,7 +205,7 @@ The dataset comes from Faraway (2002) and comprises a set of 24 blood coagulatio

1. Load the data and explore the dataset

??? "done" Answer
??? done "Answer"
```r
data(coagulation)

Expand All @@ -222,7 +223,7 @@ The dataset comes from Faraway (2002) and comprises a set of 24 blood coagulatio


2. Fit an ANOVA model, this also means checking assumptions!
??? "done" Answer
??? done "Answer"
```r
# check normality

Expand All @@ -248,7 +249,7 @@ The dataset comes from Faraway (2002) and comprises a set of 24 blood coagulatio
summary(anova_diet)
```
3. Is there some differences between the groups? If yes, which group(s) is different ?
??? "done" Answer
??? done "Answer"
```r
# check pairwise

Expand Down
2 changes: 1 addition & 1 deletion docs/day3.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The purpose of these exercises is to introduce you to using R for regression mod

Slides of lectures:

[Download slides](assets/pdf/Longitudinal data.pdf){: .md-button }
[Download slides](assets/pdf/Day3_corr_and_reg.pdf){: .md-button }

## Exercise class

Expand Down
2 changes: 1 addition & 1 deletion docs/day4.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ In this section, you will find the R code that we will use during the course. We

Slides of lectures:

[Download slides](assets/pdf/GAM.pdf){: .md-button }
[Download slides](assets/pdf/Clustering.pdf){: .md-button }



Expand Down
7 changes: 3 additions & 4 deletions docs/exam.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,14 @@
## **EXAM** :scream:

The participants who need credits must answer the following questions and send the results as an R script with comments to [email protected] until latest friday 2nd of September 2022.
The participants who need credits must answer the following questions and send the results as an R script with comments to [email protected] until latest February 2023.

Data: A set of data collected by Heinz et al.(* Heinz G, Peterson LJ, Johnson RW, Kerk CJ Journal of Statistics Education Volume 11, Number 2 (2003)
jse.amstat.org/v11n2/datasets.heinz.html
Copyright © 2003 by Grete Heinz, Louis J. Peterson, Roger W. Johnson, and Carter J. Kerk, all rights reserved) is available in the file IS_23_exam.csv
jse.amstat.org/v11n2/datasets.heinz.html, by Grete Heinz, Louis J. Peterson, Roger W. Johnson, and Carter J. Kerk, all rights reserved) is available in the file IS_23_exam.csv


Goals: Get to know the overall structure of the data. Summarize variables numerically and graphically. Model relationships between variables.

[Download exercise material](assets/Exercises_IS/IS_23_exam.csv){: .md-button }
[Download exercise material](assets/exercises/IS_23_exam.csv){: .md-button }

## Observations
1. Have look at the file in a text editor to get familiar with it
Expand Down
Empty file removed docs/exercises.md
Empty file.
22 changes: 12 additions & 10 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,34 @@
# Advanced statistics: Statistical modelling

# Introduction to statistics with R
## Teachers
Rachel Marcone

Mauro Delorenzi
Joao Lourenco

## Material

* Google doc (through mail)
* [Slack channel](https://slack.com)


### General learning outcomes

At the end of this course, participants will be able to:

- identify the appropriate model to analyze a dataset;
- fit the chosen model using R;
- assess the fit of the model, as well as its limitations.

- have experience in the application of basic statistics techniques
- know how to summarize data with numerical and graphical summaries
- plot data
- do hypothesis testing and multiple testing correction
- linear models
- correlation and regression
- principal component analysis and other topics

### Learning outcomes explained

To reach the general learning outcomes above, we have set a number of smaller learning outcomes. Each chapter starts with these smaller learning outcomes. Use these at the start of a chapter to get an idea what you will learn. Use them also at the end of a chapter to evaluate whether you have learned what you were expected to learn.

## Learning experiences

To reach the learning outcomes we will use lectures, exercises and group work. During exercises, you are free to discuss with other participants. During lectures, focus on the lecture only.
The course will combine lectures on statistics and practical exercises, during which the participants will learn how to work with the widely used "R" language and environment for statistical computing and graphics.

Participants will also have the opportunity to ask questions about the analysis of their own data.

### Exercises

Expand Down
29 changes: 3 additions & 26 deletions docs/precourse.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,37 +3,14 @@

### Previous knowledge

As is stated in the course prerequisites on the [announcement web page](https://www.sib.swiss/component/courses/20220822_ASSM?view=courses_item), this course is intended for people already familiar with basic statistics and R. Participants must be comfortable with topics such as hypothesis testing, correlation and linear models, and must have a prior knowledge of the "R" language and environment for statistical computing and graphics. Participants who have already followed the SIB course ["Introduction to statistics with R"](https://www.sib.swiss/training/course/20220207_STATR) or an equivalent course, and have used its content in practice should fit this prerequisite.

Before applying to this course, please self-assess your knowledge in stats and R to make sure this course is right for you. Here are 2 quizzes:
### Knowledge:

[https://gohighbrow.com/quiz-introduction-to-statistics/](https://gohighbrow.com/quiz-introduction-to-statistics/)

[https://docs.google.com/forms/d/e/1FAIpQLSfXCnmLha0Ks4ZZZ42G_5MyIbGi-JhPayuHZ_P2jdXZEtXdqg/viewform](https://docs.google.com/forms/d/e/1FAIpQLSfXCnmLha0Ks4ZZZ42G_5MyIbGi-JhPayuHZ_P2jdXZEtXdqg/viewform)
No prior statistical knowledge is required in order to attend the course
Participants do not need any experience in R before the course

### Technical

To do the exercises, you are required to have your own computer with at least 4 Gb of RAM and with an internet connection, as well as the latest the version of [R](https://cran.r-project.org/)
and the free version of [RStudio](https://www.rstudio.com/products/rstudio/download/) installed.

Please, install the necessary packages using:

```r
install.packages("akima")
install.packages("car")
install.packages("DAAG")
install.packages("faraway")
install.packages("gam")
install.packages("ggplot2")
install.packages("ISwR")
install.packages("lattice")
install.packages("lme4")
install.packages("nlme")
install.packages("SemiPar")
install.packages("splines")
install.packages("statmod")
install.packages("WWGbook")


```

2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
site_name: Advanced statistics
site_name: Introduction to statistics with R
nav:
- Home: index.md
- Precourse preparation: precourse.md
Expand Down

0 comments on commit 0ba8c84

Please sign in to comment.