Accèder directement au contenu

T109 - Data analysis

Biology Master, ENS
Year : 1 (M1)
Semester : 1 (S1)

Course code : BIO-M1-T109-S1

Course name : Data analysis

Schedule : 2019 provisional schedule

Coordinator :
Emeline Perthame (Institut Pasteur)
Lucie Zinger (ENS)

ECTS : 3

*** Registration required : course with a limited number of students***
To enrol, contact the course coordinators before September.

Keywords : Statistical inference (estimation, hypothesis testing), linear regression, analysis of variance, multivariate analyses (Principal Components Analysis, clustering)

Prerequisites for the course :
Basics in R/Markdown programming
Basics in statistics (e.g. sampling, random variables, discrete and continuous distributions, quantiles, etc).
For newly-arrived M1/M2 students wishing to enrol for this course, it is mandatory to enrol for - and attend - the course “BIO-M2-E01-S1 Training in mathematics and computer science”

Course objectives and description :
Biological data are often complex and challenging to analyze due to non-normal distributions, nonlinear relationships, spatial/temporal structures, and high dimensionality, in particular in the era of Big Data.
This course will introduce the students to key concepts and statistical tools for the experimental design and analysis of biological data. More specifically, the students will be made familiar with hypothesis testing, univariate statistical tests (e.g. ANOVA), linear models, descriptive multivariate analyses such as Principal Component Analysis (PCA) and clustering. All these different methods will be illustrated with current questions and data type in biology (e.g. “omics” data), and their associated analytical challenges will be introduced.
The course will alternate theoretical aspects and computer exercises on small datasets with the R Studio software. The students will be assigned a small project involving the different concepts and tools covered by the course.
Due to the 1st November holiday, please note that this course will also take place on the Thursday 31st October afternoon.

Assessment / evaluation :
- Continuous evaluation (daily MCQs)
- A short written report of the project, to be sent to the coordinators before the 4th November 12:00 pm.

Course material (hand-outs, online presentation available, …) : Course presentations will be made available on the ENS owncloud.
The course will take place in the computer room. Training exercises will be conducted on the computers provided there, which have all the necessary software (R and RStudio) and R packages installed. Any request to install software or packages on personal laptops won’t be considered during the course. Students are invited to contact the computer service later to do so, if necessary.

Suggested readings in relationship with the module content (textbook chapters, reviews, articles) :
French :
Poinsot, D. (2005). Statistiques pour statophobes. Université de Rennes
Millot, G. (2018). Comprendre et réaliser les tests statistiques à l’aide de R : manuel de biostatistique. De Boeck Superieur.
Pagès, J. (2010). Statistiques générales pour utilisateurs. Presses Universitaires de Rennes.
English equivalents :
Van Emden, H. (2012). Statistics for terrified biologists. John Wiley & Sons.
Crawley M.J. (2005) Statistics : An Introduction using R.
Holmes, S., Huber, W., & Martin, T. (2017). Modern statistics for modern biology
Online course in English
>http://rafalab.github.io/pages/harvardx.html