Skip to main content

T109 - Data analysis

PNG - 296.6 kb
M1-T109_Data Analysis Planning 2022

Master in Life Science, ENS
Bio-M2_ M1-T109-S1 | Data analysis
Year and Semester: M1 | S1
Where: Biology department, ENS
Duration: 30 hours
First and last day of class: November 21st-25th, 2022
Hours: 09:00-12:00 | 14:00-17:00
Maximum class size: 18 students
This course is open to external students.
*** Registration required: course with a limited number of students***
To enrol, contact the course coordinators before September.
Contact: Benoît Perez-Lamarque

Coordination

Emeline, Perthame, Institut Pasteur
Benoît, Perez-Lamarque, ENS

Credits

3 ECTS

Keywords

Statistical inference (estimation, hypothesis testing) | Linear regression | Analysis of variance | Multivariate analyses (Principal Components Analysis, clustering)

Course prerequisites

Basics in R/Markdown programming
Basics in statistics (e.g. sampling, random variables, discrete and continuous distributions, quantiles, etc).
For newly-arrived M1/M2 students wishing to enrol for this course, it is mandatory to enrol for - and attend - the course “BIO-M2-E01-S1 Training in mathematics and computer science”.

Course objectives and description

Aims: Biological data are often complex and challenging to analyze due to non-normal distributions, nonlinear relationships, spatial/temporal structures, and high dimensionality, in particular in the era of Big Data.
This course will introduce the students to key concepts and statistical tools for the experimental design and analysis of biological data.
Themes: More specifically, the students will be made familiar with hypothesis testing, univariate statistical tests (e.g. ANOVA), linear models, descriptive multivariate analyses such as Principal Component Analysis (PCA) and clustering. All these different methods will be illustrated with current questions and data types in biology (e.g. “omics” data), and their associated analytical challenges will be introduced.
Organisation: The course will alternate theoretical aspects and computer exercises on small datasets with the R Studio software. The students will be assigned a small project involving the different concepts and tools covered by the course.
Note that Thursday 24th November afternoon will be an open session to answer your questions.

Assessment

A short written report of the project is to be sent to the coordinators before the 4th of December 2022.

Course material

The course will be given onsite. Students will need to have a login from the Biology Department of the ENS to visualize the course and/or participate in the practical sessions. More information will be given to the attendees by email.

Suggested readings in relation with the module content

French:
Poinsot, D. (2005). Statistiques pour statophobes. Université de Rennes
Millot, G. (2018). Comprendre et réaliser les tests statistiques à l’aide de R : manuel de biostatistique. De Boeck Superieur.
Pagès, J. (2010). Statistiques générales pour utilisateurs. Presses Universitaires de Rennes.

English equivalents:
Van Emden, H. (2012). Statistics for terrified biologists. John Wiley & Sons.
Crawley M.J. (2005) Statistics: An Introduction using R.
Holmes, S., Huber, W., & Martin, T. (2017). Modern statistics for modern biology [https://web.stanford.edu/class/bios221/book/]

Online course in English
> http://rafalab.github.io/pages/harvardx.html