Accèder directement au contenu

T109 - Data analysis

23M1_planning_T109_Data_analyses

Download planning

Master in Life Science, ENS
Bio-M2-M1-T109-S1 | Data analysis
Year and Semester : M1 | S1
Where : Biology department, ENS
Duration : 30 hours
Maximum class size : 23 students
This course is open to external students.
*** Registration required : course with a limited number of students***
To enrol, contact the course coordinators before September.
Contact : Benoît Perez-Lamarque

Coordination

Emeline, Perthame, Institut Pasteur
Benoît, Perez-Lamarque, ENS

Credits

3 ECTS

Keywords

Statistical inference (estimation, hypothesis testing) | Linear regression | Analysis of variance | Multivariate analyses (Principal Components Analysis, clustering)

Course prerequisites

Basics in R/Markdown programming.
Importing and manipulating data frames with the dplyr package and making plots using the ggplot2 package would be useful but not necessary.

Basics in statistics (e.g. sampling, random variables, discrete and continuous distributions, quantiles, etc).
For newly-arrived M1/M2 students wishing to enrol for this course, it is mandatory to enrol for - and attend - the course “BIO-M2-E01-S1 Mathematics and programming training”.

Course objectives and description

Aims : Biological data are often complex and challenging to analyze due to non-normal distributions, nonlinear relationships, spatial/temporal structures, and high dimensionality, in particular in the era of Big Data.
This course will introduce the students to key concepts and statistical tools for the experimental design and analysis of biological data.
Themes : More specifically, the students will be made familiar with hypothesis testing, univariate statistical tests (e.g. ANOVA), linear models, descriptive multivariate analyses such as Principal Component Analysis (PCA) and clustering. All these different methods will be illustrated with current questions and data types in biology (e.g. “omics” data), and their associated analytical challenges will be introduced.
Organisation : The course will alternate theoretical aspects and computer exercises on small datasets with the R Studio software. The students will be assigned a small project involving the different concepts and tools covered by the course.
Note that one morning (November 9) will be an open session to answer your questions.

Assessment

A short written report of the project is to be sent to the coordinators before December 31st, 2023.

Course material

The course will be given onsite. Students will need to have a login from the Biology Department of the ENS to visualize the course and/or participate in the practical sessions. More information will be given to the attendees by email.

Suggested readings in relation with the module content

French :
Poinsot, D. (2005). Statistiques pour statophobes. Université de Rennes
Millot, G. (2018). Comprendre et réaliser les tests statistiques à l’aide de R : manuel de biostatistique. De Boeck Superieur.
Pagès, J. (2010). Statistiques générales pour utilisateurs. Presses Universitaires de Rennes.

English equivalents :
Van Emden, H. (2012). Statistics for terrified biologists. John Wiley & Sons.
Crawley M.J. (2005) Statistics : An Introduction using R.
Holmes, S., Huber, W., & Martin, T. (2017). Modern statistics for modern biology

Online course in English

Useful websites for R programming :

French ggplot2 tutorial

English ggplot2 tutorial proposed by the tidyverse community

Both pages contain the cheatsheet for these packages as well and a Rmarkdown cheatsheet can be found here