Topics in Statistical Genetics
 
Semester 2 2024
Wednesday, 13-16, Kaplun 118
Home page on http://www.tau.ac.il/~saharon/StatsGenetics.html

 

Lecturer: Saharon Rosset
Schreiber 203
saharon@tauex.tau.ac.il
Office hrs: By appointment.

Final signup form (deadline 1/8/24)

Announcements and handouts

(29 May 2024)
In class we have a quick introduction to genetics on the board and with the following presentation: Class 1 presentation.
We then start discussing the problem of time estimation under molecular clock assumptions, cover this Class note.
Some general interest reading: This New Yorker article on using genetics to catch criminals.

(5 June 2024)
We analyze the molecular clock example we started last week, using this Class note.
In our effort to understand mtDNA evolution and estimate the distribution of rates we use this African mtDNA paper and analyze its data

(19 June 2024)
We will finish the molecular clock analysis we did, then move on to a more fundamental discussion of nucleotide substitution models using this class note.
Review on nucleotide substitution models and ML fitting by Huelsenbeck and Crandall.
Whittaker et al. (2003) paper estimating mutation models for STRs.
 
Homework 1 due 3 July in class. Resources for this homework:
mtDNA mutation counts for problem 1.
mtDNA loci list for problem 1.
The paper by Whittaker et al. (2003) for problem 3 is available in pdf or html.

(26 June 2024)
In the first part we will discuss STR mutation models using last week’s class note, and Whittaker et al. (2003) paper estimating mutation models for STRs.
Then we will switch to discuss phylogenetic tree reconstruction using this class note.
Reading materials on phylogenetic reconstruction:
Review by Huelsenbeck and Crandall
Inferring Phylogenies book by Felsenstein.

(3 July 2024)
We will complete the discussion of phylogenetic tree reconstruction using this class note.
Reading materials on phylogenetic reconstruction: Review by Huelsenbeck and Crandall
We will then switch to discussing Genotype-Phenotype modeling in Genome Wide Association Studies (GWAS), using this class note.
This presentation gives a pretty popular introduction to this area.

(6 July 2024)
Homework 2 due 24 July before class. Resources for this homework:
The program PHYLIP
14-species primates+mammals mtDNA database, with documentation.
dnamlk help page.
For problem 2: HapMap Yoruban haplotype data on Chromosome 22 (note individuals are in columns, SNPs in rows, and each entry is two letters separated by space (i.e. a genotype), whereas entries are separated by tab).

(10 July 2024)
We will solve HW1 in class and discuss it.
We will then discuss Genotype-Phenotype modeling in Genome Wide Association Studies (GWAS), using this class note.
This presentation gives a pretty popular introduction to this area.

(17 July 2024)
Zoom link.
We discussed LD last time, in this week’s note we focus on testing and accounting for multiplicity, and as time permits will start talking about stratification / ancestry estimation
R code for analyzing the kidney disease data.

Syllabus

The goal of this course is to introduce some of the major topics in Genetics, and gain a statistical perspective on them.
We will start with a brief introduction to Genetics concepts, and gradually start elaborating on statistical aspects of the questions that come up. As needed, we will introduce relevant areas of statistics in some detail.
In the latter part of the course we will pick a hot current research topic and concentrate on it for a few weeks.
The final grade will be based on a combination of homework (3-4), a final take home exam, and possibly a class presentation.
Tentative topics list (each topic 1-2 weeks):

Prerequisites

Basic knowledge of mathematical foundations: Calculus; Linear Algebra
Undergraduate courses in: Probability; Theoretical Statistics
Statistical programming experience in R is an advantage
Prior basic knowledge in Biology and Genetics is an advantage

Grading

There will be three or four homework assignments, which will count for about 30% of the final grade, and a final take-home project. Both the homework and the project will combine theoretical analysis with hands-on data analysis.

Some recommended books

Human Evolutionary Genetics by Jobling, Hurles and Tyler-Smith
An excellent introduction to Human Genetics, with a quantitative flavor
 
Principles of Population Genetics by Hartl and Clark
Comprehensive overview of computational methods in Genetics
 
Statistical Methods in Molecular Evolution edited by R. Nielsen
Collection of tutorials and reviews on major topics in Statistical Genetics

Computing

The course will require some use of statistical modeling software. It is strongly recommended to use R (freely available for PC/Unix/Mac).
R Project website also contains extensive documentation.
A basic “getting you started in R” tutorial. Uses the Boston Housing Data (thanks to Giles Hooker).
Modern Applied Statistics with Splus by Venables and Ripley is an excellent source for statistical computing help for R/Splus.