Thursday 16-17 or by appointment (coordination needed in any case).
Announcements and handouts
(8 March)
R code
from class to analyze the prostate data using least squares regression and nearest neighbors.
(15 March)
R code
from class to analyze the advertising data using least
squares regression.
(18 March)
Homework
1, due on 5/4 in class.
k-NN R
code to be used for problem 1.
(22 March)
Nature
paper from 2009 introducing Google Flu Trends (GFT)
(29 March)
R code
from class to analyze the default data using logistic regression and LDA.
(5 April)
R code
for Chapter 4 from the book website to demonstrate classification methods.
(19 April)
Homework
2, due on 3/5 in class.
(3 May)
R code
from class verifying the theoretical result on optimism.
(8 May) Homework
3, due on 24/5 in class.
(25 May) Homework
4, due on 9/6 in my mailbox on floor 1 of Schreiber (or 7/6 in class). It uses the code
boost.r.
(7 June) Bootstrap presentation from class
Syllabus
The goal of this course is to introduce the basic ideas of
"modern" statistical learning and predictive modeling, from a
statistical, theoretical and computational perspective, together
with applications in big data.
The topics we will cover include:
Introduction: some examples of problems in regression and
classification; Focus on Google Flu Trends (GFT)
Basic methods for regression: Linear regression and local
(neighbor-based) methods
Basic methods for classification: Logistic regression and
discriminant analysis
Resampling methods: cross validation and bootstrap
Model selection and regularization
Modern methods and their applications: trees, support vector
machines
Both the class material and homework will combine theoretical
aspects with practical implementation aspects and demonstrations on
data.
The grade will be a combination of homework problem sets
(about six overall, worth about 30% of final grade) and a final in-class exam (about 70% of
final grade).
Prerequisites
Basic knowledge of mathematical foundations: Calculus; Linear Algebra
Undergraduate courses in: Probability; Regression; Theoretical Statistics (possibly in parallel)
Statistical programming experience in R is an advantage
The book labs, all class demonstrations and any code given in the HW
will be in R (freely
available for PC/Unix/Mac). There is no requirement to use it, but
it is highly
recommended. R Project website also contains extensive documentation. A basic "getting you started in R" tutorial. Uses the Boston Housing Data (thanks to Giles Hooker). Modern Applied Statistics with Splus by Venables and Ripley is
an excellent source for statistical computing help for R/Splus.
File translated from
TEX
by
TTH,
version 4.08. On 13 Jun 2016, 17:02.