Introduction to Statistical Learning

Course information

course number: STAT 596 semester: Fall 2019
meeting times: T/Th 11:00am–12:15pm room: GMCS-421
prerequisites: STAT 551A, or concurrent enrollment; foundational knowledge of R

Instructor information

name: Henry Scharf office: GMCS-518
email: hscharf@sdsu.edu office hours: T/Th 10:00-10:50am or by appt.

Overview

“Statistical learning…is about discovering structure in data and making predictions about unknown quantities. Practically speaking, this means (i) find[ing] relationships between a group of explanatory and response variables that provides good predictive performance, (ii) reduc[ing] the size of the group of variables for scientific, statistical, or computational purposes and, perhaps most importantly, (iii) knowing the techniques, how they work, when they apply, and how to implement them.” –Darren Homrighausen

Learning outcomes

Students who succeed in this course will, in short, fulfill (iii) from the Overview–they will have a collection of potentially useful statistical tools at their disposal that they can appropriately apply to a wide variety of problems. Just as important, they will also be able to determine when certain statistical tools are not appropriate. The focus of the course will be on both holistic, general understanding of methodology, as well as specific implementation using the R statistical programming language.

Evaluation

  • In-class quizzes (15%): A very short quiz will be given during the first 10 minutes of each class meeting. The lowest two scores will be dropped.

  • Homework (25%): There will be homework assignments. These will typically be continuations of in-class Rlabs (see schedule).

  • Independent research presentation (IRP) (25%): A short presentation/report prepared individually about a topic you find particularly interesting. Pair-based IRPs may be allowed; talk to me first.

  • Final project (35%): A presentation + report based on the analyses of a data set of your choosing. This will be a group effort, and your grade will be based on several deliverables building up to the final presentation + report.

Textbooks

Course calendar

Week Tuesday Topics Thursday Topics
1 8/27 review syllabus; bias/variance trade-off 8/29 short history of statistical learning; Stein’s paradox
2 9/3 linear regression; basis expansion 9/5 splines
3 9/10 IRP: Henry;
regression trees
9/12 Rlab: regression
4 9/17 IRP: Henry;
scoring statistical models; cross validation
9/19 IRP: __________;
regularization; ridge and LASSO penalties
5 9/24 Rlab: regularized regression 9/26 IRP: __________;
classification; latent variables
6 10/1 IRP: __________;
LDA; kNN
10/3 IRP: __________;
classification trees
7 10/8 IRP: __________;
scoring binary response models
10/10 Rlab: classification
8 10/15 IRP: __________;
neural networks 1;
begin looking for project data
10/17 IRP: __________;
neural networks 2
9 10/22 IRP: __________;
neural networks 3
10/24 Rlab: neural networks
10 10/29 random forests 1;
due: EDA for selected dataset
10/31 random forests 2
11 11/5 random forests 3 11/7 Rlab: random forests;
due: proposed analysis, goals, division of duties
12 11/12 IRP: __________;
unsupervised learning
11/14 IRP: __________;
PCA, SVD
13 11/19 IRP: __________;
k-means clustering;
project progress checkin meetings this week
11/21 Rlab: PCA, SVD, k-means clustering;
project progress checkin meetings this week
14 11/26 IRP: __________;
prototype methods
11/28 Rlab: prototype methods
15 12/3 presentations 12/5 presentations
16 12/10 presentations FINAL no class;
due: final project reports

Student motivation

Motivation to participate in this class needs to come primarily from within. Some of the assessment structures provide minimal external nudging to keep students going (e.g., quizzes), but for the most part your success will be a product of your own internal desire to actually learn this stuff. For my part as the instructor, this means I have planned zero in-class exams, and will try to keep things as immediately relevant for you, the students, as possible. I will try to be responsive to all your requests throughout the semester. If you find something boring/useless, I’ll try to take it out. If you want me to go into more depth on a particular topic, I’ll try to make time to do that.

For your part as a student, this means you will have to manage your own time carefully, and do whatever you must to make assignments/projects relevant for you. When there is an opportunity, find data sets you care about and want to analyze. Focus on methods you want to be able to take with you throughout your career. A fully engaged student will probably find that she is frequently searching online for more information about something we discussed in class. He may find himself listening to unassigned podcasts and reading blogs written by experts. From time to time, they may bump up against a problem to which the collective response of humanity is “We don’t know how to do that…yet.”

Access

I am committed to ensuring each student’s access to: all course materials, time and attention from me as the instructor both in and out of class, and fair opportunities to demonstrate mastery of the course content. Please contact me if you require any special assistance or accommodations. To avoid any delay in the receipt of your accommodations, you should contact Student Ability Success Center as soon as possible. Please note that accommodations are not retroactive, and that I cannot provide accommodations based upon disability until I have received an accommodation letter from Student Ability Success Center.

Communication with instructor

I encourage students to reach out by email anytime they need help or have a question. I endeavor to respond to all emails within 24 hours during the work week. Generally I will not be able to respond during the weekend. For questions that require a longer response than a few sentences, please visit me during office hours or schedule a meeting with me. For questions that can be easily answered through a straightforward search online, you may receive a terse reply inviting you to find the answer on your own. (For example: STUDENT: When are your office hours again? ME: You can figure that out without my help. I believe in you!)

Additional Resources