course number: STAT 596 | semester: Fall 2019 |
meeting times: T/Th 11:00am–12:15pm | room: GMCS-421 |
prerequisites: STAT 551A, or concurrent enrollment; foundational knowledge of R |
name: Henry Scharf | office: GMCS-518 |
email: hscharf@sdsu.edu | office hours: T/Th 10:00-10:50am or by appt. |
“Statistical learning…is about discovering structure in data and making predictions about unknown quantities. Practically speaking, this means (i) find[ing] relationships between a group of explanatory and response variables that provides good predictive performance, (ii) reduc[ing] the size of the group of variables for scientific, statistical, or computational purposes and, perhaps most importantly, (iii) knowing the techniques, how they work, when they apply, and how to implement them.” –Darren Homrighausen
Students who succeed in this course will, in short, fulfill (iii) from the Overview–they will have a collection of potentially useful statistical tools at their disposal that they can appropriately apply to a wide variety of problems. Just as important, they will also be able to determine when certain statistical tools are not appropriate. The focus of the course will be on both holistic, general understanding of methodology, as well as specific implementation using the R statistical programming language.
In-class quizzes (15%): A very short quiz will be given during the first 10 minutes of each class meeting. The lowest two scores will be dropped.
Homework (25%): There will be homework assignments. These will typically be continuations of in-class Rlabs (see schedule).
Independent research presentation (IRP) (25%): A short presentation/report prepared individually about a topic you find particularly interesting. Pair-based IRPs may be allowed; talk to me first.
Final project (35%): A presentation + report based on the analyses of a data set of your choosing. This will be a group effort, and your grade will be based on several deliverables building up to the final presentation + report.
An Introduction to Statistical Learning (ISL) by Gareth James, Daniela Witten, Trevor Hastie, & Robert Tibshirani. This site also has links to videos/tutorials prepared by the authors and other experts.
The Elements of Statistical Learning (ESL) by Trevor Hastie, Robert Tibshirani, Jerome Friedman, & James Franklin
Week | Tuesday | Topics | Thursday | Topics |
---|---|---|---|---|
1 | 8/27 | review syllabus; bias/variance trade-off | 8/29 | short history of statistical learning; Stein’s paradox |
2 | 9/3 | linear regression; basis expansion | 9/5 | splines |
3 | 9/10 | IRP: Henry; regression trees |
9/12 | Rlab: regression |
4 | 9/17 | IRP: Henry; scoring statistical models; cross validation |
9/19 | IRP: __________; regularization; ridge and LASSO penalties |
5 | 9/24 | Rlab: regularized regression | 9/26 | IRP: __________; classification; latent variables |
6 | 10/1 | IRP: __________; LDA; kNN |
10/3 | IRP: __________; classification trees |
7 | 10/8 | IRP: __________; scoring binary response models |
10/10 | Rlab: classification |
8 | 10/15 | IRP: __________; neural networks 1; begin looking for project data |
10/17 | IRP: __________; neural networks 2 |
9 | 10/22 | IRP: __________; neural networks 3 |
10/24 | Rlab: neural networks |
10 | 10/29 | random forests 1; due: EDA for selected dataset |
10/31 | random forests 2 |
11 | 11/5 | random forests 3 | 11/7 | Rlab: random forests; due: proposed analysis, goals, division of duties |
12 | 11/12 | IRP: __________; unsupervised learning |
11/14 | IRP: __________; PCA, SVD |
13 | 11/19 | IRP: __________; k-means clustering; project progress checkin meetings this week |
11/21 | Rlab: PCA, SVD, k-means clustering; project progress checkin meetings this week |
14 | 11/26 | IRP: __________; prototype methods |
11/28 | Rlab: prototype methods |
15 | 12/3 | presentations | 12/5 | presentations |
16 | 12/10 | presentations | FINAL | no class; due: final project reports |
Motivation to participate in this class needs to come primarily from within. Some of the assessment structures provide minimal external nudging to keep students going (e.g., quizzes), but for the most part your success will be a product of your own internal desire to actually learn this stuff. For my part as the instructor, this means I have planned zero in-class exams, and will try to keep things as immediately relevant for you, the students, as possible. I will try to be responsive to all your requests throughout the semester. If you find something boring/useless, I’ll try to take it out. If you want me to go into more depth on a particular topic, I’ll try to make time to do that.
For your part as a student, this means you will have to manage your own time carefully, and do whatever you must to make assignments/projects relevant for you. When there is an opportunity, find data sets you care about and want to analyze. Focus on methods you want to be able to take with you throughout your career. A fully engaged student will probably find that she is frequently searching online for more information about something we discussed in class. He may find himself listening to unassigned podcasts and reading blogs written by experts. From time to time, they may bump up against a problem to which the collective response of humanity is “We don’t know how to do that…yet.”
I am committed to ensuring each student’s access to: all course materials, time and attention from me as the instructor both in and out of class, and fair opportunities to demonstrate mastery of the course content. Please contact me if you require any special assistance or accommodations. To avoid any delay in the receipt of your accommodations, you should contact Student Ability Success Center as soon as possible. Please note that accommodations are not retroactive, and that I cannot provide accommodations based upon disability until I have received an accommodation letter from Student Ability Success Center.
I encourage students to reach out by email anytime they need help or have a question. I endeavor to respond to all emails within 24 hours during the work week. Generally I will not be able to respond during the weekend. For questions that require a longer response than a few sentences, please visit me during office hours or schedule a meeting with me. For questions that can be easily answered through a straightforward search online, you may receive a terse reply inviting you to find the answer on your own. (For example: STUDENT: When are your office hours again? ME: You can figure that out without my help. I believe in you!)
Self-paced statistical learning course from Stanford based on ISL.
There are many…
many…
MANY…
other useful references online. If you find a particularly good one you think others would appreciate, please let us know!