Local-copy-cs445
From PubInfo
This page contains general information for the class:
CPSC445/CPSC545/MBB334/MBB545/CBB545 Spring 2008
Introduction to Data Mining
| Table of contents |
[edit]
Course websites
- Course wiki
- Yale classes server
- Previous years' pages:
[edit]
Homework
- Homework 1
- assignment (http://zoo.cs.yale.edu/classes/cs445/hw/CPSC445_hw1_2008-2.doc)
- Due: Feb 7, 2008.
- submitted summaries
- Homework 2
- assignment (http://zoo.cs.yale.edu/classes/cs445/hw/CPSC445_hw2_2008-3.doc)
- Due: Feb 21, 2008
- Homework 3
- assignment (http://zoo.cs.yale.edu/classes/cs445/hw/CPSC445_hw3_2008-2.doc)
- hw3_data.csv (http://zoo.cs.yale.edu/classes/cs445/hw/hw3_data.csv)
- hints (http://zoo.cs.yale.edu/classes/cs445/hw/hw3_hints.doc)
- Due: Mar 6, 2008
[edit]
Final Project
- CPSC445b/545b (2008) Term Projects
- description (http://zoo.cs.yale.edu/classes/cs445/hw/CPSC445_term_projects_2008-v2.doc)
- One paragraph description due: March 27, 2008 (send to martin.schultz@yale.edu AND jiang.du@yale.edu)
- Project reports due: April 28, 2008 (send to jiang.du@yale.edu)
- 15-20 minute project presentations: during class on April 1, 10, 15, 17, 22, 24 (To make reservations for speaking times, please send your preferences to jiang.du@yale.edu)
- presentation sechedule (http://spreadsheets.google.com/pub?key=pXgR9Xs-YQoHGGHqva0_5Lw)
- project/presentation details
[edit]
Slides
[edit]
Week 1
- Tu Jan 15, Martin Schultz (http://www.cs.yale.edu/people/schultz.html) (MS): Introduction to Data Mining
- slides (http://zoo.cs.yale.edu/classes/cs445/slides/kumar_1.ppt)
- Thur Jan 17, Jiang Du (http://homes.gersteinlab.org/people/jiangdu/) (JD): Introduction to R
[edit]
Week 2
- Tu Jan 22, MS: Introduction to Datamining and Data storage/cleaning
- Thur Jan 24, MS: OLAP, Regression
[edit]
Week 3
- Tu Jan 29, MS: Multilinear Regression, cross validation
- slides (http://www.gersteinlab.org/courses/545/07-spr/slides/DM_multiple_regression.ppt)
- Thur Jan 31, MS: discriminant analysis, perceptrons, SVM
- slides: SVM (http://www.gersteinlab.org/courses/545/07-spr/slides/DM_SVM.ppt)
- additional slides: SVM (http://www.gersteinlab.org/courses/545/07-spr/slides/DM_SVM-law.ppt)
- slides: perceptron models (http://www.gersteinlab.org/courses/545/07-spr/slides/DM_perceptron.ppt)
[edit]
Week 4
- Tu Feb 5, Mark Gerstein (http://bioinfo.mbb.yale.edu/about) (MG): Bayesian classification
- slides: Predicting Networks through Bayesian Integration #1 - Theory (http://www.gersteinlab.org/courses/545/07-spr/slides/cbb545b-spr07-bioinfo3-bayes1.ppt)
- slides: Predicting Networks through Bayesian Integration #2 - Application (http://www.gersteinlab.org/courses/545/07-spr/slides/cbb545b-spr07-bioinfo4-bayes2.ppt)
- Thur Feb 7, JD: Decision trees
[edit]
Week 5
- Tu Feb 12, MS: Logistic regression
- Thur Feb 14, MG: PCA
- slides: Theory (http://www.gersteinlab.org/courses/545/07-spr/slides/cbb545b-spr07-bioinfo5-svd1.ppt)
- slides: Application (http://www.gersteinlab.org/courses/545/07-spr/slides/cbb545b-spr07-bioinfo6-svd2.ppt)
[edit]
Week 6
- Tu Feb 19, MS: k-nearest neighbors, neural networks
- link: Hans Rosling's presentation (http://www.gapminder.org/video/talks/ted-2007---the-seemingly-impossible-is-possible.html)
- Thur Feb 21, MS: Clustering
[edit]
Week 7
- Tu Feb 26, MS: Mining Time series
- Thur Feb 28, Michael Krauthammer (http://www.yalepath.org/faculty.lasso?id=KrauthammerM) (MK): Text Mining
[edit]
Week 8
- Tu Mar 4, Songhua Xu (http://www.cs.hku.hk/~songhua/) (SX): Web, image mining
- Thur Mar 6, MS: Association Analyses
- Thur Mar 6, JDU: Data Mining Packages in R: logistic regression & SVM
- slides (http://zoo.cs.yale.edu/classes/cs445/slides/r_pkgs-jdu.ppt)
[edit]
Week 9, 10
- Spring Break
[edit]
Week 11
- Tu Mar 25: Max and Kjell: DM Lab (AKW 400)
- slides (http://zoo.cs.yale.edu/classes/cs445/slides/Pfizer_Yale_Version.ppt)
- Model Building: General Strategies, Data Pre-processing, and Partial Least Squares (http://zoo.cs.yale.edu/classes/cs445/slides/Yale1.pdf)
- Thur Mar 27: MS
[edit]
Week 12
- Tu Apr 1: Student presentations
- Thur Apr 3: Max and Kjell: DM Lab
- Model Building: Ensemble Methods (http://zoo.cs.yale.edu/classes/cs445/slides/Yale2.pdf)
[edit]
Week 13
- Tu Apr 8: Max and Kjell: DM lab
- Thur Apr 10: student presentations
- An Introduction to caret (http://zoo.cs.yale.edu/classes/cs445/slides/caret.pdf)
[edit]
Week 14
- Tu Apr 15: student presentations
- Thur Apr 17: student presentations
[edit]
Week 15
- Tu Apr 22: student presentations
- Thur Apr 24: student presentations
[edit]
Suggested Readings
[edit]
Week 1
- Super Crunchers (http://www.randomhouse.com/bantamdell/supercrunchers/)
- Chapters 1-4, pp. 1-102.
- Keep your eyes open for potential term application oriented projects
- Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
- Chapter 1, pp. 1-40.
[edit]
Week 2
- Super Crunchers (http://www.randomhouse.com/bantamdell/supercrunchers/)
- Chapters 5-6, pp. 103-155.
- Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
- Chapter 2, pp. 41-60.
- Chapter 4, Section 4.6, pp. 119-127.
[edit]
Week 3
- Super Crunchers (http://www.randomhouse.com/bantamdell/supercrunchers/)
- Chapters 7-8, pp. 156-218.
- Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
- Chapter 4, Section 4.6, pp. 119-127.
- Chapter 6, Section 6.3, pp. 214-235.
[edit]
Week 4
- Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
- Bayesian Methods: Chapter 4, Section 4, pg. 141, Section 6, pp. 271-283.
- Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97 105.
[edit]
Week 5
- Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
- Logistic Regression: Chapter 4, Section 4.6, pg. 121-125.
- PCA, SVD, and LSI
- pdf (http://www.cs.pitt.edu/~milos/courses/cs3750/Readings/Berry_etal-1999.pdf)
[edit]
Week 6
- Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
- K-nearest neighbors (instance-based learning): Chapter 4, Section 4.7, pp. 128-136, Chapter 6, Section 6, pp. 235-243.
- Neural Networks: Chapter 6, Section 6.3, pp. 223-226, 233.
- Clustering: Chapter 4, Section 4.8, pp. 136-139, Chapter 6, Section 6.6, pp. 254-271.
[edit]
Week 8
- Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
- Bayesian Methods: Chapter 4, Section 4, pg. 141, Section 6, pp. 271-283.
- Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97-105.
[edit]
Other Online Materials
- Introduction to Data Mining (by Tan et al.)
- Chapter 4, 6 and 8 (http://www-users.cs.umn.edu/~kumar/dmbook/index.php)
- Chapter 5 (http://zoo.cs.yale.edu/classes/cs445/misc/chap5_other_classification.pdf)
- Data Mining (by Graham Williams)
- Draft Book (http://zoo.cs.yale.edu/classes/cs445/misc/mar13lae08.pdf) for use only in this course
- Intro to R and Data Mining (by Luis Torgo)
- R Documentation
