Local-copy-cs445

From PubInfo

This page contains general information for the class:

CPSC445/CPSC545/MBB334/MBB545/CBB545 Spring 2008

Introduction to Data Mining

Table of contents

Course websites

Homework

  • Homework 2
    • assignment (http://zoo.cs.yale.edu/classes/cs445/hw/CPSC445_hw2_2008-3.doc)
    • Due: Feb 21, 2008
  • Homework 3
    • assignment (http://zoo.cs.yale.edu/classes/cs445/hw/CPSC445_hw3_2008-2.doc)
    • hw3_data.csv (http://zoo.cs.yale.edu/classes/cs445/hw/hw3_data.csv)
    • hints (http://zoo.cs.yale.edu/classes/cs445/hw/hw3_hints.doc)
    • Due: Mar 6, 2008

Final Project

  • CPSC445b/545b (2008) Term Projects
    • description (http://zoo.cs.yale.edu/classes/cs445/hw/CPSC445_term_projects_2008-v2.doc)
    • One paragraph description due: March 27, 2008 (send to martin.schultz@yale.edu AND jiang.du@yale.edu)
    • Project reports due: April 28, 2008 (send to jiang.du@yale.edu)
    • 15-20 minute project presentations: during class on April 1, 10, 15, 17, 22, 24 (To make reservations for speaking times, please send your preferences to jiang.du@yale.edu)
    • presentation sechedule (http://spreadsheets.google.com/pub?key=pXgR9Xs-YQoHGGHqva0_5Lw)
    • project/presentation details

Slides

Week 1

  • Tu Jan 15, Martin Schultz (http://www.cs.yale.edu/people/schultz.html) (MS): Introduction to Data Mining
    • slides (http://zoo.cs.yale.edu/classes/cs445/slides/kumar_1.ppt)
  • Thur Jan 17, Jiang Du (http://homes.gersteinlab.org/people/jiangdu/) (JD): Introduction to R
    • slides (http://zoo.cs.yale.edu/classes/cs445/slides/data_mining-08spring-Intro2R.ppt)
    • wine.data (http://zoo.cs.yale.edu/classes/cs445/misc/wine.data) wine.r (http://zoo.cs.yale.edu/classes/cs445/misc/wine.r)

Week 2

  • Tu Jan 22, MS: Introduction to Datamining and Data storage/cleaning
  • Thur Jan 24, MS: OLAP, Regression

Week 3

  • Tu Jan 29, MS: Multilinear Regression, cross validation
    • slides (http://www.gersteinlab.org/courses/545/07-spr/slides/DM_multiple_regression.ppt)
  • Thur Jan 31, MS: discriminant analysis, perceptrons, SVM

Week 4

  • Thur Feb 7, JD: Decision trees
    • slides (http://zoo.cs.yale.edu/classes/cs445/slides/DM_DecisionTree-mod_jdu.ppt)
    • example (http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm)

Week 5

  • Tu Feb 12, MS: Logistic regression
  • Thur Feb 14, MG: PCA
    • slides: Theory (http://www.gersteinlab.org/courses/545/07-spr/slides/cbb545b-spr07-bioinfo5-svd1.ppt)
    • slides: Application (http://www.gersteinlab.org/courses/545/07-spr/slides/cbb545b-spr07-bioinfo6-svd2.ppt)

Week 6

  • Tu Feb 19, MS: k-nearest neighbors, neural networks
  • Thur Feb 21, MS: Clustering

Week 7

  • Tu Feb 26, MS: Mining Time series
  • Thur Feb 28, Michael Krauthammer (http://www.yalepath.org/faculty.lasso?id=KrauthammerM) (MK): Text Mining

Week 8

  • Tu Mar 4, Songhua Xu (http://www.cs.hku.hk/~songhua/) (SX): Web, image mining
  • Thur Mar 6, MS: Association Analyses
  • Thur Mar 6, JDU: Data Mining Packages in R: logistic regression & SVM
    • slides (http://zoo.cs.yale.edu/classes/cs445/slides/r_pkgs-jdu.ppt)

Week 9, 10

  • Spring Break

Week 11

  • Thur Mar 27: MS

Week 12

  • Tu Apr 1: Student presentations

Week 13

  • Tu Apr 8: Max and Kjell: DM lab

Week 14

  • Tu Apr 15: student presentations
  • Thur Apr 17: student presentations

Week 15

  • Tu Apr 22: student presentations
  • Thur Apr 24: student presentations

Suggested Readings

Week 1

  • Super Crunchers (http://www.randomhouse.com/bantamdell/supercrunchers/)
    • Chapters 1-4, pp. 1-102.
    • Keep your eyes open for potential term application oriented projects
  • Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
    • Chapter 1, pp. 1-40.

Week 2

  • Super Crunchers (http://www.randomhouse.com/bantamdell/supercrunchers/)
    • Chapters 5-6, pp. 103-155.
  • Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
    • Chapter 2, pp. 41-60.
    • Chapter 4, Section 4.6, pp. 119-127.

Week 3

  • Super Crunchers (http://www.randomhouse.com/bantamdell/supercrunchers/)
    • Chapters 7-8, pp. 156-218.
  • Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
    • Chapter 4, Section 4.6, pp. 119-127.
    • Chapter 6, Section 6.3, pp. 214-235.

Week 4

  • Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
    • Bayesian Methods: Chapter 4, Section 4, pg. 141, Section 6, pp. 271-283.
    • Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97 105.

Week 5

  • Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
    • Logistic Regression: Chapter 4, Section 4.6, pg. 121-125.
  • PCA, SVD, and LSI
    • pdf (http://www.cs.pitt.edu/~milos/courses/cs3750/Readings/Berry_etal-1999.pdf)

Week 6

  • Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
    • K-nearest neighbors (instance-based learning): Chapter 4, Section 4.7, pp. 128-136, Chapter 6, Section 6, pp. 235-243.
    • Neural Networks: Chapter 6, Section 6.3, pp. 223-226, 233.
    • Clustering: Chapter 4, Section 4.8, pp. 136-139, Chapter 6, Section 6.6, pp. 254-271.

Week 8

  • Weka Book (http://www.cs.waikato.ac.nz/~ml/weka/book.html) (Witten and Frank)
    • Bayesian Methods: Chapter 4, Section 4, pg. 141, Section 6, pp. 271-283.
    • Decision Trees: Chapter 3, Section 3.3, pp. 62-65, Chapter 4, Section 4.3, pp.97-105.

Other Online Materials

  • Introduction to Data Mining (by Tan et al.)
    • Chapter 4, 6 and 8 (http://www-users.cs.umn.edu/~kumar/dmbook/index.php)
    • Chapter 5 (http://zoo.cs.yale.edu/classes/cs445/misc/chap5_other_classification.pdf)
  • Data Mining (by Graham Williams)
    • Draft Book (http://zoo.cs.yale.edu/classes/cs445/misc/mar13lae08.pdf) for use only in this course