INFO411

INFO411 2012 FY – Machine Learning and Data Mining

Lecture notes

  1. Lecture 1 – Introduction
  2. Lecture 2 – Data Clustering
    • Reading: P-N Tan et al., ch.8
  3. Lecture 3 – Online Clustering, Alpaydin ch.12
  4. Lecture 4 – Dimension Reduction, ch.7
  5. Lecture 5 – Classification, ch.4/8/9
  6. Lecture 6 – Feature Selection
  7. Lecture 7 – Regression & Model Selection, ref. Alpaydin ch.14
  8. Lecture 8 – Combining Multiple Learners, ref. Alpaydin ch.15
  9. Presentations
  10. Performance Evaluation
  11. Lecture 9: Hidden Markov Models
  12. Chaotic information processing

Labs

  1. Lab 1. Scipy/Clustering
  2. Lab 2. Clustering II / Image Segmentation
  3. Lab 3. PCA
  4. Lab 4. Classification

Assignment 1 – Due 11am 18/5.

Presentations
Guidelines: Each presentation should be about 30 minutes long, with 20 slides at least. Remember to present an overview (what is the paper about? relevant background, main conclusion/contribution/findings etc.), an introduction of technical approaches, results (feel free to include figures or diagrams from the e-copy), and a conclusion or summary. If the coverage of the paper is too broad (e.g. a survey paper), you don’t have to present every algorithm but choose a few representative ones. Presentations will be marked based on understanding, technical accuracy, and communication skills. Also, the participation of other students in asking questions or having discussion will contribute to their final marks.

Date Talk 1 Talk 2 Talk 3
9/5 GNG (Damien) Distributed PCA (Walter) Data stream clustering (Ethan)
16/5 K-means plus (Joyce) Peer-to-Peer (Lu) Anomaly (Abdullah)

Presentation Readings

Use your University proxy to access the full-text papers when necessary.

  1. Choose two out of these recent k-means papers (Learning the k, 2003; Kmeans++ 2007; and Web-scale K-Means Clustering, 2010)
  2. Qin et al., Robust growing neural gas algorithm with application in cluster analysis, 2004.
  3. V. Chandola et al. Anomaly detection: a survey, CSUR41, 2009
  4. MM Breunig et al., LOF: Identifying density-based local outliers
  5. Huang et al., Distributed PCA and network anomaly detection, 2006
  6. G. Cormode et al., Conquering the divide: continuous clustering of distributed data streams, ICDE’07
  7. B-H Park and H. Kargupta, Distributed Data Mining in Peer-to-Peer Networks, IEEE Internet Computing10, 2006.

Other Readings

  1. R. Chellappa et al., Face Recognition by Computers and Humans, IEEE Computer Feb. 2010
  2. GESCONDA: An intelligent data analysis system for knowledge discovery and management in environmental databases
  3. Wu et al., Top 10 algorithms in data miningKnowledge and Information Systems, 2007.

Datasets

  1. UCI Machine Learning Repository
  2. UCI KDD Archive
  3. Caltech 101
  4. Cambridge Traffic Classification Datasets

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.