INFO411 2012 FY – Machine Learning and Data Mining
Lecture notes
- Lecture 1 – Introduction
- Reading: Alpaydin, Chapter 1
- Lecture 2 – Data Clustering
- Reading: P-N Tan et al., ch.8
- Lecture 3 – Online Clustering, Alpaydin ch.12
- Lecture 4 – Dimension Reduction, ch.7
- Lecture 5 – Classification, ch.4/8/9
- Lecture 6 – Feature Selection
- Lecture 7 – Regression & Model Selection, ref. Alpaydin ch.14
- Lecture 8 – Combining Multiple Learners, ref. Alpaydin ch.15
- Presentations
- Performance Evaluation
- Lecture 9: Hidden Markov Models
- Chaotic information processing
Labs
- Lab 1. Scipy/Clustering
- Lab 2. Clustering II / Image Segmentation
- Lab 3. PCA
- Lab 4. Classification
Assignment 1 – Due 11am 18/5.
Presentations
Guidelines: Each presentation should be about 30 minutes long, with 20 slides at least. Remember to present an overview (what is the paper about? relevant background, main conclusion/contribution/findings etc.), an introduction of technical approaches, results (feel free to include figures or diagrams from the e-copy), and a conclusion or summary. If the coverage of the paper is too broad (e.g. a survey paper), you don’t have to present every algorithm but choose a few representative ones. Presentations will be marked based on understanding, technical accuracy, and communication skills. Also, the participation of other students in asking questions or having discussion will contribute to their final marks.
| Date | Talk 1 | Talk 2 | Talk 3 |
|---|---|---|---|
| 9/5 | GNG (Damien) | Distributed PCA (Walter) | Data stream clustering (Ethan) |
| 16/5 | K-means plus (Joyce) | Peer-to-Peer (Lu) | Anomaly (Abdullah) |
Presentation Readings
Use your University proxy to access the full-text papers when necessary.
- Choose two out of these recent k-means papers (Learning the k, 2003; Kmeans++ 2007; and Web-scale K-Means Clustering, 2010)
- Qin et al., Robust growing neural gas algorithm with application in cluster analysis, 2004.
- V. Chandola et al. Anomaly detection: a survey, CSUR41, 2009
- MM Breunig et al., LOF: Identifying density-based local outliers
- Huang et al., Distributed PCA and network anomaly detection, 2006
- G. Cormode et al., Conquering the divide: continuous clustering of distributed data streams, ICDE’07
- B-H Park and H. Kargupta, Distributed Data Mining in Peer-to-Peer Networks, IEEE Internet Computing10, 2006.
Other Readings
- R. Chellappa et al., Face Recognition by Computers and Humans, IEEE Computer Feb. 2010
- GESCONDA: An intelligent data analysis system for knowledge discovery and management in environmental databases
- Wu et al., Top 10 algorithms in data mining, Knowledge and Information Systems, 2007.
Datasets