
I am pretty sure everybody agrees mining is dirty and risky. So congratulations to everyone for surviving the first semester of our Data Mining paper.
Actually I think we did better than surviving. I’ve seen quality work in the labs, and really enjoyed the presentations – I found a lot of gold there.
To sum up our first semester our lab and assignment work (t.b.p., p for published) will be a good place for you to try things out from Weka and apply them into a small problem solving task.
It is of course more than appropriate for me to post out the comments for our presentations. I go after the presentation order.
Presentation 1. Face Recognition.
This relates to a specific application but Maheshwar managed to present a good background and history line about the face processing: detection, verification and recognition. The Haar-like features were explained. The introduction to the AdaBoost algorithm was concise and clear, but I feel some more details could be helpful. I hope you’ll find this presentation make more sense after Lecture 8.
The class basically rank the presentation as ‘Good’(3)/’Excellent’(1)
, and suggest using more diagrams in explanation, and also giving information about the state of art – how effective can computers do face recognition now?
Presentation 2. GESCONDA.
This presentation gives us a new perspective about building a domain-specific system for intelligent data analysis. Apart from a very smooth introduction about the system, what I really liked about this presentation is Max’s critics especially on the authors (lack of) justification about particular inclusion and treatment of various algorithms for the domain of environmental data analysis. How the agents interact is not clear either.
GESCONDA is an old system that doesn’t seem to be supported anymore apart from being cited favourably. So it is hard for us to get more information or even download it to have a try. However, I would have hoped that Google had helped us to find some other pointers – e.g., this conference article by the same group actually contained more examples. Take a look if you are interested in doing a similar system for a particular domain, say, in the 2nd semester.
The class, feeling more examples would be helpful, enjoyed the presentation and ranked it Good(3)/Excellent(1).
Presentation 3. Distributed Data Mining in P2P networks
This is an interesting topic and gives us a complementary picture of large-scale, distributed data mining problems in real-world scenario. The background about P2P networking was presented and the requirements for distributed data mining were outlined well. It’s good effort for Ethem to drill down into specific algorithms and presented diagrams etc. to show us the ideas behind the algorithms. The approximate k-means was explained well. Anybody who’s interested in implementing such a local/collective algorithm?
The class voted Good(3)/Fair(1) on this presentation, but given the complexity of the algorithms I feel it is a good talk.
Presentation 4. Anomaly Detection
This is a very interesting topic and I think Rui made a wise decision to focus on the big picture and gave very good coverage on the problem and major approaches. I liked the parts on contextual anomalies and the three types of learning. What the class feels lacking is some specific techniques as examples.
The votes are on Good(2)/Excellent(1)/Fair(1).
Presentation 5. LOF
This aligns well with the last presentation but focuses on a specific outlier detection algorithm. Thanks to Shuang’s great effort in understanding and presenting the algorithm in a way we all feel we know what it is about. It’ll be great if we can implement an (improved) version of this popular algorithm and conduct some comparison study. The paper has an experiment of looking for odd records of Bundesliga football players, and there are other interesting applications too , e.g. video shot boundary detection.
If you are interested in implementing a LOF-like algorithm, read this follow-up paper too.
Presentation 6. Soccer video analysis
The title says it all – this is a very interesting and also complicated topic. Every single steps require considerable machine learning techniques, from image segmentation, recognition, and tracking. Thankfully we have someone like Munir who is never afraid of details. The tracking part of the paper seems to be a weakness since without a frame-by-frame tracking approach using e.g. particle filters, the ambiguity of player identities may become a problem when they move near each other or even occlude (unfortunately this happens very often). Still a challenging task but promisingly a lot of fun – any volunteers to work with Munir?