2023

Data Mining

Name: Data Mining
Code: INF13273M
6 ECTS
Duration: 15 weeks/156 hours
Scientific Area: Informatics

Teaching languages: Portuguese
Languages of tutoring support: Portuguese
Regime de Frequência: Presencial

Sustainable Development Goals

Learning Goals

At the end of the course unit the student should demonstrate:
Understanding in key data mining approaches and techniques, focusing on the types of problems, data preparation, including the challenges posed by big data;
Ability to use data mining tools and apply them to data sets revealing in-depth understanding of key data mining topics;
Ability to develop/deepen design and programming techniques for the construction of intelligent and adaptable systems;
Ability to develop/deepen the basic techniques needed to perform big data mining research.

Contents

The Data Mining process
Types of problems: pattern association, clustering, outlier detection, classification
Data preparation: extraction, cleaning, selection, reduction and transformation of attributes, sampling and subsampling
Mining of: streams, text, time series, discrete sequences, spatial data, graphs, web data
Measures of similarity and distances
Problems, approaches and algorithms
Association of patterns
Analysis of clusters
Algorithms: K-means, EM, PCA, SOM, ...
Performance evaluation
Classification
Ensemble methods. Problems with unbalanced classes
Performance metrics: precision, recall, F-measure, ROC curve, Log loss and others (cost function, Cohen's kappa, G-score)
Regression
linear and nonlinear models.
performance evaluation: quadratic errors, absolute errors, absolute errors, correlation coefficient
Analysis of outliers (supervised and unsupervised)
Measures of complexity/simplicity
Mixed performance criteria
Preservation of privacy

Teaching Methods

The teaching methodology includes: - availability of all resources through a teaching computer platform (e.g. Moodle) and availability of the relevant materials prior to each face-to-face session - presentation of examples, demonstrations and problem solving for each concept presented - presentation and submission of exercises via computer teaching platform - orientation of the presentation of the concepts around the applications and projects to develop.

Assessment

Continuous assessment :
40% de Exam,
40% de Final project
20% de Lab reports along the semester (at leats 5 reports)
Final assessment:
50% de Exam,
50% de Final project

Teaching Staff