2025
Data Mining
Name: Data Mining
Code: INF13273M
6 ECTS
Duration: 15 weeks/156 hours
Scientific Area:
Informatics
Teaching languages: Portuguese
Languages of tutoring support: Portuguese
Sustainable Development Goals
Learning Goals
At the end of the course unit the student should demonstrate:
Understanding in key data mining approaches and techniques, focusing on the types of problems, data preparation, including the challenges posed by big data;
Ability to use data mining tools and apply them to data sets revealing in-depth understanding of key data mining topics;
Ability to develop/deepen design and programming techniques for the construction of intelligent and adaptable systems;
Ability to develop/deepen the basic techniques needed to perform big data mining research.
Understanding in key data mining approaches and techniques, focusing on the types of problems, data preparation, including the challenges posed by big data;
Ability to use data mining tools and apply them to data sets revealing in-depth understanding of key data mining topics;
Ability to develop/deepen design and programming techniques for the construction of intelligent and adaptable systems;
Ability to develop/deepen the basic techniques needed to perform big data mining research.
Contents
The Data Mining process
Types of problems: pattern association, clustering, outlier detection, classification
Data preparation: extraction, cleaning, selection, reduction and transformation of attributes, sampling and subsampling
Mining of: streams, text, time series, discrete sequences, spatial data, graphs, web data
Measures of similarity and distances
Problems, approaches and algorithms
Association of patterns
Analysis of clusters
Algorithms: K-means, EM, PCA, SOM, ...
Performance evaluation
Classification
Ensemble methods. Problems with unbalanced classes
Performance metrics: precision, recall, F-measure, ROC curve, Log loss and others (cost function, Cohen's kappa, G-score)
Regression
linear and nonlinear models
performance evaluation: quadratic errors, absolute errors, absolute errors, correlation coefficient
Analysis of outliers (supervised and unsupervised)
Measures of complexity/simplicity
Mixed performance criteria
Preservation of privacy
Types of problems: pattern association, clustering, outlier detection, classification
Data preparation: extraction, cleaning, selection, reduction and transformation of attributes, sampling and subsampling
Mining of: streams, text, time series, discrete sequences, spatial data, graphs, web data
Measures of similarity and distances
Problems, approaches and algorithms
Association of patterns
Analysis of clusters
Algorithms: K-means, EM, PCA, SOM, ...
Performance evaluation
Classification
Ensemble methods. Problems with unbalanced classes
Performance metrics: precision, recall, F-measure, ROC curve, Log loss and others (cost function, Cohen's kappa, G-score)
Regression
linear and nonlinear models
performance evaluation: quadratic errors, absolute errors, absolute errors, correlation coefficient
Analysis of outliers (supervised and unsupervised)
Measures of complexity/simplicity
Mixed performance criteria
Preservation of privacy
Teaching Methods
The teaching methodology includes: - availability of all resources through a teaching computer platform (e.g. Moodle) and availability of the relevant materials prior to each face-to-face session - presentation of examples, demonstrations and problem solving for each concept presented - presentation and submission of exercises via computer teaching platform - orientation of the presentation of the concepts around the applications and projects to develop.The evaluation is done through: - practical assignments (70%) - written mini-tests or, alternatively, a final exam (30%)
Assessment
Continuous assessment :
40% de Exam,
40% de Final project
20% de Lab reports along the semester (at leats 5 reports)
Final assessment:
50% de Exam,
50% de Final project
40% de Exam,
40% de Final project
20% de Lab reports along the semester (at leats 5 reports)
Final assessment:
50% de Exam,
50% de Final project
Teaching Staff (2024/2025 )
- Luís Miguel de Mendonça Rato [responsible]