2024

Data Transformation and Analysis

Name: Data Transformation and Analysis
Code: INF14387L
6 ECTS
Duration: 15 weeks/156 hours
Scientific Area: Informatics

Teaching languages: Portuguese
Languages of tutoring support: Portuguese

Sustainable Development Goals

Learning Goals

This course aims to provide students with theoretical knowledge and tools for extracting, selecting and transforming information in order to be used efficiently by analysis and machine learning algorithms.

At the end of the semester, the student should be able to:
- analyze the quality of available information and apply techniques to treat missing values
- apply techniques for detection and treatment of outliers and data imbalance
- extract attributes and convert data (eg: normalization, discretization, time and frequency transforms, etc)
- design and implement techniques for attribute selection and reduction

Contents

Data processing
- Data types: numerical, categorial, text, images, temporal series, spatial, audio, graphs, etc
- Data acquisition and annotation strategies
- Data quality assessment
- Detection and treatment of outliers and missing values
- Discretization and conversion of variables
- Normalization
- Unbalanced Data Handling

Analysis, selection, and reduction of attributes
- Feature engineering
- Exploratory visualization
- Methods based on classification/regression performance
- Supervised and unsupervised methods

Processing of text data
- bag-of-words, n-grams, use of morphologic and syntactic information, convolution kernels

Processing of image data
- Noise types; Linear and non-linear filtering; Convolution and cross-correlation
- Feature detection
- Segmentation
- Geometric transforms

Teaching Methods

Teaching methodologies:
* Theoretical classes with introduction of concepts, resolution of exercises and clarification of doubts.
* Practical laboratory classes with proposal of problems that accompany the theoretical material and clarification of doubts during their resolution. Exercises, of gradual difficulty, covering the topics taught, for students to practice the subject.

Assessment

Continuous evaluation
* theoretical (50%): two written tests (25% each)
* practice (50%): development of a project

Final evaluation
* theoretical (50%): final written exam
* practice (50%): development of a project