2024
Data Transformation and Analysis
Name: Data Transformation and Analysis
Code: INF14387L
6 ECTS
Duration: 15 weeks/156 hours
Scientific Area:
Informatics
Teaching languages: Portuguese
Languages of tutoring support: Portuguese
Sustainable Development Goals
Learning Goals
This course aims to provide students with theoretical knowledge and tools for extracting, selecting and transforming information in order to be used efficiently by analysis and machine learning algorithms.
At the end of the semester, the student should be able to:
- analyze the quality of available information and apply techniques to treat missing values
- apply techniques for detection and treatment of outliers and data imbalance
- extract attributes and convert data (eg: normalization, discretization, time and frequency transforms, etc)
- design and implement techniques for attribute selection and reduction
At the end of the semester, the student should be able to:
- analyze the quality of available information and apply techniques to treat missing values
- apply techniques for detection and treatment of outliers and data imbalance
- extract attributes and convert data (eg: normalization, discretization, time and frequency transforms, etc)
- design and implement techniques for attribute selection and reduction
Contents
Data processing
- Data types: numerical, categorial, text, images, temporal series, spatial, audio, graphs, etc
- Data acquisition and annotation strategies
- Data quality assessment
- Detection and treatment of outliers and missing values
- Discretization and conversion of variables
- Normalization
- Unbalanced Data Handling
Analysis, selection, and reduction of attributes
- Feature engineering
- Exploratory visualization
- Methods based on classification/regression performance
- Supervised and unsupervised methods
Processing of text data
- bag-of-words, n-grams, use of morphologic and syntactic information, convolution kernels
Processing of image data
- Noise types; Linear and non-linear filtering; Convolution and cross-correlation
- Feature detection
- Segmentation
- Geometric transforms
- Data types: numerical, categorial, text, images, temporal series, spatial, audio, graphs, etc
- Data acquisition and annotation strategies
- Data quality assessment
- Detection and treatment of outliers and missing values
- Discretization and conversion of variables
- Normalization
- Unbalanced Data Handling
Analysis, selection, and reduction of attributes
- Feature engineering
- Exploratory visualization
- Methods based on classification/regression performance
- Supervised and unsupervised methods
Processing of text data
- bag-of-words, n-grams, use of morphologic and syntactic information, convolution kernels
Processing of image data
- Noise types; Linear and non-linear filtering; Convolution and cross-correlation
- Feature detection
- Segmentation
- Geometric transforms
Teaching Methods
Teaching methodologies:
* Theoretical classes with introduction of concepts, resolution of exercises and clarification of doubts.
* Practical laboratory classes with proposal of problems that accompany the theoretical material and clarification of doubts during their resolution. Exercises, of gradual difficulty, covering the topics taught, for students to practice the subject.
* Theoretical classes with introduction of concepts, resolution of exercises and clarification of doubts.
* Practical laboratory classes with proposal of problems that accompany the theoretical material and clarification of doubts during their resolution. Exercises, of gradual difficulty, covering the topics taught, for students to practice the subject.
Assessment
Continuous evaluation
* theoretical (50%): two written tests (25% each)
* practice (50%): development of a project
Final evaluation
* theoretical (50%): final written exam
* practice (50%): development of a project
* theoretical (50%): two written tests (25% each)
* practice (50%): development of a project
Final evaluation
* theoretical (50%): final written exam
* practice (50%): development of a project