Convite da Defesa de dissertação do Programa de Pós-Graduação em Ciência da Computação

A Coordenação do Programa de Pós-Graduação em Ciência da Computação tem a satisfação de convidá-lo para a Defesa de Dissertação:

 

Extended Pre-Processing Pipeline For Text Classification: On the Role of Meta-Features, Sparsification and Selective Sampling

 Washington Luiz Miranda da Cunha

 

Pipelines for Text Classification are a sequence of tasks needed to be performed to classify documents. The pre-processing phase of these pipelines involve different ways of manipulating the documents for the learning phase. We introduce three new steps into the pre-processing phase: 1. Meta-Features (MF) to reduce the dimensionality of the original term-document (TF-IDF) matrix; 2. Sparsification step to make the MF less dense; 3. Selective Sampling step to select the "best" documents for the learning phase. We show that the proposed extended pre-processing pipeline can improve the effectiveness while reducing the associated costs. Our experiments show that the proposed extended pre-processing pipeline can achieve significant gains in effectiveness (up to 52%) when compared to the TF-IDF, at a much lower cost (up to 9.7x faster in some cases). Another main contribution is a thorough evaluation of the trade-offs associated with the introduction of these new steps into the pipeline.

 

Comissão Examinadora:

 

Prof. Marcos André Gonçalves - Orientador (DCC - UFMG)

Prof. Leonardo Chaves Dutra da Rocha - Coorientador (DCOMP - UFSJ)

Profa. Jussara Marques de Almeida Gonçalves (DCC - UFMG)

Prof. Anisio Mendes Lacerda (DCC - UFMG)

 

8 de Novembro de 2019

13:00h

 

Sala 6321 do ICEX

 

Última modificação em Quarta, 06 Novembro 2019 21:36