Creación de corpus de palabras embebidas de tweets generados en Argentina

María Lorena Talamé; Agustina Monge; Matías Nicolás Amor; Alejandra Carolina  Cardoso

doi:10.53794/ci.v13iXIII.357

María Lorena Talamé Facultad de Ingeniería. Universidad Católica de Salta https://orcid.org/0000-0003-3224-0124
Agustina Monge Facultad de Ingeniería. Universidad Católica de Salta
Matías Nicolás Amor Universidad Católica de Salta https://orcid.org/0000-0003-0561-1815
Alejandra Carolina Cardoso Facultad de Ingeniería. Universidad Católica de Salta https://orcid.org/0000-0003-3218-1072

DOI: https://doi.org/10.53794/ci.v13iXIII.357

Keywords: emotions, Twitter, natural language processing, automatic learning, word embedding

Abstract

Text processing of any kind is a task of great interest in the scientific community. One of the social networks where people frequently express themselves freely is Twitter, and therefore, it is one of the main sources for obtaining textual data. In order to perform any type of analysis, the first step is to represent the texts in a suitable way so that they can then be used by an algorithm. This paper describes the creation of a corpus of word representations obtained from Twitter using Word2Vec. Although the sets of tweets used are not massive, they are considered sufficient to take the first step in the creation of a corpus. An important contribution of this work is the training of a model that captures the idioms and colloquial expressions of Argentina, and includes emojis and hashtags within the vector space.

Downloads

Download data is not yet available.

Author Biography

Matías Nicolás Amor, Universidad Católica de Salta

Computer Engineer.

Professor of "Database I" of the Computer Engineering course at the Faculty of Engineering, Catholic University of Salta.

Participates in research projects on text mining and digital forensics.

Coordinator of Grupo Ideas (Group of incubation of student research work -https://ideas.ucasal.edu.ar/ ).

Creation of a corpus of embedded words from tweets generated in Argentina

Abstract

Downloads

Author Biography

Publicaciones Cientificas - Universidad Católica de Salta