A multi-label, semi-supervised classification approach applied to personality prediction in social media

Tipo
Artigo
Data de publicação
2014
Periódico
Neural Networks
Citações (Scopus)
89
Autores
Lima A.C.E.S.
de Castro L.N.
Orientador
Título da Revista
ISSN da Revista
Título de Volume
Membros da banca
Programa
Resumo
Social media allow web users to create and share content pertaining to different subjects, exposing their activities, opinions, feelings and thoughts. In this context, online social media has attracted the interest of data scientists seeking to understand behaviours and trends, whilst collecting statistics for social sites. One potential application for these data is personality prediction, which aims to understand a user's behaviour within social media. Traditional personality prediction relies on users' profiles, their status updates, the messages they post, etc. Here, a personality prediction system for social media data is introduced that differs from most approaches in the literature, in that it works with groups of texts, instead of single texts, and does not take users' profiles into account. Also, the proposed approach extracts meta-attributes from texts and does not work directly with the content of the messages. The set of possible personality traits is taken from the Big Five model and allows the problem to be characterised as a multi-label classification task. The problem is then transformed into a set of five binary classification problems and solved by means of a semi-supervised learning approach, due to the difficulty in annotating the massive amounts of data generated in social media. In our implementation, the proposed system was trained with three well-known machine-learning algorithms, namely a Naïve Bayes classifier, a Support Vector Machine, and a Multilayer Perceptron neural network. The system was applied to predict the personality of Tweets taken from three datasets available in the literature, and resulted in an approximately 83% accurate prediction, with some of the personality traits presenting better individual classification rates than others. © 2014 Elsevier Ltd.
Descrição
Palavras-chave
Assuntos Scopus
Big five , Multi-label classifications , Personality , Semi-supervised learning , Social media , Twitter , Algorithms , Artificial Intelligence , Bayes Theorem , Humans , Neural Networks (Computer) , Personality , Social Media , Support Vector Machines
Citação
DOI (Texto completo)