Detecting topics in documents by clustering word vectors

dc.contributor.authorde Miranda G.R.
dc.contributor.authorPasti R.
dc.contributor.authorde Castro L.N.
dc.date.accessioned2024-03-12T23:50:39Z
dc.date.available2024-03-12T23:50:39Z
dc.date.issued2020
dc.description.abstract© Springer Nature Switzerland AG 2020.The automatic detection of topics in a set of documents is one of the most challenging and useful tasks in Natural Language Processing. Word2Vec has proven to be an effective tool for the distributed representation of words (word embeddings) usually applied to find their linguistic context. This paper proposes the use of a Self-Organizing Map (SOM) to cluster the word vectors generated by Word2Vec so as to find topics in the texts. After running SOM, a k-means algorithm is applied to separate the SOM output grid neurons into k clusters, such that the words mapped into each centroid represent the topics of that cluster. Our approach was tested on a benchmark text dataset with 19,997 texts and 20 groups. The results showed that the method is capable of finding the expected groups, sometimes merging some of them that deal with similar topics.
dc.description.firstpage235
dc.description.lastpage243
dc.description.volume1003
dc.identifier.doi10.1007/978-3-030-23887-2_27
dc.identifier.issn2194-5365
dc.identifier.urihttps://dspace.mackenzie.br/handle/10899/35132
dc.relation.ispartofAdvances in Intelligent Systems and Computing
dc.rightsAcesso Restrito
dc.subject.otherlanguageK-Means
dc.subject.otherlanguageSelf-Organizing Maps
dc.subject.otherlanguageTopic detection
dc.subject.otherlanguageWord embeddings
dc.subject.otherlanguageWord2Vec
dc.titleDetecting topics in documents by clustering word vectors
dc.typeArtigo de evento
local.scopus.citations9
local.scopus.eid2-s2.0-85068620429
local.scopus.subjectAutomatic Detection
local.scopus.subjectDistributed representation
local.scopus.subjectEffective tool
local.scopus.subjectK-means
local.scopus.subjectNAtural language processing
local.scopus.subjectTopic detection
local.scopus.subjectWord vectors
local.scopus.subjectWord2Vec
local.scopus.updated2024-05-01
local.scopus.urlhttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85068620429&origin=inward
Arquivos