An Analysis of Different Text Representation Schemes for an Immune Clustering Algorithm

dc.contributor.authorFerraria M.A.
dc.contributor.authorBalbi P.P.
dc.contributor.authorde Castro L.N.
dc.date.accessioned2025-04-01T06:19:17Z
dc.date.available2025-04-01T06:19:17Z
dc.date.issued2025
dc.description.abstract© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.This research investigates the challenges and effectiveness of various text representation methods (standard vector, grammar-based, and distributed), when applied to clustering short texts. The study explores Bag-of-Words for standard vector, Linguistic Inquiry and Word Count (LIWC), Part-of-Speech Tagging (POS-Tagging), and the Medical Research Council Psycholinguistic Database (MRC) for grammar-based, and Word2Vec, fastText, Doc2Vec, and SentenceBERT for distributed representations. Utilizing the aiNet bio-inspired clustering algorithm, the results reveal surprising findings, with grammar-based representations demonstrating competitive performance despite their simplicity, while standard vectors exhibit known challenges like high dimensionality. The study contributes insights into the properties of different text representations, providing a foundation for optimizing their application in clustering tasks with short and informal texts.
dc.description.firstpage250
dc.description.lastpage260
dc.description.volume1259
dc.identifier.doi10.1007/978-3-031-82073-1_25
dc.identifier.issnNone
dc.identifier.urihttps://dspace.mackenzie.br/handle/10899/40364
dc.relation.ispartofLecture Notes in Networks and Systems
dc.rightsAcesso Restrito
dc.subject.otherlanguageClustering
dc.subject.otherlanguageInformation retrieval
dc.subject.otherlanguageNatural Computing
dc.subject.otherlanguageNLP
dc.subject.otherlanguageText Mining
dc.titleAn Analysis of Different Text Representation Schemes for an Immune Clustering Algorithm
dc.typeArtigo de evento
local.scopus.citations0
local.scopus.eid2-s2.0-85218952027
local.scopus.subjectBag of words
local.scopus.subjectClusterings
local.scopus.subjectImmune clustering algorithms
local.scopus.subjectNatural Computing
local.scopus.subjectParts-of-speech tagging
local.scopus.subjectRepresentation method
local.scopus.subjectRepresentation schemes
local.scopus.subjectShort texts
local.scopus.subjectText representation
local.scopus.subjectText-mining
local.scopus.updated2025-04-01
local.scopus.urlhttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85218952027&origin=inward
Arquivos