An Analysis of Different Text Representation Schemes for an Immune Clustering Algorithm
dc.contributor.author | Ferraria M.A. | |
dc.contributor.author | Balbi P.P. | |
dc.contributor.author | de Castro L.N. | |
dc.date.accessioned | 2025-04-01T06:19:17Z | |
dc.date.available | 2025-04-01T06:19:17Z | |
dc.date.issued | 2025 | |
dc.description.abstract | © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.This research investigates the challenges and effectiveness of various text representation methods (standard vector, grammar-based, and distributed), when applied to clustering short texts. The study explores Bag-of-Words for standard vector, Linguistic Inquiry and Word Count (LIWC), Part-of-Speech Tagging (POS-Tagging), and the Medical Research Council Psycholinguistic Database (MRC) for grammar-based, and Word2Vec, fastText, Doc2Vec, and SentenceBERT for distributed representations. Utilizing the aiNet bio-inspired clustering algorithm, the results reveal surprising findings, with grammar-based representations demonstrating competitive performance despite their simplicity, while standard vectors exhibit known challenges like high dimensionality. The study contributes insights into the properties of different text representations, providing a foundation for optimizing their application in clustering tasks with short and informal texts. | |
dc.description.firstpage | 250 | |
dc.description.lastpage | 260 | |
dc.description.volume | 1259 | |
dc.identifier.doi | 10.1007/978-3-031-82073-1_25 | |
dc.identifier.issn | None | |
dc.identifier.uri | https://dspace.mackenzie.br/handle/10899/40364 | |
dc.relation.ispartof | Lecture Notes in Networks and Systems | |
dc.rights | Acesso Restrito | |
dc.subject.otherlanguage | Clustering | |
dc.subject.otherlanguage | Information retrieval | |
dc.subject.otherlanguage | Natural Computing | |
dc.subject.otherlanguage | NLP | |
dc.subject.otherlanguage | Text Mining | |
dc.title | An Analysis of Different Text Representation Schemes for an Immune Clustering Algorithm | |
dc.type | Artigo de evento | |
local.scopus.citations | 0 | |
local.scopus.eid | 2-s2.0-85218952027 | |
local.scopus.subject | Bag of words | |
local.scopus.subject | Clusterings | |
local.scopus.subject | Immune clustering algorithms | |
local.scopus.subject | Natural Computing | |
local.scopus.subject | Parts-of-speech tagging | |
local.scopus.subject | Representation method | |
local.scopus.subject | Representation schemes | |
local.scopus.subject | Short texts | |
local.scopus.subject | Text representation | |
local.scopus.subject | Text-mining | |
local.scopus.updated | 2025-04-01 | |
local.scopus.url | https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85218952027&origin=inward |