Modeling the combined influence of complexity and quality in supervised learning

dc.contributor.authorDe Avila Mendes R.
dc.contributor.authorDa Silva L.A.
dc.date.accessioned2024-03-12T19:16:56Z
dc.date.available2024-03-12T19:16:56Z
dc.date.issued2022
dc.description.abstract© 2022 - IOS Press. All rights reserved.Data classification is a data mining task that consists of an algorithm adjusted by a training dataset that is used to predict an object's class (unclassified) on analysis. A significant part of the performance of the classification algorithm depends on the dataset's complexity and quality. Data Complexity involves the investigation of the effects of dimensionality, the overlap of descriptive attributes, and the classes' separability. Data Quality focuses on the aspects such as noise data (outlier) and missing values. The factors Data Complexity and Data Quality are fundamental for the performance of classification. However, the literature has very few studies on the relationship between these factors and to highlight their significance. This paper applies Structural Equation Modeling and the Partial Least Squares Structural Equation Modeling (PLS-SEM) algorithm and, in an innovative manner, associates Data Complexity and Data Quality contributions to Classification Quality. Experimental analysis with 178 datasets obtained from the OpenML repository showed that the control of complexity improves the classification results more than data quality does. Additionally paper also presents a visual tool of datasets analysis about the classification performance perspective in the dimensions proposed to represent the structural model.
dc.description.firstpage1247
dc.description.issuenumber5
dc.description.lastpage1274
dc.description.volume26
dc.identifier.doi10.3233/IDA-215962
dc.identifier.issn1571-4128
dc.identifier.urihttps://dspace.mackenzie.br/handle/10899/34470
dc.relation.ispartofIntelligent Data Analysis
dc.rightsAcesso Restrito
dc.subject.otherlanguageData Complexity
dc.subject.otherlanguageData Quality
dc.subject.otherlanguageStructural Equation Modeling
dc.subject.otherlanguagesupervised learning
dc.titleModeling the combined influence of complexity and quality in supervised learning
dc.typeArtigo
local.scopus.citations1
local.scopus.eid2-s2.0-85138829067
local.scopus.subjectClass separability
local.scopus.subjectClassification algorithm
local.scopus.subjectData classification
local.scopus.subjectData complexity
local.scopus.subjectData mining tasks
local.scopus.subjectData quality
local.scopus.subjectObject class
local.scopus.subjectPerformance
local.scopus.subjectStructural equation models
local.scopus.subjectTraining dataset
local.scopus.updated2024-12-01
local.scopus.urlhttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85138829067&origin=inward
Arquivos