A preprocessing Shapley value-based approach to detect relevant and disparity prone features in machine learning

Pelegrina G.D.; Couceiro M.; Duarte L.T.

A preprocessing Shapley value-based approach to detect relevant and disparity prone features in machine learning

Tipo

Artigo de evento

Data de publicação

2024

Periódico

2024 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2024

Citações (Scopus)

0

Autores

Pelegrina G.D.
Couceiro M.
Duarte L.T.

Resumo

© 2024 ACM.Decision support systems became ubiquitous in every aspect of human lives. Their reliance on increasingly complex and opaque machine learning models raises transparency and fairness concerns with respect to unprivileged groups of people. This motivated several efforts to estimate importance of features towards the models' performance and to detect unfair/disparate decisions. The latter is often dealt with by means of fairness metrics that rely on performance metrics with respect to predefined features that are considered protected (salient features such as age, gender, ethnicity, etc.) and/or sensitive (such as education, /occupation, banking information). However, such an approach is subjective (as fairness metrics depend on the choice features), there may be other features that lead to unfair (disparate) decisions and that may ask for suitable interpretations. In this paper we focus on the latter issues and propose a statistical preprocessing approach that is inspired by both the Hilbert-Schmidt independence criterion and Shapley values to estimate feature importance and to detect disparity prone features. Unlike traditional Shapley value-based approaches, we do not require trained models to measure feature importance or detect disparate results. Instead, it focuses on data and statistical criteria to measure the dependence of feature distributions. Our empirical results show that features with the highest dependence degrees with the label vector are also the ones with the highest impact on the model performance. Moreover, our empirical results indicate that this relation enables the detection of disparity prone features.

Assuntos Scopus

Disparity detection , Fairness concerns , Fairness measures , Hilbert-schmidt independence criterions , Human lives , Machine learning models , Machine-learning , Modeling performance , Shapley value , Value-based approach