Augmented Data Low Confidence (ADLC): A Confidence-Driven Data Augmentation Framework With Ensemble Optimization for Enhanced Machine Learning Performance

Sunaryono, Dwi and Sarno, Riyanarto and Sabilla, Shoffi Izza and Putri, Rizqy Ahsana and Amri, Taufiq Choirul and Mirda, Irfan and Siswantoro, Joko (2025) Augmented Data Low Confidence (ADLC): A Confidence-Driven Data Augmentation Framework With Ensemble Optimization for Enhanced Machine Learning Performance. IEEE Access, 13. pp. 201439-201459.

[thumbnail of Augmented_Data_Low_Confidence_ADLC_A_Confidence-Driven_Data_Augmentation_Framework_With_Ensemble_Optimization_for_Enhanced_Machine_Learning_Performance Published.pdf] PDF
Augmented_Data_Low_Confidence_ADLC_A_Confidence-Driven_Data_Augmentation_Framework_With_Ensemble_Optimization_for_Enhanced_Machine_Learning_Performance Published.pdf - Published Version

Download (4MB)
[thumbnail of IEEE Access _ About Journal _ IEEE Xplore_merged.pdf] PDF
IEEE Access _ About Journal _ IEEE Xplore_merged.pdf - Other

Download (10MB)
Official URL / DOI: https://doi.org/10.1109/ACCESS.2025.3636729

Abstract

The application of machine learning in data-driven solutions has matured, yet efforts continue to improve predictive accuracy. This study presents a comprehensive approach that begins with data preprocessing, including the removal of invalid values, duplicate entries, feature selection, and dimensionality reduction, followed by model optimization through hyperparameter tuning. A novel method, Augmented-Data Low Confidence, is introduced to enhance model performance by augmenting samples with low prediction confidence. The k-nearest neighbors method is used to estimate prediction probabilities. Samples falling below a defined confidence threshold are selected for augmentation by generating new data points. These points are created by randomly sampling feature values within the upper and lower bounds of the low-confidence instances. The augmented dataset is then optimized using the Gray Wolf Optimization algorithm, which adjusts model parameters based on an accuracy-driven fitness function. Experiments on ten public datasets and two proprietary datasets show that this feedback-based augmentation consistently improves the accuracy of various machine learning models. The results demonstrate the effectiveness of incorporating uncertain predictions into the learning process, leading to improved generalization and classification performance.

Item Type: Article
Uncontrolled Keywords: Augmentation, machine learning, performance improvement
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Engineering > Department of Informatic
Depositing User: Joko Siswantoro
Date Deposited: 10 Dec 2025 05:15
Last Modified: 10 Dec 2025 05:15
URI: http://repository.ubaya.ac.id/id/eprint/49940

Actions (login required)

View Item View Item