T. Nguemdjom, Darren Kevin and Mbayandjambe, Alidor M. and Nkwimi, Grevi B. and Oshasha, Fiston and Muluba, Célestin and Mbengandji, Héritier I. and BAZIE, Ibsen G. and Kpoghomou, Raphael and Kuyunsa, Alain M. (2025) Enhancing the Robustness of Computer Vision Models to Adversarial Perturbations Using Multi-Scale Attention Mechanisms. International Journal of Innovative Science and Research Technology, 10 (4): 25apr2118. pp. 3565-3578. ISSN 2456-2165

[thumbnail of IJISRT25APR2118.pdf] Text
IJISRT25APR2118.pdf - Published Version

Download (1MB)

Abstract

This study evaluates the effectiveness of integrating multi-scale attention mechanisms, specifically the Bottleneck Attention Module (BAM), into deep learning architectures such as ResNet18 and SqueezeNet, using the CIFAR-10 dataset. BAM combines spatial and channel attention, enabling the simultaneous capture of local and global dependencies, thereby enhancing the models’ ability to handle visual disruptions and adversarial attacks. A comparison with existing mechanisms, such as ECA-Net and CBAM, demonstrates that BAM outperforms them through its parallel approach, which efficiently optimizes spatial and channel dimensions while maintaining computational efficiency.Potential applications include critical domains such as medical imaging and surveillance, where precision and robustness are essential, particularly in dynamic environments or under adversarial constraints. The study also highlights avenues for integrating BAM with emerging architectures like Transformers to combine the advantages of long-range relationships and multi-scale dependencies. Experimental results confirm BAM’s effectiveness: on clean data, ResNet18’s accuracy improves from 74.83% to 90.58%, and SqueezeNet from 75.50% to 86.70%. Under adversarial conditions, BAM enhances ResNet18’s robustness from 59.2% to 70.4% under PGD attacks, while the hybrid model achieves a maximum accuracy of 75.8%. Activation analysis reveals that BAM strengthens model interpretability by focusing attention on regions of interest, reducing false activations and improving overall reliability. These findings position BAM as an ideal solution for modern embedded vision systems that require an optimal balance between performance, robustness, and efficiency.

Item Type: Article
Subjects: T Technology > T Technology (General)
Divisions: Faculty of Engineering, Science and Mathematics > School of Engineering Sciences
Depositing User: Editor IJISRT Publication
Date Deposited: 15 May 2025 10:36
Last Modified: 15 May 2025 10:36
URI: https://eprint.ijisrt.org/id/eprint/886

Actions (login required)

View Item
View Item