T. Nguemdjom, Darren Kevin and Mbayandjambe, Alidor M. and Nkwimi, Grevi B. and Oshasha, Fiston and Muluba, Célestin and Mbengandji, Héritier I. and BAZIE, Ibsen G. and Kpoghomou, Raphael and Kuyunsa, Alain M. (2025) Enhancing the Robustness of Computer Vision Models to Adversarial Perturbations Using Multi-Scale Attention Mechanisms. International Journal of Innovative Science and Research Technology, 10 (4): 25apr2118. pp. 3565-3578. ISSN 2456-2165
![IJISRT25APR2118.pdf [thumbnail of IJISRT25APR2118.pdf]](https://eprint.ijisrt.org/style/images/fileicons/text.png)
IJISRT25APR2118.pdf - Published Version
Download (1MB)
Abstract
This study evaluates the effectiveness of integrating multi-scale attention mechanisms, specifically the Bottleneck Attention Module (BAM), into deep learning architectures such as ResNet18 and SqueezeNet, using the CIFAR-10 dataset. BAM combines spatial and channel attention, enabling the simultaneous capture of local and global dependencies, thereby enhancing the models’ ability to handle visual disruptions and adversarial attacks. A comparison with existing mechanisms, such as ECA-Net and CBAM, demonstrates that BAM outperforms them through its parallel approach, which efficiently optimizes spatial and channel dimensions while maintaining computational efficiency.Potential applications include critical domains such as medical imaging and surveillance, where precision and robustness are essential, particularly in dynamic environments or under adversarial constraints. The study also highlights avenues for integrating BAM with emerging architectures like Transformers to combine the advantages of long-range relationships and multi-scale dependencies. Experimental results confirm BAM’s effectiveness: on clean data, ResNet18’s accuracy improves from 74.83% to 90.58%, and SqueezeNet from 75.50% to 86.70%. Under adversarial conditions, BAM enhances ResNet18’s robustness from 59.2% to 70.4% under PGD attacks, while the hybrid model achieves a maximum accuracy of 75.8%. Activation analysis reveals that BAM strengthens model interpretability by focusing attention on regions of interest, reducing false activations and improving overall reliability. These findings position BAM as an ideal solution for modern embedded vision systems that require an optimal balance between performance, robustness, and efficiency.
Item Type: | Article |
---|---|
Subjects: | T Technology > T Technology (General) |
Divisions: | Faculty of Engineering, Science and Mathematics > School of Engineering Sciences |
Depositing User: | Editor IJISRT Publication |
Date Deposited: | 15 May 2025 10:36 |
Last Modified: | 15 May 2025 10:36 |
URI: | https://eprint.ijisrt.org/id/eprint/886 |