International Journal of Electronic Devices and Networking
2024, Vol. 5, Issue 2, Part A
Efficient CNN model optimization via structured pruning and sparse tensor core acceleration on NVIDIA A100 GPUS: A hardware-aware approach with fine-tuning and sparse matrix computation techniques
Author(s): Li Zhang, Hailin Wei and Ming Chen
Abstract: The increasing computational demands of Convolutional Neural Networks (CNNs) in real-world applications necessitate efficient optimization strategies, particularly for latency-sensitive and resource-constrained environments. This study aims to optimize CNN architectures using structured pruning and sparse tensor core acceleration on NVIDIA A100 GPUs, leveraging hardware-aware techniques to enhance latency, throughput, and accuracy retention. The objectives were threefold: (1) implement structured pruning methodologies tailored for sparse tensor core compatibility, (2) fine-tune pruned CNN models to recover lost accuracy, and (3) evaluate performance improvements in terms of latency, throughput, and accuracy across diverse CNN architectures, including ResNet-50, MobileNetV2, and EfficientNet-B0. Using ImageNet and CIFAR-10 datasets, models were pruned at the filter and channel levels, fine-tuned using adaptive learning rate schedules, and deployed on sparse tensor cores optimized via cu SPARSE and CUTLASS libraries. Results demonstrated significant performance improvements across all models: latency decreased by up to 30%, throughput increased by up to 50%, and accuracy loss remained below 1.5% after optimization. Statistical analyses using paired t-tests confirmed the significance of these improvements (p < 0.05). Compared to traditional pruning and sparsity-aware frameworks, our approach integrates hardware capabilities effectively, unlocking the full potential of sparse matrix computation on modern GPUs. Practical recommendations include prioritizing structured pruning, integrating dynamic fine-tuning strategies, and utilizing hardware-specific libraries for deployment workflows. Future research should explore adaptive pruning strategies, hybrid sparsity patterns, and energy efficiency metrics to further optimize CNN models. This study underscores the importance of hardware-aware optimization frameworks in bridging the gap between theoretical advancements and practical deployments, setting the stage for scalable and efficient AI applications across diverse industries.
DOI: 10.22271/27084477.2024.v5.i2a.60
Pages: 12-17 | Views: 45 | Downloads: 15
Download Full Article: Click Here
How to cite this article:
Li Zhang, Hailin Wei, Ming Chen. Efficient CNN model optimization via structured pruning and sparse tensor core acceleration on NVIDIA A100 GPUS: A hardware-aware approach with fine-tuning and sparse matrix computation techniques. Int J Electron Devices Networking 2024;5(2):12-17. DOI: 10.22271/27084477.2024.v5.i2a.60