A GPU Accelerated DCT Implementation for Image Compression

Cardone, Angelamaria; Di Pascale, Gerardo

doi:10.1007/978-3-032-30530-5_12

The Discrete Cosine Transform (DCT) is a cornerstone of the JPEG standard, nevertheless its direct implementation entails significant computational complexity for full-image processing, due to intensive matrix operations. Building upon the methodology proposed by Haweel et al. in 2016, which utilizes a specific transformation matrix to streamline block-based DCT through matrix multiplication, this work proposes an optimized parallel version designed for GPUs. This implementation leverages advanced strategies, including memory coalescing, the reduction of thread divergence, and the efficient management of GPU memory hierarchies. Experimental results demonstrate that the optimized algorithm achieves a remarkable speed-up of compared to traditional CPU-based approaches. Furthermore, the implementation significantly outperforms some existing parallel solutions for GPU, reducing execution time by up to compared to an efficient CUDA-based algorithm and by over compared to the standard cuBLAS library. These performance gains are especially pronounced when processing high-resolution images, highlighting the scalability and computational efficiency of the proposed approach for large-scale visual data.

A GPU Accelerated DCT Implementation for Image Compression

Cardone, Angelamaria;Di Pascale, Gerardo

2026

Abstract

The Discrete Cosine Transform (DCT) is a cornerstone of the JPEG standard, nevertheless its direct implementation entails significant computational complexity for full-image processing, due to intensive matrix operations. Building upon the methodology proposed by Haweel et al. in 2016, which utilizes a specific transformation matrix to streamline block-based DCT through matrix multiplication, this work proposes an optimized parallel version designed for GPUs. This implementation leverages advanced strategies, including memory coalescing, the reduction of thread divergence, and the efficient management of GPU memory hierarchies. Experimental results demonstrate that the optimized algorithm achieves a remarkable speed-up of compared to traditional CPU-based approaches. Furthermore, the implementation significantly outperforms some existing parallel solutions for GPU, reducing execution time by up to compared to an efficient CUDA-based algorithm and by over compared to the standard cuBLAS library. These performance gains are especially pronounced when processing high-resolution images, highlighting the scalability and computational efficiency of the proposed approach for large-scale visual data.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2026

Appare nelle tipologie:

4.1.1 Proceedings con DOI

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4953215

Citazioni

ND

ND

ND

UniSa - IRIS Institutional Research Information System

A GPU Accelerated DCT Implementation for Image Compression

Cardone, Angelamaria;Di Pascale, Gerardo

2026

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

UniSa - IRIS Institutional Research Information System

A GPU Accelerated DCT Implementation for Image Compression

Cardone, Angelamaria;Di Pascale, Gerardo

2026

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)