Multiply-Accumulate (MAC) operation is widely used in various real-time image processing tasks, ranging from Convolutional Neural Networks to digital filtering, significantly impacting overall system performance. In this work the Self-Adapting Reconfigurable Multiply-Accumulate (SR-MAC) is proposed as a new instrument to find the optimal trade-off between operation throughput, power consumption and physical resources utilization in real-time image processing applications. Operations of the proposed system rely on the dynamic reconfiguration of the hardware resources on the basis of the current computational requirements. This is achieved by monitoring overflow and over-representation occurrences at each accumulation cycle, and properly considering the relevant portion of the accumulation result. A custom architecture of the proposed algorithm has been designed and implemented on an AMD Xilinx Artix-7 FPGA through a Verilog description and compared to the AMD Xilinx fixed-point macro (floating-point fused multiply-accumulate). The SR-MAC achieves reductions of 83% (82%), 79% (93%) and 87.2% (94%) in the number of LUTs, FFs, and the power dissipation, PdynN, respectively. The SR-MAC has also been used to replace arithmetic units in typical real-time image processing applications. In these cases, its employment has allowed the reduction up to 6% and 14% of FFs and PdynN, respectively, while increasing up to 14% the fMax. These results highlight the significant performance enhancement achieved with respect to both single operators and entire systems, making SR-MAC an excellent design choice in real-time image processing applications.

Self-Adapting Reconfigurable Multiply-Accumulator for Real-Time Image Processing in Embedded Systems

Fasolino A.;Vitolo P.;Di Benedetto L.;Liguori R.;Rubino A.;Licciardo G. D.
2024-01-01

Abstract

Multiply-Accumulate (MAC) operation is widely used in various real-time image processing tasks, ranging from Convolutional Neural Networks to digital filtering, significantly impacting overall system performance. In this work the Self-Adapting Reconfigurable Multiply-Accumulate (SR-MAC) is proposed as a new instrument to find the optimal trade-off between operation throughput, power consumption and physical resources utilization in real-time image processing applications. Operations of the proposed system rely on the dynamic reconfiguration of the hardware resources on the basis of the current computational requirements. This is achieved by monitoring overflow and over-representation occurrences at each accumulation cycle, and properly considering the relevant portion of the accumulation result. A custom architecture of the proposed algorithm has been designed and implemented on an AMD Xilinx Artix-7 FPGA through a Verilog description and compared to the AMD Xilinx fixed-point macro (floating-point fused multiply-accumulate). The SR-MAC achieves reductions of 83% (82%), 79% (93%) and 87.2% (94%) in the number of LUTs, FFs, and the power dissipation, PdynN, respectively. The SR-MAC has also been used to replace arithmetic units in typical real-time image processing applications. In these cases, its employment has allowed the reduction up to 6% and 14% of FFs and PdynN, respectively, while increasing up to 14% the fMax. These results highlight the significant performance enhancement achieved with respect to both single operators and entire systems, making SR-MAC an excellent design choice in real-time image processing applications.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4874752
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact