A novel full-hardware multiply-accumulate (MAC) unit capable of dynamic precision scaling (DPS) and specifically designed for embedded applications is introduced. The MAC autonomously detects and mitigates on-chip overflow and overrepresentation and eliminates the need for any external software controller. A compact run-time monitoring unit (RMU), within each MAC, dynamically monitors the transitions of carry-out and sign bit, and adjusts the operand representation at the bit level so that any rounding error remains bounded by 2−(Nin−1) . Bit-sliced input partitioning enables run-time reconfigurability of operand width and accumulation depth without altering the logic topology. Prototyped on a Xilinx Artix-7 FPGA, the proposed unit achieves up to 14% lower dynamic power and 15% higher maximum clock frequency than a conventional fixed-width MAC with the same precision; in a Skywater CMOS 130 nm, it occupies 3.9 × 103 µm2 , reaches a critical-path delay of 2.68 ns, and consumes 6.07 µW/MHz.

Overflow-Driven Dynamic Precision Scaling Fixed-Point Multiply-Accumulator Unit

Fasolino A.;Liguori R.;Di Benedetto L.;Rubino A.;Licciardo G. D.
2026

Abstract

A novel full-hardware multiply-accumulate (MAC) unit capable of dynamic precision scaling (DPS) and specifically designed for embedded applications is introduced. The MAC autonomously detects and mitigates on-chip overflow and overrepresentation and eliminates the need for any external software controller. A compact run-time monitoring unit (RMU), within each MAC, dynamically monitors the transitions of carry-out and sign bit, and adjusts the operand representation at the bit level so that any rounding error remains bounded by 2−(Nin−1) . Bit-sliced input partitioning enables run-time reconfigurability of operand width and accumulation depth without altering the logic topology. Prototyped on a Xilinx Artix-7 FPGA, the proposed unit achieves up to 14% lower dynamic power and 15% higher maximum clock frequency than a conventional fixed-width MAC with the same precision; in a Skywater CMOS 130 nm, it occupies 3.9 × 103 µm2 , reaches a critical-path delay of 2.68 ns, and consumes 6.07 µW/MHz.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4944257
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact