A novel full-hardware multiply-accumulate (MAC) unit capable of dynamic precision scaling (DPS) and specifically designed for embedded applications is introduced. The MAC autonomously detects and mitigates on-chip overflow and overrepresentation and eliminates the need for any external software controller. A compact run-time monitoring unit (RMU), within each MAC, dynamically monitors the transitions of carry-out and sign bit, and adjusts the operand representation at the bit level so that any rounding error remains bounded by 2−(Nin−1) . Bit-sliced input partitioning enables run-time reconfigurability of operand width and accumulation depth without altering the logic topology. Prototyped on a Xilinx Artix-7 FPGA, the proposed unit achieves up to 14% lower dynamic power and 15% higher maximum clock frequency than a conventional fixed-width MAC with the same precision; in a Skywater CMOS 130 nm, it occupies 3.9 × 103 µm2 , reaches a critical-path delay of 2.68 ns, and consumes 6.07 µW/MHz.
Overflow-Driven Dynamic Precision Scaling Fixed-Point Multiply-Accumulator Unit
Fasolino A.;Liguori R.;Di Benedetto L.;Rubino A.;Licciardo G. D.
2026
Abstract
A novel full-hardware multiply-accumulate (MAC) unit capable of dynamic precision scaling (DPS) and specifically designed for embedded applications is introduced. The MAC autonomously detects and mitigates on-chip overflow and overrepresentation and eliminates the need for any external software controller. A compact run-time monitoring unit (RMU), within each MAC, dynamically monitors the transitions of carry-out and sign bit, and adjusts the operand representation at the bit level so that any rounding error remains bounded by 2−(Nin−1) . Bit-sliced input partitioning enables run-time reconfigurability of operand width and accumulation depth without altering the logic topology. Prototyped on a Xilinx Artix-7 FPGA, the proposed unit achieves up to 14% lower dynamic power and 15% higher maximum clock frequency than a conventional fixed-width MAC with the same precision; in a Skywater CMOS 130 nm, it occupies 3.9 × 103 µm2 , reaches a critical-path delay of 2.68 ns, and consumes 6.07 µW/MHz.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


