Matrix multiplication is a fundamental operation across many domains. For 4× 4 matrices, the naive algorithm requires 64 multiplications, while Strassen's method reduces this to 49. Recently the AlphaEvolve framework, an AI-driven approach for discovering optimized algorithms, introduced a groundbreaking 4× 4 complex valued multiplication method requiring 48 multiplications, representing the current minimum. This work presents the first full precision, VLSI optimized hardware implementation of the AlphaEvolve algorithm for 4× 4 complex matrix multiplication. The design achieves exact results without loss of numerical accuracy and is benchmarked against a naive and a Strassen based hardware implementation while varying the input bit width. The proposed circuits, synthesized in a 14nm FinFET standard cell library demonstrates that, for operand widths up to 24bits, the AlphaEvolve implementation generally incurs higher power, delay, and area costs than the naive implementation. When 32bits or higher precision is required the proposed implementation of the AlphaEvolve algorithm achieves an increasing area reduction with respect to the naive implementation that reaches 31% for the 64bits case. When compared with the Strassen based implementation the AlphaEvolve circuit achieves an area reduction for 48bits or higher precision with a 9% reduction provided at 64bits of precision.
Full Precision Hardware Implementation of the AlphaEvolve 4×4 Complex Valued Matrix Multiplication Algorithm
Napoli, Ettore
2026
Abstract
Matrix multiplication is a fundamental operation across many domains. For 4× 4 matrices, the naive algorithm requires 64 multiplications, while Strassen's method reduces this to 49. Recently the AlphaEvolve framework, an AI-driven approach for discovering optimized algorithms, introduced a groundbreaking 4× 4 complex valued multiplication method requiring 48 multiplications, representing the current minimum. This work presents the first full precision, VLSI optimized hardware implementation of the AlphaEvolve algorithm for 4× 4 complex matrix multiplication. The design achieves exact results without loss of numerical accuracy and is benchmarked against a naive and a Strassen based hardware implementation while varying the input bit width. The proposed circuits, synthesized in a 14nm FinFET standard cell library demonstrates that, for operand widths up to 24bits, the AlphaEvolve implementation generally incurs higher power, delay, and area costs than the naive implementation. When 32bits or higher precision is required the proposed implementation of the AlphaEvolve algorithm achieves an increasing area reduction with respect to the naive implementation that reaches 31% for the 64bits case. When compared with the Strassen based implementation the AlphaEvolve circuit achieves an area reduction for 48bits or higher precision with a 9% reduction provided at 64bits of precision.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


