Matrix multiplication is a fundamental operation across many domains. For 4× 4 matrices, the naive algorithm requires 64 multiplications, while Strassen's method reduces this to 49. Recently the AlphaEvolve framework, an AI-driven approach for discovering optimized algorithms, introduced a groundbreaking 4× 4 complex valued multiplication method requiring 48 multiplications, representing the current minimum. This work presents the first full precision, VLSI optimized hardware implementation of the AlphaEvolve algorithm for 4× 4 complex matrix multiplication. The design achieves exact results without loss of numerical accuracy and is benchmarked against a naive and a Strassen based hardware implementation while varying the input bit width. The proposed circuits, synthesized in a 14nm FinFET standard cell library demonstrates that, for operand widths up to 24bits, the AlphaEvolve implementation generally incurs higher power, delay, and area costs than the naive implementation. When 32bits or higher precision is required the proposed implementation of the AlphaEvolve algorithm achieves an increasing area reduction with respect to the naive implementation that reaches 31% for the 64bits case. When compared with the Strassen based implementation the AlphaEvolve circuit achieves an area reduction for 48bits or higher precision with a 9% reduction provided at 64bits of precision.

Full Precision Hardware Implementation of the AlphaEvolve 4×4 Complex Valued Matrix Multiplication Algorithm

Napoli, Ettore
2026

Abstract

Matrix multiplication is a fundamental operation across many domains. For 4× 4 matrices, the naive algorithm requires 64 multiplications, while Strassen's method reduces this to 49. Recently the AlphaEvolve framework, an AI-driven approach for discovering optimized algorithms, introduced a groundbreaking 4× 4 complex valued multiplication method requiring 48 multiplications, representing the current minimum. This work presents the first full precision, VLSI optimized hardware implementation of the AlphaEvolve algorithm for 4× 4 complex matrix multiplication. The design achieves exact results without loss of numerical accuracy and is benchmarked against a naive and a Strassen based hardware implementation while varying the input bit width. The proposed circuits, synthesized in a 14nm FinFET standard cell library demonstrates that, for operand widths up to 24bits, the AlphaEvolve implementation generally incurs higher power, delay, and area costs than the naive implementation. When 32bits or higher precision is required the proposed implementation of the AlphaEvolve algorithm achieves an increasing area reduction with respect to the naive implementation that reaches 31% for the 64bits case. When compared with the Strassen based implementation the AlphaEvolve circuit achieves an area reduction for 48bits or higher precision with a 9% reduction provided at 64bits of precision.
2026
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4950855
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact