The DEF-ATC (Differential Error Feedback - Adapt Then Combine) approach is a novel strategy to address decentralized learning and optimization problems under communication constraints. The strategy blends differential quantization and error feedback to mitigate the negative impact of exchanging compressed updates between neighboring agents. While differential quantization leverages correlations between subsequent iterates, error feedback (which consists of incorporating the compression error into subsequent steps) allows to compensate for the bias caused by compression. In this work, we examine the steady-state mean-square-error performance of the DEF-ATC approach in order to uncover the influence of several factors, including the gradient noise, the network topology, the learning step-size, and the compression schemes, on the network performance. The theoretical findings indicate that, under some general conditions on the compression error, and in the small step-size regime, it is possible to achieve performance levels comparable to those obtained without compression. This implies that, despite using compression techniques to reduce communication overheads, the performance of the decentralized compressed approach can still match that of its uncompressed counterpart, which in turn can match that of centralized learning where all data is aggregated and processed in a centralized manner.
Matching centralized learning performance via compressed decentralized learning with error feedback
Carpentiero M.;Matta V.;
2024
Abstract
The DEF-ATC (Differential Error Feedback - Adapt Then Combine) approach is a novel strategy to address decentralized learning and optimization problems under communication constraints. The strategy blends differential quantization and error feedback to mitigate the negative impact of exchanging compressed updates between neighboring agents. While differential quantization leverages correlations between subsequent iterates, error feedback (which consists of incorporating the compression error into subsequent steps) allows to compensate for the bias caused by compression. In this work, we examine the steady-state mean-square-error performance of the DEF-ATC approach in order to uncover the influence of several factors, including the gradient noise, the network topology, the learning step-size, and the compression schemes, on the network performance. The theoretical findings indicate that, under some general conditions on the compression error, and in the small step-size regime, it is possible to achieve performance levels comparable to those obtained without compression. This implies that, despite using compression techniques to reduce communication overheads, the performance of the decentralized compressed approach can still match that of its uncompressed counterpart, which in turn can match that of centralized learning where all data is aggregated and processed in a centralized manner.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


