UniSa - IRIS Institutional Research Information System

Large-scale compute clusters are highly affected by performance variability that originates from different sources. Among these sources, the network plays an essential role as a shared resource between users and their jobs in a supercomputer. In this paper, we analyze the effect of some network-related sources on the performance variability of a modern compute cluster equipped with a Dragonfly+ interconnect. Specifically, we focus on the impacts of job placement, communication patterns, routing strategy, and network background traffic on the performance variability of communication-intensive workloads.To quantify the effect of network congestion (background traffic) on the performance variability, we propose a heuristic that can successfully estimate the amount of communication on the network produced by other jobs running on the cluster simultaneously. Then, we show how this network congestion contributes to the performance variability of different communication patterns and real-world communication-intensive applications.

An Analysis of Performance Variability on Dragonfly+ Topology

Salimibeni, Majid;Cosenza, Biagio

2022

Abstract

Large-scale compute clusters are highly affected by performance variability that originates from different sources. Among these sources, the network plays an essential role as a shared resource between users and their jobs in a supercomputer. In this paper, we analyze the effect of some network-related sources on the performance variability of a modern compute cluster equipped with a Dragonfly+ interconnect. Specifically, we focus on the impacts of job placement, communication patterns, routing strategy, and network background traffic on the performance variability of communication-intensive workloads.To quantify the effect of network congestion (background traffic) on the performance variability, we propose a heuristic that can successfully estimate the amount of communication on the network produced by other jobs running on the cluster simultaneously. Then, we show how this network congestion contributes to the performance variability of different communication patterns and real-world communication-intensive applications.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	ISBN
	
				978-1-6654-9856-2
			
	Appare nelle tipologie:
	
				4.1.1 Proceedings con DOI

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4835651

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

7

5

social impact