Providing Fault Tolerance in Interconnection Networks for PC Clusters

Providing Fault Tolerance in Interconnection Networks for PC Clusters

Efficient Mechanisms

LAP Lambert Academic Publishing ( 2010-10-21 )

€ 79,00

Buy at the MoreBooks! Shop

Currently, clusters of PCs are considered a cost-effective alternative to large parallel computers. In these systems thousands of components are connected through high-performance interconnection networks. Among the high-performance network technologies available to build clusters, InfiniBand (IBA) has emerged as a new standard interconnect suitable for clusters. Indeed, has been adopted by many of the most powerful systems currently built (top500 list). As the number of nodes increases in these systems, the interconnection network grows accordingly. Along with the increase in components the probability of faults increases dramatically, and thus, fault tolerance in the system, in general, and in the interconnection network, in particular, becomes a necessity. Unfortunately, most of the fault-tolerant routing strategies proposed for massively parallel computers cannot be applied because routing and virtual channel transitions are deterministic in IBA, which prevent packets from avoiding the faults. This book focuses on methodologies for providing adequate levels of fault tolerance to PC clusters, specially tailored to IBA networks.

Book Details:

ISBN-13:

978-3-8383-1890-5

ISBN-10:

3838318900

EAN:

9783838318905

Book language:

English

By (author) :

José Miguel Montañana Aliaga

Number of pages:

208

Published on:

2010-10-21

Category:

Data communication, networks