Fault tolerance in P2P-grid environments

Huan Wang*, Nakazato Hidenori

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

P2P-Grid system provides a framework for converging Grid and peer-to-peer network to deploy large-scale distributed applications. However, working nodes with heterogeneous properties can freely join and leave in the middle of their computation. The nodes dynamic participation arbitrarily at any time according to user's decision can keep changing the topology of the network and also causing more common execution failures than in other systems. To this end, failure detection mechanisms and fault tolerance function typically as an integral part of P2P-Grid system have been well-studied. Our research aims to address the highly dynamic nature that arises in P2P-Grid systems by understanding nodes life time statistics in previous research. We are proposing a Check pointing-and-Recovery architecture for applications restarting as soon as possible on P2P-Grid systems. And failure-detection mechanism is a necessary prerequisite to fault tolerance and fault recovery in P2P-Grid system. We also investigate how the design of various failure detection algorithms affects their performance in node average failure detection time. The evaluation shows our check pointing and restart paradigm and failure detection algorithm enables high reliability and performance with high node departure.

Original languageEnglish
Title of host publicationProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
Pages2482-2485
Number of pages4
DOIs
Publication statusPublished - 2012
Event2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012 - Shanghai, China
Duration: 2012 May 212012 May 25

Publication series

NameProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012

Conference

Conference2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012
Country/TerritoryChina
CityShanghai
Period12/5/2112/5/25

Keywords

  • P2P-Grid
  • failure detection
  • failure recovery
  • fault tolerance

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'Fault tolerance in P2P-grid environments'. Together they form a unique fingerprint.

Cite this