TY - JOUR

T1 - Construction of fault‐tolerant mesh‐connected highly parallel computer and its performance analysis

AU - Takanami, Itsuo

AU - Inoue, Katsushi

AU - Watanabe, Takahiro

AU - Oka, Minoru

PY - 1993

Y1 - 1993

N2 - A reconfiguration scheme is proposed in which a mesh‐connected highly parallel computer is divided into groups of PEs with small mesh‐structures, a spare row is added to each group (in what follows, such a group with a spare row is called a plane), these planes are successively connected upward and downward, and finally the top and bottom groups are connected. The scheme has such features that: (1) although switchings for reconfiguration are done locally, compensations are done globally, considering the distribution of faults over the whole planes; and (2) switching algorithm and circuits are simple and hence our scheme is suitable for dynamic reconfiguration. First, a method for repairing faults is described, and the necessary and sufficient condition for repairability is given. Next, formulas for the reliabilities of systems are given. Using these formulas, an example of computing the improvement degree of MTTF is illustrated and the result is compared with those in the literature. The probabilities of system survivals against the number of faulty PE's also are analyzed and the results are compared with those in the literature. Finally, logic circuits for the reconfiguration are shown and the correctness of their behavior is proved.

AB - A reconfiguration scheme is proposed in which a mesh‐connected highly parallel computer is divided into groups of PEs with small mesh‐structures, a spare row is added to each group (in what follows, such a group with a spare row is called a plane), these planes are successively connected upward and downward, and finally the top and bottom groups are connected. The scheme has such features that: (1) although switchings for reconfiguration are done locally, compensations are done globally, considering the distribution of faults over the whole planes; and (2) switching algorithm and circuits are simple and hence our scheme is suitable for dynamic reconfiguration. First, a method for repairing faults is described, and the necessary and sufficient condition for repairability is given. Next, formulas for the reliabilities of systems are given. Using these formulas, an example of computing the improvement degree of MTTF is illustrated and the result is compared with those in the literature. The probabilities of system survivals against the number of faulty PE's also are analyzed and the results are compared with those in the literature. Finally, logic circuits for the reconfiguration are shown and the correctness of their behavior is proved.

KW - Fault tolerance

KW - dynamic reconfiguration

KW - mesh‐connected

KW - parallel computer

KW - ring structure

UR - http://www.scopus.com/inward/record.url?scp=0027886160&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027886160&partnerID=8YFLogxK

U2 - 10.1002/scj.4690240802

DO - 10.1002/scj.4690240802

M3 - Article

AN - SCOPUS:0027886160

SN - 0882-1666

VL - 24

SP - 11

EP - 24

JO - Systems and Computers in Japan

JF - Systems and Computers in Japan

IS - 8

ER -