Affiliations: Department of Computer Sc & Engineering, Asia
Pacific Institute of Information Technology, Panipal (Haryana), India. Tel.:
+91 0180 2620043; E-mail: [email protected]
Abstract: Mobile distributed systems raise new issues such as mobility, low
bandwidth of wireless channels, disconnections, limited battery power and lack
of reliable stable storage on mobile nodes. In minimum-process coordinated
checkpointing, some processes may not checkpoint for several checkpoint
initiations. In the case of a recovery after a fault, such processes may
rollback to far earlier checkpointed state and thus may cause greater loss of
computation. In all-process coordinated checkpointing, the recovery line is
advanced for all processes but the checkpointing overhead may be exceedingly
high. To optimize both matrices, the checkpointing overhead and the loss of
computation on recovery, we propose a hybrid checkpointing algorithm, wherein
an all-process coordinated checkpoint is taken after the execution of
minimum-process coordinated checkpointing algorithm for a fixed number of
times. Thus, the Mobile nodes with low activity or in doze mode operation may
not be disturbed in the case of minimum-process checkpointing and the recovery
line is advanced for each process after an all-process checkpoint.
Additionally, we try to minimize the information piggybacked onto each
computation message. For minimum-process checkpointing, we design a blocking
algorithm, where no useless checkpoints are taken and an effort has been made
to optimize the blocking of processes. We propose to delay selective messages
at the receiver end. By doing so, processes are allowed to perform their normal
computation, send messages and partially receive them during their blocking
period. The proposed minimum-process blocking algorithm forces zero useless
checkpoints at the cost of very small blocking.
Keywords: Fault tolerance, consistent global state, coordinated checkpointing and mobile systems