Abstract: In this paper, we present the design and implementation of a lightweight fault tolerance framework for Web services. With our framework, a Web service can be rendered fault tolerant by replicating it across several nodes. A consensus-based algorithm is used to ensure total ordering of incoming application requests to the replicated Web service, and to ensure consistent membership view among the replicas. The framework is built by extending an open-source implementation of the WS-ReliableMessaging specification, and all reliable message exchanges in our framework conform to the specification. As such, our framework does not depend on any proprietary messaging and transport protocols, which is consistent with the Web services design principles. Our performance evaluation shows that our implementation is nearly optimal and the framework incurs only moderate runtime overhead.
Keywords: Fault tolerance, Web services, distributed consensus, reliable messaging, replication