Tutorial on Architecting Fault Tolerant Systems

 

Main | Structure of the tutorial | Related materials | Past editions | Related activities | About us

 

Main events related to this tutorial

International Workshop on Software Engineering for Resilient Systems (SERENE 2008)


Software Engineering of Fault Tolerant Systems book

Tutorial at ISSRE 2007

Tutorial at WICSA 2007
EFTS 2007 Workshop

EFTS 2006 Workshop





ISSRE 2007 tutorial

The 18th IEEE International Symposium on Software Reliability Engineering
5-9th of November, Trollhättan, Sweden
H. Muccini, P. Pelliccione, A. Romanovsky


Tutorial overview: Fault tolerance, being one of the four means for guaranteeing dependability, is intended to ensure the delivery of the correct services in the presence of active faults. It is implemented by error detection and subsequent system recovery. While typical solutions focus on fault tolerance (and specifically, exception handling) during the design and implementation phases of the software life-cycle (e.g., Java and Windows NT exception handling), more recently the need for explicit exception handling solutions during the entire life cycle has been advocated by some researchers. Several solutions have been proposed for fault tolerance via exception handling at the software architecture and component levels. This tutorial describes how the two concepts of fault tolerance and software architectures have been integrated so far. It is structured in five parts (Overview on Software Architecture, Overview on Fault Tolerance and Exception Handling, Integrating Fault Tolerance into Software Architecture, Coordinated Atomic Actions, Examples and Case Studies) and is based on a survey study on architecting fault tolerant systems where more than fifteen approaches have been analyzed and classified. The tutorial concludes identifying those issues that remain still open and require deeper investigation.

Target Audience and supporting materials (tutorial level: intermediate)

The tutorial has been structured to attract a wide range of software engineers. It does not require any specific knowledge on fault tolerance and exception handling. In order to provide a commonly understood definition of software architecture, the talk will introduce an initial informal meaning of software architecture. Tutorial attendants will receive in advance slides that will be presented during the tutorial. They will receive also useful bibliographies and links for further reading.


WICSA 2007 tutorial

Sixth Working IEEE/IFIP Conference on Software Architecture, WICSA 2007
Mumbai, India, January 7 2007
H. Muccini, P. Pelliccione, A. Romanovsky


Tutorial overview: Fault tolerance, being one of the four means for guaranteeing dependability, is intended to ensure the delivery of the correct services in the presence of active faults. It is implemented by error detection and subsequent system recovery. Error detection finds an erroneous system state. Following system recovery transforms the system state that contains one or more errors and (possibly) faults into a state without detected errors and faults (fault handling). Exceptions and exception handling provide a general framework for structuring the fault tolerance activities in a system, by focusing on the concept of exceptional/abnormal behaviour (as opposed to normal behaviour), exception handling enables specifying actions to be undertaken in the presence of abnormal events. While typical solutions focus on fault tolerance (and specifically, exception handling) during the design and implementation phases of the software life-cycle (e.g., Java and Windows NT exception handling), more recently the need for explicit exception handling solutions during the entire life cycle has been advocated by some researchers. Several solutions have been proposed for fault tolerance via exception handling at the software architecture and component levels. This tutorial describes how the two concepts of fault tolerance and software architectures have been integrated so far. It is structured in two parts (Overview on Fault Tolerance and Exception Handling, and Integrating Fault Tolerance into Software Architecture) and is based on a survey study on architecting fault tolerant systems where more than fifteen approaches have been analyzed and classified. The tutorial concludes identifying those issues that remain still open and require deeper investigation.

© Copyright 2007. AFTS Tutorial.