Reliable and Dynamically Reconfigurable Distributed Systems
The purpose of this research is
to create operating system support and a programming
environment for developing and maintaining
long-running parallel and distributed applications
that are continually evolving.
Many distributed and parallel applications,
such as automated manufacturing, computer-aided control systems,
and scientific computation,
execute for a long time and are developed incrementally
whereby changes may be required to be incorporated
while the application is running.
Supporting changes at runtime require
an efficient dynamic reconfiguration facilities
and an extensible programming environment.
The facilities consist of toolkits and runtime mechanisms
for generation of correct reconfiguration plans and
consistency maintenance during normal execution as well as
exception activities.
The toolkits support generalized techniques
for concurrency control, recovery and reconfiguration that utilize
partial-order application semantics.
The toolkit is implemented on top of
a library that supplies
common routines for behavior analysis, conflict analysis,
and consistency restoration.
The programming environment is based on a
scalable software architecture for specifying and verifying
complex applications behavior that can also be easily
analyzable by the toolkits.
The scope of the research includes building toolkits at the system level,
end-user environment for developing applications that use them,
algorithm design, prototyping, and evaluation of the efficiency
and usefulness of the facilities.
In this system, a hierarchical state machine model
is used as the underlying formalism because of its utility for
analyzing dependencies among interacting operations and
computing plans for maintaining consistency during
failure recovery and reconfiguration.
This research will enhance our understanding of the fundamental
principles for maintaining consistency in distributed and parallel systems
that is more general than existing techniques, such as transactions
(including semantic-based transactions).
Transaction-based approaches require
failure atomicity and serializability to be preserved
resulting in restrictive interaction of concurrent tasks.
However, our approach analyzes dependencies automatically and restores
applications to correct intermediate states.
Reconfiguration facilities based on this approach
are the underlying mechanisms for
building tools for other purposes,
including (1) interactive parallel program
steering and control,
(2) adaptive transaction processing,
(3) performance tuning through dynamic selection
of efficient implementations,
(4) mobile distributed systems,
(5) fault-tolerant computing, and (6) load balancing.
Publications
-
"Advanced Techniques for
Maintaining Reliability of Complex Computer Systems"
by Alvin
Lim, 30th Hawaii International Conference on System Sciences, Hawaii,
Jan 1997.
-
"Multilevel Master-Slave
Parallel Programming Models"
by Hsin-Chu
Chen, Alvin Lim and Nazir A. Warsi, Asian Computing Science Conference,
Singapore, December 1996.
-
"Abstraction and Composition
Techniques for Reconfiguration of Large-Scale Complex Applications"
FONT>
by Alvin
Lim, IEEE International Conference on Configurable Distributed Systems,
Annapolis, Maryland, May 1996.
-
"Automatic Analytical Tools
for Reliability and Dynamic Adaptation of Complex Distributed Systems"
by Alvin
Lim, IEEE International Conference on Engineering of Complex Computer
Systems, Florida, November 6-10, 1995.
-
"A Uniform Software Architecture
for Cooperation, Reliability and Reconfiguration of Autonomous Decentralized
Systems"
by Alvin
Lim, Second International Symposium on Decentralized Systems, IEEE,
Phoenix, Arizona, April 25-27, 1995.
-
"A State Machine Approach
to Reliable Distributed Systems"
by Alvin
Lim, Stuart A. Friedberg, 11th IEEE Symposium on Reliable Distributed
Systems, Houston, October 1992.
Acknowledgements
-
National
Science Foundation (NSF) CAREER Award. PI on Operating System Support
and Programming Environment for Evolutionary Parallel and Distributed Applications,
May 95 -- April 98.
-
ACEIS.
Co-PI of award from the Army Research Laboratory
(95-98) and Army research Office (92-95) for
the Army Center of Excellence in Information Sciences, July 1992 -- June
1998, with Nazir Warsi (PI), et. al.