Xiao Qin > Research

Refad: Real-time and Fault-tolerant Distributed Systems

Research is supported in part by NSF under Grant EPS-0091900 and the University of Nebraska-Lincoln under Grant 26-0511-0019

In real-time and distributed systems, tasks must be guaranteed a priori to meet its timing constraint. Mission critical real-time tasks must tolerate faults. In other words, mission critical tasks must be completed before their deadlines even the real-time system encounter temporary and permanent faults. Scheduling multiple copies of a task on different processors is a promising approach to support fault-tolerance. If the primary copy of a task cannot be completed due to a fault, its backup copy executes and complete task before its deadline. We propose a series of new algorithms for fault-tolerant scheduling on a real-time distributed system. The algorithms guarantee the completion of a scheduled task before its deadline in the presence of a processor's permanent failures.

Firstly, we attempted to investigate two real-time fault-tolerant scheduling algorithms for independent tasks, which have no precedence constraints. The description of the algorithms and the experiment results can be found in Journal of Software, No5. 2000. The novel idea associate with these two algorithms is that real-time tasks without fault-tolerant requirement are also considered. This is because complex real-time systems consist of both fault-tolerant task and non-fault-tolerant tasks.

Heterogeneous systems involve multiple heterogeneous modules that interact with one another via high-speed network, which also can be heterogeneous. For each real-time task, the execution time on different computing modules are various. A fault-tolerant real-time scheduling is studied to provide a fault-tolerant support in the heterogeneous real-time systems. More information about this algorithm, including its description and simulation results, can be found in the Proceedings of 2000 International Conference on Parallel and Distributed Processing Techniques and Applications.

Above three algorithms are offline scheduling algorithms, which cannot directly be applied in dynamic real-time computing environment. Therefore, a heuristic dynamic scheduling scheme for parallel real-time jobs in a heterogeneous system is presented. The parallel real-time jobs studied in this paper are modelled by directed acyclic graphs (DAG). In the scheduling model, parallel real-time jobs arrive at a heterogeneous system following a Poisson process. The scheduling algorithms developed in this paper take the reliability measure into account, in order to enhance the reliability of the heterogeneous system without any additional hardware cost. In addition, scheduling time and dispatch time are both incorporated into our scheduling scheme so as to make the scheduling result more realistic and precise. Admission control is in place so that a parallel real-time job whose deadline cannot be guaranteed is rejected by the system. The performance of the proposed scheme is evaluated via extensive simulations. The simulation results show that the heuristic algorithm performs significantly better than two other algorithms that do not consider reliability cost. Furthermore, results suggest that shortening the scheduling time results in a higher guarantee ratio. Hence, if parallel scheduling algorithm is devised and employed to shorten the scheduling time, the performance of the heterogeneous system will be further enhanced. The detailed discussion about this algorithm is given in the proceedings of the 30th International Conference on Parallel Processing (ICPP 2001).

Publications

Xiao Qin, Zongfen Han and Liping Pang, "Real-time Scheduling with Fault-tolerance in Heterogeneous Distributed Systems," Chinese Journal of Computers, vol.25, no. 1, 2002. [ Abstract | PDF ] (In Chinese 2.3M)
Xiao Qin, Zongfen Han, Hai Jin, Liping Pang, and Shengli Li. "Real-time Fault-tolerant Scheduling in Heterogeneous Distributed Systems," in Proceedings of Cluster Computing Technologies, Environments, and Applications, the 2000 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'2000), Las Vegas, USA, June 26-29, 2000. Vol. I, pp.421-427. [ Abstract | PDF ]
Xiao Qin, Liping Pang, Zongfen Han, and Shengli Li, "Design and Performance Analysis of a Hybrid Real-time Scheduling Algorithm with Fault-Tolerance," Journal of Software, Vol.11, No.5, May 2000, Beijing, P.R.China.
Xiao Qin, Zongfen Han, Liping Pang and Shengli Li, "A Static Scheduling Algorithm for Real-time and Fault-tolerant tasks in Multiprocessor Systems," in Proceedings of the Fifth International Conference for Young Computer Scientists (ICYCS'99), pp.765, International Academic Publishers, 21-24 August, 1999, Southeast University, Nanjing, P.R.China. [ Abstract | PDF ]
Xiao Qin, Liping Pang, Shengli Li and Zongfen Han, "Efficient Scheduling Algorithm with Fault-tolerance for Real-time Tasks in Distributed Systems," in Proceedings of the Fifth International Conference for Young Computer Scientists (ICYCS'99), pp.721-725, International Academic Publishers, 21-24 August, 1999, Southeast University, Nanjing, P.R.China. [ Abstract | PDF ]
M.S. Thesis,"Study of Real-time Scheduling Model and Real-time Scheduling Algorithms with Fault-Tolerance", (In Chinese). Huazhong University of Science and Technology, China, April 1999. Advisor: Professor Liping Pang

Updated on 4/27/2007

Xiao Qin's Research