Dynamic Load Balancing for I/O-Intensive Applications on Clusters
Over the last ten years, clusters have become the fastest growing platforms in high-performance computing. Due to the rapidly widening performance gap between processors and disks in modern cluster systems, storage systems tend to become a performance bottleneck, which induces much more pronounced performance degradation for I/O-intensive applications such as long running simulations, remote-sensing database systems, and biological sequence analysis. To alleviate the I/O bottleneck, our research investigates load-balancing policies that are capable of achieving the high utilization of disks in addition to those of CPU and memory resources.
A New I/O-aware Load-balancing Scheme. In the first phase of this study, a new I/O-aware load-balancing scheme has been developed to noticeably improve the overall performance of clusters under a broad range of workload conditions [7]. The new scheme relies on a technique that allows a job’s I/O operations to be conducted by a node that is different from the one in which the job’s computation is assigned, thereby permitting the job to access remote I/O. The proposed scheme dynamically detects I/O load imbalance on nodes of a system and determines whether to migrate the I/O requests of some jobs from overloaded nodes to other less- or under-loaded nodes, depending on data migration cost and remote I/O access overheads. Besides balancing I/O load, the scheme judiciously takes into account both CPU and memory load sharing in systems, thereby maintaining the same level of performance as the existing schemes when I/O load is low or well balanced. Compared with the existing schemes that only consider CPU and memory, the proposed scheme significantly improves the system performance by reducing the mean slowdown. Furthermore, the proposed scheme improves the performance in the mean slowdown over the existing approaches that only consider I/O. More importantly, my study shows that the new scheme improves over a very recent algorithm found in the literature that considers all the three resources mentioned above.
Leverage Preemptive Job Migration to Improve System Performance. A second load-balancing scheme has been proposed to address the issue of whether preemptive job migration can improve the performance of a cluster over non-preemptive schemes [6]. In this approach, a running job is eligible to be migrated only if it is expected to improve the overall performance. Unlike the existing I/O-aware load balancing schemes, the technique utilized in this scheme tackles the problem by considering both explicit I/O invoked by application programs and implicit I/O induced by page faults. The results from a trace-driven simulation show that, compared to the existing schemes that consider I/O without using preemptive job migrations, the proposed approach achieves the improvement in mean slowdown by up to a factor of 10.
Consider the Heterogeneity of Resources. Although it is reasonable to assume that a new and stand-alone cluster system may be configured with a set of homogenous nodes, there is a strong likelihood for upgraded clusters or networked clusters to become heterogeneous in a variety of resources such as CPU, memory, and disk storage. This is because, to improve performance and support more users, new nodes that might have divergent behaviors and properties may be added to the systems or several smaller clusters of different characteristics may be connected via a high-speed network to form a bigger one. Since heterogeneity in disks inevitably imposes significant performance degradation when coupled with an imbalanced load of I/O and memory resources, an approach for hiding the heterogeneity of resources has been devised by judiciously balancing I/O work across all the nodes in a cluster [4]. An extensive simulation provides us with empirical results to show that the performance of the proposed policy is more sensitive to the changes in CPU and memory heterogeneity than the existing policies. Conversely, the results prove that my approach is less sensitive to the changes in disk I/O heterogeneity than non I/O-aware load balancing policies.
Support for I/O-intensive Parallel Applications. Most recently, the previous work in load balancing has been extended by developing two simple yet effective I/O-aware load-balancing schemes for parallel jobs running on clusters [5]. Each parallel job consists of a number of tasks, which are assumed to synchronize with one another. Each workstation serves several tasks in a time-sharing fashion so that the tasks can dynamically share the cluster resources. Extensive simulations show that the proposed schemes are capable of balancing the load of a cluster in such a way that CPU, memory, and I/O resources at each node can be simultaneously well utilized under a wide spectrum of workload conditions.
Improve Buffer Utilization. In addition to the research in load balancing, a feedback control mechanism has been developed to improve the performance of a cluster by adaptively manipulating the I/O buffer size [8]. Results from a trace-driven simulation show that this mechanism is effective in enhancing the performance of a number of existing load-balancing schemes.
Current Work
Current studies in this work include:
(1) Implement the proposed I/O aware load balancing policies in Prairiefire, a computing cluster that consists of 128 nodes connected by Myrinet.
(2) Deal with a rigorous testing experiment where the performance of a cluster with more than 1000 workstations will be evaluated.
(3) Make an effort to develop a parallel simulator to efficiently investigate the performance of the I/O-aware load balancing schemes for large-scale clusters.
Publications
Design and Analysis of a Load Balancing Strategy in Data Grids. [ Abstract | PDF ]
X. Qin, Future Generation Computer Systems: The Int'l Journal of Grid Computing, vol. 23, no. 1, pp. 132-137, Jan. 2007.
Improving the Performance of I/O-Intensive Applications on Clusters of Workstations. [Abstract | PDF]
X. Qin, H. Jiang, Y. Zhu, and D. R. Swanson, Cluster Computing: The Journal of Networks, Software Tools and Applications, vol. 9, no. 3, pp. 297-311, July 2006.
A Feedback Control Mechanism for Balancing I/O- and Memory-Intensive Applications on Clusters. [ Abstract | PDF ]
X. Qin, H. Jiang, Y. Zhu, and D. R. Swanson, Scalable Computing: Practice and Experience, ISSN 1895-1767, Vol. 6, No. 4, pp. 95-107, 2005 .
Dynamic Load Balancing for I/O-Intensive Tasks on Heterogeneous Clusters.
X. Qin, H. Jiang, Y. Zhu, and D. R. Swanson, Cluster Computing: The Journal of Networks, Software Tools and Applications, Special Issue on Parallel I/O in Computational Grids and Cluster Computing Systems.
Performance Analysis of an Admission Controller for CPU- and I/O- Intensive Applications in Self-Managing Computer Systems. [ Abstract | PDF ]
M. Nijim, T. Xie, and X. Qin, ACM Operating Systems Review, Vol. 39, No. 4, pp.37-45, October, 2005.
Integrating a Performance Model in Self-Managing Computer Systems under Mixed Workload Conditions.
M. Nijim, T. Xie, and X. Qin, Proc. IEEE International Conference on Information Reuse and Integration, Aug. 2005.
Improving Effective Bandwidth of Networks on Clusters using Load Balancing for Communication-Intensive Applications. [ Abstract | PDF ]
X. Qin and H. Jiang, Proc. 24th IEEE International Performance, Computing, and Communications Conference (IPCCC'05), pp.27-34, April 2005.
Improving Network Performance through Task Duplication for Parallel Applications on Clusters. [ Abstract | PDF ]
X. Qin, Proc 24th IEEE International Performance, Computing, and Communications Conference (IPCCC'05), pp.35-42, April 2005.
Benchmarking the CLI for I/O Intensive Computing. [ Abstract | PDF ]
X. Qin, T. Xie, A. Nathan, and V. K. Tadepalli, Proc. 19th International Parallel and Distributed Processing Symposium (IPDPS), 6th Int’l Workshop on Parallel and Distributed Scientific and Engineering Computing, April 2005.
Improving the Performance of Communication-Intensive Parallel Applications Executing on Clusters.
X. Qin and H. Jiang, 6th IEEE International Conference on Cluster Computing (Cluster'04, poster session), pp. 493, Sept. 2004.
Dynamic Load Balancing for I/O-Intensive Tasks on Heterogeneous Clusters. [ Abstract | PDF | PS ]
X. Qin, H. Jiang, Y. Zhu, and D. R. Swanson, Proc. 10th International Conference on High Performance Computing (HiPC), pp.300-309, Dec. 2003.
Towards Load Balancing Support for I/O-Intensive Parallel Jobs in a Cluster of Workstations. [ Abstract | PDF ]
X. Qin, H. Jiang, Y. Zhu, and D. R. Swanson, Proc. 5th IEEE International Conference on Cluster Computing (Cluster'03), pp.100-107, Dec. 2003.
Boosting Performance for I/O-Intensive Workload by Preemptive Job Migrations in a Cluster System. [ Abstract | PDF | PS ]
X. Qin, H. Jiang, Y. Zhu, and D. R. Swanson, Proc. 15th Symp. Computer Architecture and High Performance Computing, pp. 235-245, Nov. 2003.
A Dynamic Load Balancing Scheme for I/O-Intensive Applications in Distributed Systems. [ Abstract | PDF | PS ]
X. Qin, H. Jiang, Y. Zhu, and D. R. Swanson, Proc. 32nd Int'l Conference on Parallel Processing Workshops (ICPP Workshops), pp.79-86, Oct. 2003.
Dynamic Load balancing for I/O- and Memory-Intensive workload in Clusters using a Feedback Control Mechanism. [Abstract | PDF ]
X. Qin, H. Jiang, Y. Zhu, and D. R. Swanson, Proc. 9th Int'l Euro-Par Conference on Parallel Processing (Euro-Par), pp. 224-229, Aug. 2003.
Dynamic Distributed Load Balancing Support for I/O-Intensive Parallel Applications on Clusters.
H. Jiang, D. Swanson, X. Qin, and Y. Zhu, Poster Session at CSE Research Facility Open House, University of Nebraska-Lincoln, September 5, 2003.
Boosting Performance for I/O-Intensive Workload by Preemptive Job Migrations in a Cluster System.
X. Qin and H. Jiang, Technical Report TR02-10-04, Department of Computer Sci. and Eng., Univ. of Nebraska-Lincoln, Oct. 2002.
A Dynamic Load Balancing Scheme for I/O-Intensive Applications in Distributed Systems.
X. Qin and H. Jiang, Technical Report TR02-10-02, Department of Computer Sci. and Eng., Univ. of Nebraska-Lincoln, Oct. 2002.
Updated on 9/17/2006