Adaptive Parallelism for Computing on Heterogeneous Clusters
Abstract
Until recent years most parallel machines have been made up of closely-coupled microprocessor-based computers. With the advent of high-performance workstations and high-speed networking, the aggregate computational power and memory capacity of workstation clusters have become attractive and indispensable resources for parallel computing. Techniques to harness the power of workstation cluster computing, however, require the development of practical methods for controlling heterogeneous resources dynamically. This dissertation proposes an integrated framework that comprises two related parts. The first part of the framework is a software structure that enables parallel applications to be adaptable to workload imbalances at runtime. To realize the adaptation, applications are partitioned into small components called tasks. The tasks are then grouped into grains; each grain is an object that facilitates execution of tasks on a workstation. An application can therefore optimize its performance by the reconfiguration of task-to-grain and grain-to- workstation mappings. Based on the software structure, the implementation and evaluation of workload distribution schemes for data-parallel and task- parallel applications are presented. The second part of the framework is a resource management system that allocates resources to parallel applications through competition. The applications respond to allocation decisions by dynamic reconfiguration. The objectives of the system are to maximize the speedup of the parallel applications and, at the same time, to allocate workstations fairly and efficiently to the applications. A prototype implementation which provides a testbed for studying the dynamics of competition is constructed. In addition, a new structure for organizing replicated parallel applications is developed and an architecture for a multi-user, multi-parallel program environment based on the proposed framework is suggested. The effectiveness of the concept and the framework is demonstrated by the results of experiments conducted on the testbed. The parallel applications involved in the experiments consist of block-matrix multiplication, cycle-searching of a non-linear iterated cryptographic function, and simulators of an ATM network.