TY - GEN
T1 - Dynamic scheduling of parallel jobs with QoS demands in multiclusters and grids
AU - He, Ligang
AU - Jarvis, Stephen A.
AU - Spooner, Daniel P.
AU - Chen, Xinuo
AU - Nudd, Graham R.
PY - 2004
Y1 - 2004
N2 - This paper addresses the dynamic scheduling of parallel jobs with QoS demands (soft-deadlines) in multiclusters and grids. Three metrics (over-deadline, makespan and idle-time) are combined with variable weights to evaluate the scheduling performance. These three metrics are used to measure the extent of the jobs' QoS demand compliance, the resource throughput and the resource utilization. Two levels of performance optimisation are applied in the multicluster. At the multicluster level, a scheduler (which we call MUSCLE) allocates parallel jobs with high packing potential to the same cluster; it also takes the jobs' QoS requirements into account and employs a heuristic to allocate suitable workloads to each cluster to balance the overall system performance. At the single cluster level, an existing workload manager, called TITAN, utilizes a genetic algorithm to further improve the scheduling performance of the jobs previously allocated by MUSCLE. Extensive experimental studies are conducted to verify the effectiveness of the scheduling mechanism as well as the effect of the prediction accuracy on the scheduling performance. The results show that compared with traditional distributed workload allocation policies, the comprehensive scheduling performance of parallel jobs is significantly improved across the multicluster, and the presence of prediction errors does not dramatically weaken the performance advantage.
AB - This paper addresses the dynamic scheduling of parallel jobs with QoS demands (soft-deadlines) in multiclusters and grids. Three metrics (over-deadline, makespan and idle-time) are combined with variable weights to evaluate the scheduling performance. These three metrics are used to measure the extent of the jobs' QoS demand compliance, the resource throughput and the resource utilization. Two levels of performance optimisation are applied in the multicluster. At the multicluster level, a scheduler (which we call MUSCLE) allocates parallel jobs with high packing potential to the same cluster; it also takes the jobs' QoS requirements into account and employs a heuristic to allocate suitable workloads to each cluster to balance the overall system performance. At the single cluster level, an existing workload manager, called TITAN, utilizes a genetic algorithm to further improve the scheduling performance of the jobs previously allocated by MUSCLE. Extensive experimental studies are conducted to verify the effectiveness of the scheduling mechanism as well as the effect of the prediction accuracy on the scheduling performance. The results show that compared with traditional distributed workload allocation policies, the comprehensive scheduling performance of parallel jobs is significantly improved across the multicluster, and the presence of prediction errors does not dramatically weaken the performance advantage.
UR - http://www.scopus.com/inward/record.url?scp=19944379474&partnerID=8YFLogxK
U2 - 10.1109/GRID.2004.27
DO - 10.1109/GRID.2004.27
M3 - Conference contribution
AN - SCOPUS:19944379474
SN - 0769522564
T3 - Proceedings - IEEE/ACM International Workshop on Grid Computing
SP - 402
EP - 409
BT - Proceedings - Fifth IEEE/ACM International Workshop on Grid Computing
A2 - Buyya, R.
T2 - Proceedings - Fifth IEEE/ACM International Workshop on Grid Computing
Y2 - 8 November 2004 through 8 November 2004
ER -