Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
5-2017
Abstract
Many cluster management systems (CMSs) have been proposed to share a single cluster with multiple distributed computing systems. However, none of the existing approaches can handle distributed machine learning (ML) workloads given the following criteria: high resource utilization, fair resource allocation and low sharing overhead. To solve this problem, we propose a new CMS named Dorm, incorporating a dynamicallypartitioned cluster management mechanism and an utilizationfairness optimizer. Specifically, Dorm uses the container-based virtualization technique to partition a cluster, runs one application per partition, and can dynamically resize each partition at application runtime for resource efficiency and fairness. Each application directly launches its tasks on the assigned partition without petitioning for resources frequently, so Dorm imposes flat sharing overhead. Extensive performance evaluations showed that Dorm could simultaneously increase the resource utilization by a factor of up to 2.32, reduce the fairness loss by a factor of up to 1.52, and speed up popular distributed ML applications by a factor of up to 2.72, compared to existing approaches. Dorm’s sharing overhead is less than 5% in most cases. Index Terms—Cluster Resource Management, Distributed Machine Learning, Fairness
Discipline
Artificial Intelligence and Robotics | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
Proceedings of the 2017 IEEE International Conference on Smart Computing (SMARTCOMP), May 29-31
First Page
1
Last Page
6
Identifier
10.1109/SMARTCOMP.2017.7947053
Publisher
IEEE
City or Country
Hong Kong
Citation
SUN, Peng; WEN, Yonggang; TA, Nguyen Binh Duong; and YAN, Shengen.
Towards distributed machine learning in shared clusters: A dynamically-partitioned approach. (2017). Proceedings of the 2017 IEEE International Conference on Smart Computing (SMARTCOMP), May 29-31. 1-6.
Available at: https://ink.library.smu.edu.sg/sis_research/4766
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/SMARTCOMP.2017.7947053