Distributed machine learning on IAAS clouds

Publication Type

Conference Proceeding Article

Publication Date

11-2018

Abstract

Training complex machine learning (ML) models with large datasets requires powerful computing infrastructure, which is costly to acquire and maintain. As a result, ML researchers turn to the cloud for on-demand and elastic resource provisioning capabilities. Two issues have arisen from this trend: 1) if not configured properly, training ML models on the cloud could incur significant cost and time, and 2) many researchers in ML tend to focus more on model and algorithm development, so they may not have enough time or skills to deal with system setup, resource selection and configuration. In this work, we propose and implement FC 2 : a web service for fast, convenient and cost-effective distributed ML model training over public cloud resource. Central to the effectiveness of FC 2 is the ability to recommend an appropriate resource configuration in terms of cost and execution time for a given ML training task. Extensive experiments with real-world deep neural network models and datasets demonstrate the effectiveness of our solution.

Discipline

Artificial Intelligence and Robotics | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Proceedings of the 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), Nanjing, China, 2018 November 23-25

Identifier

10.1109/CCIS.2018.8691150

Publisher

IEEE

City or Country

Nanjing, China

Additional URL

https://doi.org/10.1109/CCIS.2018.8691150

This document is currently not available here.

Share

COinS