Research Collection School Of Computing and Information Systems

Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference

Xinrui TAN
Hongjia LI
Xiaofei XIE, Singapore Management UniversityFollow
Lu GUO
Nirwan ANSARI
Xueqing HUANG
Liming WANG
Zhen XU
Yang LIU

Publication Type

Journal Article

Version

acceptedVersion

Publication Date

7-2024

Abstract

The recent advances of deep learning in various mobile and Internet-of-Things applications, coupled with the emergence of edge computing, have led to a strong trend of performing deep learning inference on the edge servers located physically close to the end devices. This trend presents the challenge of how to meet the quality-of-service requirements of inference tasks at the resource-constrained network edge, especially under variable or even bursty inference workloads. Solutions to this challenge have not yet been reported in the related literature. In the present paper, we tackle this challenge by means of workload-adaptive inference request scheduling: in different workload states, via adaptive inference request scheduling policies, different models with diverse model sizes can play different roles to maintain high-quality inference services. To implement this idea, we propose a request scheduling framework for general-purpose edge inference serving systems. Theoretically, we prove that, in our framework, the problem of optimizing the inference request scheduling policies can be formulated as a Markov decision process (MDP). To tackle such an MDP, we use reinforcement learning and propose a policy optimization approach. Through extensive experiments, we empirically demonstrate the effectiveness of our framework in the challenging practical case where the MDP is partially observable.

Keywords

Edge computing, deep learning inference serving systems, efficient deep learning inference, reinforcement learning

Discipline

Artificial Intelligence and Robotics | Numerical Analysis and Scientific Computing

Publication

IEEE Transactions on Mobile Computing

First Page

Last Page

ISSN

1536-1233

Identifier

10.1109/TMC.2024.3429571

Publisher

Institute of Electrical and Electronics Engineers

Citation

TAN, Xinrui; LI, Hongjia; XIE, Xiaofei; GUO, Lu; ANSARI, Nirwan; HUANG, Xueqing; WANG, Liming; XU, Zhen; and LIU, Yang. Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference. (2024). IEEE Transactions on Mobile Computing. 1-18.
Available at: https://ink.library.smu.edu.sg/sis_research/9442

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/TMC.2024.3429571

Download

Included in

Artificial Intelligence and Robotics Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Reinforcement learning based online request scheduling framework for workload-adaptive edge deep learning inference

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links