Byzantine-resilient decentralized stochastic gradient descent

Publication Type

Journal Article

Publication Date

6-2022

Abstract

Decentralized learning has gained great popularity to improve learning efficiency and preserve data privacy. Each computing node makes equal contribution to collaboratively learn a Deep Learning model. The elimination of centralized Parameter Servers (PS) can effectively address many issues such as privacy, performance bottleneck and single-point-failure. However, how to achieve Byzantine Fault Tolerance in decentralized learning systems is rarely explored, although this problem has been extensively studied in centralized systems. In this paper, we present an in-depth study towards the Byzantine resilience of decentralized learning systems with two contributions. First, from the adversarial perspective, we theoretically illustrate that Byzantine attacks are more dangerous and feasible in decentralized learning systems: even one malicious participant can arbitrarily alter the models of other participants by sending carefully crafted updates to its neighbors. Second, from the defense perspective, we propose Ubar, a novel algorithm to enhance decentralized learning with Byzantine Fault Tolerance. Specifically, Ubar provides a Uniform Byzantine-resilient Aggregation Rule for benign nodes to select the useful parameter updates and filter out the malicious ones in each training iteration. It guarantees that each benign node in a decentralized system can train a correct model under very strong Byzantine attacks with an arbitrary number of faulty nodes. We conduct extensive experiments on standard image classification tasks and the results indicate that Ubar can effectively defeat both simple and sophisticated Byzantine attacks with higher performance efficiency than existing solutions.

Keywords

Training, servers, learning systems, distance learning, computer aided instruction, security, fault tolerant systems, decentralized learning, stochastic gradient descent, Byzantine attack, Byzantine fault tolerance

Discipline

Artificial Intelligence and Robotics | Theory and Algorithms

Research Areas

Data Science and Engineering

Publication

IEEE Transactions on Circuits and Systems for Video Technology

Volume

32

Issue

6

First Page

4096

Last Page

4106

ISSN

1051-8215

Identifier

10.1109/TCSVT.2021.3116976

Publisher

Institute of Electrical and Electronics Engineers

Additional URL

http://doi.org/10.1109/TCSVT.2021.3116976

This document is currently not available here.

Share

COinS