Publication Type

Conference Proceeding Article

Version

submittedVersion

Publication Date

2-2021

Abstract

Many fundamental machine learning tasks can be formulated as a problem of learning with vector-valued functions, where we learn multiple scalar-valued functions together. Although there is some generalization analysis on different specific algorithms under the empirical risk minimization principle, a unifying analysis of vector-valued learning under a regularization framework is still lacking. In this paper, we initiate the generalization analysis of regularized vector-valued learning algorithms by presenting bounds with a mild dependency on the output dimension and a fast rate on the sample size. Our discussions relax the existing assumptions on the restrictive constraint of hypothesis spaces, smoothness of loss functions and low-noise condition. To understand the interaction between optimization and learning, we further use our results to derive the first generalization bounds for stochastic gradient descent with vector-valued functions. We apply our general results to multi-class classification and multi-label classification, which yield the first bounds with a logarithmic dependency on the output dimension for extreme multi-label classification with the Frobenius regularization. As a byproduct, we derive a Rademacher complexity bound for loss function classes defined in terms of a general strongly convex function.

Keywords

Statistical Learning Theory, Multi-label Learning, Stochastic Gradient Descent

Discipline

Artificial Intelligence and Robotics | Theory and Algorithms

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 35th AAAI Conference on Artificial Intelligence: February 2-9, Virtual

Volume

12

First Page

10338

Last Page

10346

ISBN

9781577358664

Publisher

AAAI Press

City or Country

Paolo Alto, CA

Additional URL

https://ojs.aaai.org/index.php/AAAI/article/view/17238

Share

COinS