Duties for datasets
Publication Type
Book Chapter
Publication Date
12-2023
Abstract
Machine learning (ML) systems are increasingly being deployed in contexts, such as law, medicine and finance, where system errors present serious and foreseeable risks. As ML system behaviour is largely determined by their training inputs, should dataset providers owe duties of care to victims? Using the ImageNet dataset and the Generative Pre-trained Transformer (GPT) models as case studies, this chapter argues that the conventional approach of centralising duties on system providers alone yields insufficient safeguards. Dataset-specific duties should also be considered to incentivise precaution in the preparation of crucial ML input. The chapter analyses how dataset duties may be encompassed in existing tort law, surfacing situations where duties are more appropriate. For instance, where a dataset is intended to be used in a risky context, the dataset provider actively influences system outputs, and the dataset is published without safety restrictions or warnings.
Keywords
Datasets, machine learning, tort law
Discipline
Artificial Intelligence and Robotics | Science and Technology Law
Research Areas
Innovation, Technology and the Law
Publication
Data and Private Law
Editor
Damian Clifford, Lau Kwan Ho & Jeannie Marie Paterson
First Page
207
Last Page
224
ISBN
9781509966059
Identifier
10.5040/9781509966059.ch-013
Publisher
Hart Publishing
City or Country
Oxford
Citation
SOH, Jerrold Tsin Howe.
Duties for datasets. (2023). Data and Private Law. 207-224.
Available at: https://ink.library.smu.edu.sg/sol_research/4443
Additional URL
https://doi.org/10.5040/9781509966059.ch-013