Publication Type
Working Paper
Version
acceptedVersion
Publication Date
11-2021
Abstract
Data-centric AI calls for better, not just bigger, datasets. As data protection laws with extra-territorial reach proliferate worldwide, ensuring datasets are legal is an increasingly crucial yet overlooked component of “better”. To help dataset builders become more willing and able to navigate this complex legal space, this paper reviews key legal obligations surrounding ML datasets, examines the practical impact of data laws on ML pipelines, and offers a framework for building legal datasets.
Keywords
Legal datasets, machine learning, data laws, data protection laws
Discipline
Computer Law | Databases and Information Systems | Internet Law
Research Areas
Innovation, Technology and the Law
First Page
1
Last Page
7
Citation
SOH, Jerrold.
Building legal datasets. (2021). 1-7.
Available at: https://ink.library.smu.edu.sg/sol_research/3442
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://arxiv.org/abs/2111.02034
Comments
Accepted at NeuRIPS 2021 Data-Centric AI Workshop