Research Collection School Of Economics

Evaluating human versus machine learning performance in classifying research abstracts

Yeow Chong GOH, Nanyang Technological University
Xin Qing CAI, Nanyang Technological University
Walter THESEIRA, Singapore University of Social Sciences
Giovanni KO, Singapore Management UniversityFollow
Khiam Aik KHOR, Nanyang Technological University

Publication Type

Journal Article

Version

publishedVersion

Publication Date

7-2020

Abstract

We study whether humans or machine learning (ML) classification models are better at classifying scientific research abstracts according to a fixed set of discipline groups. We recruit both undergraduate and postgraduate assistants for this task in separate stages, and compare their performance against the support vectors machine ML algorithm at classifying European Research Council Starting Grant project abstracts to their actual evaluation panels, which are organised by discipline groups. On average, ML is more accurate than human classifiers, across a variety of training and test datasets, and across evaluation panels. ML classifiers trained on different training sets are also more reliable than human classifiers, meaning that different ML classifiers are more consistent in assigning the same classifications to any given abstract, compared to different human classifiers. While the top five percentile of human classifiers can outperform ML in limited cases, selection and training of such classifiers is likely costly and difficult compared to training ML models. Our results suggest ML models are a cost effective and highly accurate method for addressing problems in comparative bibliometric analysis, such as harmonising the discipline classifications of research from different funding agencies or countries.

Keywords

Discipline classification, Text classification, Supervised classification

Discipline

Artificial Intelligence and Robotics | Economics

Research Areas

Applied Microeconomics

Publication

Scientometrics

Volume

125

Issue

First Page

1197

Last Page

1212

ISSN

0138-9130

Identifier

10.1007/s11192-020-03614-2

Publisher

Springer

Citation

GOH, Yeow Chong; CAI, Xin Qing; THESEIRA, Walter; KO, Giovanni; and KHOR, Khiam Aik. Evaluating human versus machine learning performance in classifying research abstracts. (2020). Scientometrics. 125, (2), 1197-1212.
Available at: https://ink.library.smu.edu.sg/soe_research/2446

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Additional URL

https://doi.org/10.1007/s11192-020-03614-2

Download

Find it in your library

Included in

Artificial Intelligence and Robotics Commons, Economics Commons

COinS

Research Collection School Of Economics

Evaluating human versus machine learning performance in classifying research abstracts

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Economics

Evaluating human versus machine learning performance in classifying research abstracts

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links