Research Collection School Of Computing and Information Systems

A BERT-based two-stage model for Chinese Chengyu recommendation

Minghuan TAN, Singapore Management University
Jing JIANG, Singapore Management UniversityFollow
Bingtian DAI, Singapore Management UniversityFollow

Publication Type

Journal Article

Version

acceptedVersion

Publication Date

11-2021

Abstract

In Chinese, Chengyu are fixed phrases consisting of four characters. As a type of idioms, their meanings usually cannot be derived from their component characters. In this paper, we study the task of recommending a Chengyu given a textual context. Observing some of the limitations with existing work, we propose a two-stage model, where during the first stage we re-train a Chinese BERT model by masking out Chengyu from a large Chinese corpus with a wide coverage of Chengyu. During the second stage, we fine-tune the retrained, Chengyu-oriented BERT on a specific Chengyu recommendation dataset. We evaluate this method on ChID and CCT datasets and find that it can achieve the state of the art on both datasets. Ablation studies show that both stages of training are critical for the performance gain.

Keywords

natural language processing, chengyu recommendation, idiom understanding, question answering

Discipline

Databases and Information Systems | East Asian Languages and Societies | Numerical Analysis and Scientific Computing

Research Areas

Data Science and Engineering

Publication

ACM Transactions on Asian and Low-Resource Language Information Processing

Volume

Issue

First Page

Last Page

ISSN

2375-4699

Identifier

10.1145/3453185

Publisher

ACM

Embargo Period

3-11-2021

Citation

TAN, Minghuan; Jing JIANG; and DAI, Bingtian. A BERT-based two-stage model for Chinese Chengyu recommendation. (2021). ACM Transactions on Asian and Low-Resource Language Information Processing. 20, (6), 1-18.
Available at: https://ink.library.smu.edu.sg/sis_research/5821

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3453185

Download

Download Research Data

Find it in your library

Included in

Databases and Information Systems Commons, East Asian Languages and Societies Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

A BERT-based two-stage model for Chinese Chengyu recommendation

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Embargo Period

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

A BERT-based two-stage model for Chinese Chengyu recommendation

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Embargo Period

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links