Publication Type

Journal Article

Version

publishedVersion

Publication Date

9-2024

Abstract

The semantic understanding of numbers requires association with context. However, powerful neural networks overfit spurious correlations between context and numbers in training corpus can lead to the occurrence of contextual bias, which may affect the network's accurate estimation of number magnitude when making inferences in real-world data. To investigate the resilience of current methodologies against contextual bias, we introduce a novel out-of- distribution (OOD) numerical question-answering (QA) dataset that features specific correlations between context and numbers in the training data, which are not present in the OOD test data. We evaluate the robustness of different numerical encoding and decoding methods when confronted with contextual bias on this dataset. Our findings indicate that encoding methods incorporating more detailed digit information exhibit greater resilience against contextual bias. Inspired by this finding, we propose a digit-aware position embedding strategy, and the experimental results demonstrate that this strategy is highly effective in improving the robustness of neural networks against contextual bias.

Keywords

Natural language processing, Question answering, Out of distribution, Contextual bias, Number magnitude estimation

Discipline

Databases and Information Systems

Publication

KSII Transactions on Internet and Information Systems

Volume

18

Issue

9

First Page

2464

Last Page

2482

ISSN

1976-7277

Identifier

10.3837/tiis.2024.09.001

Copyright Owner and License

Publisher

Additional URL

https://doi.org/10.3837/tiis.2024.09.001

Share

COinS