Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

4-2023

Abstract

We propose SubText, a compression mechanism via vocabulary reduction. The crux is to judiciously select a subset of word embeddings which support the reconstruction of the remaining word embeddings based on their form alone. The proposed algorithm considers the preservation of the original embeddings, as well as a word’s relationship to other words that are morphologically or semantically similar. Comprehensive evaluation of the compressed vocabulary reveals SubText’s efficacy on diverse tasks over traditional vocabulary reduction techniques, as validated on English, as well as a collection of inflected languages.

Keywords

Word embeddings, compression, vocabulary reduction

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing | Theory and Algorithms

Research Areas

Data Science and Engineering

Publication

WI-IAT '22: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology 2022, Niagara Falls, Canada, November 17-20

First Page

56

Last Page

63

ISBN

9781665494021

Identifier

10.1109/WI-IAT55865.2022.00018

Publisher

ACM

City or Country

New York

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1109/WI-IAT55865.2022.00018

Share

COinS