Publication Type

Working Paper

Version

publishedVersion

Publication Date

12-2022

Abstract

We examine whether empirical results using text-based sentiment of U.S. annual reports depend on the underlying context, within documents, from which sentiment is measured. We construct a clause-level measure of context, showing that sentiment is driven by many different contexts and that positive and negative sentiment are driven by different contexts. We then construct context-level sentiment measures and examine whether sentiment works as expected at the context-level across four prediction problems. Our results demonstrate that document-level sentiment exhibits significant noise in prediction and suggest that document-level aggregation of sentiment leads to missed empirical nuances. The contexts driving sentiment results vary substantially by outcome, suggesting lower empirical internal validity for document-level sentiment. Using three additional sentiment measures, we document the same inferences, concluding that document-level aggregation likely leads to lower internal validity. Sentiment is thus best applied at the level of specific contexts rather than across whole documents.

Keywords

Sentiment analysis, context, machine learning, aggregation, lasso regression, text analysis

Discipline

Accounting | Finance and Financial Management | Numerical Analysis and Scientific Computing

Research Areas

Corporate Reporting and Disclosure

Copyright Owner and License

Authors

Share

COinS