Publication Type

Journal Article

Version

publishedVersion

Publication Date

10-2012

Abstract

A rapidly growing area in political science has focused on perfecting techniques to treat politicaltext as ‘data’, usually for the purposes of estimating latent traits such as left–right political policypositions.1 More traditional approaches have applied classical content analysis to categorize sub-unitsof political text, such as sentences in manifestos. Prominent examples of this latter approach includethe thirty-year old Comparative Manifestos Project and the Policy Agendas Project.2 ‘Text as data’approaches use machines to convert text to quantitative information and use statistical tools to makeinferences about characteristics of the author of the text. Content analysis schemes use humans to readtextual sub-units and assign these to pre-defined categories. Both methods require the prior identificationof a textual unit of analysis – a highly consequential, yet often unquestioned, feature of research design.Our objective in this Research Note is to question the dominant approach to unitizing politicaltexts prior to human coding. This is to parse texts into quasi-sentences (QSs), where a QS is definedas part or all of a natural sentence that states a distinct policy proposition. The use of the QS ratherthan a natural language unit (such as a sentence defined by punctuation) is motivated by the desireto capture all relevant political information, regardless of the stylistic decisions made by the author,for example, to use long or short natural sentences. The identification of QSs by human coders,however, is highly unreliable. If, comparing codings of the same texts using quasi-sentences andnatural sentences, there is no appreciable difference in measured political content, then there is astrong case for replacing ‘endogenous’ human unitization with ‘exogenous’ unitization based on natural sentences that can be identified with perfect reliability by machines using pre-specifiedpunctuation delimiters.

Discipline

Models and Methods | Political Science

Research Areas

Political Science

Publication

British Journal of Political Science

Volume

42

First Page

937

Last Page

951

ISSN

0007-1234

Identifier

10.1017/S0007123412000105

Publisher

Cambridge University Press

Copyright Owner and License

Publisher

Additional URL

https://doi.org/10.1017/S0007123412000105

Share

COinS