Publication Type
Journal Article
Version
publishedVersion
Publication Date
10-2012
Abstract
A rapidly growing area in political science has focused on perfecting techniques to treat politicaltext as ‘data’, usually for the purposes of estimating latent traits such as left–right political policypositions.1 More traditional approaches have applied classical content analysis to categorize sub-unitsof political text, such as sentences in manifestos. Prominent examples of this latter approach includethe thirty-year old Comparative Manifestos Project and the Policy Agendas Project.2 ‘Text as data’approaches use machines to convert text to quantitative information and use statistical tools to makeinferences about characteristics of the author of the text. Content analysis schemes use humans to readtextual sub-units and assign these to pre-defined categories. Both methods require the prior identificationof a textual unit of analysis – a highly consequential, yet often unquestioned, feature of research design.Our objective in this Research Note is to question the dominant approach to unitizing politicaltexts prior to human coding. This is to parse texts into quasi-sentences (QSs), where a QS is definedas part or all of a natural sentence that states a distinct policy proposition. The use of the QS ratherthan a natural language unit (such as a sentence defined by punctuation) is motivated by the desireto capture all relevant political information, regardless of the stylistic decisions made by the author,for example, to use long or short natural sentences. The identification of QSs by human coders,however, is highly unreliable. If, comparing codings of the same texts using quasi-sentences andnatural sentences, there is no appreciable difference in measured political content, then there is astrong case for replacing ‘endogenous’ human unitization with ‘exogenous’ unitization based on natural sentences that can be identified with perfect reliability by machines using pre-specifiedpunctuation delimiters.
Discipline
Models and Methods | Political Science
Research Areas
Political Science
Publication
British Journal of Political Science
Volume
42
First Page
937
Last Page
951
ISSN
0007-1234
Identifier
10.1017/S0007123412000105
Publisher
Cambridge University Press
Citation
DAUBLER, Thomas, BENOIT, Kenneth, MIKHAYLOV, Slava, & LAVER, Michael.(2012). Natural sentences as valid units for coded political texts. British Journal of Political Science, 42, 937-951.
Available at: https://ink.library.smu.edu.sg/soss_research/3972
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1017/S0007123412000105