Publication Type
Working Paper
Version
publishedVersion
Publication Date
9-2016
Abstract
It is costly to collect the household- andindividual-level data that underlies official estimates of poverty and health. Forthis reason, developing countries often do not have the budget to update their estimatesof poverty and health regularly, even though these estimates are most neededthere. One way to reduce the financial burden is to substitute some of the realdata with predicted data. An approach referred to as double sampling collectsthe expensive outcome variable for a sub-sample only while collecting thecovariates used for prediction for the full sample. The objective of this studyis to determine if this would indeed allow for realizing meaningful reductionsin financial costs while preserving statistical precision. The study does thisusing analytical calculations that allow for considering a wide range of parametervalues that are plausible to real applications. The benefits of using double samplingare found to be modest. There are circumstances for which the gains can be moresubstantial, but the study conjectures that these denote the exceptions ratherthan the rule. The recommendation is to rely on real data whenever there is aneed for new data, and use the prediction estimator to leverage existing data.
Keywords
Prediction, Double sampling, Survey costs, Poverty
Discipline
Income Distribution | Public Economics
Research Areas
Applied Microeconomics
First Page
1
Last Page
43
Publisher
World Bank Policy Research Working Paper 7841
City or Country
Washinton, DC
Citation
FUJII, Tomoki and VAN DER WEIDE, Roy.
Is predicted data a viable alternative to real data?. (2016). 1-43.
Available at: https://ink.library.smu.edu.sg/soe_research/2296
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Comments
Published in World Bank Economic Review, forthcoming (2019)