Impact Evaluation on a Budget: World Bank Data and R


This entry will be the first in a series where we go through all of the Stata exercises in the World Bank’s excellent and free Handbook on Impact Evaluation: Quantitative Methods and Practices written by S. Khandker, G. Koolwal and H. Samad in 2009. The book can be downloaded for free here. The book has a series of chapters (11-16, in fact all of part 2) on Stata exercises designed to prepare the reader to conduct impact evaluations. To make this learning process more affordable this series will take you from installing R to estimating impacts using fuzzy regression discontinuity design. Go to part 1 of the book to read up on the theory and motivation for the techniques we will use in this series. The data files we will use can be downloaded from here. Go ahead and extract these to a folder you will remember.

Install R

To install R go to and follow the instructions for your OS.

Install R-Studio

I highly recommend this environment for working with R. While R can be run completely from the command line, RStudio is much more user friendly and provides an easier transition for users coming from Stata. Go to RStudio and download this free R development environment.


Each of the following sections will follow chapters in the book. I will leave out the exposition and instead focus on the commands. I will present Stata commands first followed by the equivalent expression in R.

File Structure

The book assumes you are using a PC, I’m using a Mac. They create several folders, I recommend creating these as well except for the do and log folders.



OSX or Linux:

 ~/eval ~/eval/data

To avoid confusion I will present commands using *NIX style paths. In fact, we could make all of our path statements shorted in R by setting the working directory to the data folder:


I will use the full path in the following code, but if you set your working directory you can use the shorter versions. If you are using Windows use the folder structure above instead.

Opening a Data Set


use ~/eval/data/hh_98.dta


library("foreign") hh_98 = read.dta('~/eval/data/hh_98.dta')

Save a Data Set


save hh_98, replace


save(hh_98, file="~/eval/data/hh_98.RData")

Exit the Program

With prompt to save





Reckless (or confident) version


exit, clear



Even shorter R command:



Strict command requiring the correct command or keyword to be used


help memory



Even shorter R command:


Help search


search mem




Next time we will continue with Chapter 11 and begin Working with Data Files.

Economic Data Resources: South Africa

If, like me, you become interested in analyzing post-Apartheid South Africa, you will need to find as much good-quality data as you can get your hands on. Here are links to sources that I found useful: (Please add any resources you think are lacking in the comments below. I will update this post as I become aware of more resources.)

Country Profiles

These country profiles provide a handy overview and are useful for brainstorming regional research topics.

General Sources

These general sources are collected and maintained by South African institutions.

Household Surveys

  • KIDS dataset: this survey includes 3 waves – 1993, 1998, and 2004 covering the KwaZulu-Natal province.
  • NIDS dataset: wave 1 of this national survey covers 2008 while the second wave includes data from 2010 and 2011.
  • IPUMS – click on “Change Samples” to select the South African surveys
  • World Bank LSMS
  • World Bank Microdata

Other Projects on South Africa

Trade Statistics