## R for Impact Evaluation: R and Stata Side-by-side

This tutorial follows the Handbook on Impact Evaluation: Quantitative Methods and Practices, chapter 11. The data files we will use can be downloaded from here. The first part of Chapter 11 is covered in Impact Evaluation on a Budget: World Bank Data and R.

# Notes on Commands

• Stata commands are typed in lowercase, R commands are functions (e.g., `ls()`)
• In Stata, you can type abbreviated forms of functions and variables provided there is no ambiguity. In R, you must use the full function or variable name.
• In Stata, use the Page-Up and Page-Down keys to cycle through previously entered commands. In R, use the Up and Down Arrow keys to do this.

# Working with Data Files: Looking at the Content

## Open the Dataset

Here I assume you saved the file (from the previous tutorial) to the `~/eval/data` folder. Stata:

`use ~/eval/data/hh_98.dta`

R:

`library(foreign) hh_98 = read.dta('~/eval/data/hh_98.dta')`

(If you don’t already have the `foreign` library installed, you can use the command `install.packages("foreign")`.)

## Listing the Variables

Stata:

`describe`

R:

`ls(hh_98) dim(hh_98) sapply(hh_98,class)`

The function `ls(x)` displays the names of the objects within `x`. If you just enter `ls()`, R will show you the names of the objects open in your current environment (remember you can use `?ls` to see the R documentation for the `ls()` function). The function `dim(x)` returns the dimensions of object `x`. When measuring a data.frame, like `hh_98`, `dim()` returns the number of rows first followed by the number of columns. The function `sapply(x,FUN)` returns a simplified result from applying the function `FUN` to each object in `x`. The function `class(x)` returns the class of object `x`.

## Wildcards and Abbreviations

Stata:

`describe exp∗`

R:

`summary(hh_98[grep("exp", colnames(hh_98))])`

In R, it is possible to do things even if we don’t know the exact name of the object we want to analyze. Starting from the innermost function and working our way out, `colnames(hh_98)` returns a vector where each element is the name of a column of `hh_98`. `grep("exp", x)` returns the indices of the elements that contain “exp” (you can also use regexp here) within `x`. Placing the resulting vector of indices into `hh_98[]` returns the matching columns. Finally, `summary()` returns the following summary of the returned columns:

` expfd expnfd exptot Min. : 945.3 Min. : 89.55 Min. : 1193 1st Qu.: 2602.1 1st Qu.: 514.37 1st Qu.: 3254 Median : 3373.7 Median : 865.31 Median : 4432 Mean : 3660.2 Mean : 1813.08 Mean : 5473 3rd Qu.: 4232.5 3rd Qu.: 1710.24 3rd Qu.: 6039 Max. :15270.7 Max. :43411.15 Max. :47981`

## Listing Data

List the first three entries in hh_98: Stata:

`list in 1/3`

R:

`hh_98[1:3,]`

In R, you can access records in a data.frame using matrix notation. The colon (`:`) separates the beginning and ending of a sequence. By leaving the portion following the comma blank, we tell R to show all columns. List household size and head’s education for households headed by a female who is younger than 45: Stata:

`list famsize educhead if (sexhead==0 &amp; agehead&lt;45)`

R:

`subset(hh_98,sexhead==0 &amp; agehead&lt;45,c(famsize,educhead))`

The `subset()` function is another method of selecting elements. Here’s the matrix form of the same subset: R:

`hh_98[hh_98\$sexhead==0 &amp; hh_98\$agehead&lt;45,c("famsize","educhead")]`

Browse or Edit the data: Stata:

`browse edit`

R:

`View(hh_98) edit(hh_98)`

## Summarizing Data

Display summary statistics for a few variables: Stata:

`sum famsize educhead sum famsize educhead, d`

R:

`summary(hh_98[,c("famsize","educhead")]) library(psych) describe(hh_98[,c("famsize","educhead")])`

(If you don’t already have the `foreign` library installed, you can use the command `install.packages("foreign")`.) Using survey weights: Stata:

`sum famsize educhead [aw=weight]`

R:

`library(survey) design &lt;- svydesign(id=~nh,weights=~weight,data=hh_98) svymean(~famsize + educhead,design)`

(If you don’t already have the `survey` library installed, you can use the command `install.packages("survey")`.) Summarize by groups: Stata:

`sort dfmfd by dfmfd: sum famsize educhead [aw=weight] tabstat famsize educhead, statistics(mean sd) by(dfmfd)`

R:

`library(survey) svyby(~famsize + educhead, ~dfmfd, design, svymean)`

(you only need to call `library(survey)` once per session).

## Frequency Distributions (Tabulations)

Stata:

`tab dfmfd `

R:

`table(hh_98\$dfmfd)`

In R, the `table()` function presents a table similar to the tabulate function in Stata, but only shows the counts grouped by factor. To see both the counts and percentages, as in the Stata program, we can divide by the total count (i.e., the `length()`). I group the counts and percentages using a `list()` so they are displayed together. R:

`list(count=table(hh_98\$dfmfd),percent=table(hh_98\$dfmfd)/length(hh_98\$dfmfd))`

Frequency tables over subsets and for multiple variables: Stata:

`tab sexhead if dfmfd==1 tab educhead sexhead`

R:

`table(hh_98[hh_98\$dfmfd==1,]\$sexhead) table(hh_98\$educhead, hh_98\$sexhead)`

Column and row percentages: Stata:

`tab dfmfd sexhead, col row`

R:

`mytable &lt;- table(hh_98\$dfmfd, hh_98\$sexhead) list(counts = mytable, percent.row = prop.table(mytable,1), percent.col = prop.table(mytable,2), count.row = margin.table(mytable,1), count.col = margin.table(mytable,2))`

## Distributions of Table Statistics

Stata:

`table dfmfd, c(mean famsize mean educhead)`

R:

`by(hh_98[c("famsize","educhead")], hh_98\$dfmfd, colMeans)`

Breakdown by two factors: Stata:

`table dfmfd sexhead, c(mean famsize mean educhead)`

R:

`by(hh_98[c("famsize","educhead")], hh_98[c("dfmfd","sexhead")], colMeans)`

## Missing Values

In Stata, missing values are represented by “`.`” In R, missing values are represented by “`NA`

## Counting Observations

Stata:

`count count if agehead&gt;50`

R:

`dim(hh_98) dim(hh_98[hh_98\$agehead&gt;50,])`

-or-

`length(hh_98[,1]) length(hh_98[hh_98\$agehead&gt;50,1])`

## Using Weights

For information on using weights in R, take a look at the homepage for the `survey` package: http://r-survey.r-forge.r-project.org/survey/

# Other Resources

The following websites are useful for searching for R:

Remember to use `?` to look up functions and `?? to search for help within R (e.g., "?by").`

``` ```

## `Impact Evaluation on a Budget: World Bank Data and R`

``` Introduction This entry will be the first in a series where we go through all of the Stata exercises in the World Bank’s excellent and free Handbook on Impact Evaluation: Quantitative Methods and Practices written by S. Khandker, G. Koolwal and H. Samad in 2009. The book can be downloaded for free here. The book has a series of chapters (11-16, in fact all of part 2) on Stata exercises designed to prepare the reader to conduct impact evaluations. To make this learning process more affordable this series will take you from installing R to estimating impacts using fuzzy regression discontinuity design. Go to part 1 of the book to read up on the theory and motivation for the techniques we will use in this series. The data files we will use can be downloaded from here. Go ahead and extract these to a folder you will remember. Install R To install R go to cran.us.r-project and follow the instructions for your OS. Install R-Studio I highly recommend this environment for working with R. While R can be run completely from the command line, RStudio is much more user friendly and provides an easier transition for users coming from Stata. Go to RStudio and download this free R development environment. Overview Each of the following sections will follow chapters in the book. I will leave out the exposition and instead focus on the commands. I will present Stata commands first followed by the equivalent expression in R. File Structure The book assumes you are using a PC, I’m using a Mac. They create several folders, I recommend creating these as well except for the do and log folders. PC: c:\eval c:\eval\data OSX or Linux: ~/eval ~/eval/data To avoid confusion I will present commands using *NIX style paths. In fact, we could make all of our path statements shorted in R by setting the working directory to the data folder: setwd('~/eval/data') I will use the full path in the following code, but if you set your working directory you can use the shorter versions. If you are using Windows use the folder structure above instead. Opening a Data Set Stata: use ~/eval/data/hh_98.dta R: library("foreign") hh_98 = read.dta('~/eval/data/hh_98.dta') Save a Data Set Stata: save hh_98, replace R: save(hh_98, file="~/eval/data/hh_98.RData") Exit the Program With prompt to save Stata: exit R: quit() Reckless (or confident) version Stata: exit, clear R: quit(save="no") Even shorter R command: q("no") Help Strict command requiring the correct command or keyword to be used Stata: help memory R: help(Memory) Even shorter R command: ?Memory Help search Stata: search mem R: ??mem Conclusion Next time we will continue with Chapter 11 and begin Working with Data Files. Filed Under: Tutorials Tagged With: Data Resources, R, Stata Leave a Comment```
``` (function(){ var corecss = document.createElement('link'); var themecss = document.createElement('link'); var corecssurl = "http://economistry.com/wp-content/plugins/syntaxhighlighter/syntaxhighlighter3/styles/shCore.css?ver=3.0.9b"; if ( corecss.setAttribute ) { corecss.setAttribute( "rel", "stylesheet" ); corecss.setAttribute( "type", "text/css" ); corecss.setAttribute( "href", corecssurl ); } else { corecss.rel = "stylesheet"; corecss.href = corecssurl; } document.getElementsByTagName("head").insertBefore( corecss, document.getElementById("syntaxhighlighteranchor") ); var themecssurl = "http://economistry.com/wp-content/plugins/syntaxhighlighter/syntaxhighlighter3/styles/shThemeDefault.css?ver=3.0.9b"; if ( themecss.setAttribute ) { themecss.setAttribute( "rel", "stylesheet" ); themecss.setAttribute( "type", "text/css" ); themecss.setAttribute( "href", themecssurl ); } else { themecss.rel = "stylesheet"; themecss.href = themecssurl; } //document.getElementById("syntaxhighlighteranchor").appendChild(themecss); document.getElementsByTagName("head").insertBefore( themecss, document.getElementById("syntaxhighlighteranchor") ); })(); SyntaxHighlighter.config.strings.expandSource = '+ expand source'; SyntaxHighlighter.config.strings.help = '?'; SyntaxHighlighter.config.strings.alert = 'SyntaxHighlighter\n\n'; SyntaxHighlighter.config.strings.noBrush = 'Can\'t find brush for: '; SyntaxHighlighter.config.strings.brushNotHtmlScript = 'Brush wasn\'t configured for html-script option: '; SyntaxHighlighter.defaults['pad-line-numbers'] = false; SyntaxHighlighter.defaults['toolbar'] = false; SyntaxHighlighter.all(); // Infinite scroll support if ( typeof( jQuery ) !== 'undefined' ) { jQuery( function( \$ ) { \$( document.body ).on( 'post-load', function() { SyntaxHighlighter.highlight(); } ); } ); } /* <![CDATA[ */ var WPGroHo = {"my_hash":""}; /* ]]> */ _stq = window._stq || []; _stq.push([ 'view', {v:'ext',j:'1:6.1',blog:'41518866',post:'0',tz:'-10',srv:'economistry.com'} ]); _stq.push([ 'clickTrackerInit', '41518866', '0' ]); ```