Creating Your First PDF with LaTeX and Atom

This tutorial will walk you through the steps of creating your first PDF with LaTeX and Atom. This guide focuses on installing LaTeX and Atom on a Mac, but since Atom is a cross-platform editor, most of the instructions should work on Windows and Linux as well. You will need about an hour to download everything and to produce your first PDF.

See our previous tutorials on creating PDFs with LaTeX and SublimeText:

Making your first PDF with LaTeX and Sublime Text 2

Making your first PDF with LaTeX and Sublime Text 2 for Mac

Install MacTex

Download MacTeX. MacTeX installs everything you need to compile tex files into PDFs. This will take a while, so grab a coffee.

Install Atom

If you haven’t already, download the awesome Atom text editor Atom text editor. Atom is awesome because it is open source and supported by GitHub.

On my MacBook Pro running Yosemite, I clicked on the “Download For Mac” buttun, then openned the downloaded atom-mac.zip. In Finder, just drag “Atom” to your Applications folder. You can then find Atom in your Applications folder or launch it from Spotlight. The first time you open Atom, press the “Open” button to trust Atom if prompted.

Install Skim (for previewing PDFs)

LatexTools makes use of Skim for previewing works-in-progress. Download and install Skim. On OS X Yosemite, I installed version 1.4.17.

To make Skim trusted so that the preview will work, open Skim by holding down the control key while clicking on the Skim icon in the “Applications” folder in Finder. Click “Open” at the prompt.

Install LatexTools

Open the “Settings” tab by pressing `Command+`` or using the menu “Atom > Preferences…”.

Click on the “Install” tab on the left. Type in language-latex and click the “Install” button in the language-latex package box. I installed version 0.6.1. This package provides syntax highlighting that will make working with TeX much more enjoyable.

Next Type in latextools and install the latextools package.

Create a tex source file

Create a new file if you don’t already have one up (you should see a tab titled “untitled” if you already have a new file open). To create a new file go to “File > New File” in the menu or use the keyboard shortcut Command+N.

In the new file paste the following TeX sample:

\documentclass{article}
\title{Title}
\author{Your Name}
\begin{document}
\maketitle{}
\section{Introduction}
This is where you will write your content.
\end{document}

Save this file as sample.tex. You should now see that the content is now recognized by the syntax highlighter (see all the pretty colors?).

Build and view your PDF

To build this PDF, use the following keyboard shortcut: Command+Alt+B (i.e., all three of those keys at the same time). If that doesn’t work, check your keybindings in the “Settings” tab, in the “Keybindings” tab on the left. Type in latextools:build to see what the command for your system is. On a Mac (i.e., “Darwin”) the keybinding should read alt-cmd-b, for Windowss and Linux the default is probably ctrl-alt-b.

Conclusion

Hopefully now you have your first PDF ready to show off to all your neighbors. If not, let me know in the comments below so I can update the tutorial.

R for Impact Evaluation: R and Stata Side-by-side

This tutorial follows the Handbook on Impact Evaluation: Quantitative Methods and Practices, chapter 11. The data files we will use can be downloaded from here. The first part of Chapter 11 is covered in Impact Evaluation on a Budget: World Bank Data and R.

Notes on Commands

  • Stata commands are typed in lowercase, R commands are functions (e.g., ls())
  • In Stata, you can type abbreviated forms of functions and variables provided there is no ambiguity. In R, you must use the full function or variable name.
  • In Stata, use the Page-Up and Page-Down keys to cycle through previously entered commands. In R, use the Up and Down Arrow keys to do this.

Working with Data Files: Looking at the Content

Open the Dataset

Here I assume you saved the file (from the previous tutorial) to the ~/eval/data folder. Stata:

use ~/eval/data/hh_98.dta

R:

library(foreign) hh_98 = read.dta('~/eval/data/hh_98.dta')

(If you don’t already have the foreign library installed, you can use the command install.packages("foreign").)

Listing the Variables

Stata:

describe

R:

ls(hh_98) dim(hh_98) sapply(hh_98,class)

The function ls(x) displays the names of the objects within x. If you just enter ls(), R will show you the names of the objects open in your current environment (remember you can use ?ls to see the R documentation for the ls() function). The function dim(x) returns the dimensions of object x. When measuring a data.frame, like hh_98, dim() returns the number of rows first followed by the number of columns. The function sapply(x,FUN) returns a simplified result from applying the function FUN to each object in x. The function class(x) returns the class of object x.

Wildcards and Abbreviations

Stata:

describe exp∗

R:

summary(hh_98[grep("exp", colnames(hh_98))])

In R, it is possible to do things even if we don’t know the exact name of the object we want to analyze. Starting from the innermost function and working our way out, colnames(hh_98) returns a vector where each element is the name of a column of hh_98. grep("exp", x) returns the indices of the elements that contain “exp” (you can also use regexp here) within x. Placing the resulting vector of indices into hh_98[] returns the matching columns. Finally, summary() returns the following summary of the returned columns:

 expfd expnfd exptot Min. : 945.3 Min. : 89.55 Min. : 1193 1st Qu.: 2602.1 1st Qu.: 514.37 1st Qu.: 3254 Median : 3373.7 Median : 865.31 Median : 4432 Mean : 3660.2 Mean : 1813.08 Mean : 5473 3rd Qu.: 4232.5 3rd Qu.: 1710.24 3rd Qu.: 6039 Max. :15270.7 Max. :43411.15 Max. :47981

Listing Data

List the first three entries in hh_98: Stata:

list in 1/3

R:

hh_98[1:3,]

In R, you can access records in a data.frame using matrix notation. The colon (:) separates the beginning and ending of a sequence. By leaving the portion following the comma blank, we tell R to show all columns. List household size and head’s education for households headed by a female who is younger than 45: Stata:

list famsize educhead if (sexhead==0 & agehead<45)

R:

subset(hh_98,sexhead==0 & agehead<45,c(famsize,educhead))

The subset() function is another method of selecting elements. Here’s the matrix form of the same subset: R:

hh_98[hh_98$sexhead==0 & hh_98$agehead<45,c("famsize","educhead")]

Browse or Edit the data: Stata:

browse edit

R:

View(hh_98) edit(hh_98)

Summarizing Data

Display summary statistics for a few variables: Stata:

sum famsize educhead sum famsize educhead, d

R:

summary(hh_98[,c("famsize","educhead")]) library(psych) describe(hh_98[,c("famsize","educhead")])

(If you don’t already have the foreign library installed, you can use the command install.packages("foreign").) Using survey weights: Stata:

sum famsize educhead [aw=weight]

R:

library(survey) design <- svydesign(id=~nh,weights=~weight,data=hh_98) svymean(~famsize + educhead,design)

(If you don’t already have the survey library installed, you can use the command install.packages("survey").) Summarize by groups: Stata:

sort dfmfd by dfmfd: sum famsize educhead [aw=weight] tabstat famsize educhead, statistics(mean sd) by(dfmfd)

R:

library(survey) svyby(~famsize + educhead, ~dfmfd, design, svymean)

(you only need to call library(survey) once per session).

Frequency Distributions (Tabulations)

Stata:

tab dfmfd 

R:

table(hh_98$dfmfd)

In R, the table() function presents a table similar to the tabulate function in Stata, but only shows the counts grouped by factor. To see both the counts and percentages, as in the Stata program, we can divide by the total count (i.e., the length()). I group the counts and percentages using a list() so they are displayed together. R:

list(count=table(hh_98$dfmfd),percent=table(hh_98$dfmfd)/length(hh_98$dfmfd))

Frequency tables over subsets and for multiple variables: Stata:

tab sexhead if dfmfd==1 tab educhead sexhead

R:

table(hh_98[hh_98$dfmfd==1,]$sexhead) table(hh_98$educhead, hh_98$sexhead)

Column and row percentages: Stata:

tab dfmfd sexhead, col row

R:

mytable <- table(hh_98$dfmfd, hh_98$sexhead) list(counts = mytable, percent.row = prop.table(mytable,1), percent.col = prop.table(mytable,2), count.row = margin.table(mytable,1), count.col = margin.table(mytable,2))

Distributions of Table Statistics

Stata:

table dfmfd, c(mean famsize mean educhead)

R:

by(hh_98[c("famsize","educhead")], hh_98$dfmfd, colMeans)

Breakdown by two factors: Stata:

table dfmfd sexhead, c(mean famsize mean educhead)

R:

by(hh_98[c("famsize","educhead")], hh_98[c("dfmfd","sexhead")], colMeans)

Missing Values

In Stata, missing values are represented by “.” In R, missing values are represented by “NA

Counting Observations

Stata:

count count if agehead>50

R:

dim(hh_98)[1] dim(hh_98[hh_98$agehead>50,])[1]

-or-

length(hh_98[,1]) length(hh_98[hh_98$agehead>50,1])

Using Weights

For information on using weights in R, take a look at the homepage for the survey package: http://r-survey.r-forge.r-project.org/survey/

Other Resources

The following websites are useful for searching for R:

Remember to use ? to look up functions and ?? to search for help within R (e.g., "?by").

Impact Evaluation on a Budget: World Bank Data and R

Introduction

This entry will be the first in a series where we go through all of the Stata exercises in the World Bank’s excellent and free Handbook on Impact Evaluation: Quantitative Methods and Practices written by S. Khandker, G. Koolwal and H. Samad in 2009. The book can be downloaded for free here. The book has a series of chapters (11-16, in fact all of part 2) on Stata exercises designed to prepare the reader to conduct impact evaluations. To make this learning process more affordable this series will take you from installing R to estimating impacts using fuzzy regression discontinuity design. Go to part 1 of the book to read up on the theory and motivation for the techniques we will use in this series. The data files we will use can be downloaded from here. Go ahead and extract these to a folder you will remember.

Install R

To install R go to cran.us.r-project and follow the instructions for your OS.

Install R-Studio

I highly recommend this environment for working with R. While R can be run completely from the command line, RStudio is much more user friendly and provides an easier transition for users coming from Stata. Go to RStudio and download this free R development environment.

Overview

Each of the following sections will follow chapters in the book. I will leave out the exposition and instead focus on the commands. I will present Stata commands first followed by the equivalent expression in R.

File Structure

The book assumes you are using a PC, I’m using a Mac. They create several folders, I recommend creating these as well except for the do and log folders.

PC:

 c:\eval
c:\eval\data

OSX or Linux:

 ~/eval ~/eval/data

To avoid confusion I will present commands using *NIX style paths. In fact, we could make all of our path statements shorted in R by setting the working directory to the data folder:

setwd('~/eval/data')

I will use the full path in the following code, but if you set your working directory you can use the shorter versions. If you are using Windows use the folder structure above instead.

Opening a Data Set

Stata:

use ~/eval/data/hh_98.dta

R:

library("foreign") hh_98 = read.dta('~/eval/data/hh_98.dta')

Save a Data Set

Stata:

save hh_98, replace

R:

save(hh_98, file="~/eval/data/hh_98.RData")

Exit the Program

With prompt to save

Stata:

exit

R:

quit()

Reckless (or confident) version

Stata:

exit, clear

R:

quit(save="no")

Even shorter R command:

q("no")

Help

Strict command requiring the correct command or keyword to be used

Stata:

help memory

R:

help(Memory)

Even shorter R command:

?Memory

Help search

Stata:

search mem

R:

??mem

Conclusion

Next time we will continue with Chapter 11 and begin Working with Data Files.

Google Charts and CSV Part 3: Side-by-Side Bubble Charts

Introduction

If you haven’t already, go ahead and take a look at the previous two installments of this series (Easy Data Visualization with Google Charts and a CSV and More Google Charts with a CSV: Bubble Charts). Today we’re going to take the bubble chart from More Google Charts with a CSV: Bubble Charts and add another chart to the same page. It’s not exactly as simple as duplicating all the code we created last time, but it nearly is.

Setup

Begin by downloading the finished product from last time here. Also, I’ve pasted the HTML below in case that’s easier for you:

<!DOCTYPE html> <html> <head> <title>Google Chart Example</title> <style> ul {list-style-type: none;} </style> <script src="https://www.google.com/jsapi"></script> <script src="http://code.jquery.com/jquery-1.10.1.min.js"></script> <script src="jquery.csv-0.71.js"></script> <script> // load the visualization library from Google and set a listener google.load("visualization", "1", {packages:["corechart"]}); google.setOnLoadCallback(drawChart); function drawChart() { // grab the CSV $.get("kzn1993.csv", function(csvString) { // transform the CSV string into a 2-dimensional array var arrayData = $.csv.toArrays(csvString, {onParseValue: $.csv.hooks.castToScalar}); // use arrayData to load the select elements with the appropriate options for (var i = 0; i < arrayData[0].length; i++) { // this adds the given option to both select elements $("select").append("<option value='" + i + "'>" + arrayData[0][i] + "</option"); } // set the default selection $("#domain option[value='0']").attr("selected","selected"); $("#range option[value='1']").attr("selected","selected"); // this new DataTable object holds all the data var data = new google.visualization.arrayToDataTable(arrayData); // this view can select a subset of the data at a time var view = new google.visualization.DataView(data); view.setColumns([{calc:stringID, type: "string"},1,2,3]); // this function returns the first column values as strings (by row) function stringID(dataTable, rowNum){ // return dataTable.getValue(rowNum, 0).toString(); // return an empty string instead to avoid the bubble labels return ""; } var options = { title: "KwaZulu-Natal Household Survey (1993)", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, bubble: {stroke: "transparent", opacity: 0.2}, colorAxis: {colors:['red','blue']}, }; var chart = new google.visualization.BubbleChart(document.getElementById('chart')); chart.draw(view, options); // set listener for the update button $("select").change(function(){ // determine selected domain and range var domain = +$("#domain option:selected").val(); var range = +$("#range option:selected").val(); var color = +$("#color option:selected").val(); var size = +$("#size option:selected").val(); // update the view view.setColumns([{calc:stringID, type: "string", label: "Household ID"},domain,range,color,size]); // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; options.bubble = {stroke: "transparent", opacity: 0.2}; options.colorAxis = {colors:['red','blue']}; // update the chart chart.draw(view, options); }); }); } </script> </head> <body> <div id="chart" style="width:800px; height:500px;"> </div> <ul> <li> Y-Axis <select id="range"></select> </li> <li> X-Axis <select id="domain"></select> </li> <li> Color <select id="color"></select> </li> <li> Size <select id="size"></select> </li> </ul> </body> </html> 

Data

For this tutorial I’ve split the South African data by province in 1993. File kz1993.csv contains only the households in the former KwaZulu bantustan. File n1993.csv contains only the black households in the Natal province of what was at the time “white” South Africa. For more details on the data, please see the first tutorial in this series (Easy Data Visualization with Google Charts and a CSV).

HTML

Let’s begin by modifying the HTML. First we’ll encapsulate both the chart and the list in a <div> that is floated to the left. We also need to add a class (I’m going to choose chart) to each of the <select> elements to distinguish the chart on the left from the one on the right:

 <div style="float:left;"> <div id="chart" style="width:600px; height:500px;"> </div> <ul> <li> Y-Axis <select class="chart" id="range"></select> </li> <li> X-Axis <select class="chart" id="domain"></select> </li> <li> Color <select class="chart" id="color"></select> </li> <li> Size <select class="chart" id="size"></select> </li> </ul> </div> 

So far so good. Now copy this entire <div> and paste a duplicate below. Change the ids and classes for this <div> by adding a 2. Also change the float direction to “right“:

 <div style="float:right"> <div id="chart2" style="width:600px; height:500px;"> </div> <ul> <li> Y-Axis <select class="chart2" id="range2"></select> </li> <li> X-Axis <select class="chart2" id="domain2"></select> </li> <li> Color <select class="chart2" id="color2"></select> </li> <li> Size <select class="chart2" id="size2"></select> </li> </ul> </div> 

JavaScript

Chart 1

Now we need to make some adjustments to our drawChart() callback function. First we’ll change the CSV file reference from the kzn1993.csv to kz1993.csv.

 function drawChart() { // grab the first CSV $.get("kz1993.csv", function(csvString) { 

In the for loop we need to change the jQuery selection of all select elements to only those with the chart class:

 // use arrayData to load the select elements with the appropriate options for (var i = 0; i < arrayData[0].length; i++) { // this adds the given option to both select elements $("select.chart").append("<option value='" + i + "'>" + arrayData[0][i] + "</option>"); } 

The last change we need to make for the chart on the left is to modify the title:

 var options = { title: "KwaZulu-Natal Household Survey (1993) - KwaZulu", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, bubble: {stroke: "transparent", opacity: 0.2}, colorAxis: {colors:['red','blue']}, }; 

Chart 2

Now, for the chart on the right we can start by copying the $.get(); call of the first chart. We just need to change the referenced CSV, the referenced HTML ids and classes, and the title. First, change the referenced CSV:

 // grab the second CSV (this one covers Natal Province) $.get("n1993.csv", function(csvString) { 

Next, change the referenced ids and classes here:

 // use arrayData to load the select elements with the appropriate options for (var i = 0; i < arrayData[0].length; i++) { // this adds the given option to both select elements $("select.chart2").append("<option value='" + i + "'>" + arrayData[0][i] + "</option>"); } // set the default selection $("#domain2 option[value='0']").attr("selected","selected"); $("#range2 option[value='1']").attr("selected","selected"); 

and here:

 var chart = new google.visualization.BubbleChart(document.getElementById('chart2')); chart.draw(view, options); // set listener for the update button $("select.chart2").change(function(){ // determine selected domain and range var domain = +$("#domain2 option:selected").val(); var range = +$("#range2 option:selected").val(); var color = +$("#color2 option:selected").val(); var size = +$("#size2 option:selected").val(); 

The last thing to do is change the title:

 var options = { title: "KwaZulu-Natal Household Survey (1993) - Natal", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, bubble: {stroke: "transparent", opacity: 0.2}, colorAxis: {colors:['red', 'blue']}, }; 

Conclusion

The end result looks pretty nice (if your charts stack vertically, you need to make your charts smaller or your screen wider). side_by_side To play around with a live version go here. If any of this was confusing, please check the previous two tutorials (Easy Data Visualization with Google Charts and a CSV and More Google Charts with a CSV: Bubble Charts) or leave a comment below.

More Google Charts with a CSV: Bubble Charts

Last time we built an interactive scatter plot. This time we’re going to turn that scatter plot into a bubble chart (see a preview of the finished product here). Start by openning up the HTML document we created last time. You can see the source here or expand the section below:

 <!DOCTYPE html> <html> <head> <title>Google Chart Example</title> <script src="https://www.google.com/jsapi"></script> <script src="http://code.jquery.com/jquery-1.10.1.min.js"></script> <script src="jquery.csv-0.71.js"></script> <script> // load the visualization library from Google and set a listener google.load("visualization", "1", {packages:["corechart"]}); google.setOnLoadCallback(drawChart); function drawChart() { // grab the CSV $.get("kzn1993.csv", function(csvString) { // transform the CSV string into a 2-dimensional array var arrayData = $.csv.toArrays(csvString, {onParseValue: $.csv.hooks.castToScalar}); // use arrayData to load the select elements with the appropriate options for (var i = 0; i < arrayData[0].length; i++) { // this adds the given option to both select elements $("select").append("<option value='" + i + "'>" + arrayData[0][i] + "</option"); } // set the default selection $("#domain option[value='0']").attr("selected","selected"); $("#range option[value='1']").attr("selected","selected"); // this new DataTable object holds all the data var data = new google.visualization.arrayToDataTable(arrayData); // this view can select a subset of the data at a time var view = new google.visualization.DataView(data); view.setColumns([0,1]); var options = { title: "KwaZulu-Natal Household Survey (1993)", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, legend: 'none' }; var chart = new google.visualization.ScatterChart(document.getElementById('chart')); chart.draw(view, options); // set listener for the update button $("select").change(function(){ // determine selected domain and range var domain = +$("#domain option:selected").val(); var range = +$("#range option:selected").val(); // update the view view.setColumns([domain,range]); // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; // update the chart chart.draw(view, options); }); }); } </script> </head> <body> <div id="chart" style="width:800px; height:500px;"> </div> <select id="range"></select> <select id="domain"></select> </body> </html> 

Add Controls for Size and Color

Bubble charts add two dimension, size and color, to the standard scatter plot (here I’m using Google’s terminology, several other graphics libraries simply add this functionality to their scatter plot functions). To keep the nice interactivity we built into our last chart, let’s start by adding controls for the color and size. We’ll nest everything in an unordered list and add labels to the controls. Just change the section with two <select> tags to match the following:

 <ul> <li> Y-Axis <select id="range"></select> </li> <li> X-Axis <select id="domain"></select> </li> <li> Color <select id="color"></select> </li> <li> Size <select id="size"></select> </li> </ul> 

Next we want to get rid of the bullets in our unordered list. Add the following <style> tag inside your <head> tag.

 <style> ul {list-style-type: none; } </style> 

Changing the Chart Type

Change the line that loads the chart object from this:

 var chart = new google.visualization.ScatterChart(document.getElementById('chart')); 

to this:

 var chart = new google.visualization.BubbleChart(document.getElementById('chart')); 

Feeding the Data to the Chart

The data table for Google’s bubble chart requires the first coloumn to be a string which can be used to identify the bubbles. When we loaded the CSV into an array in the last tutorial, we parsed all values as scalars. We need to update our DataView call to change the values in the first column, the household ids (hhid), to string. This requires us to add a function to retrieve these strings from the DataTable.

 var view = new google.visualization.DataView(data); view.setColumns([{calc:stringID, type: "string"},1,2,3]); // this function returns the first column values as strings (by row) function stringID(dataTable, rowNum){ return dataTable.getValue(rowNum, 0).toString(); } 

Updating the Chart

Now we need to modify the code that updates the chart when a user changes the selected variables. First we’ll add local variables for color and size to the <select> listener function. These variables need to be assigned the value of the respective <select> tag. After we have column indices for color and size, we will set these as the third and fourth columns (after the id column) in our bubble chart view. See the highlighted lines below:

 $("select").change(function(){ // determine selected domain and range var domain = +$("#domain option:selected").val(); var range = +$("#range option:selected").val(); var color = +$("#color option:selected").val(); var size = +$("#size option:selected").val(); // update the view view.setColumns([{calc:stringID, type: "string"},domain,range,color,size]); // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; // update the chart chart.draw(view, options); }); 

Unfortunately, when I test this and select a few variables of interest I get the following chart. This is not very useful. The id values obscure all the information. bubble1

Improving Upon the Defaults

Removing the Bubble Label

The bubble labels would work fine if we had only a few data points and being able to quickly identify them was important. In this case, we are more interested in the general relationships between the variables and not the specific position of any one household. Let’s start by removing the bubble label. Go to our stringID function and return an empty string instead of the household id (be sure to comment out the old return statement):

 function stringID(dataTable, rowNum){ // return dataTable.getValue(rowNum, 0).toString(); // return an empty string instead to avoid the bubble labels return ""; } 

Now let’s check our chart: bubble2

Removing the Bubble Border and Adjusting Bubble Opacity

Okay. This is a lot nicer, but we can do better by removing the bubble borders and lowering the bubble opacity, since both cause issues with occlusion (i.e., there is data we are not seeing due to overly opaque data in the foreground). To remove the bubble’s border we’ll set it’s stroke color to “transparent”. Let’s change the opacity from the default of 0.8 to 0.2. To implement this we need to add an element to our initial options object

 var options = { title: "KwaZulu-Natal Household Survey (1993)", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, bubble: {stroke: "transparent", opacity: 0.2}, }; 

and reset it in our <select> listener function:

 // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; options.bubble = {stroke: "transparent", opacity: 0.2}; 

Let’s take a look: bubble3

Changing the Color Gradient

This is starting to look great. One issue I have with the default color choice, besides being ugly, is that gray with an opacity of 0.2 is hard to see. Let’s make the color gradient change from red to blue. We do this again by adding an element to the initial options object

 var options = { title: "KwaZulu-Natal Household Survey (1993)", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, bubble: {stroke: "transparent", opacity: 0.2}, colorAxis: {colors:['red','blue']}, }; 

and resetting it in our <select> listener function:

 // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; options.bubble = {stroke: "transparent", opacity: 0.2}; options.colorAxis = {colors:['red','blue']}; 

Bam! And here’s our finished product: bubble4 These changes have made it easier to explore the dataset and added a little style in the process. You can find an interactive version here (check the source if you are having problems with your chart).

Conclusion

While this ramped up the complexity of our figure (compared to the chart from the previous tutorial), being able to change which variables control the color and size of the bubbles will make your data that much more engaging. Take the source, change the reference to your CSV, and remember to download a copy of the jquery-csv script. With just a few steps you can have your own interactive chart to encourage your site’s viewers to explore your data. Check out the next tutorial in this series: Google Charts and CSV Part 3: Side-by-Side Bubble Charts For more information on Google’s bubble chart, check the documentation here.

Easy Data Visualization with Google Charts and a CSV

Static figures work fine for a print publication. However, when you want to present your research or collected data online, static is stale and dynamic is alive. Today we’re going to take a CSV and create a simple, but interactive, scatter plot. This tutorial assumes some basic familiarity with HTML and JavaScript. If you don’t currently possess these skills, head on over to Codecademy and follow the Web Fundamentals track and the JavaScript track.

Setting Up

To begin, we need to make sure we have the CSV we want to load and the JavaScript library jquery-csv in the same folder as our HTML.

Preview and Data

Here’s the end result of this tutorial: Finished Chart The data I’ll be using is from the three wave KwaZulu-Natal Income Dynamics Study (KIDS). In this example I will be using the first round of the survey (1993). Children are household members listed as younger than 16 and pensioners are defined as males over 65 and females over 60. I use an adult equivalent measure of household income used by Carter and May (1999) and many others in the South African context. The cleaned CSV can be downloaded here. I recommend you download this CSV to work along with this tutorial, but feel free to use your own (just be careful to make the relavent changes to the example code). Add the CSV to the same folder as the HTML we will be creating.

jQuery-CSV

The jQuery-CSV library allows us to easily take a string of CSV data and transform it into the appropriate format for Google’s visualization library. Download either jquery.csv-0.71.js or jquery.csv-0.71.min.js from that page and add it to the folder where your HTML will go.

Accessing the CSV

To begin with, create the HTML document, load the Google JS API, jQuery, and the jQuery library, and display the contents of the CSV to confirm the CSV is where it’s supposed to be and that we can access all the JavaScript we need:

 <!DOCTYPE html> <html> <head> <title>Google Chart Example</title> <script src="https://www.google.com/jsapi"></script> <script src="http://code.jquery.com/jquery-1.10.1.min.js"></script> <script src="jquery.csv-0.71.js"></script> <script> // wait till the DOM is loaded $(function() { // grab the CSV $.get("kzn1993.csv", function(csvString) { // display the contents of the CSV $("#chart").html(csvString); }); }); </script> </head> <body> <div id="chart"> </div> </body> </html> 

Load your newly created HTML to confirm your code outputs the contents of the CSV.

A Simple Scatter Plot

Clear the script tag we used to display the CSV; in this section we will focus on the JavaScript necessary to create a scatter plot with our CSV. Start by loading the visualization library and setting a callback function:

 // load the visualization library from Google and set a listener google.load("visualization", "1", {packages:["corechart"]}); google.setOnLoadCallback(drawChart); 

Next, we need to create the callback function we referenced in the previous step. We’ll begin by grabbing the CSV as we did previously:

 function drawChart() { // grab the CSV $.get("kzn1993.csv", function(csvString) { 

We need to transform the CSV into a format suitable for Google’s visualization library:

 // transform the CSV string into a 2-dimensional array var arrayData = $.csv.toArrays(csvString, {onParseValue: $.csv.hooks.castToScalar}); 

Next, we’ll transform this array into a DataTable object:

 // this new DataTable object holds all the data var data = new google.visualization.arrayToDataTable(arrayData); 

Since we have more columns of data than are needed for our visualization, let’s create a view on this table of just the first two columns:

 // this view can select a subset of the data at a time var view = new google.visualization.DataView(data); view.setColumns([0,1]); 

Now let’s set some basic options for our chart:

 var options = { title: "KwaZulu-Natal Household Survey (1993)", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, legend: 'none' }; 

Now we need to bind a chart to our <div> and tell the chart to draw the current view with the options we selected:

 var chart = new google.visualization.ScatterChart(document.getElementById('chart')); chart.draw(view, options); 

All that’s left for this stage is to close our function blocks:

 }); } 

If you load our current progress you should see the following (relatively meaningless) chart: Basic Chart

Adding Interaction

This chart already features interactivity in the form of rollover states for the plotted points. What we really need is to be able to change the variables we are plotting on the fly. Add the following tags after the </div> tag. I place the range first so that it lines up with the y-axis title:

 <select id="range"> </select> <select id="domain"> </select> <button type="button">Update Chart</button> 

Now we need to update our script to first load the <select> tags with the CSV headers, and also to respond to a click on our button.

Adding <options> to the <select> elements

Immediately following the assignment of arrayData, add the CSV headers to the <select> element:

 // use arrayData to load the select elements with the appropriate options for (var i = 0; i < arrayData[0].length; i++) { // this adds the given option to both select elements $("select").append("<option value='" + i + "'>" + arrayData[0][i] + "</option"); } 

Make sure the <select> elements show the starting options:

 // set the default selection $("#domain option[value='0']").attr("selected","selected"); $("#range option[value='1']").attr("selected","selected"); 

Updating the Chart

Now we need to assign a function to the button we created. Add the following after chart.draw(view, options);:

 // set listener for the update button $("button").click(function(){ 

Assign the selected column indices to local variables:

 // determine selected domain and range var domain = +$("#domain option:selected").val(); var range = +$("#range option:selected").val(); 

Update the view to reflect the selected columns:

 // update the view view.setColumns([domain,range]); 

Update the axis titles and the axis ranges:

 // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; 

Update the chart and close the function block:

 // update the chart chart.draw(view, options); }); 

Cool! Now we can do more interesting comparisons like plotting cm_16_exp and mean_educ. Here’s what our current chart looks like: Interactive Chart

Even Better UX

UX = User experience. User experience design is an important consideration. We want visitors to our site/blog to enjoy exploring our data. To make our chart more enjoyable, let’s remove the annoying step of having to click the button to update. Simply change this:

 $("button").click(function(){ 

to this:

 $("select").change(function(){ 

and remove the <button> tag. Now your chart should look like this (view the source and compare to yours if your chart is not working).

Conclusion

I hope you enjoyed this tutorial, and especially the end project. Now, to use your own CSV, all you need to do is change the file string “kzn1993.csv” to the name of your CSV and change the title in the chart options. In the next tutorial, we’ll use the Google visualization library to make a bubble chart. (Check out the third tutorial in this series: Google Charts and CSV Part 3: Side-by-Side Bubble Charts)As always, place any questions or comments in the section below. Thanks!

First Beamer Presentation with LaTeX and Sublime Text 2

This tutorial will walk you through the creation of your first beamer presentation using LaTeX and Sublime Text 2. I will assume you have at least made your first PDF with LaTeX in Sublime Text 2 (Mac-specific setup instructions).

Create beamer_test.tex

Start by opening Sublime Text 2, and saving a new document as “beamer_test.tex“. On the first line we’ll set the document class to “beamer“:

\documentclass{beamer}

On the next line, type “begin“, then press the TAB key. This should paste the following snippet and place cursors selecting “env” in both the begin and end tags:

\begin{env} \end{env}

Now type “document” so that environment tags look as follows:

\begin{document} \end{document}

Adding slides (um,… I mean frames)

Frames in beamer presentations are (for our purposes) the equivalent of slides in PowerPoint presentations. To add a frame inside of our document environment, simply type “frame” and press the TAB key (this and the last snippet assumes you have LaTeXTools installed). Go ahead and change “title” to “My First Slide“. Add some content inside the frame (I’m going to add a bulleted list). Your new slide should look similar to the following:

\begin{frame}[t]\frametitle{My First Slide} \begin{itemize} \item My first point \item My second point \item My third point \end{itemize} \end{frame}

Here I added an itemized list, but inside of these frames you can place figures, tables, equations and anything else defined in LaTeX. Ok, now that we have one of the most basic presentations known to man, let’s hit CTRL+B (or COMMAND+B on OS X) to build this presentation (if you get a few font warnings, don’t worry, fixing these is not important). Your finished slide should look like this: first_slide Now, I admit, this is a little underwhelming. So, let’s add a title page, make it so the frame content is not top-aligned, and play around with some themes while we’re at it.

Adding a title page

What presentation would be complete without a title page. First we need to define the elements of the title page. Paste the following commands between the document class statement and before the beginning of the document environment.

\title[Short Presentation]{The shortest presentation in \LaTeX} \subtitle[title edition]{Now with a title} \author[F. Lastname]{Firstname Lastname} \institute[UIR]{ The University of Irreproducible Results }

Making the title page is pretty easy. Just paste the following frame above the first one we made earlier.

\begin{frame}[plain] \titlepage \end{frame}

Here we replace the frame title and the t (top-align) option with the plain option. Go ahead and build the PDF. Here’s what the first slide should look like: beamer_title

Changing alignment

So let’s say you don’t want the frame content vertically aligned to the top. Simply change the “[t]” to “” (or “[b]” if you want it bottom-aligned). You can also remove “[t]” entirely to use the default which is centered.

Adding themes

Time to spice up our rather bland presentation. Hop on over to the Beamer Theme Matrix and pick out a theme. The city names along the side are beamer themes which will go inside a usetheme command and the animal names along the top are color themes which will go inside a usecolortheme command. I’ve chosen beamer theme “Szeged“, and color theme “dove.” Add the next commands between the document class command and the title info we inserted earlier (replace Szeged and dove for themes you chose).

\usetheme{Szeged} \usecolortheme{dove}

The finished presentation

Here’s what the slides for our completed presentation look like. title_dove slide_dove Here’s the complete source:

\documentclass{beamer} \usetheme{Szeged} \usecolortheme{dove} \title[Short Presentation]{The shortest presentation in \LaTeX} \subtitle[title edition]{Now with a title} \author[F. Lastname]{Firstname Lastname} \institute[UIR]{The University of Irreproducible Results} \begin{document} \begin{frame}[plain] \titlepage \end{frame} \begin{frame}\frametitle{My First Slide} \begin{itemize} \item My first point \item My second point \item My third point \end{itemize} \end{frame} \end{document}

Conclusion

While this is a presentation short on finesse and content, I hope it helps get you started. Be sure to come back for a follow-up tutorial taking your skills with ST2 and beamer to the next level. In the meantime here are some resources I have found useful:

And here are a few of our tutorials on Sublime Text 2 and LaTeX in general:

Using SourceTree and Git for Research (Part 2): Bitbucket

In the first part of this tutorial we created a local git repository with SourceTree, committed a change, and reviewed the commit history. Make sure you have either already completed the previous part of this tutorial, or that you already have a repository on SourceTree you want to link up with Bitbucket.org.

Making a remote version on Bitbucket

While you can create a new remote repository to connect to on Bitbucket.org, here we will do this from within SourceTree. Open SourceTree and click the “Settings” button. settings Click “Add”. remotes Click the button with a globe icon. globe Click the “Create New Repository …” button. create_new_button Fill out the new repository dialog. Set “Name” to “itn_project“, add a description if desired, and make sure to uncheck the “Publicly Visible” option. Finally, click “Create Repository”. create_new_dialog Notice that now you have an entry in your list for “itn_project” that is hosted by Bitbucket. Click “OK”. link_repository Set “Remote Name” to “itn_remote” and click “OK”. remote_name Click “OK” one last time. Finally, click the “Push” button to push the repository you have been working on to the remote one you just created. push_repo Make sure the “Local Branch” “master” is checked. Then click “OK”. Enter your password if requested to push the repository. push_repo2

Inviting a collaborator

Go to Bitbucket.org and login. Click on the “Repositories” menu at the top and choose your newly created repository. online_repo You should now see something like the following screenshot. This shows that your repository has been setup on Bitbucket.org. online_repo_overview Click the “Share” button. bitbucket_share Next, enter the email address (or Bitbucket username if you know it) of the collaborators you want to add to this repository (i.e. your coauthors) and then click the “Add” button. Here, I’m going to add Joseph Page (full disclosure: he’s my brother). bitbucket_share2 Before clicking “Share”, set the permissions of this new user of your repository. I set Joseph’s permissions to “WRITE” because I want him to be able to push commits to the repository (see the documentation for more details on permissions). bitbucket_share3 Click “Share” to invite your new collaborator. This will send a link which your collaborator can use to access the repository. Once they login to Bitbucket they will be able to access your repository.

Viewing a collaborator’s commit

Once your collaborator has pushed a commit (i.e. made changes and updated the repository) you will see this on the main page for your repository. collab_edit You can see Joseph left the following commit message:

made a few edits and added author name

Click on the link to the commit (see picture below). collab_edit2 This pulls up the summary of the latest commit. Scroll down to view the changes that were made. collab_edit3

Conclusion

This tutorial (and the previous one) merely scratched the surface of how leveraging DVCSs, such as git and hg, can enhance your research productivity. Using SourceTree, integrating these tools is easier than ever. Have fun with Bitbucket and let me know if you have any questions in the comments below!

Using SourceTree and Git for Research (Part 1)

A Version control system (VCS) helps you manage changes to documents and programs. This goes beyond using Track Changes in Microsoft Word. For example, you can revert to older versions of a LaTeX document or program written in Stata, SAS, or R. With a distributed version control system (DVCS), you can track changes to all your documents and programs while collaborating with coauthors. Bitbucket offers for free an unlimited number of private repositories with up to 5 collaborators. If you authorize an academic (*.edu) email account you get unlimited contributors! A popular alternative is Github, but since Github does not offer free private repositories (and keeping your research private is important!) we will use Bitbucket. Bitbucket makes use of two DVCSs: Mercurial (Hg) and Git. We’ll be using Git for this tutorial, but you could use Mercurial instead if you prefer (intro to working with Mercurial). To make using git (and hg) a breeze, we will be using SourceTree, the free tool by Atlassian (makers of Bitbucket). Using a DVCS allows you to link a repository to the documents on your local machine. This repository will allow you to track changes to your documents and keep a record of your document history. To see how this system works. This tutorial will trace the following steps:

  1. Set up a project folder with a basic LaTeX file.
  2. Set up a Bitbucket account.
  3. Install SourceTree.
  4. Create a git repository using SourceTree.
  5. Make a change to our LaTeX file.
  6. Summarize basic features for reviewing changes.

Part 2 of this tutorial will cover connecting this repository to a Bitbucket.org repository to put the D in DVCS and take a look at collaborating with coauthors.

Step 1: Setup LaTeX file

  1. Create a new LaTeX file named “itn.tex”.
  2. Place this file in a folder that will only be used for this project.
  3. Place the following sample text in the file and save it.
    \documentclass{article} \title{International Trade Network} \author{Jonathan Page} \begin{document} \maketitle{} \section{Introduction} Careful analysis of the topology of the international trade network (ITN) is necessary in order to identify stylized facts which a theoretic network model of international trade should be able to replicate. Properties of networks are tightly related to the relevant network formation process. Determining the most appropriate network formation process can provide depth to related empirical analysis. \end{document}

Step 2: Setup a Bitbucket account

Go to bitbucket.org and sign up for a new account (if you don’t already have one). bitbucket_frontpage

Step 3: Install SourceTree

Go to http://sourcetreeapp.com/ and click the large “Download SourceTree Free” button in the middle of the page. Your button may appear different if you are using Windows. SourceTree Click the downloaded dmg file and drag the SourceTree application to your Applications folder. installSourceTree Open the SourceTree application. You will see a screen similar to the one below. Fill this form out with your fullname and the email address you used to setup your Bitbucket account. Make sure both check boxes are checked to allow SourceTree to manage your Mercurial and Git configurations and to agree to the license agreement. Click “Next”. Setup_SourceTree Enter your Bitbucket account information and click “Next”. Setup2_SourceTree Click “Finish” to complete the initial setup process.

Step 4: Setup your repository

Open finder and find the folder that contains your LaTeX file. Drag this folder onto the SourceTree application window. This will open the following dialog. new_repo_dialog Change the “Repository Type” from “Mercurial” to “Git” (you can also change the default bookmark name if you like). Click “Create”.

Add “itn.tex” to the staging area

Double-click the project to open the project view. Click “Add” to add all files in the folder to the repository. project_view Notice that “itn.tex” has been moved from “Files in the working directory” to “Files staged in the index”. This means that if we commit changes, changes to “itn.tex” will be updated to the repository. itn_staged

Commit to initialize the repository

Press the “Commit” button. commit This will bring up the dialog seen below. Add a meaningful message (always a good idea). Click “Commit”. commit_message

Step 5: Making a change to the LaTeX file

Now we want to see how this version control system deals with changes. Add the following paragraph to “itn.tex”:

In the analysis of international trade as a network phenomenon, we must answer the question of how the network structure is determined. More precisely, if we assume the network structure is given to us exogenously, our analysis will focus on the game played on the given network. If, however, the formation of the network structure is endogenous, our analysis must broaden to consider the formation process. This survey focuses on what information the network itself can provide regarding the formation process.

Save “itn.tex”. Open the SourceTree application. Notice that “itn.tex” now has a new icon beside it to indicate it has been changed. modified Click the file. This will show the changes you have made on the right. click_file_stage Click “Stage File” or “Add”. Click “Commit”, choose an appropriate message and click “Commit” to commit these changes to your repository.

Step 6: Viewing your history

Click the clock symbol to view the log. log_view Click the two log entries to see the changes that were added at each commit. Click “External Diff” to open another view of the differences between the selected commit and the one preceding it. Close the “External Diff” window. log_view2

Next Steps

We’ve covered quite a bit here, but there is much more to learn. Though our example worked with a LaTeX source file, you could follow the same process with any filetype. In fact, version control systems were developed with programmers in mind. As a result this is the perfect way to manage your source files for Stata, SAS, R, Python, HTML, etc. Here are a few git resources to get you started:

Be sure to check out the next part of this tutorial as we connect our repository to the Bitbucket site and collaborate with a coauthor. Any tips or nagging questions? I would love to hear them!

Run z-Tree on a Mac:
OSX, MacPorts, and Wine

As the late, great Agatha Christie wrote:

I don’t think necessity is the mother of invention — invention, in my opinion, arises directly from idleness, possibly also from laziness. To save oneself trouble.

Continue Reading