Creating Your First PDF with LaTeX and Atom

This tutorial will walk you through the steps of creating your first PDF with LaTeX and Atom. This guide focuses on installing LaTeX and Atom on a Mac, but since Atom is a cross-platform editor, most of the instructions should work on Windows and Linux as well. You will need about an hour to download everything and to produce your first PDF.

See our previous tutorials on creating PDFs with LaTeX and SublimeText:

Making your first PDF with LaTeX and Sublime Text 2

Making your first PDF with LaTeX and Sublime Text 2 for Mac

Install MacTex

Download MacTeX. MacTeX installs everything you need to compile tex files into PDFs. This will take a while, so grab a coffee.

Install Atom

If you haven’t already, download the awesome Atom text editor Atom text editor. Atom is awesome because it is open source and supported by GitHub.

On my MacBook Pro running Yosemite, I clicked on the “Download For Mac” buttun, then openned the downloaded atom-mac.zip. In Finder, just drag “Atom” to your Applications folder. You can then find Atom in your Applications folder or launch it from Spotlight. The first time you open Atom, press the “Open” button to trust Atom if prompted.

Install Skim (for previewing PDFs)

LatexTools makes use of Skim for previewing works-in-progress. Download and install Skim. On OS X Yosemite, I installed version 1.4.17.

To make Skim trusted so that the preview will work, open Skim by holding down the control key while clicking on the Skim icon in the “Applications” folder in Finder. Click “Open” at the prompt.

Install LatexTools

Open the “Settings” tab by pressing `Command+`` or using the menu “Atom > Preferences…”.

Click on the “Install” tab on the left. Type in language-latex and click the “Install” button in the language-latex package box. I installed version 0.6.1. This package provides syntax highlighting that will make working with TeX much more enjoyable.

Next Type in latextools and install the latextools package.

Create a tex source file

Create a new file if you don’t already have one up (you should see a tab titled “untitled” if you already have a new file open). To create a new file go to “File > New File” in the menu or use the keyboard shortcut Command+N.

In the new file paste the following TeX sample:

\documentclass{article}
\title{Title}
\author{Your Name}
\begin{document}
\maketitle{}
\section{Introduction}
This is where you will write your content.
\end{document}

Save this file as sample.tex. You should now see that the content is now recognized by the syntax highlighter (see all the pretty colors?).

Build and view your PDF

To build this PDF, use the following keyboard shortcut: Command+Alt+B (i.e., all three of those keys at the same time). If that doesn’t work, check your keybindings in the “Settings” tab, in the “Keybindings” tab on the left. Type in latextools:build to see what the command for your system is. On a Mac (i.e., “Darwin”) the keybinding should read alt-cmd-b, for Windowss and Linux the default is probably ctrl-alt-b.

Conclusion

Hopefully now you have your first PDF ready to show off to all your neighbors. If not, let me know in the comments below so I can update the tutorial.

R for Impact Evaluation: R and Stata Side-by-side

This tutorial follows the Handbook on Impact Evaluation: Quantitative Methods and Practices, chapter 11. The data files we will use can be downloaded from here. The first part of Chapter 11 is covered in Impact Evaluation on a Budget: World Bank Data and R.

Notes on Commands

  • Stata commands are typed in lowercase, R commands are functions (e.g., ls())
  • In Stata, you can type abbreviated forms of functions and variables provided there is no ambiguity. In R, you must use the full function or variable name.
  • In Stata, use the Page-Up and Page-Down keys to cycle through previously entered commands. In R, use the Up and Down Arrow keys to do this.

Working with Data Files: Looking at the Content

Open the Dataset

Here I assume you saved the file (from the previous tutorial) to the ~/eval/data folder. Stata:

use ~/eval/data/hh_98.dta

R:

library(foreign) hh_98 = read.dta('~/eval/data/hh_98.dta')

(If you don’t already have the foreign library installed, you can use the command install.packages("foreign").)

Listing the Variables

Stata:

describe

R:

ls(hh_98) dim(hh_98) sapply(hh_98,class)

The function ls(x) displays the names of the objects within x. If you just enter ls(), R will show you the names of the objects open in your current environment (remember you can use ?ls to see the R documentation for the ls() function). The function dim(x) returns the dimensions of object x. When measuring a data.frame, like hh_98, dim() returns the number of rows first followed by the number of columns. The function sapply(x,FUN) returns a simplified result from applying the function FUN to each object in x. The function class(x) returns the class of object x.

Wildcards and Abbreviations

Stata:

describe exp∗

R:

summary(hh_98[grep("exp", colnames(hh_98))])

In R, it is possible to do things even if we don’t know the exact name of the object we want to analyze. Starting from the innermost function and working our way out, colnames(hh_98) returns a vector where each element is the name of a column of hh_98. grep("exp", x) returns the indices of the elements that contain “exp” (you can also use regexp here) within x. Placing the resulting vector of indices into hh_98[] returns the matching columns. Finally, summary() returns the following summary of the returned columns:

 expfd expnfd exptot Min. : 945.3 Min. : 89.55 Min. : 1193 1st Qu.: 2602.1 1st Qu.: 514.37 1st Qu.: 3254 Median : 3373.7 Median : 865.31 Median : 4432 Mean : 3660.2 Mean : 1813.08 Mean : 5473 3rd Qu.: 4232.5 3rd Qu.: 1710.24 3rd Qu.: 6039 Max. :15270.7 Max. :43411.15 Max. :47981

Listing Data

List the first three entries in hh_98: Stata:

list in 1/3

R:

hh_98[1:3,]

In R, you can access records in a data.frame using matrix notation. The colon (:) separates the beginning and ending of a sequence. By leaving the portion following the comma blank, we tell R to show all columns. List household size and head’s education for households headed by a female who is younger than 45: Stata:

list famsize educhead if (sexhead==0 & agehead<45)

R:

subset(hh_98,sexhead==0 & agehead<45,c(famsize,educhead))

The subset() function is another method of selecting elements. Here’s the matrix form of the same subset: R:

hh_98[hh_98$sexhead==0 & hh_98$agehead<45,c("famsize","educhead")]

Browse or Edit the data: Stata:

browse edit

R:

View(hh_98) edit(hh_98)

Summarizing Data

Display summary statistics for a few variables: Stata:

sum famsize educhead sum famsize educhead, d

R:

summary(hh_98[,c("famsize","educhead")]) library(psych) describe(hh_98[,c("famsize","educhead")])

(If you don’t already have the foreign library installed, you can use the command install.packages("foreign").) Using survey weights: Stata:

sum famsize educhead [aw=weight]

R:

library(survey) design <- svydesign(id=~nh,weights=~weight,data=hh_98) svymean(~famsize + educhead,design)

(If you don’t already have the survey library installed, you can use the command install.packages("survey").) Summarize by groups: Stata:

sort dfmfd by dfmfd: sum famsize educhead [aw=weight] tabstat famsize educhead, statistics(mean sd) by(dfmfd)

R:

library(survey) svyby(~famsize + educhead, ~dfmfd, design, svymean)

(you only need to call library(survey) once per session).

Frequency Distributions (Tabulations)

Stata:

tab dfmfd 

R:

table(hh_98$dfmfd)

In R, the table() function presents a table similar to the tabulate function in Stata, but only shows the counts grouped by factor. To see both the counts and percentages, as in the Stata program, we can divide by the total count (i.e., the length()). I group the counts and percentages using a list() so they are displayed together. R:

list(count=table(hh_98$dfmfd),percent=table(hh_98$dfmfd)/length(hh_98$dfmfd))

Frequency tables over subsets and for multiple variables: Stata:

tab sexhead if dfmfd==1 tab educhead sexhead

R:

table(hh_98[hh_98$dfmfd==1,]$sexhead) table(hh_98$educhead, hh_98$sexhead)

Column and row percentages: Stata:

tab dfmfd sexhead, col row

R:

mytable <- table(hh_98$dfmfd, hh_98$sexhead) list(counts = mytable, percent.row = prop.table(mytable,1), percent.col = prop.table(mytable,2), count.row = margin.table(mytable,1), count.col = margin.table(mytable,2))

Distributions of Table Statistics

Stata:

table dfmfd, c(mean famsize mean educhead)

R:

by(hh_98[c("famsize","educhead")], hh_98$dfmfd, colMeans)

Breakdown by two factors: Stata:

table dfmfd sexhead, c(mean famsize mean educhead)

R:

by(hh_98[c("famsize","educhead")], hh_98[c("dfmfd","sexhead")], colMeans)

Missing Values

In Stata, missing values are represented by “.” In R, missing values are represented by “NA

Counting Observations

Stata:

count count if agehead>50

R:

dim(hh_98)[1] dim(hh_98[hh_98$agehead>50,])[1]

-or-

length(hh_98[,1]) length(hh_98[hh_98$agehead>50,1])

Using Weights

For information on using weights in R, take a look at the homepage for the survey package: http://r-survey.r-forge.r-project.org/survey/

Other Resources

The following websites are useful for searching for R:

Remember to use ? to look up functions and ?? to search for help within R (e.g., "?by").

Impact Evaluation on a Budget: World Bank Data and R

Introduction

This entry will be the first in a series where we go through all of the Stata exercises in the World Bank’s excellent and free Handbook on Impact Evaluation: Quantitative Methods and Practices written by S. Khandker, G. Koolwal and H. Samad in 2009. The book can be downloaded for free here. The book has a series of chapters (11-16, in fact all of part 2) on Stata exercises designed to prepare the reader to conduct impact evaluations. To make this learning process more affordable this series will take you from installing R to estimating impacts using fuzzy regression discontinuity design. Go to part 1 of the book to read up on the theory and motivation for the techniques we will use in this series. The data files we will use can be downloaded from here. Go ahead and extract these to a folder you will remember.

Install R

To install R go to cran.us.r-project and follow the instructions for your OS.

Install R-Studio

I highly recommend this environment for working with R. While R can be run completely from the command line, RStudio is much more user friendly and provides an easier transition for users coming from Stata. Go to RStudio and download this free R development environment.

Overview

Each of the following sections will follow chapters in the book. I will leave out the exposition and instead focus on the commands. I will present Stata commands first followed by the equivalent expression in R.

File Structure

The book assumes you are using a PC, I’m using a Mac. They create several folders, I recommend creating these as well except for the do and log folders.

PC:

 c:\eval
c:\eval\data

OSX or Linux:

 ~/eval ~/eval/data

To avoid confusion I will present commands using *NIX style paths. In fact, we could make all of our path statements shorted in R by setting the working directory to the data folder:

setwd('~/eval/data')

I will use the full path in the following code, but if you set your working directory you can use the shorter versions. If you are using Windows use the folder structure above instead.

Opening a Data Set

Stata:

use ~/eval/data/hh_98.dta

R:

library("foreign") hh_98 = read.dta('~/eval/data/hh_98.dta')

Save a Data Set

Stata:

save hh_98, replace

R:

save(hh_98, file="~/eval/data/hh_98.RData")

Exit the Program

With prompt to save

Stata:

exit

R:

quit()

Reckless (or confident) version

Stata:

exit, clear

R:

quit(save="no")

Even shorter R command:

q("no")

Help

Strict command requiring the correct command or keyword to be used

Stata:

help memory

R:

help(Memory)

Even shorter R command:

?Memory

Help search

Stata:

search mem

R:

??mem

Conclusion

Next time we will continue with Chapter 11 and begin Working with Data Files.

The Economist Illustrated: China

china Illustrated by: Joel Hopler


The Inspiration

Economic growth: Missing the mat

Illustrator’s Notes

I found the initial metaphor of China’s growth targets being “more like the bar of a high jump” interesting. I thought the high jump imagery, used later in the article in the context of China giving its own economy a boost, would be illustrated well by a high jumper leaping over a sickle, instead of the standard straight bar.

Google Charts and CSV Part 3: Side-by-Side Bubble Charts

Introduction

If you haven’t already, go ahead and take a look at the previous two installments of this series (Easy Data Visualization with Google Charts and a CSV and More Google Charts with a CSV: Bubble Charts). Today we’re going to take the bubble chart from More Google Charts with a CSV: Bubble Charts and add another chart to the same page. It’s not exactly as simple as duplicating all the code we created last time, but it nearly is.

Setup

Begin by downloading the finished product from last time here. Also, I’ve pasted the HTML below in case that’s easier for you:

<!DOCTYPE html> <html> <head> <title>Google Chart Example</title> <style> ul {list-style-type: none;} </style> <script src="https://www.google.com/jsapi"></script> <script src="http://code.jquery.com/jquery-1.10.1.min.js"></script> <script src="jquery.csv-0.71.js"></script> <script> // load the visualization library from Google and set a listener google.load("visualization", "1", {packages:["corechart"]}); google.setOnLoadCallback(drawChart); function drawChart() { // grab the CSV $.get("kzn1993.csv", function(csvString) { // transform the CSV string into a 2-dimensional array var arrayData = $.csv.toArrays(csvString, {onParseValue: $.csv.hooks.castToScalar}); // use arrayData to load the select elements with the appropriate options for (var i = 0; i < arrayData[0].length; i++) { // this adds the given option to both select elements $("select").append("<option value='" + i + "'>" + arrayData[0][i] + "</option"); } // set the default selection $("#domain option[value='0']").attr("selected","selected"); $("#range option[value='1']").attr("selected","selected"); // this new DataTable object holds all the data var data = new google.visualization.arrayToDataTable(arrayData); // this view can select a subset of the data at a time var view = new google.visualization.DataView(data); view.setColumns([{calc:stringID, type: "string"},1,2,3]); // this function returns the first column values as strings (by row) function stringID(dataTable, rowNum){ // return dataTable.getValue(rowNum, 0).toString(); // return an empty string instead to avoid the bubble labels return ""; } var options = { title: "KwaZulu-Natal Household Survey (1993)", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, bubble: {stroke: "transparent", opacity: 0.2}, colorAxis: {colors:['red','blue']}, }; var chart = new google.visualization.BubbleChart(document.getElementById('chart')); chart.draw(view, options); // set listener for the update button $("select").change(function(){ // determine selected domain and range var domain = +$("#domain option:selected").val(); var range = +$("#range option:selected").val(); var color = +$("#color option:selected").val(); var size = +$("#size option:selected").val(); // update the view view.setColumns([{calc:stringID, type: "string", label: "Household ID"},domain,range,color,size]); // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; options.bubble = {stroke: "transparent", opacity: 0.2}; options.colorAxis = {colors:['red','blue']}; // update the chart chart.draw(view, options); }); }); } </script> </head> <body> <div id="chart" style="width:800px; height:500px;"> </div> <ul> <li> Y-Axis <select id="range"></select> </li> <li> X-Axis <select id="domain"></select> </li> <li> Color <select id="color"></select> </li> <li> Size <select id="size"></select> </li> </ul> </body> </html> 

Data

For this tutorial I’ve split the South African data by province in 1993. File kz1993.csv contains only the households in the former KwaZulu bantustan. File n1993.csv contains only the black households in the Natal province of what was at the time “white” South Africa. For more details on the data, please see the first tutorial in this series (Easy Data Visualization with Google Charts and a CSV).

HTML

Let’s begin by modifying the HTML. First we’ll encapsulate both the chart and the list in a <div> that is floated to the left. We also need to add a class (I’m going to choose chart) to each of the <select> elements to distinguish the chart on the left from the one on the right:

 <div style="float:left;"> <div id="chart" style="width:600px; height:500px;"> </div> <ul> <li> Y-Axis <select class="chart" id="range"></select> </li> <li> X-Axis <select class="chart" id="domain"></select> </li> <li> Color <select class="chart" id="color"></select> </li> <li> Size <select class="chart" id="size"></select> </li> </ul> </div> 

So far so good. Now copy this entire <div> and paste a duplicate below. Change the ids and classes for this <div> by adding a 2. Also change the float direction to “right“:

 <div style="float:right"> <div id="chart2" style="width:600px; height:500px;"> </div> <ul> <li> Y-Axis <select class="chart2" id="range2"></select> </li> <li> X-Axis <select class="chart2" id="domain2"></select> </li> <li> Color <select class="chart2" id="color2"></select> </li> <li> Size <select class="chart2" id="size2"></select> </li> </ul> </div> 

JavaScript

Chart 1

Now we need to make some adjustments to our drawChart() callback function. First we’ll change the CSV file reference from the kzn1993.csv to kz1993.csv.

 function drawChart() { // grab the first CSV $.get("kz1993.csv", function(csvString) { 

In the for loop we need to change the jQuery selection of all select elements to only those with the chart class:

 // use arrayData to load the select elements with the appropriate options for (var i = 0; i < arrayData[0].length; i++) { // this adds the given option to both select elements $("select.chart").append("<option value='" + i + "'>" + arrayData[0][i] + "</option>"); } 

The last change we need to make for the chart on the left is to modify the title:

 var options = { title: "KwaZulu-Natal Household Survey (1993) - KwaZulu", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, bubble: {stroke: "transparent", opacity: 0.2}, colorAxis: {colors:['red','blue']}, }; 

Chart 2

Now, for the chart on the right we can start by copying the $.get(); call of the first chart. We just need to change the referenced CSV, the referenced HTML ids and classes, and the title. First, change the referenced CSV:

 // grab the second CSV (this one covers Natal Province) $.get("n1993.csv", function(csvString) { 

Next, change the referenced ids and classes here:

 // use arrayData to load the select elements with the appropriate options for (var i = 0; i < arrayData[0].length; i++) { // this adds the given option to both select elements $("select.chart2").append("<option value='" + i + "'>" + arrayData[0][i] + "</option>"); } // set the default selection $("#domain2 option[value='0']").attr("selected","selected"); $("#range2 option[value='1']").attr("selected","selected"); 

and here:

 var chart = new google.visualization.BubbleChart(document.getElementById('chart2')); chart.draw(view, options); // set listener for the update button $("select.chart2").change(function(){ // determine selected domain and range var domain = +$("#domain2 option:selected").val(); var range = +$("#range2 option:selected").val(); var color = +$("#color2 option:selected").val(); var size = +$("#size2 option:selected").val(); 

The last thing to do is change the title:

 var options = { title: "KwaZulu-Natal Household Survey (1993) - Natal", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, bubble: {stroke: "transparent", opacity: 0.2}, colorAxis: {colors:['red', 'blue']}, }; 

Conclusion

The end result looks pretty nice (if your charts stack vertically, you need to make your charts smaller or your screen wider). side_by_side To play around with a live version go here. If any of this was confusing, please check the previous two tutorials (Easy Data Visualization with Google Charts and a CSV and More Google Charts with a CSV: Bubble Charts) or leave a comment below.

The Economist Illustrated: Kazakhstan

kazakhstan Illustrated by: Joel Hopler


The Inspiration

Kazakhstan’s capital: Laying the golden egg

Illustrator’s Notes

I initially thought it would be ironic to show a golden egg of happiness being held up by a beautiful piece of architecture to symbolize how the president of Kazakhstan is hoarding the people’s happiness. Then I re-read the part of the article that mentions the egg and realized that imagery literally exist through the Bayterek tower. I chose to create an image that more explicitly show’s a powerful fist holding the egg away from the tent of nomads.

More Google Charts with a CSV: Bubble Charts

Last time we built an interactive scatter plot. This time we’re going to turn that scatter plot into a bubble chart (see a preview of the finished product here). Start by openning up the HTML document we created last time. You can see the source here or expand the section below:

 <!DOCTYPE html> <html> <head> <title>Google Chart Example</title> <script src="https://www.google.com/jsapi"></script> <script src="http://code.jquery.com/jquery-1.10.1.min.js"></script> <script src="jquery.csv-0.71.js"></script> <script> // load the visualization library from Google and set a listener google.load("visualization", "1", {packages:["corechart"]}); google.setOnLoadCallback(drawChart); function drawChart() { // grab the CSV $.get("kzn1993.csv", function(csvString) { // transform the CSV string into a 2-dimensional array var arrayData = $.csv.toArrays(csvString, {onParseValue: $.csv.hooks.castToScalar}); // use arrayData to load the select elements with the appropriate options for (var i = 0; i < arrayData[0].length; i++) { // this adds the given option to both select elements $("select").append("<option value='" + i + "'>" + arrayData[0][i] + "</option"); } // set the default selection $("#domain option[value='0']").attr("selected","selected"); $("#range option[value='1']").attr("selected","selected"); // this new DataTable object holds all the data var data = new google.visualization.arrayToDataTable(arrayData); // this view can select a subset of the data at a time var view = new google.visualization.DataView(data); view.setColumns([0,1]); var options = { title: "KwaZulu-Natal Household Survey (1993)", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, legend: 'none' }; var chart = new google.visualization.ScatterChart(document.getElementById('chart')); chart.draw(view, options); // set listener for the update button $("select").change(function(){ // determine selected domain and range var domain = +$("#domain option:selected").val(); var range = +$("#range option:selected").val(); // update the view view.setColumns([domain,range]); // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; // update the chart chart.draw(view, options); }); }); } </script> </head> <body> <div id="chart" style="width:800px; height:500px;"> </div> <select id="range"></select> <select id="domain"></select> </body> </html> 

Add Controls for Size and Color

Bubble charts add two dimension, size and color, to the standard scatter plot (here I’m using Google’s terminology, several other graphics libraries simply add this functionality to their scatter plot functions). To keep the nice interactivity we built into our last chart, let’s start by adding controls for the color and size. We’ll nest everything in an unordered list and add labels to the controls. Just change the section with two <select> tags to match the following:

 <ul> <li> Y-Axis <select id="range"></select> </li> <li> X-Axis <select id="domain"></select> </li> <li> Color <select id="color"></select> </li> <li> Size <select id="size"></select> </li> </ul> 

Next we want to get rid of the bullets in our unordered list. Add the following <style> tag inside your <head> tag.

 <style> ul {list-style-type: none; } </style> 

Changing the Chart Type

Change the line that loads the chart object from this:

 var chart = new google.visualization.ScatterChart(document.getElementById('chart')); 

to this:

 var chart = new google.visualization.BubbleChart(document.getElementById('chart')); 

Feeding the Data to the Chart

The data table for Google’s bubble chart requires the first coloumn to be a string which can be used to identify the bubbles. When we loaded the CSV into an array in the last tutorial, we parsed all values as scalars. We need to update our DataView call to change the values in the first column, the household ids (hhid), to string. This requires us to add a function to retrieve these strings from the DataTable.

 var view = new google.visualization.DataView(data); view.setColumns([{calc:stringID, type: "string"},1,2,3]); // this function returns the first column values as strings (by row) function stringID(dataTable, rowNum){ return dataTable.getValue(rowNum, 0).toString(); } 

Updating the Chart

Now we need to modify the code that updates the chart when a user changes the selected variables. First we’ll add local variables for color and size to the <select> listener function. These variables need to be assigned the value of the respective <select> tag. After we have column indices for color and size, we will set these as the third and fourth columns (after the id column) in our bubble chart view. See the highlighted lines below:

 $("select").change(function(){ // determine selected domain and range var domain = +$("#domain option:selected").val(); var range = +$("#range option:selected").val(); var color = +$("#color option:selected").val(); var size = +$("#size option:selected").val(); // update the view view.setColumns([{calc:stringID, type: "string"},domain,range,color,size]); // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; // update the chart chart.draw(view, options); }); 

Unfortunately, when I test this and select a few variables of interest I get the following chart. This is not very useful. The id values obscure all the information. bubble1

Improving Upon the Defaults

Removing the Bubble Label

The bubble labels would work fine if we had only a few data points and being able to quickly identify them was important. In this case, we are more interested in the general relationships between the variables and not the specific position of any one household. Let’s start by removing the bubble label. Go to our stringID function and return an empty string instead of the household id (be sure to comment out the old return statement):

 function stringID(dataTable, rowNum){ // return dataTable.getValue(rowNum, 0).toString(); // return an empty string instead to avoid the bubble labels return ""; } 

Now let’s check our chart: bubble2

Removing the Bubble Border and Adjusting Bubble Opacity

Okay. This is a lot nicer, but we can do better by removing the bubble borders and lowering the bubble opacity, since both cause issues with occlusion (i.e., there is data we are not seeing due to overly opaque data in the foreground). To remove the bubble’s border we’ll set it’s stroke color to “transparent”. Let’s change the opacity from the default of 0.8 to 0.2. To implement this we need to add an element to our initial options object

 var options = { title: "KwaZulu-Natal Household Survey (1993)", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, bubble: {stroke: "transparent", opacity: 0.2}, }; 

and reset it in our <select> listener function:

 // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; options.bubble = {stroke: "transparent", opacity: 0.2}; 

Let’s take a look: bubble3

Changing the Color Gradient

This is starting to look great. One issue I have with the default color choice, besides being ugly, is that gray with an opacity of 0.2 is hard to see. Let’s make the color gradient change from red to blue. We do this again by adding an element to the initial options object

 var options = { title: "KwaZulu-Natal Household Survey (1993)", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, bubble: {stroke: "transparent", opacity: 0.2}, colorAxis: {colors:['red','blue']}, }; 

and resetting it in our <select> listener function:

 // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; options.bubble = {stroke: "transparent", opacity: 0.2}; options.colorAxis = {colors:['red','blue']}; 

Bam! And here’s our finished product: bubble4 These changes have made it easier to explore the dataset and added a little style in the process. You can find an interactive version here (check the source if you are having problems with your chart).

Conclusion

While this ramped up the complexity of our figure (compared to the chart from the previous tutorial), being able to change which variables control the color and size of the bubbles will make your data that much more engaging. Take the source, change the reference to your CSV, and remember to download a copy of the jquery-csv script. With just a few steps you can have your own interactive chart to encourage your site’s viewers to explore your data. Check out the next tutorial in this series: Google Charts and CSV Part 3: Side-by-Side Bubble Charts For more information on Google’s bubble chart, check the documentation here.

The Economist Illustrated: China

china_sea_turtle Illustrated by: Joel Hopler


The Inspiration

Returning students: Plight of the sea turtles

Illustrator’s Notes

The article made it clear that the sea turtle concept is no longer working in its intended way, so I thought a skeleton of a turtle would illustrate that well. I pointed the turtle westward and labeled it with it’s old and new names, “hai gui” and “hai dai“.

Easy Data Visualization with Google Charts and a CSV

Static figures work fine for a print publication. However, when you want to present your research or collected data online, static is stale and dynamic is alive. Today we’re going to take a CSV and create a simple, but interactive, scatter plot. This tutorial assumes some basic familiarity with HTML and JavaScript. If you don’t currently possess these skills, head on over to Codecademy and follow the Web Fundamentals track and the JavaScript track.

Setting Up

To begin, we need to make sure we have the CSV we want to load and the JavaScript library jquery-csv in the same folder as our HTML.

Preview and Data

Here’s the end result of this tutorial: Finished Chart The data I’ll be using is from the three wave KwaZulu-Natal Income Dynamics Study (KIDS). In this example I will be using the first round of the survey (1993). Children are household members listed as younger than 16 and pensioners are defined as males over 65 and females over 60. I use an adult equivalent measure of household income used by Carter and May (1999) and many others in the South African context. The cleaned CSV can be downloaded here. I recommend you download this CSV to work along with this tutorial, but feel free to use your own (just be careful to make the relavent changes to the example code). Add the CSV to the same folder as the HTML we will be creating.

jQuery-CSV

The jQuery-CSV library allows us to easily take a string of CSV data and transform it into the appropriate format for Google’s visualization library. Download either jquery.csv-0.71.js or jquery.csv-0.71.min.js from that page and add it to the folder where your HTML will go.

Accessing the CSV

To begin with, create the HTML document, load the Google JS API, jQuery, and the jQuery library, and display the contents of the CSV to confirm the CSV is where it’s supposed to be and that we can access all the JavaScript we need:

 <!DOCTYPE html> <html> <head> <title>Google Chart Example</title> <script src="https://www.google.com/jsapi"></script> <script src="http://code.jquery.com/jquery-1.10.1.min.js"></script> <script src="jquery.csv-0.71.js"></script> <script> // wait till the DOM is loaded $(function() { // grab the CSV $.get("kzn1993.csv", function(csvString) { // display the contents of the CSV $("#chart").html(csvString); }); }); </script> </head> <body> <div id="chart"> </div> </body> </html> 

Load your newly created HTML to confirm your code outputs the contents of the CSV.

A Simple Scatter Plot

Clear the script tag we used to display the CSV; in this section we will focus on the JavaScript necessary to create a scatter plot with our CSV. Start by loading the visualization library and setting a callback function:

 // load the visualization library from Google and set a listener google.load("visualization", "1", {packages:["corechart"]}); google.setOnLoadCallback(drawChart); 

Next, we need to create the callback function we referenced in the previous step. We’ll begin by grabbing the CSV as we did previously:

 function drawChart() { // grab the CSV $.get("kzn1993.csv", function(csvString) { 

We need to transform the CSV into a format suitable for Google’s visualization library:

 // transform the CSV string into a 2-dimensional array var arrayData = $.csv.toArrays(csvString, {onParseValue: $.csv.hooks.castToScalar}); 

Next, we’ll transform this array into a DataTable object:

 // this new DataTable object holds all the data var data = new google.visualization.arrayToDataTable(arrayData); 

Since we have more columns of data than are needed for our visualization, let’s create a view on this table of just the first two columns:

 // this view can select a subset of the data at a time var view = new google.visualization.DataView(data); view.setColumns([0,1]); 

Now let’s set some basic options for our chart:

 var options = { title: "KwaZulu-Natal Household Survey (1993)", hAxis: {title: data.getColumnLabel(0), minValue: data.getColumnRange(0).min, maxValue: data.getColumnRange(0).max}, vAxis: {title: data.getColumnLabel(1), minValue: data.getColumnRange(1).min, maxValue: data.getColumnRange(1).max}, legend: 'none' }; 

Now we need to bind a chart to our <div> and tell the chart to draw the current view with the options we selected:

 var chart = new google.visualization.ScatterChart(document.getElementById('chart')); chart.draw(view, options); 

All that’s left for this stage is to close our function blocks:

 }); } 

If you load our current progress you should see the following (relatively meaningless) chart: Basic Chart

Adding Interaction

This chart already features interactivity in the form of rollover states for the plotted points. What we really need is to be able to change the variables we are plotting on the fly. Add the following tags after the </div> tag. I place the range first so that it lines up with the y-axis title:

 <select id="range"> </select> <select id="domain"> </select> <button type="button">Update Chart</button> 

Now we need to update our script to first load the <select> tags with the CSV headers, and also to respond to a click on our button.

Adding <options> to the <select> elements

Immediately following the assignment of arrayData, add the CSV headers to the <select> element:

 // use arrayData to load the select elements with the appropriate options for (var i = 0; i < arrayData[0].length; i++) { // this adds the given option to both select elements $("select").append("<option value='" + i + "'>" + arrayData[0][i] + "</option"); } 

Make sure the <select> elements show the starting options:

 // set the default selection $("#domain option[value='0']").attr("selected","selected"); $("#range option[value='1']").attr("selected","selected"); 

Updating the Chart

Now we need to assign a function to the button we created. Add the following after chart.draw(view, options);:

 // set listener for the update button $("button").click(function(){ 

Assign the selected column indices to local variables:

 // determine selected domain and range var domain = +$("#domain option:selected").val(); var range = +$("#range option:selected").val(); 

Update the view to reflect the selected columns:

 // update the view view.setColumns([domain,range]); 

Update the axis titles and the axis ranges:

 // update the options options.hAxis.title = data.getColumnLabel(domain); options.hAxis.minValue = data.getColumnRange(domain).min; options.hAxis.maxValue = data.getColumnRange(domain).max; options.vAxis.title = data.getColumnLabel(range); options.vAxis.minValue = data.getColumnRange(range).min; options.vAxis.maxValue = data.getColumnRange(range).max; 

Update the chart and close the function block:

 // update the chart chart.draw(view, options); }); 

Cool! Now we can do more interesting comparisons like plotting cm_16_exp and mean_educ. Here’s what our current chart looks like: Interactive Chart

Even Better UX

UX = User experience. User experience design is an important consideration. We want visitors to our site/blog to enjoy exploring our data. To make our chart more enjoyable, let’s remove the annoying step of having to click the button to update. Simply change this:

 $("button").click(function(){ 

to this:

 $("select").change(function(){ 

and remove the <button> tag. Now your chart should look like this (view the source and compare to yours if your chart is not working).

Conclusion

I hope you enjoyed this tutorial, and especially the end project. Now, to use your own CSV, all you need to do is change the file string “kzn1993.csv” to the name of your CSV and change the title in the chart options. In the next tutorial, we’ll use the Google visualization library to make a bubble chart. (Check out the third tutorial in this series: Google Charts and CSV Part 3: Side-by-Side Bubble Charts)As always, place any questions or comments in the section below. Thanks!

The Economist Illustrated: China

china_bull Illustrated by: Joel Hopler


The Inspiration

China’s cash crunch: Bear in the China shop

Illustrator’s Notes

This article left me with the impression that China has potential to rebalance their economy. While the article largely focuses on bearish Chinese lending, the point is made that the Chinese government has effective controls to bring back the bull. To represent this point I show a tamed bull, drinking tea in a china shop.