How to scrape data from Google Sheets in R
Google Forms offers a convenient way to collect data online. It is particularly
useful because you can embed the form in a webpage, link the results with a
spreadsheet and publish the results online. This post shows how to scrape the
data from the spreadsheet (google form) in
r using the package
should be able to follow along by copying and pasting the code into an R
Ideally you can use this method once you have collected data using a google form. For our purposes I just created a google sheet and I will scrape the data from there.
Get some data
To show how this works, I simulated some data with the following code:
# create fake data # to save in google sheet set.seed(1) df <- data.frame( subj = 1:30, group = gl(2, 15, labels = c("mono", "bi")), score = c(rnorm(15, 87, 8), rnorm(15, 94, 3)) )
I then copy and pasted the data frame into a google sheet. To do this, open google drive and create a new sheet.
Once you have some data in a sheet you need to do a few things before you are ready to fire up R.
First, you need to publish your sheet to the web (File > Publish to the web…):
Publish the sheet and copy the public link from the window.
As you can see, my link is:
Copy your link and save it somewhere. We will need it in just a second.
Now we’re ready for R. Here are the packages I used:
# load libraries library(dplyr); library(tidyr); library(RCurl) library(ggplot2); library(DT); library(pander)
We will use the
RCurl package to scrape the data. The command we need is
getForm(). The first arguement represents the URI to which the form is posted.
You can just use the one shown below for a google sheet. The important part here
key arguement. You need to copy it from the link you saved above. The
key can be found in the last part of the link. Here is my link again:
Specifically we want:
Therefore we can delete
https://docs.google.com/spreadsheets/d/ from the
beginning, as well as
/pubhtml from the end. Check the
key arguement below.
Finally, we use the
read.csv() command to import the data.
# scrape data sheet = getForm("https://spreadsheets.google.com/spreadsheet/pub", hl ="en_US", key = "1AqS_DAThPUJuS2L2E-S5X7fM1kpIdhXQdBDZUyt-bWM", output = "csv", .opts = list(followlocation = TRUE, verbose = TRUE, ssl.verifypeer = FALSE)) df <- read.csv(textConnection(sheet))
Let’s see if it worked…
pandoc.table(df, style = "rmarkdown", round = 2)
Looks good. Now we can visualize and analyze the data.
df %>% ggplot(., aes(x = as.numeric(group), y = score)) + scale_x_discrete(limits = c(2, 1), labels = c("Bilingual", "Monolingual")) + geom_jitter() + geom_point() + geom_smooth(method = "lm") + labs(x = "Group", y = "Score")
And that’s it.