untidydata

An R package of untidy datasets made for the purpose of teaching the tidyverse.

Last update: 2021-01-27

Overview

The purpose of this package is to store untidy datasets I have been creating for teaching purposes in a version controlled environment. The datasets vary in difficulty and present different problems common when tidying data.

Installation

You can install the development version from GitHub with:

install.packages("devtools")
devtools::install_github("jvcasillas/untidydata")

Datasets

language_diversity

  • Difficulty: easy
  • A long format dataset that is most useful in wide format.
  • Data taken from Appendix 1 in:
    Nettle, D. (1998). Explaining Global Patterns of Language Diversity. Journal of Anthropological Archaeology, 17, 354–374.

pre_post

  • Difficulty: easy
  • A typical pre-test, post-test data set in wide format.

spanish_vowels

  • Difficulty: easy
  • Simulated Spanish vowel formant measurements from male and female speakers.

spirantization

  • Difficulty: easy
  • Simulated intensity measurements of CV sequences in word initial and word medial position from L2 learners and native speakers.

vot

  • Difficulty: medium
  • A voice-onset time data set. Includes coronal stop data from English and Spanish monolinguals, as well as English/Spanish bilinguals.