class: center, middle, inverse, title-slide # Data Wrangling 1 ### Bora Jin --- layout: true <div class="my-footer"> <span> <a href="https://introds.org" target="_blank">introds.org</a> </span> </div> --- ## Material 🎥 Watch [Tidy Data](https://youtu.be/Ux85eR3h9hw) - [Slides](https://rstudio-education.github.io/datascience-box/course-materials/slides/u2-d05-tidy-data/u2-d05-tidy-data.html#1) 🎥 Watch [Grammar of Data Wrangling](https://youtu.be/ZCaYBES_VEk) - [Slides](https://rstudio-education.github.io/datascience-box/course-materials/slides/u2-d06-grammar-wrangle/u2-d06-grammar-wrangle.html#1) 🎥 Watch [Working With a Single Data Frame](https://youtu.be/0229Uq2hkJo) - [Slides](https://rstudio-education.github.io/datascience-box/course-materials/slides/u2-d07-single-df/u2-d07-single-df.html#1) --- ## Today's Goal - Understand how data are organized according to a consistent set of "tidy" principles - Use **seven key verbs** to wrangle data and extract meaning "Happy families are all alike; every unhappy family is unhappy in its own way" - Leo Tolstoy --- ## Quiz Tidy data has three related characteristics 1. Each variable forms a column. 2. Each observation forms a row. 3. Each value has its own cell. -- **Q - A tidy data set is tidy for any analyses (T/F).** -- F --- ## Quiz **Q - What makes this data not tidy?** <div class="figure" style="text-align: left"> <img src="img/05/untidy_data.png" alt="source: https://gist.github.com/Kimmirikwa/b69d0ea134820ea52f8481991ffae93e#file-student_results-csv" width="100%" /> <p class="caption">source: https://gist.github.com/Kimmirikwa/b69d0ea134820ea52f8481991ffae93e#file-student_results-csv</p> </div> --- ## Quiz **Q - What makes this data not tidy?** 1. `sex and age`: multiple variables in one column 2. `test number`: not a value for each observation, `test 1` and `test 2` should be a variable 3. `term 1-3`: can be aggregated |id | name | phone | sex | age | test1 | test2 | term | |:--|:-----|:------|:----|:----|:------|:------|:-----| | 1 | Mike | 134 | m | 12 | 76 | 85 | term1| | 1 | Mike | 134 | m | 12 | 84 | 80 | term2| | 1 | Mike | 134 | m | 12 | 87 | 90 | term3| --- ## Quiz **Q - Correct the following sentences about** `dplyr` **functions:** a. Its input can be a matrix. b. Its output is in various forms. c. Functions are nouns. d. Replace the input data frame. --- ## Quiz **Q - Correct the following sentences about** `dplyr` **functions:** a. Its input should be a **data frame**. b. Its output is always a **data frame**. c. Functions are **verbs**. d. **Do not modify** the input data frame. --- ## Quiz **Q - Whic of these is the most appropriate translation of ** `%>%` **in English?** a. Before b. Which c. Then d. Do --- ## Quiz **Q - What of these is the most appropriate translation of ** `%>%` **in English?** a. Before b. Which **c. Then** d. Do --- ## Quiz **Q - Connect each function to its effect.** .pull-left[ `filter` - `arrange` - `select` - `distinct` - `mutate` - `summarize` - `group_by` - ] .pull-right[ - pick columns by variable names - reorder rows by values of variables - pick rows matching criteria - enable grouped operations - compute summary statistics - extract unique rows - create new variables ] --- ## Quiz **Q - Connect each function to its effect.** <br> <img src="img/05/answer.png" width="100%" style="display: block; margin: auto;" /> --- class: middle, center # Questions? --- ## Let's Practice Together! Go to [AE 05: Data Wrangling 1](https://sta199-summer22.netlify.app/appex/ae05_BJ.html) --- ## Bulletin - Watch videos for [Prepare: May 18](https://sta199-summer22.netlify.app/prepare/week02_may18_BJ.html) - Make sure you commit **at least 3 times** for AEs - Cannot give full credit for fewer than 3 commits from `ae03` - Submit your `ae04` - Complete up to `filter()` in `ae05` - HW 01 released --- ## Updated Bulletin on May 18 - No prepare for May 19 - Submit your `ae05`