class: center, middle, inverse, title-slide # Data Wrangling 2 ### Bora Jin --- layout: true <div class="my-footer"> <span> <a href="https://introds.org" target="_blank">introds.org</a> </span> </div> --- ## Material 🎥 Watch [Working With Multiple Data Frames](https://youtu.be/VdV5ABsaf5Y) - [Slides](https://rstudio-education.github.io/datascience-box/course-materials/slides/u2-d08-multi-df/u2-d08-multi-df.html#1) --- ## Today's Goal - Combine data from different sources using a well-chosen join function in the format `something_join(x, y)`: - `left_join()`: return all rows from `x` for all columns in `x` and `y` - `right_join()`: return all rows from `y` for all columns in `x` and `y` - `full_join()`: return all rows from both `x` and `y` for all columns in `x` and `y` - `inner_join()`: return all rows from `x` where there are matching values in `y`, returning all combination of multiple matches in the case of multiple matches for all columns in `x` and `y` - `semi_join()`: return all rows from `x` where there are matching values in `y`, keeping just columns from `x` - `anti_join()`: return all rows from `x` where there are not matching values in `y`, keeping just columns from `x` - `\(\cdots\)` --- ## Quiz **Q - Which join function might you use when you want to expand `x` by appending all variables in `y` for all the rows in `x`?** -- <img src="img/06/left-join.gif" width="40%" style="display:block; margin:auto; background-color: #FDF6E3" style="display: block; margin: auto;" /> --- ## Quiz **Q - Which join function might you use when you want a dataset with all the rows and the columns in `x` and `y`?** -- <img src="img/06/full-join.gif" width="40%" style="display:block; margin:auto; background-color: #FDF6E3" style="display: block; margin: auto;" /> --- ## Quiz **Q - Which join function might you use when you want rows in `x` that are not in `y`?** -- <img src="img/06/anti-join.gif" width="40%" style="display:block; margin:auto; background-color: #FDF6E3" style="display: block; margin: auto;" /> --- ## Quiz **Q - Which join function might you use when you want a dataset with all the columns from `x` and `y` and only the rows existing in both data frames?** -- <img src="img/06/inner-join.gif" width="40%" style="display:block; margin:auto; background-color: #FDF6E3" style="display: block; margin: auto;" /> --- ## Quiz **Q - Which join function might you use when you want rows in `x` that exist also in `y`?** -- <img src="img/06/semi-join.gif" width="40%" style="display:block; margin:auto; background-color: #FDF6E3" style="display: block; margin: auto;" /> --- class: middle, center # Questions? --- ## Let's Practice Together! Go to [AE 06: Data Wrangling 2](https://sta199-summer22.netlify.app/appex/ae06_BJ.html) --- ## Bulletin - Watch videos for [Prepare: May 20](https://sta199-summer22.netlify.app/prepare/week02_may20_BJ.html) - Submit your `ae06` - Lab 02 due Friday, May 20 at 11:59pm - HW 01 due Tuesday, May 24 at 11:59pm