+ - 0:00:00
Notes for current slide
Notes for next slide

Data Wrangling 1

Bora Jin

1 / 16

Today's Goal

  • Understand how data are organized according to a consistent set of "tidy" principles
  • Use seven key verbs to wrangle data and extract meaning

"Happy families are all alike; every unhappy family is unhappy in its own way" - Leo Tolstoy

3 / 16

Quiz

Tidy data has three related characteristics

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each value has its own cell.
4 / 16

Quiz

Tidy data has three related characteristics

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each value has its own cell.

Q - A tidy data set is tidy for any analyses (T/F).

4 / 16

Quiz

Tidy data has three related characteristics

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each value has its own cell.

Q - A tidy data set is tidy for any analyses (T/F).

F

4 / 16

Quiz

Q - What makes this data not tidy?

source: https://gist.github.com/Kimmirikwa/b69d0ea134820ea52f8481991ffae93e#file-student_results-csv

source: https://gist.github.com/Kimmirikwa/b69d0ea134820ea52f8481991ffae93e#file-student_results-csv

5 / 16

Quiz

Q - What makes this data not tidy?

  1. sex and age: multiple variables in one column
  2. test number: not a value for each observation, test 1 and test 2 should be a variable
  3. term 1-3: can be aggregated
id name phone sex age test1 test2 term
1 Mike 134 m 12 76 85 term1
1 Mike 134 m 12 84 80 term2
1 Mike 134 m 12 87 90 term3
6 / 16

Quiz

Q - Correct the following sentences about dplyr functions:

a. Its input can be a matrix.

b. Its output is in various forms.

c. Functions are nouns.

d. Replace the input data frame.

7 / 16

Quiz

Q - Correct the following sentences about dplyr functions:

a. Its input should be a data frame.

b. Its output is always a data frame.

c. Functions are verbs.

d. Do not modify the input data frame.

8 / 16

Quiz

Q - Whic of these is the most appropriate translation of %>% in English?

a. Before

b. Which

c. Then

d. Do

9 / 16

Quiz

Q - What of these is the most appropriate translation of %>% in English?

a. Before

b. Which

c. Then

d. Do

10 / 16

Quiz

Q - Connect each function to its effect.

filter -

arrange -

select -

distinct -

mutate -

summarize -

group_by -

  • pick columns by variable names

  • reorder rows by values of variables

  • pick rows matching criteria

  • enable grouped operations

  • compute summary statistics

  • extract unique rows

  • create new variables

11 / 16

Quiz

Q - Connect each function to its effect.

12 / 16

Questions?

13 / 16

Let's Practice Together!

Go to AE 05: Data Wrangling 1

14 / 16

Bulletin

  • Watch videos for Prepare: May 18

  • Make sure you commit at least 3 times for AEs

    • Cannot give full credit for fewer than 3 commits from ae03
  • Submit your ae04

  • Complete up to filter() in ae05

  • HW 01 released

15 / 16

Updated Bulletin on May 18

  • No prepare for May 19

  • Submit your ae05

16 / 16
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow