+ - 0:00:00
Notes for current slide
Notes for next slide

Data Visualization 1

Bora Jin

1 / 14

Today's Goal

  • Build up an effective visualization systematically layer by layer with ggplot2

"The simple graph has brought more information to the data analyst's mind than any other device" - John Tukey

3 / 14

Quiz

Q - What does each row of a dataset represent?

4 / 14

Quiz

Q - What does each row of a dataset represent?

Observation

4 / 14

Quiz

Q - What does each row of a dataset represent?

Observation

Q - What does each column of a dataset represent?

4 / 14

Quiz

Q - What does each row of a dataset represent?

Observation

Q - What does each column of a dataset represent?

Variable

4 / 14

Quiz

Q - What does each row of a dataset represent?

Observation

Q - What does each column of a dataset represent?

Variable

Q - All variables in a dataset must be in the same type (character, integer, etc.). (T/F)

4 / 14

Quiz

Q - What does each row of a dataset represent?

Observation

Q - What does each column of a dataset represent?

Variable

Q - All variables in a dataset must be in the same type (character, integer, etc.). (T/F)

F

4 / 14

Quiz

Q - What does each row of a dataset represent?

Observation

Q - What does each column of a dataset represent?

Variable

Q - All variables in a dataset must be in the same type (character, integer, etc.). (T/F)

F

Q - In this class, we mainly focus on variables having a single entry per observation. (T/F)

4 / 14

Quiz

Q - What does each row of a dataset represent?

Observation

Q - What does each column of a dataset represent?

Variable

Q - All variables in a dataset must be in the same type (character, integer, etc.). (T/F)

F

Q - In this class, we mainly focus on variables having a single entry per observation. (T/F)

T

4 / 14

Quiz

Q - Which function spits out the dimension of a dataset?

5 / 14

Quiz

Q - Which function spits out the dimension of a dataset?

dim() (row, column)

5 / 14

Quiz

Q - Which function spits out the dimension of a dataset?

dim() (row, column)

Q - How to check the number of rows of a dataset?

5 / 14

Quiz

Q - Which function spits out the dimension of a dataset?

dim() (row, column)

Q - How to check the number of rows of a dataset?

nrow()

5 / 14

Quiz

Q - Which function spits out the dimension of a dataset?

dim() (row, column)

Q - How to check the number of rows of a dataset?

nrow()

Q - How to check the number of columns of a dataset?

5 / 14

Quiz

Q - Which function spits out the dimension of a dataset?

dim() (row, column)

Q - How to check the number of rows of a dataset?

nrow()

Q - How to check the number of columns of a dataset?

ncol()

5 / 14

Quiz

Q - What is the information that you cannot get using glimpse()?

a. number of rows

b. number of cols

c. list of variables

d. variable types

e. summaries of variables

6 / 14

Quiz

Q - What is the information that you cannot get using glimpse()?

a. number of rows

b. number of cols

c. list of variables

d. variable types

e. summaries of variables

7 / 14

Quiz

Q - What is EDA?

8 / 14

Quiz

Q - What is EDA?

  • Approach to analysing datasets to summarize and describe its main characteristics
  • Often visual
8 / 14

Quiz

Q - What is EDA?

  • Approach to analysing datasets to summarize and describe its main characteristics
  • Often visual

Q - Why is visualization important?

8 / 14

Quiz

Q - What is EDA?

  • Approach to analysing datasets to summarize and describe its main characteristics
  • Often visual

Q - Why is visualization important?

It can reveal a new story that summary statistics could miss!

8 / 14

Quiz

Q - What is EDA?

  • Approach to analysing datasets to summarize and describe its main characteristics
  • Often visual

Q - Why is visualization important?

It can reveal a new story that summary statistics could miss!

Q - Before plotting, what package do we load?

8 / 14

Quiz

Q - What is EDA?

  • Approach to analysing datasets to summarize and describe its main characteristics
  • Often visual

Q - Why is visualization important?

It can reveal a new story that summary statistics could miss!

Q - Before plotting, what package do we load?

library(tidyverse) (ggplot2 tidyverse)

8 / 14

Quiz

Q - When plotting, what are the three fundamental things to define and the relevant code?

9 / 14

Quiz

Q - When plotting, what are the three fundamental things to define and the relevant code?

  • Data in ggplot(data = )
  • Aesthetics in ggplot(mapping = aes())
  • Geometries in geom_xxxx()

layers added by +

9 / 14

Quiz

Q - When plotting, what are the three fundamental things to define and the relevant code?

  • Data in ggplot(data = )
  • Aesthetics in ggplot(mapping = aes())
  • Geometries in geom_xxxx()

layers added by +

ggplot(data = [dataset],
mapping = aes(x = [x-variable], y = [y-variable])) +
geom_xxx() +
other options
9 / 14

Quiz

Q - What are other options include?

10 / 14

Quiz

Q - What are other options include?

  • Labels and legends in labs()
  • More aesthetics options (color, shape, size, alpha)
  • Faceting
  • Many more!
10 / 14

Quiz

Q - What are other options include?

  • Labels and legends in labs()
  • More aesthetics options (color, shape, size, alpha)
  • Faceting
  • Many more!

Q - For aesthetics options, can you explain mapping vs. setting?

10 / 14

Quiz

Q - What are other options include?

  • Labels and legends in labs()
  • More aesthetics options (color, shape, size, alpha)
  • Faceting
  • Many more!

Q - For aesthetics options, can you explain mapping vs. setting?

  • Mapping: map plotting characters to a specific variable in data through aes()
  • Setting: determine them without regard to data values through geom_xxxx()
10 / 14

Quiz

Q - Why faceting?

11 / 14

Quiz

Q - Why faceting?

  • To explore conditional relationships
  • Multiple smaller plots in one plot
11 / 14

Quiz

Q - Why faceting?

  • To explore conditional relationships
  • Multiple smaller plots in one plot

Q - Two functions for faceting?

11 / 14

Quiz

Q - Why faceting?

  • To explore conditional relationships
  • Multiple smaller plots in one plot

Q - Two functions for faceting?

  • facet_grid([row-variable] ~ [column-variable])
  • facet_wrap(~ [variable])
11 / 14

Questions?

12 / 14

Bulletin

  • Watch videos for Prepare: May 16

  • Submit your ae02

  • Complete Part 1-3 in ae03

14 / 14
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow