+ - 0:00:00
Notes for current slide
Notes for next slide

Data Science Ethics

Bora Jin

1 / 19

Material

🎥 Watch Misrepresentation

🎥 Watch Data Privacy

🎥 Watch Algorithmic Bias

2 / 19

Today's Goal

  • Understand data misrepresentation, data ethics, and algorithmic bias
  • Critique visualizations and interpretations that misrepresent data or results of analysis
  • Improve data visualizations to better convey the right message
3 / 19

Quiz

Q - What confusion was there in this news article about interpreting the research results?

4 / 19

Quiz

Q - What confusion was there in this news article about interpreting the research results?

  • Original research

Moore, Steven C., et al. "Association of leisure-time physical activity with risk of 26 types of cancer in 1.44 million adults." JAMA internal medicine 176.6 (2016): 816-825.

  • Volunteers were asked about their physical activity level over the preceding year.
  • Compared to the bottom 10% of exercisers, the top 10% had lower rates of esophageal, liver, lung, endometrial, colon, and breast cancer.
  • No association was found between exercising and 13 other cancers (e.g. pancreatic, ovarian, and brain).
5 / 19

Quiz

Q - What confusion was there in this news article about interpreting the research results?

  • Original research

Moore, Steven C., et al. "Association of leisure-time physical activity with risk of 26 types of cancer in 1.44 million adults." JAMA internal medicine 176.6 (2016): 816-825.

  • Volunteers were asked about their physical activity level over the preceding year.
  • Compared to the bottom 10% of exercisers, the top 10% had lower rates of esophageal, liver, lung, endometrial, colon, and breast cancer.
  • No association was found between exercising and 13 other cancers (e.g. pancreatic, ovarian, and brain).

Causality vs. Association

5 / 19

Quiz

Q - What is wrong with this picture? How would you correct it?

6 / 19

Quiz

Q - What is wrong with this picture? How would you correct it?

  • x-axis is sensible in time
  • order of counties is consistent
7 / 19

Quiz

Q - Which of these is a way to help visualize uncertainty in the results of a public opinion survey?

a. Resizing based upon population

b. Including margin of error bounds

c. Including a causality discussion

d. Eliminate responses where the person surveyed has no opinion

8 / 19

Quiz

Q - Which of these is a way to help visualize uncertainty in the results a public opinion survey?

a. Resizing based upon population

b. Including margin of error bounds

c. Including a causality discussion

d. Eliminate responses where the person surveyed has no opinion

9 / 19

Quiz

Q - The OK Cupid data breach didn’t release the real names and pictures of users. Was it ethical for researchers to release the users’ data? Why?

a. No, their identities could still be easily uncovered from the details provided such as usernames.

b. No, OK Cupid users didn’t willingly give their data.

c. Yes, if you don’t know the names it isn’t a problem.

d. Yes, the data were already public.

10 / 19

Quiz

Q - The OK Cupid data breach didn’t release the real names and pictures of users. Was it ethical for researchers to release the users’ data? Why?

a. No, their identities could still be easily uncovered from the details provided such as usernames.

b. No, OK Cupid users didn’t willingly give their data.

c. Yes, if you don’t know the names it isn’t a problem.

d. Yes, the data were already public.

11 / 19

Quiz

Q - All algorithms are neutral and we should not worry about algorithmic bias. (T/F)

12 / 19

Quiz

Q - All algorithms are neutral and we should not worry about algorithmic bias. (T/F)

F

12 / 19

Quiz

Q - All algorithms are neutral and we should not worry about algorithmic bias. (T/F)

F

  • gender-bias in Google translate and Amazon's hiring algorithm
  • race-bias in facial recognition and criminal sentencing
12 / 19

Guidelines for Discussion

  • Listen respectfully. Listen actively and with an ear to understanding others’ views.

  • Criticize ideas, not individuals.

  • Commit to learning, not debating. Comment in order to share information, not to persuade.

  • Avoid blame, speculation, and inflammatory language.

  • Avoid assumptions about any member of the class or generalizations about social groups.

13 / 19

Discussion

What are the questions you should consider to avoid misrepresentation of data or results of data analysis?

03:00
14 / 19

Discussion

What are the questions you should consider to avoid misrepresentation of data or results of data analysis?

03:00
  • Is it association or causality?
  • Can I generalize the result to a bigger group?
  • Are the scales correct and not distorted?
  • Are the axes in intuitive and natural order?
  • Do size / areas align well with associated figures?
  • For estimates, how uncertain are they?
14 / 19

Discussion

What is data ethics?

03:00
15 / 19

Discussion

What is data ethics?

03:00
  • Ethics related to data practices, including, but not limited to, generation, collection, analysis, and dissemination of data
  • Ask yourself all the time to not violate reasonable expectations of privacy and ethical practices as a researcher
    • "Should I?"
    • "Is it necessary / reasonable?"
    • "Is it okay that somebody does the same to me and the data about me?"
15 / 19

Discussion

As you start working on data analyses for the STA199 project, internships, research, etc., what are 1 - 2 things you can do to ensure you’re doing the analysis in an ethical way?

03:00
16 / 19

Discussion

As you start working on data analyses for the STA199 project, internships, research, etc., what are 1 - 2 things you can do to ensure you’re doing the analysis in an ethical way?

03:00
  • Be familiar with research ethics, compliance, protocols, etc.
16 / 19

Questions?

17 / 19

Let's Practice Together!

Go to AE 08: Data Science Ethics

18 / 19

Bulletin

  • Watch videos for Prepare: May 24

  • Lab 03 due today at 11:59pm

  • HW 01 due Tuesday, May 24 at 11:59pm

  • Submit your ae08

19 / 19

Material

🎥 Watch Misrepresentation

🎥 Watch Data Privacy

🎥 Watch Algorithmic Bias

2 / 19
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow