AE 02: Part 2, Bechdel Test + R Markdown

Your task is to fill in the blanks denoted by ___.

Introduction

In this mini analysis we work with the data used in the FiveThirtyEight story titled “The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”. The data contains information about if movies pass the Bechdel test which checks the following three criteria: (1) it has to have at least two women in it, who (2) who talk to each other, about (3) something besides a man.

Data and Packages

We start with loading the packages we’ll use.

library(fivethirtyeight) # for data
library(tidyverse)

The dataset contains information on 1794 movies released between 1970 and 2013. However, we’ll focus our analysis on movies released between 2000 and 2013. We first make a new dataset with year between 2000 and 2013 by the pipe operator %>% and filter(). The function filter() is to filter observations meeting the given condition, and you can think of the pipe operator %>% as a “then”. Hence, the following code translates to “you take the bechdel data, then you filter the data with year between 2000 and 2013”.

bechdel00_13 <- bechdel %>% 
  filter(between(year, 2000, 2013))

Let’s preview our data with the glimpse() function:

glimpse(bechdel00_13)

## Rows: 1,278
## Columns: 15
## $ year          <int> 2013, 2012, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 20…
## $ imdb          <chr> "tt1711425", "tt1343727", "tt2024544", "tt1272878", "tt0…
## $ title         <chr> "21 & Over", "Dredd 3D", "12 Years a Slave", "2 Guns", "…
## $ test          <chr> "notalk", "ok-disagree", "notalk-disagree", "notalk", "m…
## $ clean_test    <ord> notalk, ok, notalk, notalk, men, men, notalk, ok, ok, no…
## $ binary        <chr> "FAIL", "PASS", "FAIL", "FAIL", "FAIL", "FAIL", "FAIL", …
## $ budget        <int> 13000000, 45000000, 20000000, 61000000, 40000000, 225000…
## $ domgross      <dbl> 25682380, 13414714, 53107035, 75612460, 95020213, 383624…
## $ intgross      <dbl> 42195766, 40868994, 158607035, 132493015, 95020213, 1458…
## $ code          <chr> "2013FAIL", "2012PASS", "2013FAIL", "2013FAIL", "2013FAI…
## $ budget_2013   <int> 13000000, 45658735, 20000000, 61000000, 40000000, 225000…
## $ domgross_2013 <dbl> 25682380, 13611086, 53107035, 75612460, 95020213, 383624…
## $ intgross_2013 <dbl> 42195766, 41467257, 158607035, 132493015, 95020213, 1458…
## $ period_code   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ decade_code   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

There are ___ such movies.

The financial variables we’ll focus on are the following:

budget_2013: Budget in 2013 inflation adjusted dollars
domgross_2013: Domestic gross (US) in 2013 inflation adjusted dollars
intgross_2013: Total International (i.e., worldwide) gross in 2013 inflation adjusted dollars

And we’ll also use the binary and clean_test variables for grouping.

Analysis

Part 1

Let’s take a look at how median budget and gross vary by whether the movie passed the Bechdel test, which is stored in the binary variable.

bechdel00_13 %>%
  group_by(binary) %>%
  summarise(med_budget = median(budget_2013),
            med_domgross = median(domgross_2013, na.rm = TRUE),
            med_intgross = median(intgross_2013, na.rm = TRUE))

## # A tibble: 2 × 4
##   binary med_budget med_domgross med_intgross
##   <chr>       <dbl>        <dbl>        <dbl>
## 1 FAIL     46231854    56199150.    106399210
## 2 PASS     29955246    41913460      75682603

Part 2

Next, let’s take a look at how median budget and gross vary by a more detailed indicator of the Bechdel test result. This information is stored in the clean_test variable, which takes on the following values:

ok = passes test
dubious
men = women only talk about men
notalk = women don’t talk to each other
nowomen = fewer than two women

bechdel00_13 %>%
  # group_by(___) %>%
  summarise(med_budget = median(budget_2013),
            med_domgross = median(domgross_2013, na.rm = TRUE),
            med_intgross = median(intgross_2013, na.rm = TRUE))

## # A tibble: 1 × 3
##   med_budget med_domgross med_intgross
##        <dbl>        <dbl>        <dbl>
## 1  36145608.     48120751     92571961

Part 3

In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test, we’ll first create a new variable called roi as the ratio of the gross to budget.

bechdel00_13 <- bechdel00_13 %>%
  mutate(roi = (intgross_2013 + domgross_2013) / budget_2013)

Let’s see which movies have the highest return on investment.

bechdel00_13 %>%
  arrange(desc(roi)) %>% 
  select(title, roi, year)

## # A tibble: 1,278 × 3
##    title                      roi  year
##    <chr>                    <dbl> <int>
##  1 Paranormal Activity      671.   2007
##  2 Napoleon Dynamite        227.   2004
##  3 Once                     190.   2006
##  4 The Devil Inside         155.   2012
##  5 Primer                   142.   2004
##  6 Fireproof                134.   2008
##  7 Saw                      132.   2004
##  8 My Big Fat Greek Wedding 119.   2002
##  9 Insidious                103.   2010
## 10 Paranormal Activity 2     87.4  2010
## # … with 1,268 more rows

Part 4

Below is a visualization of the return on investment by test result, however it’s difficult to see the distributions due to a few extreme observations.

ggplot(data = bechdel00_13, 
       mapping = aes(x = clean_test, y = roi, color = binary)) +
  geom_boxplot() +
  labs(title = "Return on investment vs. Bechdel test result",
       x = "Detailed Bechdel result",
       y = "___",
       color = "Binary Bechdel result")

What are those movies with very high returns on investment?

bechdel00_13 %>%
  filter(roi > 150) %>%
  select(title, budget_2013, domgross_2013, year)

## # A tibble: 4 × 4
##   title               budget_2013 domgross_2013  year
##   <chr>                     <int>         <dbl> <int>
## 1 The Devil Inside        1014639      54041622  2012
## 2 Paranormal Activity      505595     121251476  2007
## 3 Once                     173369      10917487  2006
## 4 Napoleon Dynamite        493277      54927590  2004

Zooming in on the movies with roi < ___ provides a better view of how the medians across the categories compare:

ggplot(data = bechdel00_13, mapping = aes(x = clean_test, y = roi, fill = binary)) +
  geom_boxplot() +
  labs(title = "___",
       subtitle = "___", # Something about zooming in to a certain level
       x = "___",
       y = "___",
       fill = "___") + 
  coord_cartesian(ylim = c(0, 15)) # zooming in without dropping data

Submitting Application Exercises

Once you have completed the activity, push your final changes to your GitHub repo.
Make sure you committed at least three times.
Check that your repo is updated on GitHub, and that’s all you need to do to submit application exercises for participation.

References

This assignment was adapted from Bechdel exercise.