Introduction
I live in Orlando Florida, which is less than an hour away from Cape Canaveral. As you might imagine, visiting the space center and watching launches is a “thing” we Orlando folks do fairly often.

I’ve also been getting into R and data science recently via Garrett Grolemund and Hadley Wickham’s excellent R for Data Science. To apply the things I’m learning, I thought it’d be fun to analyze this week’s Tidy Tuesday astronauts dataset.
I’ll follow the analysis process suggested by R for Data Science:
- Import/Tidy
- Explore (via Transforming, Visualizing, and Modeling the data)
- Repeat exploration loop.
- Communicate results.
If you’re not interested in the journey, you can skip to the results. The graphs are cleaner and there’s no code to clutter things.
Load and Tidy
tuesdata <- tidytuesdayR::tt_load('2020-07-14')
##
## Downloading file 1 of 1: `astronauts.csv`
Let’s glimpse our data:
astronauts <- tuesdata$astronauts
glimpse(astronauts)
## Rows: 1,277
## Columns: 24
## $ id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ number <dbl> 1, 2, 3, 3, 4, 5, 5, 6, 6, 7, 7, 7, 8, 8, 9,…
## $ nationwide_number <dbl> 1, 2, 1, 1, 2, 2, 2, 4, 4, 3, 3, 3, 4, 4, 5,…
## $ name <chr> "Gagarin, Yuri", "Titov, Gherman", "Glenn, J…
## $ original_name <chr> "ГАГАРИН Юрий Алексеевич", "ТИТОВ Герман Сте…
## $ sex <chr> "male", "male", "male", "male", "male", "mal…
## $ year_of_birth <dbl> 1934, 1935, 1921, 1921, 1925, 1929, 1929, 19…
## $ nationality <chr> "U.S.S.R/Russia", "U.S.S.R/Russia", "U.S.", …
## $ military_civilian <chr> "military", "military", "military", "militar…
## $ selection <chr> "TsPK-1", "TsPK-1", "NASA Astronaut Group 1"…
## $ year_of_selection <dbl> 1960, 1960, 1959, 1959, 1959, 1960, 1960, 19…
## $ mission_number <dbl> 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 2, 3, 1, 2, 1,…
## $ total_number_of_missions <dbl> 1, 1, 2, 2, 1, 2, 2, 2, 2, 3, 3, 3, 2, 2, 3,…
## $ occupation <chr> "pilot", "pilot", "pilot", "PSP", "Pilot", "…
## $ year_of_mission <dbl> 1961, 1961, 1962, 1998, 1962, 1962, 1970, 19…
## $ mission_title <chr> "Vostok 1", "Vostok 2", "MA-6", "STS-95", "M…
## $ ascend_shuttle <chr> "Vostok 1", "Vostok 2", "MA-6", "STS-95", "M…
## $ in_orbit <chr> "Vostok 2", "Vostok 2", "MA-6", "STS-95", "M…
## $ descend_shuttle <chr> "Vostok 3", "Vostok 2", "MA-6", "STS-95", "M…
## $ hours_mission <dbl> 1.77, 25.00, 5.00, 213.00, 5.00, 94.00, 424.…
## $ total_hrs_sum <dbl> 1.77, 25.30, 218.00, 218.00, 5.00, 519.33, 5…
## $ field21 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ eva_hrs_mission <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.…
## $ total_eva_hrs <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.…
Each row is an astronaut and the mission they accomplished. Columns are variables whose meaning is fairly clear from the name, with the exception of field21.
Let’s rename it. The docs say that it represents “Instances of EVA by mission.”:
astronauts <- astronauts %>%
rename(evas_by_mission = field21)
Exploration Loop 1
I’m curious what the spread of astronauts is by sex.
astronauts %>%
ggplot(aes(sex)) +
geom_bar()

Unfortunately, this isn’t surprising. I wonder if the ratio of male to female astronauts has become more equal over time.1 Let’s see:
astronauts %>%
ggplot(aes(year_of_mission, fill = sex)) +
geom_bar()

It’s not crystal clear from here whether the ratio has improved over time. Let’s confirm explicitly by creating, plotting, and fitting a line to a ratio variable.
astronauts %>%
group_by(year_of_mission) %>%
summarise(ratio = sum(sex == "female") / sum(sex == "male")) %>%
ggplot(aes(year_of_mission, ratio)) +
geom_point() +
geom_smooth(se = F)
## `summarise()` ungrouping output (override with `.groups` argument)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Looks like there was more equality since the 60s, but there may be some tapering off starting in the 2000s.
What the heck happened in ~1960? That’s an unusually high ratio.
astronauts %>%
filter(between(year_of_mission, 1960, 1970)) %>%
group_by(year_of_mission) %>%
count(sex)
## # A tibble: 11 x 3
## # Groups: year_of_mission [10]
## year_of_mission sex n
## <dbl> <chr> <int>
## 1 1961 male 2
## 2 1962 male 5
## 3 1963 female 1
## 4 1963 male 2
## 5 1964 male 3
## 6 1965 male 12
## 7 1966 male 10
## 8 1967 male 1
## 9 1968 male 7
## 10 1969 male 23
## 11 1970 male 5
Ah. Only three astronauts went on missions in 1963 and one of them was female. Makes sense now.
Exploration Loop 2
I’m curious what the spread of astronauts is by nationality.
astronauts %>%
ggplot(aes(nationality)) +
geom_bar()

That’s not useful. Let’s drop nationalities that appear less than 10 times in the dataset, flip the axis, and sort.
astronauts %>%
add_count(nationality) %>%
filter(n > 10) %>%
ggplot(aes(x = fct_reorder(nationality, n))) +
geom_bar() +
coord_flip()

Better. Looks like the US dominates missions overall.
Let’s try looking at the the ratio of US astronauts on missions over time:
astronauts %>%
group_by(year_of_mission) %>%
summarise(ratio = sum(nationality == "U.S.") / n()) %>%
ggplot(aes(year_of_mission, ratio)) +
geom_point() +
geom_smooth(se = F)
## `summarise()` ungrouping output (override with `.groups` argument)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Interesting. I didn’t realize the U.S. peaked in terms of share of astronauts sent to space in the mid-90s. This makes me wonder how the number of U.S. missions have changed over time.
astronauts %>%
count(year_of_mission, wt = sum(nationality == "U.S.")) %>%
ggplot(aes(year_of_mission, n)) +
geom_point() +
geom_smooth(se = F)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Super interesting! I remember thinking that Obama’s shutting of the shuttle program would be an inflection point of NASA’s activity, but this suggests that the inflection point was before Obama was even elected: ~1994.
Results
This data set suggests three interesting conclusions:
3. The raw number of U.S. astronauts on missions has been in decline since the late 90s, long before Obama cancelled the Constellation Program

I’m going to have a daughter soon, and if she wants to be an astronaut, I sure hope she doesn’t have to deal with any bias.↩︎


