The purpose of this project is to analyze any possible relationship between birthdays and holidays. We’ll use the R libraries dplyr, ggplot2, lubridate, as well as the Birthdays data set from the mosaicData library.
First we will draw a plot measuring the number of births for each day in our time frame.
DailyBirths <- Birthdays %>%
group_by(date) %>%
summarise(total = sum(births))
DailyBirths %>%
ggplot(aes(x=date, y = total)) + geom_point()
Next we draw a similar plot, only this time measured by week of the year.
WeeklyBirths <- Birthdays %>%
mutate(week = week(date)) %>%
group_by(week) %>%
summarise(total = sum(births))
WeeklyBirths %>%
ggplot(aes(x=week, y = total)) + geom_point()
Now by month of the year.
MonthlyBirths <- Birthdays %>%
mutate(month = month(date)) %>%
group_by(month) %>%
summarise(total = sum(births))
MonthlyBirths %>%
ggplot(aes(x=month, y = total)) + geom_point()
Here we plot by day of the year.
YDailyBirths <- Birthdays %>%
mutate(yday = yday(date)) %>%
group_by(yday) %>%
summarise(total = sum(births))
head(YDailyBirths)
## # A tibble: 6 x 2
## yday total
## <dbl> <int>
## 1 1 160369
## 2 2 169896
## 3 3 180036
## 4 4 182854
## 5 5 184145
## 6 6 186726
YDailyBirths %>%
ggplot(aes(x=yday, y = total)) + geom_point()
Next by day of the week.
WDailyBirths <- Birthdays %>%
mutate(wday = wday(date, label = TRUE)) %>%
group_by(wday) %>%
summarise(total = sum(births))
head(WDailyBirths)
## # A tibble: 6 x 2
## wday total
## <ord> <int>
## 1 Sun 8647150
## 2 Mon 10372019
## 3 Tues 10813928
## 4 Wed 10533539
## 5 Thurs 10434966
## 6 Fri 10593324
WDailyBirths %>%
ggplot(aes(x=wday, y = total)) + geom_point()
For the remainder of this document, we restrict to the year 1980. First we graph by day of the week for this period.
MyYear <- DailyBirths %>%
filter(year(date) == 1980) %>%
mutate(week_day = wday(date, label = TRUE))
MyYear %>%
ggplot(aes(x=date, y = total, color=week_day)) + geom_line()
Finally we read in a list of holidays and overlay that information with our plot.
Holidays <- read.csv("http://tiny.cc/dcf/US-Holidays.csv", stringsAsFactors = FALSE) %>%
mutate(date = as.POSIXct(dmy(date))) %>%
mutate(week_day = wday(date, label = TRUE)) %>%
filter(year(date) == 1980)
MyYear %>%
ggplot(aes(x=date, y = total, color=week_day)) +
geom_line() +
geom_vline(data = Holidays, aes(xintercept = as.numeric(date), color = week_day)) +
geom_text(data = Holidays, mapping=aes(x = date, y = 11000, label = holiday), angle = 45, size = 3, color = "black")