Data Analysis

Overview

The goal of this analysis is to understand how dependency ratios in Washington State are evolving temporally and spatially. How are the statewide ratios evolving? Which counties have the highest ratios? Are there regional differences in dependency ratios? If so, are there any patterns to the distribution of high dependency ratios?

Statewide dependency ratios

As the chart and table below show, the total dependency ratio for the State is increasing. Child and aged dependency ratios are moving in opposite directions. Child dependency is decreasing slowly while the aged dependency ratio is increasing. The aged dependency ratio is driving the trend in total dependency over the 10 years from 2011 to 2021.

Show the code

df_counties |> 
  filter(geography == "Washington State") |> 
  select(-(age_65:age_1)) |> 
  pivot_longer(total_dep_ratio:aged_dep_ratio, names_to = "dep_ratio", values_to = "ratio_val") |> 
  mutate(
    dep_ratio = fct_rev(fct_recode(as_factor(dep_ratio), 
                             "Total" = "total_dep_ratio",
                             "Child" = "child_dep_ratio",
                             "Aged" = "aged_dep_ratio"))
  ) |> 
  ggplot(aes(x=year, y=ratio_val, fill=dep_ratio)) +
  geom_col(position = "dodge", colour="white") +
  labs(
    x = element_blank(),
    y = "Value",
    fill = "Dependency ratio",
    caption="Data source: Washington State County Demography Dashboard",
    title = paste("Total dependency increases from", start_year, "to", last_year),
    subtitle = "Child and aged dependencies move in opposite directions"
  ) +
  theme(
    legend.position = "bottom"
  )

Show the code

df_counties |> 
  filter(geography == "Washington State") |> 
  select( "Year"=year, "Child"=child_dep_ratio, "Aged"=aged_dep_ratio, "Total"=total_dep_ratio) |> 
  mutate(
    pct_change_total = round((Total - lag(Total, n=1))/lag(Total, n=1)*100, 2)
  ) |> 
  gt() |> 
  tab_header(
    title = paste("Changes in dependency ratios in Washington State,", start_year, "-", last_year)
  ) |> 
  cols_label(
    pct_change_total = "Percent Change Total"
  ) |> 
  tab_options(
    table.width = pct(100)
  )

Year	Child	Aged	Total	Percent Change Total
Changes in dependency ratios in Washington State, 2011 - 2021
2011	28.29	18.40	46.69	NA
2012	28.22	19.22	47.44	1.61
2013	28.17	20.01	48.18	1.56
2014	28.11	20.69	48.81	1.31
2015	28.07	21.38	49.45	1.31
2016	28.08	21.99	50.06	1.23
2017	28.14	22.62	50.76	1.40
2018	28.15	23.37	51.52	1.50
2019	28.07	24.09	52.16	1.24
2020	27.94	24.86	52.80	1.23
2021	27.73	25.71	53.45	1.23

Understanding the total dependency ratio trend

In order to quantify how total dependency is changing in Washington a simple linear model was employed. The model indicates that in each subsequent year the total dependency ratio increases by about 0.67 and the intercept can be interpreted as the value of the total dependency ratio in 2010.

Show the code

lm_fit <- df_counties |> 
  filter(geography == "Washington State") |>
  mutate(
    year = as.numeric(year)
  ) |> 
  lm(total_dep_ratio ~ year, data = _)

model_estimates <- broom::tidy(lm_fit)

model_estimates |> 
  gt() |> 
  tab_header(
    title = "Simple linear model of total depency ratio"
  ) |> 
  fmt_number(
    decimals = 2
  )

term	estimate	std.error	statistic	p.value
Simple linear model of total depency ratio
(Intercept)	46.09	0.03	1,468.50	0.00
year	0.67	0.00	145.20	0.00

The plot below shows the model fit and confidence interval.

Show the code

df_counties |> 
  filter(geography == "Washington State") |>
  ggplot(aes(x=as.numeric(year), y=total_dep_ratio)) +
  geom_point() +
  geom_smooth(method="lm", se=TRUE) +
  scale_x_continuous(
    breaks = 1:11,
    minor_breaks = NULL,
    labels = as.character(2011:2021)
  ) +
  labs(
    title = paste("Washington State total dependecy ratio trend",start_year,"-",last_year),
    x = "Year",
    y = "Total dependency ratio",
    caption="Data source: Washington State County Demography Dashboard"
  )

The confidence interval is very tight around the trend line.

Prediciting future total dependency ratios

We can project this model forward a few years to see what total dependency ratios will be in subsequent years. When the 2022-23 demographic data become available, they can be compared with model predictions and the model can be updated.

Show the code

# 2022 and 2023 are the 12th and 13th years in the series.
TDR_preds <- predict(lm_fit, newdata = data.frame(year = c(12,13)), type="response")

In this case, the model predicts total dependency ratios of 54.15 and 54.82 for 2022 and 2023, respectively.

Counties with highest total dependency ratios

The table below shows the counties with the highest total dependency ratios in 2021. The counties in this list all have much higher aged dependency ratios than the statewide value (25.71) while generally having typical child dependency ratios. Jefferson county being the notable exception with a child dependency ratio of 18.58.

Show the code

df_high_ratio <- df_counties |> 
  filter(geography != "Washington State" & year == 2021) |> 
  arrange(desc(total_dep_ratio)) |> 
  select("County"= geography, "Child"=child_dep_ratio, "Aged"=aged_dep_ratio, "Total"=total_dep_ratio) |>
  head(10)

df_high_ratio |> 
  gt() |> 
  tab_header(
    title = paste("Counties with highest dependency ratios in",last_year)
  ) |> 
  tab_options(
    table.width = pct(100)
  )

County	Child	Aged	Total
Counties with highest dependency ratios in 2021
Jefferson	18.58	79.39	97.97
Garfield	31.60	59.65	91.26
Pacific	24.40	66.51	90.92
Wahkiakum	24.47	64.54	89.01
Clallam	24.55	62.54	87.10
San Juan	19.69	67.15	86.84
Lincoln	30.61	53.93	84.55
Pend Oreille	27.34	51.14	78.48
Columbia	25.46	52.23	77.69
Ferry	27.13	50.42	77.55

Where these counties are located

In order to understand how dependency ratios vary across the state, shape files for the counties are imported and the dependency ratios are overlaid on a map of the state.

Show the code

wa_county <- counties(state = "WA", cb = TRUE, class = "sf", progress_bar = FALSE)

label_size <- 3.5

wa_county |> 
  left_join(df_high_ratio, by=c("NAME"="County")) |>
  mutate(
    NAME = if_else(is.na(Total), "", NAME)
  ) |>
  ggplot() +
  geom_sf(aes(fill=Total), colour="white", ) +
  geom_label_repel(aes(label = NAME, geometry = geometry),
                  stat = "sf_coordinates", size = label_size) +
  # for the same colour scale for all dependency maps with limits
  scale_fill_continuous(type = "viridis",limits=c(0, 100)) +
  map_theme +
  labs(
    title="Counties with highest total dependency ratios in 2021",
    caption="Data source: Washington State County Demography Dashboard",
    fill="Total dependency ratio"
  )

From this map, it is clear that the distribution of the high dependency counties is not random. They are located on the east/west extremes of the state. Furthermore, 5 of the top 6 are located on the west coast, while the other five are located in the eastern quarter of the state.

Child dependency comparisons

These counties have roughly similar rates of child dependency, with Jefferson and San Juan Counties being notable exceptions.

Show the code

wa_county |> 
  left_join(df_high_ratio, by=c("NAME"="County")) |>
  mutate(
    NAME = if_else(is.na(Child), "", NAME)
  ) |>
  ggplot() +
  geom_sf(aes(fill=Child), colour="white", ) +
  geom_label_repel(aes(label = NAME, geometry = geometry),
                  stat = "sf_coordinates", size = label_size) +
  # for the same colour scale for all dependency maps with limits
  scale_fill_continuous(type = "viridis",limits=c(0, 100)) +
  map_theme +
  labs(
    title="Child dependency ratios for counties with highest\ntotal dependency ratios in 2021",
    caption="Data source: Washington State County Demography Dashboard",
    fill="Child dependency ratio"
  )

That said, there is a statistically significant difference, with \(\alpha=0.05\), between the mean child dependency ratios of the west coast and eastern counties.

Show the code

west_coast_CDR <- c(18.85, 24.40, 24.47, 24.55, 19.69)
eastern_CDR <- c(31.6, 30.61, 27.34, 25.46, 27.13)
state_CDR_mean <- 27.73

t.test(
  west_coast_CDR, # west coast counties
  eastern_CDR  # eastern counties
)


    Welch Two Sample t-test

data:  west_coast_CDR and eastern_CDR
t = -3.5038, df = 7.9094, p-value = 0.00818
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.016492  -2.055508
sample estimates:
mean of x mean of y 
   22.392    28.428

If we compare the west coast counties to the statewide mean, there is a statistically significant difference at the \(\alpha=0.05\) significance level.

Show the code

t.test(
  x=west_coast_CDR,
  mu = state_CDR_mean
)


    One Sample t-test

data:  west_coast_CDR
t = -4.1649, df = 4, p-value = 0.01409
alternative hypothesis: true mean is not equal to 27.73
95 percent confidence interval:
 18.83351 25.95049
sample estimates:
mean of x 
   22.392

If we compare the eastern counties to the statewide mean, we can see that there is not a statistically significant difference at the \(\alpha=0.05\) significance level.

Show the code

t.test(
  x=eastern_CDR,
  mu = state_CDR_mean
)


    One Sample t-test

data:  eastern_CDR
t = 0.60638, df = 4, p-value = 0.577
alternative hypothesis: true mean is not equal to 27.73
95 percent confidence interval:
 25.23205 31.62395
sample estimates:
mean of x 
   28.428

Aged dependency comparisons

The map shows aged dependency ratios for these counties. It is clear that the west coast counties have higher ratios than the eastern counties.

Show the code

wa_county |> 
  left_join(df_high_ratio, by=c("NAME"="County")) |>
  mutate(
    NAME = if_else(is.na(Aged), "", NAME)
  ) |>
  ggplot() +
  geom_sf(aes(fill=Aged), colour="white", ) +
  geom_label_repel(aes(label = NAME, geometry = geometry),
                  stat = "sf_coordinates", size = label_size) +
  # for the same colour scale for all dependency maps with limits
  scale_fill_continuous(type = "viridis",limits=c(0, 100)) +
  map_theme +
  labs(
    title="Aged dependency ratios for counties with highest\ntotal dependency ratios in 2021",
    caption="Data source: Washington State County Demography Dashboard",
    fill="Aged dependency ratio"
  )

With \(\alpha = 0.05\), there is a statistically significant difference between the means of the two regions’ counties.

Show the code

t.test(
  c(79.39, 66.51, 64.54, 62.54, 67.15), # west coast counties
  c(59.65, 53.93, 51.14, 52.23, 50.42)  # eastern counties
)


    Welch Two Sample t-test

data:  c(79.39, 66.51, 64.54, 62.54, 67.15) and c(59.65, 53.93, 51.14, 52.23, 50.42)
t = 4.2993, df = 6.2829, p-value = 0.004584
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  6.359444 22.744556
sample estimates:
mean of x mean of y 
   68.026    53.474

The aged dependency ratios for both county groups is so much higher than the statewide mean that performing a statistical test is unnecessary.

Key takeaway

Among the counties with the highest total dependencies there are significant differences between these counties and the statewide value. Furthermore, there are differences between the regional subgroups in these 10 counties.

Modeling changes in total dependency

Predicting future changes to the total dependency ratio for counties can help county planners understand what budgetary pressures they might expect to face in the coming years. While attempting to forecast far into the future comes with risk, we can use simple linear models to project into the near future for the counties with the highest total dependency ratios.

The trend lines for all 10 counties show a clear upward movement.

Show the code

high_ratio_counties <- df_high_ratio |> 
  select(County) |> 
  pull()

df_county_models <- df_counties |> 
  filter(geography %in% high_ratio_counties) |> 
  select(-c(age_65:age_1, child_dep_ratio, aged_dep_ratio)) |>
  mutate(
    year = as.numeric(year)
  ) |> 
  group_by(geography) |> 
  nest() |> 
  mutate(
    model = map(.x=data, ~lm(total_dep_ratio ~ year, data=.x) |> tidy())
  )

df_county_models |> 
  unnest(data) |>
  ggplot(aes(x=year, y=total_dep_ratio)) +
  geom_point() +
  geom_smooth(method = "lm", se=TRUE) +
  facet_wrap(~geography, ncol=2) +
  scale_x_continuous(
    breaks = c(1:11),
    labels = as.character(2011:2021),
    minor_breaks = NULL,
  ) +
  labs(
    x=element_blank(),
    y="Total dependency ratio",
    title="Simple regression models of total dependency ratios",
    caption="Data source: Washington State County Demography Dashboard"
  ) +
  theme(
    axis.text.x = element_text(
      angle = 45
    )
  )

The following table shows the model slopes for each of the ten counties. The interpretation of these slopes is that for each year the total dependency ratio will change by the amount of the slope. For Jefferson County, the state with the highest total dependency ratio, we can expect the total dependency ratio to change by 3.31 points each year. This almost 5 times higher than Washington State’s model slope of 0.67. Garfield County has the steepest slope, but as discussed below, the integrity of the Garfield County data is in question. The standard error for the Garfield County slope reflects this.

Show the code

df_county_models |> 
  select(-data) |> 
  unnest(model) |>
  select(-statistic) |> 
  pivot_wider(names_from = term, values_from = estimate) |> 
  rename(Intercept = "(Intercept)", "Slope"=year) |> 
  filter(is.na(Intercept)) |> 
  select(-Intercept, County = "geography", Slope, "Std Error" = "std.error", "P Value"="p.value") |> 
  relocate(Slope, .before = "Std Error") |> 
  arrange(desc(Slope)) |> 
  ungroup() |> 
  gt() |> 
  tab_header(
    title = "Rates of change for total dependency ratio by county"
  ) |>
  fmt_number(
    decimals = 2
  ) |> 
  cols_align(
    align = "left",
    columns = County
  )

County	Slope	Std Error	P Value
Rates of change for total dependency ratio by county
Garfield	3.44	0.31	0.00
Jefferson	3.31	0.06	0.00
San Juan	2.72	0.07	0.00
Ferry	2.46	0.04	0.00
Pacific	2.44	0.07	0.00
Clallam	2.26	0.03	0.00
Pend Oreille	2.07	0.05	0.00
Wahkiakum	2.06	0.04	0.00
Lincoln	1.99	0.07	0.00
Columbia	1.49	0.11	0.00

Potential problems with Garfield County data

The Garfield County data has an interesting bend in it that occurred in 2016. This behaviour isn’t seen in the other counties. The Garfield County data for 2015-2016 shows the following:

Show the code

df_counties |> 
  filter(geography == "Garfield" & year %in% c(2015,2016)) |>
  select(-(age_65:age_1)) |>
  select("Child"=child_dep_ratio, "Aged"=aged_dep_ratio, "Total"=total_dep_ratio) |> 
  gt() |> 
  tab_header(
    title = paste("Garfield County dependency data, 2015-2016")
  ) |> 
  tab_options(
    table.width = pct(100)
  )

Child	Aged	Total
Garfield County dependency data, 2015-2016
25.72	44.72	70.43
30.44	51.62	82.06

There is a noticeable jump in both the child and aged dependency rates from 2015 to 2016. There is not a significant change in the total population of the county over the decade from 2011 to 2021 as shown in the following table:

Show the code

df_counties |> 
  filter(geography=="Garfield") |> 
  select("Year" = year, age_65:age_1) |> 
  pivot_longer(cols=age_65:age_1, names_to = "brackets", values_to = "value") |> 
  group_by(Year) |> 
  summarise(`Total Population` = ceiling(sum(value))) |> 
  gt()

Year	Total Population
2011	2264
2012	2271
2013	2271
2014	2267
2015	2296
2016	2228
2017	2233
2018	2249
2019	2271
2020	2287
2021	2301

There is a small decline from 2015 to 2016. In looking for an explanation for this jump I consulted the Washington Regional Economic Analysis Project as well as Garfield County, WA looking for a possible explanation for this change.

More information needed

Additional consultation with Washington State Department of Health is required to understand what caused this pattern in the Garfield County data.

Continue to Conclusions