forked from HimesGroup/BMIN503_Final_Project
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathreport_pt1.Rmd
414 lines (330 loc) · 31.9 KB
/
report_pt1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
---
title: "Measuring the Association of CMS \"Accountable Care\" Contracting Models to Skilled Nursing Facility Staffing Ratios"
author: "David Roberts"
output:
html_document:
theme: default
highlight: tango
---
# Overview
With a rapidly aging population in the United States, skilled nursing facilities (hereafter, SNF) have garnered increased importance in American life. The sector is a magnet for controversy and [jarring headlines](https://www.nytimes.com/2021/03/13/world/in-us-nursing-homes-where-covid-19-killed-scores-even-reports-of-maggots-and-rape-dont-dock-five-star-ratings.html), an understandable fact given the vulnerability of elderly residents and numerous studies tracing the influence of business models in compromising quality of care ([non-profit vs. for-proft](https://medicareadvocacy.org/non-profit-vs-for-profit-nursing-homes-is-there-a-difference-in-care/), [private equity impact](https://www.nber.org/papers/w28474)). Recognizing the asymmetry of information for patients and caregivers choosing a facility, the Centers for Medicare (CMS) introduced [Nursing Home Compare](https://www.medicare.gov/care-compare/?providerType=NursingHome&redirect=true), a public website synthesizing a complex quality reporting program to simple, five point "star" ratings. At a high level, a SNF's overall star rating combines information regarding staffing ratios (per resident per hour), annual inspection results, and claims-based metrics, i.e. readmission rates.
The public image of nursing homes was further exacerbated during the COVID-19 pandemic, which resulted in tragic outbreaks with high mortality rates given the vulnerable population. These outbreaks were largely responsible for a reduction in the nursing home population by [~200k between 2020 and 2021](https://www.kff.org/other/state-indicator/number-of-nursing-facility-residents/?activeTab=graph¤tTimeframe=0&startTimeframe=7&sortModel=%7B%22colId%22:%22Location%22,%22sort%22:%22asc%22%7D). As one might expect, staffing ratios were an [important predictor of the occurrence and severity of outbreaks](https://pubmed.ncbi.nlm.nih.gov/32770832/). Intuitively, staffing ratios are fundamental to safe patient care. Residents will have more bedsores if they are turned less often. Infection control is more challenging with fewer people engaged in the maintenance of a clean facility. Indeed, unsafe staffing ratios are the chief grievance in recent nursing strikes at Temple Health System and elsewhere in the United States.
In this project, I consulted three former colleagues from the Delaware Valley Accountable Care Organization (DVACO), Beth Souder DPT, Liz Todd DPT, and Dr. Imelda Vasquez Flores PHD. Beth and Liz are both experts in post-acute operations, while Imelda is an economist specializing in health services research. For advice in statistical modeling, I consulted Dr. Jesse Hsu, a bio-statistician at UPENN.
## Introduction
In the first part of this project, I sought to expand work assessing the impact of business models on SNF operations. Specifically, I estimate the impact of two Medicare contracting models, Medicare Advantage (MA) and Accountable Care Organizations (ACO), on reported staffing ratios in SNF's. [CMS asserts](https://innovation.cms.gov/strategic-direction-whitepaper) that these models create an "accountable care relationship", in that they:
> ...give all participating providers the incentives and tools to deliver high-quality, coordinated, team-based care that promotes health, thereby reducing fragmentation and costs for people and the health system.
In an "accountable care relationship", CMS puts healthcare organizations (health systems, insurance companies, and provider groups) at risk for total cost of patient care to incentivize efficient care delivery. CMS then implements expansive quality reporting programs to incentivize preventative care and ensure cost reductions are not achieved by drastic service reductions. A multi-decade expansion of such models is set to continue, with CMS setting a goal of [all Medicare enrollees involved in accountable care relationships by 2030](https://innovation.cms.gov/strategic-direction-whitepaper). Given their status as the second largest category of expenditure for CMS after inpatient stays, SNF's are often the target of utilization management programs by MA organizations and ACO's. In addition, SNF's are not typically affiliated with health system, nor owned outright by vertically integrated insurance companies. Hence, they may lack an advocate in negotiations with better-capitalized organizations in a particular market.
I hypothesize that increased penetration of Accountable Care contracting models, as measured by the proportion of Medicare enrollees enrolled in such models, will contribute to a clinically meaningful, statistically statistical decrease in nursing staff ratios. Interestingly, all three of my former colleagues from DVACO expressed doubt that any relationship exists. They indicated that while ACO's and MA health plans exert influence on SNF's via direct contracting (MA) or preferred referral networks (MA and ACO's), their levers to directly influence SNF operations are limited. In addition, SNF's exhibit a water balloon effect, in which lost revenues in one area may be recouped elsewhere. Therefore, SNF's may be resilient to financial shocks which would otherwise result in reduced staffing ratios.
## Methods
### Data Sources
To model the relationship between "accountable care" penetration and SNF staffing ratios, I extracted the following publicly available CMS data-sets between 2017-2021.
* [Number of Accountable Care Organization Assigned Beneficiaries by County](https://data.cms.gov/medicare-shared-savings-program/number-of-accountable-care-organization-assigned-beneficiaries-by-county)
* [Medicare Monthly Enrollment by County, disaggregated by MA and Fee for Services](https://data.cms.gov/summary-statistics-on-beneficiary-enrollment/medicare-and-medicaid-reports/medicare-monthly-enrollment)
* [Nursing Home Compare Facility Level Data](https://data.cms.gov/provider-data/archived-data/nursing-homes)
Python scripts to extract the files, concatenate across time periods, filter irrelevant data, and harmonize disparate data sources are available in the `/transform` directory.
### Proposed Model
To investigate the relationship between staffing ratios and accountable care penetration, I propose a linear regression model as follows:
$$SR{y,c} = \alpha_{y} * \alpha_{s} + \beta_1{MA_{y,c}} + \beta_2{ACO_{y,c}} + \beta_3{CM_{y,c}} + \epsilon_{y,c}$$
where:
* $SR{y,c}$ is the reported nursing staff ratio, per resident per month in a given county and year. This is a weighted average computed from facility level data, with average daily residents as weights.
* $\alpha_{y} * \alpha_{s}$ represent interacted year-state level fixed effects. These control for variation in state regulations across time, as well as the differential impact of the COVID-19 pandemic in different regions.
* ${MA_{y,c}}$ is the percentage of total medicare enrollees which have MA health plans in a given county and year.
* $ACO_{y,c}$ is the percentage of total medicare enrollees which are ["assigned"](https://www.naacos.com/aco-assignment-in-the-medicare-share-savings-program) to ACO's in a given county and year.
* $CM_{y,c}$ is the average resident case-mix of in a given county and year. This is a weighted average computed from facility level data, with average daily residents as the weight. It is included to control for variance in resident populations across counties and time.
I hypothesize that ${MA_{y,c}}$ and $ACO_{y,c}$ is associated with a clinically meaningful reduction in $SR{y,c}$.
### Data Exploration and Cleaning
First, let's bring in the facility level SNF data. Note, this is a condensed version of the R script `analysis/nhc_compare.qmd`
```{r eval = TRUE, message = FALSE, warning = FALSE}
library(dplyr)
library(ggplot2)
library(here)
library(readr)
# Load concatenated output from `transform/nhc_provider_info.py`
provider_info_tmp <- read_delim(here("data", "interim", "provider_info_tmp.csv"),
delim = "|", escape_double = FALSE, trim_ws = TRUE)
provider_info_tmp$processing_date[provider_info_tmp$processing_date == "10/1/2019"] <- "2019-10-01" # correct parsing issue
provider_info_tmp$processing_date <- as.Date(provider_info_tmp$processing_date)
# problems(provider_info_tmp) # data ingestion issue affecting two rows...
```
Remove rows where FIPS code is unknown. Affects 758 of ~1 million rows
```{r eval = TRUE, message = FALSE, warning = FALSE}
count(provider_info_tmp) # 1 million rows pools data from ~15k facilities from 68 reporting dates (processing_date)
count(filter(provider_info_tmp, SSA_Code == "Unknown"))
count(filter(provider_info_tmp, StateAbbrev == "Unknown"))
provider_info_tmp <- filter(provider_info_tmp, StateAbbrev != "Unknown")
```
Compute descriptive stats for relevant dimensions. 1 million total rows pools data from ~15k facilities and 68 monthly reports. Note:
* Total nursing hours is the sum of hours from registered nurses (RN), licensed practical nurses (LPN), and nursing aides (CNA).
* All "ratings" are CMS "Star Ratings", reported on a scale from 1-5.
```{r eval = TRUE, message = FALSE, warning = FALSE}
library(gtsummary)
provider_info_tmp$provider_resides_in_hospital <- recode(
provider_info_tmp$provider_resides_in_hospital, YES = "Y", NO = "N"
)
select(provider_info_tmp,
ownership_type:provider_resides_in_hospital,
provider_changed_ownership_in_last_12_months:overall_rating,
health_inspection_rating, qm_rating, staffing_rating, rn_staffing_rating,
reported_nurse_aide_staffing_hours_per_resident_per_day:adjusted_total_nurse_staffing_hours_per_resident_per_day) %>% tbl_summary()
```
Where staffing or quality ratings are unknown, CMS provides footnotes indicating the reason, listed below. The most common footnote is 12, indicating difficulties in the submission of staffing date. This scenario accounts for ~55% of unknowns.
| Footnote Code | Footnote Description |
| -- | ------------------- |
| 1 | Newly certified nursing home with less than 12-15 months of data available or the nursing opened less than 6 months ago, and there were no data to submit or claims for this measure. |
| 2 | Not enough data available to calculate a star rating. |
| 6 | This facility did not submit staffing data, or submitted data that did not meet the criteria required to calculate a staffing measure. |
| 9 | The number of residents or resident stays is too small to report. Call the facility to discuss this quality measure. |
| 10 | The data for this measure is missing or was not submitted. Call the facility to discuss this quality measure. |
| 12 | This facility either did not submit staffing data, has reported a high number of days without a registered nurse onsite, or submitted data that could not be verified through an audit. |
| 13 | Results are based on a shorter time period than required. |
| 14 | This nursing home is not required to submit data for the Skilled Nursing Facility Quality Reporting Program. |
| 18 | This facility is not rated due to a history of serious quality issues and is included in the special focus facility program.|
| 19 | Scores for individual quarters are not reported for this measure. |
A single time slice, ~15k facilities, has similar distributions.
```{r eval = TRUE, message = FALSE, warning = FALSE}
#filter(provider_info_tmp, processing_date == "2022-04-01") %>%
# select(
# ownership_type:provider_resides_in_hospital,
# provider_changed_ownership_in_last_12_months:overall_rating,
# health_inspection_rating, qm_rating, staffing_rating, rn_staffing_rating,
# reported_nurse_aide_staffing_hours_per_resident_per_day:adjusted_total_nurse_staffing_hours_per_resident_per_day) #%>%
# tbl_summary()
```
Unfortunately, the day the report is cut (processing_date) does not correspond to the data collection time periods. Many measure values stay the same month over month because there is no new data to report. Despite many disruptions in reporting, SNF staffing ratios are reported consistently, with each finalized quarter reported four months post-hoc (more or less). I use the lag function below to identify when data is changing for all facilities. Based on the scatter plot, Q4 results (reported in April) are the best option to create a complete data series from 2017-2021. Ideally, I would use a full years data to account for seasonality in SNF staffing, which varies with flu season. Alas...
```{r eval = TRUE, message = FALSE, warning = FALSE}
library(lubridate)
lagged_data <- select(provider_info_tmp,
c(
"federal_provider_number",
"processing_date",
"reported_total_nurse_staffing_hours_per_resident_per_day"
)) %>%
group_by(federal_provider_number) %>%
dplyr::mutate(lag1 = lag(reported_total_nurse_staffing_hours_per_resident_per_day, n = 1, default = NA)) %>%
as.data.frame()
lagged_data <- lagged_data %>% dplyr::mutate(
change= (ifelse(lag1 != reported_total_nurse_staffing_hours_per_resident_per_day, 1, 0)
))
by_date <- lagged_data %>%
group_by(processing_date) %>%
summarize(pct_with_change = round(mean(change, na.rm=TRUE), 2))
plt <- ggplot(by_date, mapping = aes(x=processing_date, y=pct_with_change)) + geom_point() +
labs(x="Report Processing Date", y="Pct SNF's w/ Changed Value", title = "Identify Multi-Year Staffing Time Series", subtitle = "April Reporting Dates (Blue) Reflect Q4 Data Collection Period of Previous Year") +
geom_point(data=filter(by_date, month(processing_date) == 4), aes(x=processing_date, y=pct_with_change), colour="blue", size=5) +
theme_classic() + theme(plot.title = element_text(hjust = 0.5)) + theme(plot.subtitle = element_text(hjust = 0.5))
plt
#filter(by_date, pct_with_change == 1)$processing_date
```
Filter to observations with Q4 staffing data for 2018-2021. Facility count is reasonable, slowly decreasing each year.
```{r message=F, warning=F}
filter_dates = as.Date(c("2018-04-01", "2019-04-01", "2020-04-01", "2021-04-01", "2022-04-01"))
q4_snf_data <- filter(provider_info_tmp, processing_date %in% filter_dates) %>% mutate(collection_yr=(as.integer(format(processing_date, format="%Y"))) - 1)
ggplot(data = q4_snf_data, aes(x = collection_yr)) + geom_bar() +
labs(x="Data Collection Year (Q4)", y="Facility Count", title = "Count of SNF Facilities By Year") + theme_classic()
```
#### Trending Staffing Ratios
Reported staffing ratios vary each year, noticeably spiking in 2020 during the height of the delta wave. Per advisers, this is due to a concerted effort by health systems to divert lower acuity patients to the home setting in order to minimize infection risk. The data do not appear normally distributed, exhibiting a positive skew. There is also a interesting increase in dispersion during 2021, illustrating a decrease in reported staffing ratios.
```{r message=F, warning=F}
ggplot(data = q4_snf_data, aes(factor(collection_yr), reported_total_nurse_staffing_hours_per_resident_per_day)) + geom_violin() + labs(x = "Collection Year Period (Q4)", y = "Hours Per Resident Per Day", title = "Unadjusted Total Nursing Staffing Hours Per Resident Per Day (2017-2021)") + theme_classic() + theme(plot.title = element_text(hjust = 0.5))
```
The spike in staffing ratios and corresponding drop in average daily residents are visible in below table. Interestingly, the case-mix variable does not appear to do a great job capturing increased acuity...
```{r message=F, warning=F}
select(q4_snf_data,
c("collection_yr", "reported_total_nurse_staffing_hours_per_resident_per_day","adjusted_total_nurse_staffing_hours_per_resident_per_day","case-mix_total_nurse_staffing_hours_per_resident_per_day","average_number_of_residents_per_day")) %>% rename(yr = collection_yr, reported_staff_prpd = reported_total_nurse_staffing_hours_per_resident_per_day, adjusted_staff_prpd = adjusted_total_nurse_staffing_hours_per_resident_per_day, case_mix = `case-mix_total_nurse_staffing_hours_per_resident_per_day`, avg_residents_pd = average_number_of_residents_per_day) %>%
tbl_summary(by=yr)
```
Since my main explanatory variable, accountable care penetration is not observed at the facility level, I aggregate staffing ratios to the county level for symmetry. Dr. Hsu indicated that while variables from high level dimensions are often included as fixed effects, *the primary variable of interest ideally should have the same level as the dependent variable*. Otherwise, you're letting the statistical method do aggregation for you, which is confusing... As an example, census tract level social determinants of health are often included in regressions at the individual level. However, these are typically used as controls, rather than the primary variable of interest.
```{r eval = TRUE, message = FALSE, warning = FALSE}
library(tidyr)
group_cols <- c("FIPS_Code", "StateAbbrev", "StdCountyName", "collection_yr")
q4_snf_data <- q4_snf_data %>% rename(case_mix_total_nurse_prpd = `case-mix_total_nurse_staffing_hours_per_resident_per_day`,
case_mix_cna_prpd = `case-mix_nurse_aide_staffing_hours_per_resident_per_day`,
case_mix_rn_prpd = `case-mix_rn_staffing_hours_per_resident_per_day`,
chow = provider_changed_ownership_in_last_12_months)
county_staffing <- q4_snf_data %>%
group_by(across(all_of(group_cols))) %>%
summarize(
row_cnt = n(),
cerfied_beds = sum(number_of_certified_beds, na.rm=TRUE),
sum_avg_daily_residents = sum(average_number_of_residents_per_day, na.rm=TRUE),
mean_cna_reported_hprd = weighted.mean(reported_nurse_aide_staffing_hours_per_resident_per_day, average_number_of_residents_per_day, na.rm=TRUE),
mean_lpn_reported_hprd = weighted.mean(reported_lpn_staffing_hours_per_resident_per_day, average_number_of_residents_per_day, na.rm=TRUE),
mean_rn_reported_hprd = weighted.mean(reported_rn_staffing_hours_per_resident_per_day, average_number_of_residents_per_day, na.rm=TRUE),
mean_licensed_reported_hprd = weighted.mean(reported_licensed_staffing_hours_per_resident_per_day, average_number_of_residents_per_day, na.rm=TRUE),
mean_total_reported_hprd = weighted.mean(reported_total_nurse_staffing_hours_per_resident_per_day, average_number_of_residents_per_day, na.rm=TRUE),
mean_total_cm_hprd = weighted.mean(case_mix_total_nurse_prpd, average_number_of_residents_per_day, na.rm=TRUE),
mean_cna_cm_hprd = weighted.mean(case_mix_cna_prpd, average_number_of_residents_per_day, na.rm=TRUE),
mean_rn_cm_hprd = weighted.mean(case_mix_rn_prpd, average_number_of_residents_per_day, na.rm=TRUE)
)
# yummy snack for later
save(county_staffing, file=here("data", "interim", "county_level_staffing.Rda"))
save(q4_snf_data, file=here("data", "interim", "snf_provider_info.Rda"))
```
Merge MA and ACO enrollment data to create the explanatory variables.
*One important note on this data-set.* In both the Medicare enrollment by county file and the ACO enrollment file, CMS does not provide a specific value when enrollee count is less than 10 in a given year. This is an effort to protect the privacy of members. Unfortunately, this obfuscation effectively zeros out accountable care penetration for the least populous counties, which tend to be rural. Without a member-level dataset, I cannot address this issue.
```{r eval = TRUE, message = FALSE, warning = FALSE}
library(stringr)
aco_enroll <- read_delim(here("data", "interim", "aco_enrollment.csv"),
delim = "|", escape_double = FALSE, trim_ws = TRUE) # created by `transform/aco_beneficiaries_by_county.py`
med_enroll <- read_delim(here("data", "interim", "medicare_enrollment.csv"),
delim = "|", escape_double = FALSE, trim_ws = TRUE) # created by `transform/medicare_benes_by_county.py`
med_enroll <- rename(med_enroll, Year = YEAR)
```
Some data cleaning required to merge counties. Overall, great correspondence when using FIPS code to merge.
```{r eval = TRUE, message = FALSE}
aco_enroll[is.na(aco_enroll$Tot_AB_Psn_Yrs), "Tot_AB_Psn_Yrs"] <- 0
# Group aco data to the county level
aco_by_county <- aco_enroll %>%
group_by(FIPS_Code, Year) %>%
summarize(ACOBenePsnYrs = sum(Tot_AB_Psn_Yrs))
pene_df <- merge(
med_enroll,
aco_by_county,
by=c("FIPS_Code", "Year"),
all.x = TRUE
)
pene_df <- rename(
pene_df,
AllMedBenePsnYrs = TOT_BENES,
OrigMedBenePsnYrs = ORGNL_MDCR_BENES,
MA_BenePsnYrs = MA_AND_OTH_BENES,
)
pene_df <- select(
pene_df,
c("FIPS_Code","StateAbbrev", "Year", "AllMedBenePsnYrs", "OrigMedBenePsnYrs", "MA_BenePsnYrs", "ACOBenePsnYrs")
)
# filter out obs where total medicare enroll is unknown. This denominator is required to compute penetration figure.
pene_df <- filter(
pene_df,
!is.na(pene_df$AllMedBenePsnYrs)
)
# Assume zero beneficiaries from ACO and MA if no records present
pene_df[is.na(pene_df$ACOBenePsnYrs), "ACOBenePsnYrs"] <- 0
pene_df[is.na(pene_df$MA_BenePsnYrs), "MA_BenePsnYrs"] <- 0
count(pene_df, Year) # ~3200 counties in the US, including territories
```
Calculate "Accountable Care Penetration" figures by county and year
```{r}
pene_df <- pene_df %>%
mutate(ACOPenetration = ACOBenePsnYrs / AllMedBenePsnYrs) %>%
mutate(MAPenetration = MA_BenePsnYrs / AllMedBenePsnYrs) %>%
mutate(AccountableCarePenetration = MAPenetration + ACOPenetration) %>%
mutate(LogAllMedBenePsnYrs = log(AllMedBenePsnYrs))
```
There appears to be a positive relationship between accountable care penetration and the size of the county. This may be introduced by our obfuscated data issue described above. In 2019, there are also penetration values above 1, which *should not be possible*, given that MA and ACO enrollees are drawn from the set of total medicare beneficiaries. This issue appears to affect 63 counties, solely in 2019. MA penetration figures are gradually increasing, as expected. The following table illustrates the median and maximum for each continous value.
```{r eval = TRUE, message = FALSE}
select(pene_df, Year, AllMedBenePsnYrs:AccountableCarePenetration) %>% tbl_summary(by=Year, statistic=list(all_continuous() ~ "{median} ({p100})"))
```
```{r eval = TRUE, message = FALSE, warning=F}
plt <- ggplot(filter(pene_df, Year == 2019), mapping=aes(x=AccountableCarePenetration, y=LogAllMedBenePsnYrs)) +
geom_point() + theme_classic()
plt
unique(filter(pene_df, AccountableCarePenetration > 1)$Year) # 63 observations in 2019
```
Calculated MA penetration figures are corroborated by [KFF Report](https://www.kff.org/medicare/issue-brief/medicare-advantage-in-2022-enrollment-update-and-key-trends/#:~:text=In%202022%2C%20nearly%20half%20of,(19%25%20to%2048%25).), consistently 1-2% below. Not perfect, but good enough.
```{r}
pene_df %>%
group_by(Year) %>%
summarize(MA_BenePsnYrs = sum(MA_BenePsnYrs, na.rm = TRUE), AllMedBenePsnYrs = sum(AllMedBenePsnYrs),
MA_Penetration = sum(MA_BenePsnYrs, na.rm=TRUE) / sum(AllMedBenePsnYrs, na.rm=TRUE))
```
```{r}
county_pene <- pene_df
rm(pene_df)
save(county_pene, file=here("data", "interim", "county_vbc_penetration.Rda")) #yum
```
Merge Accountable Care Penetration data and SNF Staffing Data at County Level
```{r}
load(here("data", "interim", "county_vbc_penetration.Rda"))
load(here("data", "interim", "county_level_staffing.Rda"))
county_staffing <- rename(
county_staffing,
Year = collection_yr
)
merged <- merge(
county_pene,
county_staffing,
by=c("FIPS_Code", "Year"),
all.x = TRUE
)
# multiply pene figures by 100 to ease interpretation
merged$AccountableCarePenetration = merged$AccountableCarePenetration * 100
merged$MAPenetration = merged$MAPenetration * 100
merged$ACOPenetration = merged$ACOPenetration * 100
merged$YearFactor = as.factor(merged$Year)
```
Visualizing all years, the 2019 strangeness for Accountable Care Penetration is apparent. There is significantly higher dispersion for this observation. Could this be due to ACO dropout at high rate in 2019? Skewed small ACO's, unaffliated with health system. [Change in contract structure](https://www.cms.gov/newsroom/press-releases/cms-finalizes-pathways-success-overhaul-medicares-national-aco-program)? [Drop in ACO participation](https://www.naacos.com/press-release--medicare-aco-participation-flat-in-2022 )? Why does it revert back to normal in 2020? Whatever the cause, I will test that any results are insensitive to excluding data from 2019.
```{r eval = TRUE, message = FALSE, warning=F}
plt <- ggplot(filter(merged, Year == 2017), mapping=aes(x = ACOPenetration, y = mean_cna_reported_hprd)) + geom_point() + xlim(0,100) + ylim(0,6) + ggtitle(label="Reported Nursing Aid Hours Per Resident Per Day By ACO Penetration (2017)") + ylab("CNA Hours Per Resident Per Day") + theme_classic()
plt
plt <- ggplot(filter(merged, Year == 2018), mapping=aes(x = ACOPenetration, y = mean_cna_reported_hprd)) + geom_point() + xlim(0,100) + ylim(0,6) + ggtitle(label="Reported Nursing Aid Hours Per Resident Per Day By ACO Penetration (2018)") + ylab("CNA Hours Per Resident Per Day") + theme_classic()
plt
plt <- ggplot(filter(merged, Year == 2019), mapping=aes(x = ACOPenetration, y = mean_cna_reported_hprd)) + geom_point() + xlim(0,100) + ylim(0,6) + ggtitle(label="Reported Nursing Aid Hours Per Resident Per Day By ACO Penetration (2019)") + ylab("CNA Hours Per Resident Per Day") + theme_classic()
plt
plt <- ggplot(filter(merged, Year == 2020), mapping=aes(x = ACOPenetration, y = mean_cna_reported_hprd)) + geom_point() + xlim(0,100) + ylim(0,6)+ ggtitle(label="Reported Nursing Aid Hours Per Resident Per Day By ACO Penetration (2020)") + ylab("CNA Hours Per Resident Per Day") + theme_classic()
plt
plt <- ggplot(filter(merged, Year == 2021), mapping=aes(x = ACOPenetration, y = mean_cna_reported_hprd)) + geom_point() + xlim(0,100) + ylim(0,6) + ggtitle(label="Reported Nursing Aid Hours Per Resident Per Day By ACO Penetration (2021)") + ylab("CNA Hours Per Resident Per Day") + theme_classic()
plt
```
In scatterplots below, MA penetration has no apparent linear relationship with nursing aid staffing ratios. If there is a relationship, it must be small.
```{r eval = TRUE, message = FALSE, warning=F}
plt <- ggplot(filter(merged, Year == 2017), mapping=aes(x = MAPenetration, y = mean_cna_reported_hprd)) + geom_point() + xlim(0,100) + ylim(0,6) + ggtitle(label="Reported Nursing Aid Hours Per Resident Per Day By MA Penetration (2017)") + ylab("CNA Hours Per Resident Per Day") + theme_classic()
plt
plt <- ggplot(filter(merged, Year == 2018), mapping=aes(x = MAPenetration, y = mean_cna_reported_hprd)) + geom_point() + xlim(0,100) + ylim(0,6) + ggtitle(label="Reported Nursing Aid Hours Per Resident Per Day By MA Penetration (2018)") + ylab("CNA Hours Per Resident Per Day") + theme_classic()
plt
plt <- ggplot(filter(merged, Year == 2019), mapping=aes(x = MAPenetration, y = mean_cna_reported_hprd)) + geom_point() + xlim(0,100) + ylim(0,6) + ggtitle(label="Reported Nursing Aid Hours Per Resident Per Day By MA Penetration (2019)") + ylab("CNA Hours Per Resident Per Day") + theme_classic()
plt
plt <- ggplot(filter(merged, Year == 2020), mapping=aes(x = MAPenetration, y = mean_cna_reported_hprd)) + geom_point() + xlim(0,100) + ylim(0,6)+ ggtitle(label="Reported Nursing Aid Hours Per Resident Per Day By MA Penetration (2020)") + ylab("CNA Hours Per Resident Per Day") + theme_classic()
plt
plt <- ggplot(filter(merged, Year == 2021), mapping=aes(x = MAPenetration, y = mean_cna_reported_hprd)) + geom_point() + xlim(0,100) + ylim(0,6) + ggtitle(label="Reported Nursing Aid Hours Per Resident Per Day By MA Penetration (2021)") + ylab("CNA Hours Per Resident Per Day") + theme_classic()
plt
```
Reviewing missingness of the variables of interest, it appears county level nursing staff ratios are the main issue. Distributed similarly from 2017-2021.
```{r eval = TRUE, message = FALSE, warning=F}
merged %>% count(is.na(mean_total_reported_hprd), by=Year)
#merged %>% count(is.na(MAPenetration), by=Year) # cannot be null, populated missing w/ 0
#merged %>% count(is.na(ACOPenetration), by=Year) # cannot be null, populated missing w/ 0
```
Missingness is related to county size. On closer inspection, it's driven by:
1. Low population counties with a single SNF, which failed to produce valid staffing data
2. Small counties without any SNF's present.
In both cases, I'm comfortable dropping the observations from regression analysis.
```{r eval = TRUE, message = FALSE, warning=F}
merged <- merged %>% mutate(staffing_is_null = is.na(mean_total_reported_hprd))
merged %>%
group_by(staffing_is_null) %>%
summarize(median_bene = median(OrigMedBenePsnYrs, na.rm=TRUE), avg_row_cnt=mean(row_cnt, na.rm=TRUE))
# View(filter(merged, staffing_is_null == TRUE))
```
## Results
Without controls (case mix, state-year fixed effects), the penetration coefficients are negative, and very slight. Note that the R-squared value is tiny however, indicating that minimal variation is explained.
```{r eval = TRUE, message = FALSE, warning=F}
model <- lm(mean_total_reported_hprd ~ MAPenetration + ACOPenetration, merged)
print(summary(model))
```
When controls are included, there is *no strong relationship between staffing ratios and accountable care penetration*, as my expert advisers graciously informed me ;) The estimated reduction in staffing ratios is minimal regardless of which specific role is considered (CNA, RN, all nursing staff).
*Original Relationship of interest*
An additional pct point of MA Penetration is associated with a decrease of ~10 seconds per resident per day. Even assuming +10 pct point MA penetration increase, this would imply a 1 min 40 second decrease in total nurse staffing, per resident per day. The estimated association of ACO penetration is even smaller. *These results are insensitive to excluding data from 2019.*
```{r eval = TRUE, message = FALSE, warning=F}
options(max.print=30) # omit distracting fixed effects
model <- lm(mean_total_reported_hprd ~ MAPenetration + ACOPenetration + StateAbbrev.x:YearFactor + mean_total_cm_hprd, merged)
print(summary(model))
print(-2.785e-03 * 60 * 60) # calculate reduction in seconds
#model <- lm(mean_cna_reported_hprd ~ MAPenetration + ACOPenetration + StateAbbrev.x:YearFactor + mean_cna_cm_hprd, merged)
#print(summary(model))
#model <- lm(mean_cna_reported_hprd ~ MAPenetration + ACOPenetration + StateAbbrev.x:YearFactor + mean_cna_cm_hprd, #filter(merged, YearFactor != 2019))
#print(summary(model))
#model <- lm(mean_rn_reported_hprd ~ MAPenetration + ACOPenetration + StateAbbrev.x:YearFactor + mean_rn_cm_hprd, merged)
#print(summary(model))
# county level fixed effects - did not produce any useful output due to singularities
#model <- lm(mean_cna_reported_hprd ~ MAPenetration + ACOPenetration + FIPS_Code:YearFactor + mean_cna_cm_hprd, merged)
#print(summary(model))
```
In addition, I omitted important controls which are unavailable in public datasets. Notably, the [NBER paper on private equity acquisitions in nursing homes](https://www.nber.org/papers/w28474) includes controls for the proportion of residents with Medicaid coverage, as well as metrics to represent market dynamics, including SNF concentration within a given hospital referral region. Given that MA plans tend to attract Medicare beneficiaries seeking lower premiums, I would expect Medicaid coverage in particular to confound the relationship between MA penetration and staffing ratios.
## Conclusion
There are several counterbalancing forces that may contribute to small magnitude:
* The water balloon effect. SNF's receiving pressure in Part A may respond by increasing services in Part B.
* Some SNF's receive financial supports via integration in larger system. For instance, a SNF onsite at a continuing care retirement community, or owned by a hospital may be better equipped in negotiation with MA health plans.
* SNF contracting with MA health plans may be supportive in some respects. Depending on contract parameters, it may bring revenue predictability that a fee for service arrangement does not.
* ACO's exert very little direct influence over SNF operations. They simply do not have many levers to pull in order to affect behavior. This outcome contradicts portrayals of SNF's as the "ACO ATM".
* Accountable care penetration might increase concentration of referrals. This could adversely affect some SNF's, but support others, resulting in a neutral aggregate impact.