Duplicate Files to Download? #677
Replies: 4 comments
-
Most plans (MRFs) are cookie cutter, so they'll have the same negotiated rate data. However what I call the header section may vary. You have two choices, both ugly: What's happening is that normalized data (at the payer's DB) is denormalized into individual MRF files. However, if the goal is analytics, the data aggregator must figure out how to normalize the now de-normalized data, which is a messy and expensive operation. CMS should have come up with a more workable solution. Congress doesn't appear to be enthralled with our progress in price transparency. On Monday the House subcommittee on Health began a mark-up session on a bill with specific delivery dates for CMS to deliver progress reports and analysis on transparency: [H.R. 3281] The Transparent PRICE Act (Rep. Cathy McMorris Rodgers) |
Beta Was this translation helpful? Give feedback.
-
I pulled some counts for these files (limited to the Houston market) but you can see which ones have the same counts vs different. The short answer is that many of them look the same but not all. Note some of these are from Q4 2022 and some from Q1 2023. You might be able to use this to choose the reference files to process. |
Beta Was this translation helpful? Give feedback.
-
Duplication is a really big problem in this data, and figuring out what is in all those files is a lot of work. There is a lot of duplication within the files themselves, but there are also a lot of files that are duplicated in their entirety. What you want to do is look at the ETag header in the http responses to identify duplicated files. TicToc Health https://tictoc.health/ has already done that for in-network rate files, but not for allowed amounts files. Here is how you could answer that question quickly for in-network rate files: You could download the May 4, 2023 list of all files for all payers (116MB compressed), then gzcat the gzipped file into this jq command to search for the term of choice and pull out the associated URLs and ETags, after which you could find the duplicates. gzcat tictoc.may-23.gz | jq '.[] | .in_network_rate_urls | .[] | select(.url|test(".*CHOICE-PLUS"))| [.url,.ETag] | @csv' > choice_plus.csv It will give you a list of things that look like this: ""https://uhc-tic-mrf.azureedge.net/public-mrf/2023-05-01/2023-05-01_UMR--Inc-_Third-Party-Administrator_ARCHDIOCESE-OF-DUBUQUE_UNITEDHEALTHCARE-CHOICE-PLUS_-M_0L_in-network Which Excel would make quick work of to find and sort duplicates. For reference, the data in the file looks like: |
Beta Was this translation helpful? Give feedback.
-
@rgiljohann There is a requirement to create an Allowed Amounts file even if it contains "no data" (in terms of items or services that didn't mean the minimum claims threshold needed for reporting purposes). There is an example of what that file could look like here. Without checking, I suspect many of these files are just that due to the "allowed amounts" in the file's name. The reason for requiring an allowed amount file even if empty is to confirm with the public that in fact there were no allowed amounts vs the skepticism that might arise with a missing file. @dgolden It appears that you're talking about in-network files (where "negotiated rate data" comes into the picture) and from the screenshot included, that doesn't seem to be the case in this scenario. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I am looking at the UHC files. I grabbed a full list of files and locations and threw them into Excel along with their file size.
I filtered to Choice-Plus in the file name, and found that over 12,000 of 71,000 files have the same exact file size (all Choice-Plus plans).
Are these duplicate files? If so, Does anyone know how to tell if files are duplicates or not? My goal is to only have to download/store one of those files, and point to that one file for the rest of the file names. I never saw a TOC file anywhere for UHC.
Beta Was this translation helpful? Give feedback.
All reactions