Versionista removed all version content from the HTML in the initial page load and started populating the list of versions of a page via a dynamic, JavaScript-based request asynchronously after the initial page load. The HTML that now came over the wire looked the same to our code as a page that listed no captured versions (rare, but can happen if you have no protected versions of a page and your account runs over with too many captures). We logged a warning, but failed to see that warning as an error case and so did not fire alerts on Sentry.
The problem persisted for a week before it was caught.
On the up-side, we now have a structured API (even if we’re not sure how reliable it is) for getting lists of versions in a page. (There is no corresponding API to get lists of pages or lists of sites.)
NOTE: the PR to fix this (scraper#196) is unmerged and still awaiting review, but is running in production.
All times in PST.
@jjudish posts an issue on the versionista-scraper project describing unusual error messages they haven’t seen before. (“No versions found for {URL}”)
@Mr0grog responds with some possibilities and thoughts for debugging from his phone.
@Mr0grog gets home and checks logs to see if we are having similar issues — we are. The issues started on Feb 12th at 3pm PST (2019-02-12T23:00:00Z
). Checking the page in browsers demonstrates the issue immediately: the data is no longer being delivered in the HTML body of the page.
@Mr0grog creates hotfix code to:
- Throw exceptions in this situation instead of treating it the same as a page with no versions.
- Throw exceptions in a similar situation for lists of pages in a site and list of sites in an account, since it seem only natural that they will change in the same way soon.
- Use the same API call the content of the page is making to extract the data.
He switches production to the branch with the hotfix after testing with a smaller timeframe locally and tests with a larger timeframe matching the first failing request from a week ago.
@Mr0grog creates PR from the hotfix code after it seems to be working successfully and asks for @jjudish’s review. He then starts incrementally backfilling data from the versionista1
account.
The backfilling process finishes for versionista1
. @Mr0grog starts a much larger single backfill process (instead of incremental) for versionista2
.
The backfilling process throws a few errors and @Mr0grog adjusts the PR to account for them.
The backfilling process for versionista2
appears to be going smoothly and @Mr0grog and goes to sleep. (Sites in the versionista2
account tend to change with greater frequency, so the same operation usually takes 1.5x - 2x the time for versionista2
, so @Mr0grog expected this to go for a while.)
Backfilling process for versionista2
finishes.
- The issue filed by @jjudish was super useful.
- Logs provided URLs and obvious spots to investigate.
- This didn’t register as an actual error and we had no alerts about it for a week.
- Other people saw the issue before @Mr0grog but did not respond or investigate or flag it for someone who could.
- Review and merge PR for the fix. (scraper#196)
- There was an old discussion about having a process that looks for new records from Versionista in our DB every so often as a general “is it working and not failing silently?” measure. We should discuss whether we should revive that.
- @Mr0grog