-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Youtube video fails to play #147
Comments
Watching the HTTP requests in Production for: https://swap.stanford.edu/was/20220921195257/https://nodontdie.com/louie-roots You can see that this request generates a HTTP 500 response after spinning for 2 minutes: whereas it works fine in Stage: It would be good to see what the actual error but it doesn't appear to be logged to the Apache error log. |
I saw a bunch of these (104) in the
|
I'm wondering if the duplicate entries from having ingested both If you look at the underlying CDX request for this you see this response:
The last two lines have identical properties except for |
Should we try to remove the duplicated entries? |
Yes, I'm going to try to copy these WARCs out of the production environment and see if I can replicate the problem in my development environment. If that works then I can experiment with removing the duplicates to see if that helps? I get nervous about experimenting with the CDX files in production! |
I've been able to replicate the problem in my development environment by downloading the WARCs for But then I copied the CDX entries from our production In our production environment uwsgi is giving up on this request because it is taking too long, and realistically our users won't wait 7 minutes for a response anyway. Here is what I see in the pywb log when running it manually with the
I'm not sure why the one requests shows up as two (a POST and a GET) in the log. I wonder if there is something we can be doing here to speed things up. I'm kind of confused why if the CDX lookup is quick why replay is not. |
What happens if you remove duplicates in the development environment? Would it work normal again? |
No, the duplicates are actually not negatively impacting the replay. The problem is the amount of time pywb is spending looking through the |
A possibly related pywb issue: webrecorder/pywb#573 |
Another case on same issue: |
Reported by @peterchanws on 2022-09-29:
The failure to play back can be seen when trying to play this video, which works in stage but not in production:
@andrewjbtw confirmed that the WACZ files for sv558gk6917 and yx015cx1366 in production and stage have the same fixity value. I also checked that the WARC files and CDX entries are identical.
However @peterchanaws also reported that another version was uploaded to SDR as well, which did not finish uploading:
This does appear to be 331 MB instead of 12.3 GB. It is possible that this truncated WACZ is somehow corrupting playback?
The CDX files as a whole are much larger as a whole in production, than in stage. There is also a previous Archive-It crawl from 2020 in Production, which is evident when comparing the index results:
The text was updated successfully, but these errors were encountered: