Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Youtube video fails to play #147

Open
edsu opened this issue Oct 3, 2022 · 11 comments
Open

Youtube video fails to play #147

edsu opened this issue Oct 3, 2022 · 11 comments
Assignees

Comments

@edsu
Copy link
Contributor

edsu commented Oct 3, 2022

Reported by @peterchanws on 2022-09-29:

There was interruption when uploading the wacz file in https://argo.stanford.edu/view/druid:sv558gk6917 However, the upload was marked completed. However, the seed was not render properly (not able to play video) in swap. I uploaded the file (same file name) again https://argo.stanford.edu/view/druid:yx015cx1366. Problem still there. I uploaded the file (different file name) again. The problem is not solved. https://swap.stanford.edu/was/*?url=https%3A%2F%2Fnodontdie.com%2F. I upload the same file to stage https://argo-stage.stanford.edu/view/druid:db175fk4446 and it works fine https://swap-stage.stanford.edu/was/20220921195257/https://nodontdie.com/

The failure to play back can be seen when trying to play this video, which works in stage but not in production:

@andrewjbtw confirmed that the WACZ files for sv558gk6917 and yx015cx1366 in production and stage have the same fixity value. I also checked that the WARC files and CDX entries are identical.

However @peterchanaws also reported that another version was uploaded to SDR as well, which did not finish uploading:

This does appear to be 331 MB instead of 12.3 GB. It is possible that this truncated WACZ is somehow corrupting playback?

The CDX files as a whole are much larger as a whole in production, than in stage. There is also a previous Archive-It crawl from 2020 in Production, which is evident when comparing the index results:

@edsu
Copy link
Contributor Author

edsu commented Oct 4, 2022

Watching the HTTP requests in Production for:

https://swap.stanford.edu/was/20220921195257/https://nodontdie.com/louie-roots

You can see that this request generates a HTTP 500 response after spinning for 2 minutes:

https://swap.stanford.edu/was/20220921195743oe_/https://rr3---sn-o097znsl.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&mh=gi&mm=31%2C29&mn=sn-o097znsl%2Csn-n4v7snls&ms=au%2Crdu&mv=m&mvi=3&pl=20&initcwndbps=227500&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&mt=1663789767&fvip=4&fexp=24001373%2C24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIgHI4spdAtiH_OC3nQts77b8KCHteX4wt3SADv23oi3NkCIQDbg4ga_E4nug6E_xhKh-bhX1K7BRlCjIhlIoxIwrr4EQ%3D%3D&cpn=8lLBIK9wGaHrC5jh&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc

whereas it works fine in Stage:

https://was-pywb-stage.stanford.edu/was/20220921195743oe_/https://rr4---sn-a5mekn6s.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&fexp=24001373,24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&cpn=BIK9wGaHrC5jhvYi&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc&redirect_counter=1&cm2rm=sn-o09sd7s&req_id=7066179c6559a3ee&cms_redirect=yes&cmsv=e&mh=gi&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&pl=20&lsparams=mh,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRAIgf5-RGF2xd0lozXCgZwxGOxqhQZmGWxa-vDYtr57H5lcCIEAvoKgnaIe3ZDafQHO7fbimPhUTNTsRiqa9KdGBDemA

It would be good to see what the actual error but it doesn't appear to be logged to the Apache error log.

@edsu edsu self-assigned this Oct 4, 2022
@edsu
Copy link
Contributor Author

edsu commented Oct 4, 2022

I saw a bunch of these (104) in the /var/log/syslog when the request failed:

Oct  3 18:12:36 was-pywb-prod poetry[993635]: Mon Oct  3 18:12:36 2022 - *** uWSGI listen queue of socket ":8081" (fd: 5) full !!! (101/100) ***

@edsu
Copy link
Contributor Author

edsu commented Oct 6, 2022

I'm wondering if the duplicate entries from having ingested both

If you look at the underlying CDX request for this you see this response:

{"urlkey": "com,googlevideo,rr4---sn-a5mekn6s)/videoplayback?c=web_embedded_player&cm2rm=sn-o09sd7s&cms_redirect=yes&cmsv=e&cnr=14&cpn=bik9wgahrc5jhvyi&cver=1.20220918.00.00&dur=5588.230&ei=umwry6pgdcon2lypre2i2am&expire=1663811866&fexp=24001373,24007246&id=o-adz2f4-l-smqnl8l31k7a7-cg3h0wt_ga8aikxgcxg2t&ip=137.184.177.15&itag=18&lmt=1643054906292951&lsig=ag3c_xawraigf5-rgf2xd0lozxcgzwxgoxqhqzmgwxa-vdytr57h5lccieavokgnaie3zdafqho7fbimphutntsriqa9kdgbdema&lsparams=mh,mm,mn,ms,mv,mvi,pl&mh=gi&mime=video/mp4&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&n=1kzdgoh5orw1ag&ns=hgbb8j72m15xhohixiz33qsi&pl=20&pltype=contentugc&ptk=youtube_none&ratebypass=yes&redirect_counter=1&req_id=7066179c6559a3ee&requiressl=yes&sig=aoq0qj8wrqigi72ue5jztjgbj2jadhg0fmfpndh_ec1gpi7il-a95riciqd4jentwa5oqsixjisc_ijd1dscp3yxazfac6xaxhsw7w==&source=youtube&sparams=expire,ei,ip,id,itag,source,requiressl,spc,vprv,mime,ns,cnr,ratebypass,dur,lmt&spc=yr2vp5oyfrtwxtxwdiqfteeej98xy1c&txp=6218224&vprv=1", "timestamp": "20220921195748", "url": "https://rr4---sn-a5mekn6s.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&fexp=24001373,24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&cpn=BIK9wGaHrC5jhvYi&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc&redirect_counter=1&cm2rm=sn-o09sd7s&req_id=7066179c6559a3ee&cms_redirect=yes&cmsv=e&mh=gi&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&pl=20&lsparams=mh,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRAIgf5-RGF2xd0lozXCgZwxGOxqhQZmGWxa-vDYtr57H5lcCIEAvoKgnaIe3ZDafQHO7fbimPhUTNTsRiqa9KdGBDemA", "mime": "video/mp4", "status": "200", "digest": "sha1:XWXMGAQPGUJI6FHCAWLI6ABQQ5F5Y6OB", "length": "161861539", "offset": "383819330", "filename": "qg137zm7702/sv/558/gk/6917/don-t-die-peter-chan-rec-20220921195258533131-crawl-manual-20220921195228-385b66de-86c-0.warc.gz", "source": "was:level2.cdxj", "source-coll": "was", "access": "allow"}
{"urlkey": "com,googlevideo,rr4---sn-a5mekn6s)/videoplayback?c=web_embedded_player&cm2rm=sn-o09sd7s&cms_redirect=yes&cmsv=e&cnr=14&cpn=bik9wgahrc5jhvyi&cver=1.20220918.00.00&dur=5588.230&ei=umwry6pgdcon2lypre2i2am&expire=1663811866&fexp=24001373,24007246&id=o-adz2f4-l-smqnl8l31k7a7-cg3h0wt_ga8aikxgcxg2t&ip=137.184.177.15&itag=18&lmt=1643054906292951&lsig=ag3c_xawraigf5-rgf2xd0lozxcgzwxgoxqhqzmgwxa-vdytr57h5lccieavokgnaie3zdafqho7fbimphutntsriqa9kdgbdema&lsparams=mh,mm,mn,ms,mv,mvi,pl&mh=gi&mime=video/mp4&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&n=1kzdgoh5orw1ag&ns=hgbb8j72m15xhohixiz33qsi&pl=20&pltype=contentugc&ptk=youtube_none&ratebypass=yes&redirect_counter=1&req_id=7066179c6559a3ee&requiressl=yes&sig=aoq0qj8wrqigi72ue5jztjgbj2jadhg0fmfpndh_ec1gpi7il-a95riciqd4jentwa5oqsixjisc_ijd1dscp3yxazfac6xaxhsw7w==&source=youtube&sparams=expire,ei,ip,id,itag,source,requiressl,spc,vprv,mime,ns,cnr,ratebypass,dur,lmt&spc=yr2vp5oyfrtwxtxwdiqfteeej98xy1c&txp=6218224&vprv=1", "timestamp": "20220921195748", "url": "https://rr4---sn-a5mekn6s.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&fexp=24001373,24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&cpn=BIK9wGaHrC5jhvYi&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc&redirect_counter=1&cm2rm=sn-o09sd7s&req_id=7066179c6559a3ee&cms_redirect=yes&cmsv=e&mh=gi&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&pl=20&lsparams=mh,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRAIgf5-RGF2xd0lozXCgZwxGOxqhQZmGWxa-vDYtr57H5lcCIEAvoKgnaIe3ZDafQHO7fbimPhUTNTsRiqa9KdGBDemA", "mime": "video/mp4", "status": "200", "digest": "sha1:XWXMGAQPGUJI6FHCAWLI6ABQQ5F5Y6OB", "length": "161861539", "offset": "383819330", "filename": "qg137zm7702/yh/223/zs/4674/don-t-die-peter-chan-20220921202603400-385b66de-86c-0-rec-20220921195258533131-crawl-manual-20220921195228-385b66de-86c-0.warc.gz", "source": "was:level2.cdxj", "source-coll": "was", "access": "allow"}
{"urlkey": "com,googlevideo,rr4---sn-a5mekn6s)/videoplayback?c=web_embedded_player&cm2rm=sn-o09sd7s&cms_redirect=yes&cmsv=e&cnr=14&cpn=bik9wgahrc5jhvyi&cver=1.20220918.00.00&dur=5588.230&ei=umwry6pgdcon2lypre2i2am&expire=1663811866&fexp=24001373,24007246&id=o-adz2f4-l-smqnl8l31k7a7-cg3h0wt_ga8aikxgcxg2t&ip=137.184.177.15&itag=18&lmt=1643054906292951&lsig=ag3c_xawraigf5-rgf2xd0lozxcgzwxgoxqhqzmgwxa-vdytr57h5lccieavokgnaie3zdafqho7fbimphutntsriqa9kdgbdema&lsparams=mh,mm,mn,ms,mv,mvi,pl&mh=gi&mime=video/mp4&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&n=1kzdgoh5orw1ag&ns=hgbb8j72m15xhohixiz33qsi&pl=20&pltype=contentugc&ptk=youtube_none&ratebypass=yes&redirect_counter=1&req_id=7066179c6559a3ee&requiressl=yes&sig=aoq0qj8wrqigi72ue5jztjgbj2jadhg0fmfpndh_ec1gpi7il-a95riciqd4jentwa5oqsixjisc_ijd1dscp3yxazfac6xaxhsw7w==&source=youtube&sparams=expire,ei,ip,id,itag,source,requiressl,spc,vprv,mime,ns,cnr,ratebypass,dur,lmt&spc=yr2vp5oyfrtwxtxwdiqfteeej98xy1c&txp=6218224&vprv=1", "timestamp": "20220921195748", "url": "https://rr4---sn-a5mekn6s.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&fexp=24001373,24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&cpn=BIK9wGaHrC5jhvYi&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc&redirect_counter=1&cm2rm=sn-o09sd7s&req_id=7066179c6559a3ee&cms_redirect=yes&cmsv=e&mh=gi&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&pl=20&lsparams=mh,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRAIgf5-RGF2xd0lozXCgZwxGOxqhQZmGWxa-vDYtr57H5lcCIEAvoKgnaIe3ZDafQHO7fbimPhUTNTsRiqa9KdGBDemA", "mime": "video/mp4", "status": "200", "digest": "sha1:XWXMGAQPGUJI6FHCAWLI6ABQQ5F5Y6OB", "length": "161861539", "offset": "383819330", "filename": "qg137zm7702/yx/015/cx/1366/don-t-die-peter-chan-20220921202603400-385b66de-86c-0-rec-20220921195258533131-crawl-manual-20220921195228-385b66de-86c-0.warc.gz", "source": "was:level2.cdxj", "source-coll": "was", "access": "allow"}

The last two lines have identical properties except for filename. I wonder if this could be causing problems for fuzzy matching?

@peterchanws
Copy link
Collaborator

Should we try to remove the duplicated entries?

@edsu
Copy link
Contributor Author

edsu commented Oct 6, 2022

Yes, I'm going to try to copy these WARCs out of the production environment and see if I can replicate the problem in my development environment. If that works then I can experiment with removing the duplicates to see if that helps? I get nervous about experimenting with the CDX files in production!

@edsu
Copy link
Contributor Author

edsu commented Oct 7, 2022

I've been able to replicate the problem in my development environment by downloading the WARCs for sv558gk6917, yh223zs4674 and yx015cx1366 and indexing them with cdxj-indexer. At this point playback works fine, even with the duplicate WARC content.

But then I copied the CDX entries from our production level3.cdxj that matched ^com,googlevideo, and added them to my development index as well (8,323,879 index entries). After this the request above that fails in production but not in stage results in a 7 minute wait for a response.

http://localhost:8080/test/20220921195743oe_/https://rr4---sn-a5mekn6s.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&fexp=24001373,24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&cpn=BIK9wGaHrC5jhvYi&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc&redirect_counter=1&cm2rm=sn-o09sd7s&req_id=7066179c6559a3ee&cms_redirect=yes&cmsv=e&mh=gi&mm=34&mn=sn-a5mekn6s&ms=ltu&mt=1663790218&mv=m&mvi=4&pl=20&lsparams=mh,mm,mn,ms,mv,mvi,pl&lsig=AG3C_xAwRAIgf5-RGF2xd0lozXCgZwxGOxqhQZmGWxa-vDYtr57H5lcCIEAvoKgnaIe3ZDafQHO7fbimPhUTNTsRiqa9KdGBDemA

In our production environment uwsgi is giving up on this request because it is taking too long, and realistically our users won't wait 7 minutes for a response anyway.

Here is what I see in the pywb log when running it manually with the wayback command:

127.0.0.1 - - [2022-10-07 03:11:05] "POST /test/resource/postreq?url=https%3A%2F%2Frr3---sn-o097znsl.googlevideo.com%2Fvideoplayback%3Fexpire%3D1663811866%26ei%3DumwrY6PgDcON2LYPre2I2AM%26ip%3D137.184.177.15%26id%3Do-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t%26itag%3D18%26source%3Dyoutube%26requiressl%3Dyes%26mh%3Dgi%26mm%3D31%252C29%26mn%3Dsn-o097znsl%252Csn-n4v7snls%26ms%3Dau%252Crdu%26mv%3Dm%26mvi%3D3%26pl%3D20%26initcwndbps%3D227500%26spc%3DyR2vp5OyfrTwxTXwDIqftEEeJ98XY1c%26vprv%3D1%26mime%3Dvideo%252Fmp4%26ns%3DHGBB8J72m15xHOhIxIZ33QsI%26cnr%3D14%26ratebypass%3Dyes%26dur%3D5588.230%26lmt%3D1643054906292951%26mt%3D1663789767%26fvip%3D4%26fexp%3D24001373%252C24007246%26c%3DWEB_EMBEDDED_PLAYER%26txp%3D6218224%26n%3D1kZdGOH5ORW1ag%26sparams%3Dexpire%252Cei%252Cip%252Cid%252Citag%252Csource%252Crequiressl%252Cspc%252Cvprv%252Cmime%252Cns%252Ccnr%252Cratebypass%252Cdur%252Clmt%26sig%3DAOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%253D%253D%26lsparams%3Dmh%252Cmm%252Cmn%252Cms%252Cmv%252Cmvi%252Cpl%252Cinitcwndbps%26lsig%3DAG3C_xAwRQIgHI4spdAtiH_OC3nQts77b8KCHteX4wt3SADv23oi3NkCIQDbg4ga_E4nug6E_xhKh-bhX1K7BRlCjIhlIoxIwrr4EQ%253D%253D%26cpn%3D8lLBIK9wGaHrC5jh%26cver%3D1.20220918.00.00%26ptk%3Dyoutube_none%26pltype%3Dcontentugc&closest=20220921195743&matchType=exact HTTP/1.1" 200 7501 420.605267
127.0.0.1 - - [2022-10-07 03:11:05] "GET /test/20220921195743oe_/https://rr3---sn-o097znsl.googlevideo.com/videoplayback?expire=1663811866&ei=umwrY6PgDcON2LYPre2I2AM&ip=137.184.177.15&id=o-ADz2f4-l-SMQNL8L31k7a7-cG3h0WT_gA8AIKxgcXG2t&itag=18&source=youtube&requiressl=yes&mh=gi&mm=31%2C29&mn=sn-o097znsl%2Csn-n4v7snls&ms=au%2Crdu&mv=m&mvi=3&pl=20&initcwndbps=227500&spc=yR2vp5OyfrTwxTXwDIqftEEeJ98XY1c&vprv=1&mime=video%2Fmp4&ns=HGBB8J72m15xHOhIxIZ33QsI&cnr=14&ratebypass=yes&dur=5588.230&lmt=1643054906292951&mt=1663789767&fvip=4&fexp=24001373%2C24007246&c=WEB_EMBEDDED_PLAYER&txp=6218224&n=1kZdGOH5ORW1ag&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cspc%2Cvprv%2Cmime%2Cns%2Ccnr%2Cratebypass%2Cdur%2Clmt&sig=AOq0QJ8wRQIgI72UE5jZTJgBj2jadHg0fMFpnDh_EC1gpi7IL-a95RICIQD4jEnTWA5oqSIxjIsc_IJd1dsCP3yxAzfAC6xaxHSW7w%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIgHI4spdAtiH_OC3nQts77b8KCHteX4wt3SADv23oi3NkCIQDbg4ga_E4nug6E_xhKh-bhX1K7BRlCjIhlIoxIwrr4EQ%3D%3D&cpn=8lLBIK9wGaHrC5jh&cver=1.20220918.00.00&ptk=youtube_none&pltype=contentugc HTTP/1.1" 302 7061 420.617071

I'm not sure why the one requests shows up as two (a POST and a GET) in the log. I wonder if there is something we can be doing here to speed things up. I'm kind of confused why if the CDX lookup is quick why replay is not.

@peterchanws
Copy link
Collaborator

What happens if you remove duplicates in the development environment? Would it work normal again?

@edsu
Copy link
Contributor Author

edsu commented Oct 7, 2022

No, the duplicates are actually not negatively impacting the replay. The problem is the amount of time pywb is spending looking through the googlevideo.com CDX entries. The good news that now that we can replicate it outside of the production environment it'll be easier to investigate what is going on, and maybe see if there is an optimization to be made.

@edsu
Copy link
Contributor Author

edsu commented Feb 14, 2023

A possibly related pywb issue: webrecorder/pywb#573

@peterchanws
Copy link
Collaborator

@edsu
Copy link
Contributor Author

edsu commented Feb 23, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants