-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yt-dlp implementation in /play #115
base: master
Are you sure you want to change the base?
Conversation
Probably "https://www.youtube.com/" should be in the whitelist in url.txt. |
I completely forgot to append it to the url.txt file, I will make sure to do so in the morning. If anyone is willing to edit the README to include ffmpeg instructions to explain how to handle that, that would be ideal. |
server/client_manager.py
Outdated
@@ -431,6 +487,25 @@ def change_music(self, song, cid, showname="", effects=0, loop=True): | |||
"This URL is not allowed." | |||
) | |||
return | |||
|
|||
yt_pattern = re.compile(r"^((?:https?:)?\/\/)?((?:www|m)\.)?((?:youtube\.com|youtu.be))(\/(?:[\w\-]+\?v=|embed\/|v\/)?)([\w\-]+)(\S+)?$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are more standard ways to parse URLs than regex, like urllib.parse. if we're lucky it may even fix the recursion error below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recursion error is not caused by this line but rather yt-dlp's own URL parsing if memory serves. Still, I will fix this soon!
server/client_manager.py
Outdated
try: | ||
os.remove(yt_song_path) | ||
except PermissionError: | ||
pass #FIXME: unlucky |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least, log something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the best, I left this in while testing since I could not reproduce this bug on Linux, and was only happening on Windows (likely due to NOT closing the file stream. It should never occur now, but I will set up a virtual machine for further testing).
@@ -522,6 +546,60 @@ def send_discord_chat(self, name, message, hub_id=0, area_id=0): | |||
pos=self.config["bridgebot"]["pos"], | |||
) | |||
|
|||
def get_ffmpeg(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the yt_dlp package really require a local installation of ffmpeg? This practice of downloading dependencies at runtime seems unfortunate either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It requires it so that it can extract the audio stream from the video downloaded.
I am not proud of this function myself, but it felt like the ideal solution. Perhaps having the user acquire the binaries themselves would be better than doing this nonsense given the ffmpeg_location key, though I myself am not sure which is the better option of the two.
Very cool concept, added some comments |
Also the match-case keywords were introduced in python 3.10, so we need to bump version in CI to make the linter happy |
OH, so that's what it was. It freaked me out the first time honestly lmao. |
sys.setrecursionlimit(1200) #FIXME: figure out what's causing a RecursionError. Python's regex module should not recurse so far that it causes this to happen. | ||
info = "" | ||
yt_id = yt_pattern.match(song).group(5) | ||
yt_id = parse_qs(yt_parse.query)['v'][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the URL is malformed (eg. does not have a v query string), this will throw a KeyError. How is that exception handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I neglected to consider that, my bad.
Perhaps I should also account for different URL formats (such as v/ and embed/).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably "good enough" to just accept v= for now, just if the link is not how we expect it to be, we should let the client know (eg. unable to play. URL is malformed). iirc if you raise a ClientError, the server will send the exception message to the client. need to check if that's correct though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClientError posts in OOC chat, yeah. I will implement it ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am mistaken, frustratingly all of yt-dlp's errors seem to raise DownloadError
which makes any error handling a nightmare.
e.g.:
yt_dlp.utils.PostProcessingError: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location
During handling of the above exception, another exception occurred:
...
yt_dlp.utils.DownloadError: ERROR: Postprocessing: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location
This seems to be as every function in YoutubeDL.py calls report_error on exception which raises DownloadError later on in the trouble function.
There is an open question whether this constitutes permitted use of litterbox.catbox.moe or not. Let's not merge before we have a clear answer here. I'll come back to it. |
Wouldn't it be better to push the media to the servers WebAO repo? It would avoid all legal question and ensures operation for as long as the server itself is operational. |
I was myself wondering if it's a good idea of using catbox as a CDN at all or not as it might be an abuse of the service, but for the purposes of testing it's done its job. It is also the reason why I've implemented the ability to change how and where the files are passed, though;
I'm unaware of how this works as I've not looked into it. Could you point me to the right resources for that service? |
If the Webserver and the AO2 Server run on the same machine, which is the case in 99.9% of servers, the software can just store the saved file in the directory the Webserver saves data from. |
Ah, I think I see what you mean now. This would still be an issue for server hosts who only want to use KFO-Server without anything else, however? At that rate, it may be simpler to have the server itself open a simple webserver to serve the files off of as this would simply require the host to open an additional port on their router. Unless I am mistaken, but the only webAO repo I have found is AttorneyOnline/webAO and requiring the host to set this up for the purpose of serving music feels redundant and needlessly complicated. |
That repo is for the webAO website - that's not what hosts would need to set up. A webAO asset server (which is what is being referred to) is simply a file server, which can be set up with, say, nginx or Apache. The AO server then sends the URL it during the handshake (iirc?) which is what webAO clients use for assets. I think it's not that big of an ask to set this up, at least for 'big' hosts who are hosting their AO servers 24/7 (and most of them already have this), since it's necessary if they want to pull in webAO users. You also get the advantage that you can choose the filename and thus URL to be something pretty, which means you won't get some gibberish like "Phoenix has played a song: gdjkshdfgjl". If you want this feature to also be available to the 'little guy,' then I think just catbox might be fine in those cases? Since, as far as demand goes, it wouldn't be that much more than users themselves using yt-dlp to download YT vids and uploading them to catbox to use on AO. It just automates the process. And I think it's unlikely small, private, and/or non-permanent servers would end up using it enough to warrant attention. |
Ah, I see. I was completely unaware of this, thank you for letting me know!
That was my (albeit shortsighted) original intention when creating the feature in such a way, as I know a few server owners who only host for the purpose of one or two of their own RPs, which I feel should not be neglected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CW personally requested me to look at this, so here I am
"manjaro": "pacman -S ffmpeg", | ||
"fedora": "dnf install ffmpeg", | ||
"opensuse-leap": "zypper install ffmpeg-4", #This surely won't deprecate. | ||
"opensuse-tumbleweed": "zypper install ffmpeg-4" #Are they fucking serious? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
truly idiot proof design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly code was made just to get shoved out the door so I could fix this issue (though it caused a few others that I should resolve like an idiot).
yt_parse = urlparse(song) | ||
|
||
if "youtube" in yt_parse.netloc: | ||
sys.setrecursionlimit(1200) #FIXME: figure out what's causing a RecursionError. Python's regex module should not recurse so far that it causes this to happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks sus. Check the stack trace. If this keeps being an issue then the regex that parse_qs or parse_qsl uses is too complicated and you will need to write an iterative version. You do not want people crashing the darn server from writing long links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely suspect, I fully agree, though I now believe yt-dlp to be a bit brain-damaged in some ways.
When I was working on the project I considered finding alternative ways of downloading songs from youtube due to the variety of annoyances it caused.
'preferredcodec': 'vorbis', | ||
'preferredquality': '192', | ||
}], | ||
"outtmpl": 'storage/tmp/%(title)s.%(ext)s', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use the Python NamedTemporaryFile object that gives you a file and filename you can use. This way you don't need to worry about this folder having correct perms set by the admin, and also having to clean it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time I was uncertain as to whether I wanted the files to be reuploaded from local storage (if the link somehow expired) rather than having to redownload them again and would have made appropriate configs for that.
Of course it would just be smarter to host a local webserver to serve these files off of once they are pulled, but that does not resolve the core issue of the server waiting for the download function to finish when is called, which I have not looked at at all as I am preoccupied due to personal circumstances.
please for the love of god do not merge this ever |
def mirror_youtube(self, yt_url): | ||
try: | ||
with yt_dlp.YoutubeDL(self.ydl_opts) as ydl: | ||
info = ydl.extract_info(yt_url, download=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just discovered this function locks the entire server until it concludes (both the downloading and conversion). Implementing async into the code might prove problematic for this one function. I will need to think on how to proceed with this in mind.
sys.setrecursionlimit(1200) #FIXME: figure out what's causing a RecursionError. Python's regex module should not recurse so far that it causes this to happen. | ||
info = "" | ||
yt_id = yt_pattern.match(song).group(5) | ||
yt_id = parse_qs(yt_parse.query)['v'][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I am mistaken, frustratingly all of yt-dlp's errors seem to raise DownloadError
which makes any error handling a nightmare.
e.g.:
yt_dlp.utils.PostProcessingError: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location
During handling of the above exception, another exception occurred:
...
yt_dlp.utils.DownloadError: ERROR: Postprocessing: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location
This seems to be as every function in YoutubeDL.py calls report_error on exception which raises DownloadError later on in the trouble function.
"manjaro": "pacman -S ffmpeg", | ||
"fedora": "dnf install ffmpeg", | ||
"opensuse-leap": "zypper install ffmpeg-4", #This surely won't deprecate. | ||
"opensuse-tumbleweed": "zypper install ffmpeg-4" #Are they fucking serious? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly code was made just to get shoved out the door so I could fix this issue (though it caused a few others that I should resolve like an idiot).
'preferredcodec': 'vorbis', | ||
'preferredquality': '192', | ||
}], | ||
"outtmpl": 'storage/tmp/%(title)s.%(ext)s', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time I was uncertain as to whether I wanted the files to be reuploaded from local storage (if the link somehow expired) rather than having to redownload them again and would have made appropriate configs for that.
Of course it would just be smarter to host a local webserver to serve these files off of once they are pulled, but that does not resolve the core issue of the server waiting for the download function to finish when is called, which I have not looked at at all as I am preoccupied due to personal circumstances.
yt_parse = urlparse(song) | ||
|
||
if "youtube" in yt_parse.netloc: | ||
sys.setrecursionlimit(1200) #FIXME: figure out what's causing a RecursionError. Python's regex module should not recurse so far that it causes this to happen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extremely suspect, I fully agree, though I now believe yt-dlp to be a bit brain-damaged in some ways.
When I was working on the project I considered finding alternative ways of downloading songs from youtube due to the variety of annoyances it caused.
/play will now download YouTube URLs and upload them to the endpoint of a user's choice, then send the resulting URL to the clients in said area to play the song, allowing for more convenient use of music from YouTube.
The
yt_cache
dictionary can be saved as a yaml for a cache which persists throughout server shutdowns as previously discussed, though I have not yet integrated such a feature.