Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yt-dlp implementation in /play #115

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

bazettfraga
Copy link
Contributor

/play will now download YouTube URLs and upload them to the endpoint of a user's choice, then send the resulting URL to the clients in said area to play the song, allowing for more convenient use of music from YouTube.

The yt_cache dictionary can be saved as a yaml for a cache which persists throughout server shutdowns as previously discussed, though I have not yet integrated such a feature.

@EstatoDeviato
Copy link
Contributor

Probably "https://www.youtube.com/" should be in the whitelist in url.txt.
Also I think ffmpeg_location should be added in config.

@bazettfraga
Copy link
Contributor Author

I completely forgot to append it to the url.txt file, I will make sure to do so in the morning.
As for an ffmpeg_location config var, I will also implement it, though personally find it would have little purpose in the config. If the user does not have ffmpeg configured in their PATH environmental variable or any binary directories, then it would be more convenient for them to allow the get_ffmpeg() function to handle download and extraction of the binary.

If anyone is willing to edit the README to include ffmpeg instructions to explain how to handle that, that would be ideal.

@@ -431,6 +487,25 @@ def change_music(self, song, cid, showname="", effects=0, loop=True):
"This URL is not allowed."
)
return

yt_pattern = re.compile(r"^((?:https?:)?\/\/)?((?:www|m)\.)?((?:youtube\.com|youtu.be))(\/(?:[\w\-]+\?v=|embed\/|v\/)?)([\w\-]+)(\S+)?$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are more standard ways to parse URLs than regex, like urllib.parse. if we're lucky it may even fix the recursion error below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursion error is not caused by this line but rather yt-dlp's own URL parsing if memory serves. Still, I will fix this soon!

try:
os.remove(yt_song_path)
except PermissionError:
pass #FIXME: unlucky
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the very least, log something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the best, I left this in while testing since I could not reproduce this bug on Linux, and was only happening on Windows (likely due to NOT closing the file stream. It should never occur now, but I will set up a virtual machine for further testing).

@@ -522,6 +546,60 @@ def send_discord_chat(self, name, message, hub_id=0, area_id=0):
pos=self.config["bridgebot"]["pos"],
)

def get_ffmpeg(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the yt_dlp package really require a local installation of ffmpeg? This practice of downloading dependencies at runtime seems unfortunate either way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires it so that it can extract the audio stream from the video downloaded.
I am not proud of this function myself, but it felt like the ideal solution. Perhaps having the user acquire the binaries themselves would be better than doing this nonsense given the ffmpeg_location key, though I myself am not sure which is the better option of the two.

@OmniTroid
Copy link
Contributor

Very cool concept, added some comments

@OmniTroid
Copy link
Contributor

Also the match-case keywords were introduced in python 3.10, so we need to bump version in CI to make the linter happy

@bazettfraga
Copy link
Contributor Author

Also the match-case keywords were introduced in python 3.10, so we need to bump version in CI to make the linter happy

OH, so that's what it was. It freaked me out the first time honestly lmao.

sys.setrecursionlimit(1200) #FIXME: figure out what's causing a RecursionError. Python's regex module should not recurse so far that it causes this to happen.
info = ""
yt_id = yt_pattern.match(song).group(5)
yt_id = parse_qs(yt_parse.query)['v'][0]
Copy link
Contributor

@OmniTroid OmniTroid Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the URL is malformed (eg. does not have a v query string), this will throw a KeyError. How is that exception handled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I neglected to consider that, my bad.
Perhaps I should also account for different URL formats (such as v/ and embed/).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably "good enough" to just accept v= for now, just if the link is not how we expect it to be, we should let the client know (eg. unable to play. URL is malformed). iirc if you raise a ClientError, the server will send the exception message to the client. need to check if that's correct though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClientError posts in OOC chat, yeah. I will implement it ASAP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am mistaken, frustratingly all of yt-dlp's errors seem to raise DownloadError which makes any error handling a nightmare.
e.g.:

yt_dlp.utils.PostProcessingError: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location

During handling of the above exception, another exception occurred:

...

yt_dlp.utils.DownloadError: ERROR: Postprocessing: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location

This seems to be as every function in YoutubeDL.py calls report_error on exception which raises DownloadError later on in the trouble function.

@OmniTroid
Copy link
Contributor

There is an open question whether this constitutes permitted use of litterbox.catbox.moe or not. Let's not merge before we have a clear answer here. I'll come back to it.

@Salanto
Copy link
Contributor

Salanto commented Feb 26, 2024

Wouldn't it be better to push the media to the servers WebAO repo?
That way it can be used by the client trough media streaming directly without having to even bother with filesharing TOS nightmares.

It would avoid all legal question and ensures operation for as long as the server itself is operational.

@bazettfraga
Copy link
Contributor Author

I was myself wondering if it's a good idea of using catbox as a CDN at all or not as it might be an abuse of the service, but for the purposes of testing it's done its job. It is also the reason why I've implemented the ability to change how and where the files are passed, though;

Wouldn't it be better to push the media to the servers WebAO repo?

I'm unaware of how this works as I've not looked into it. Could you point me to the right resources for that service?
As an aside, the more I've thought on this project the more sane it has felt to implement the YT downloading/streaming functionality into the client itself rather than the server, something I'd not considered in hindsight as I thought this to be a simpler problem than it was.

@Salanto
Copy link
Contributor

Salanto commented Feb 27, 2024

I'm unaware of how this works as I've not looked into it. Could you point me to the right resources for that service?

If the Webserver and the AO2 Server run on the same machine, which is the case in 99.9% of servers, the software can just store the saved file in the directory the Webserver saves data from.

@bazettfraga
Copy link
Contributor Author

If the Webserver and the AO2 Server run on the same machine

Ah, I think I see what you mean now. This would still be an issue for server hosts who only want to use KFO-Server without anything else, however? At that rate, it may be simpler to have the server itself open a simple webserver to serve the files off of as this would simply require the host to open an additional port on their router.

Unless I am mistaken, but the only webAO repo I have found is AttorneyOnline/webAO and requiring the host to set this up for the purpose of serving music feels redundant and needlessly complicated.

@lambdcalculus
Copy link

Unless I am mistaken, but the only webAO repo I have found is AttorneyOnline/webAO and requiring the host to set this up for the purpose of serving music feels redundant and needlessly complicated.

That repo is for the webAO website - that's not what hosts would need to set up. A webAO asset server (which is what is being referred to) is simply a file server, which can be set up with, say, nginx or Apache. The AO server then sends the URL it during the handshake (iirc?) which is what webAO clients use for assets. I think it's not that big of an ask to set this up, at least for 'big' hosts who are hosting their AO servers 24/7 (and most of them already have this), since it's necessary if they want to pull in webAO users. You also get the advantage that you can choose the filename and thus URL to be something pretty, which means you won't get some gibberish like "Phoenix has played a song: gdjkshdfgjl".

If you want this feature to also be available to the 'little guy,' then I think just catbox might be fine in those cases? Since, as far as demand goes, it wouldn't be that much more than users themselves using yt-dlp to download YT vids and uploading them to catbox to use on AO. It just automates the process. And I think it's unlikely small, private, and/or non-permanent servers would end up using it enough to warrant attention.

@bazettfraga
Copy link
Contributor Author

Ah, I see. I was completely unaware of this, thank you for letting me know!
Then yes, if this is something major servers are utilizing already I see no reason not to implement support for it already, and I'll have to look into it over the coming days.

If you want this feature to also be available to the 'little guy,' then I think just catbox might be fine in those cases

That was my (albeit shortsighted) original intention when creating the feature in such a way, as I know a few server owners who only host for the purpose of one or two of their own RPs, which I feel should not be neglected.

Copy link
Contributor

@oldmud0 oldmud0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CW personally requested me to look at this, so here I am

"manjaro": "pacman -S ffmpeg",
"fedora": "dnf install ffmpeg",
"opensuse-leap": "zypper install ffmpeg-4", #This surely won't deprecate.
"opensuse-tumbleweed": "zypper install ffmpeg-4" #Are they fucking serious?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truly idiot proof design

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly code was made just to get shoved out the door so I could fix this issue (though it caused a few others that I should resolve like an idiot).

yt_parse = urlparse(song)

if "youtube" in yt_parse.netloc:
sys.setrecursionlimit(1200) #FIXME: figure out what's causing a RecursionError. Python's regex module should not recurse so far that it causes this to happen.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sus. Check the stack trace. If this keeps being an issue then the regex that parse_qs or parse_qsl uses is too complicated and you will need to write an iterative version. You do not want people crashing the darn server from writing long links.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely suspect, I fully agree, though I now believe yt-dlp to be a bit brain-damaged in some ways.
When I was working on the project I considered finding alternative ways of downloading songs from youtube due to the variety of annoyances it caused.

'preferredcodec': 'vorbis',
'preferredquality': '192',
}],
"outtmpl": 'storage/tmp/%(title)s.%(ext)s',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use the Python NamedTemporaryFile object that gives you a file and filename you can use. This way you don't need to worry about this folder having correct perms set by the admin, and also having to clean it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time I was uncertain as to whether I wanted the files to be reuploaded from local storage (if the link somehow expired) rather than having to redownload them again and would have made appropriate configs for that.
Of course it would just be smarter to host a local webserver to serve these files off of once they are pulled, but that does not resolve the core issue of the server waiting for the download function to finish when is called, which I have not looked at at all as I am preoccupied due to personal circumstances.

@mposs00
Copy link
Contributor

mposs00 commented Apr 12, 2024

please for the love of god do not merge this ever

def mirror_youtube(self, yt_url):
try:
with yt_dlp.YoutubeDL(self.ydl_opts) as ydl:
info = ydl.extract_info(yt_url, download=True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just discovered this function locks the entire server until it concludes (both the downloading and conversion). Implementing async into the code might prove problematic for this one function. I will need to think on how to proceed with this in mind.

sys.setrecursionlimit(1200) #FIXME: figure out what's causing a RecursionError. Python's regex module should not recurse so far that it causes this to happen.
info = ""
yt_id = yt_pattern.match(song).group(5)
yt_id = parse_qs(yt_parse.query)['v'][0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am mistaken, frustratingly all of yt-dlp's errors seem to raise DownloadError which makes any error handling a nightmare.
e.g.:

yt_dlp.utils.PostProcessingError: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location

During handling of the above exception, another exception occurred:

...

yt_dlp.utils.DownloadError: ERROR: Postprocessing: ffprobe and ffmpeg not found. Please install or provide the path using --ffmpeg-location

This seems to be as every function in YoutubeDL.py calls report_error on exception which raises DownloadError later on in the trouble function.

"manjaro": "pacman -S ffmpeg",
"fedora": "dnf install ffmpeg",
"opensuse-leap": "zypper install ffmpeg-4", #This surely won't deprecate.
"opensuse-tumbleweed": "zypper install ffmpeg-4" #Are they fucking serious?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly code was made just to get shoved out the door so I could fix this issue (though it caused a few others that I should resolve like an idiot).

'preferredcodec': 'vorbis',
'preferredquality': '192',
}],
"outtmpl": 'storage/tmp/%(title)s.%(ext)s',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time I was uncertain as to whether I wanted the files to be reuploaded from local storage (if the link somehow expired) rather than having to redownload them again and would have made appropriate configs for that.
Of course it would just be smarter to host a local webserver to serve these files off of once they are pulled, but that does not resolve the core issue of the server waiting for the download function to finish when is called, which I have not looked at at all as I am preoccupied due to personal circumstances.

yt_parse = urlparse(song)

if "youtube" in yt_parse.netloc:
sys.setrecursionlimit(1200) #FIXME: figure out what's causing a RecursionError. Python's regex module should not recurse so far that it causes this to happen.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely suspect, I fully agree, though I now believe yt-dlp to be a bit brain-damaged in some ways.
When I was working on the project I considered finding alternative ways of downloading songs from youtube due to the variety of annoyances it caused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants