Replies: 7 comments
-
It would be good to include examples of what you want to optimize compression wise, and what the current solutions offer. Then as you experiment with improving on that, there's a good baseline to compare to. I've only had experience with short CLI example recordings, where a 100KB GIF was replaced by a 114KB SVG, but with gzip encoding (since Github is the asset delivery source) it was 100KB vs 7KB compressed. Have you compared your larger recordings with SVG animations? There is a discussion issue for supporting that better here. If you have a larger set of data, you could also see how well it compresses with zstd with a dictionary trained against the input (be that SVG or a different format). That may be simpler to implement/maintain depending on requirements. |
Beta Was this translation helpful? Give feedback.
-
See an example recording here: I wouldn't call it "huge", but after reading through the JSON, a lot is left to be desired. I have specifically used a tiny resolution and have recorded many sessions to select the smallest and shortest one. SVG is out of the question for me, as the whole selling point of asciinema is to also be able to play back the recordings from the terminal ("a tildeverse alternative to YouTube") and to maintain the ability to copy & paste. I know this comparison is not fair, but the |
Beta Was this translation helpful? Give feedback.
-
You can still copy/paste text content with the SVG, although for the animation you'd need to pause it as it just animates through frames (via CSS).
That's fine too 👍
What are your expectations? The 4.5MiB I wanted to compare against an SVG animation, but it seems that tool is very inefficient. It used over 4GB of RAM processing it at which point it failed.
Alphanumeric characters would not have much filesize weight, but large documents of text using them would definitely be larger, you're not going to get crazy efficiency as the associated data to construct the content is not part of the alphanumeric characters. Likewise for a game, your disk storage is different from RAM, which for a game only needs to use memory to handle the current state, not retain it, but your recording does need to keep enough information to do so. Different story with 3D games where the game sizes and memory used can be huge compared to a 2D video (or data to record a replay internally which could be even lighter). Point being the disk/memory requirements of the game itself doesn't mean much for size of a recorded session. I am not familiar with asciinema You could optimize that format for sure, but I'm not sure how much more efficient it'll compress.
With fixed width/height, you could also try encoding rows or tiles, to cache/recycle unchanged portions (assuming that content for such in the current format is repeated), but again this is probably something a compressor with a large enough window may already handle partially? Which is probably similar to the video encoding techniques you referenced earlier. So 4.5MiB to 310KiB is the current compression result (closer to 400KiB without a trained dictionary). Is that really not sufficient for you? Especially since AFAIK zstd can also stream/seek that compressed data? (not the seek you'd expect for playback exactly, but could possibly be adapted) How much more efficient do you want beyond that, and is it worth the time/effort to attempt it? It definitely looks viable if switching to SoA format with binary representation instead of text. |
Beta Was this translation helpful? Give feedback.
-
I managed to convert the 4.5MB The SVG output was about 82MB and attempting to clean that up with SVGO was using over 3GB of RAM, I chose to terminate that before it'd complete/fail. zstd compression (level 3) was down to around 3MB. I didn't continue further as the SVG size and memory required to convert to SVG seemed pretty bad. That Just thought I'd share that, even though SVG was ruled out earlier for your requirements. |
Beta Was this translation helpful? Give feedback.
-
MotivationI'm trying to represent multiple business requirements with this optimization. Even now that asciinema is not widespread, having few users, storage costs have been asserted in operations and when enacting limits. Imagine if it reached the scale of YouTube - it sure couldn't be hosted for gratis. If a hobbist would want to self-host such a service without cost indefinitely, they would probably have a web hosting of about 50-500MB at their disposal. If you also store other materials, this would barely suffice to store a single capture of the most well known ascii games, let alone showing multiple strategies or multiplayer matches in one or the other. Sharing a number of matches or a whole a playthrough would amend to application specific compression for gains of orders of magnitudes. And we haven't even mentioned non-games or hosting content for your friends. Such a system also begs to be used by those who are very low on monthly data allowance to replace watching streaming video tutorials. For them, every megabyte literally counts. Whole countries could gain from this, along with offline-first communities where you preload every educational material during installation and you update the set at most annually. If you assume a 80GB HDD, that's not a lot of material. General compressionI think you are pushing zstd too far (but then, why not go all the way with PAQ?). However, compression could be improved quite a bit by compression filters you have hinted at above for separating the delta timing parts from the terminal characters. We could improve it yet again if the terminal characters could be stored by similarly zoning the stream:
Note that as ncurses updates the screen with a minimized diff of commands, it already counts as a kind of poor pre-compression, and pre-compressed data is prone to compress badly in further passes. So it might also improve overall compression if we compressed full frames of the console for applications that internally refresh the whole screen based on a timer or on user input. Domain specific compressionFor general compression, my suggestion above was to introduce tunable and adaptive lossiness to the process in the temporal dimension:
Application specific compressionWhat is the state of the art you ask? The The generated output only depends on which key a user presses and at which simulation step, taking ~3 bits: left, right, up, down, build, flag, (remove all flags, remove 50% of flags). As user actions are sparse, the timing could be encoded via a variable length discrete step delta as well (with arithmetic coding). Sharing the exact cursor movement is usually not important, as players are usually doodling while they wait for resources or expansion, although, the recorder might sometimes want to show population thresholds at certain key locations and times during a match. So, for most use cases, the representation could also be reduced to coordinate (~9 bits) vs. action (1 bit) along with timing. The example included 2418 simulation steps (why not that many rows in the cast then?) According to grep, it contained about 33 build commands and at most 326 flags (loose upper bound, but less than this). I think it would be a reasonable estimate that ~300 total commands were invoked (build, flag and unflag). That would make the average delta time ~10 frames, let's say taking 5 bits on average. That weighs ~15 bits/action, so a replay would take about 600 bytes, along with the the executable and its parameters (including the random seed). That's the ball park of the lower bound of what entropy such a replay should be backed by. |
Beta Was this translation helpful? Give feedback.
-
For $5 a month you get plenty of storage, and if you need more you can scale that. Additionally, if the increase in latency is acceptable, one can pull from storage on their own local machines (eg a cheap raspberry pi connected to an external USB SDD or NAS). You could even host the entire server locally instead of just storage if preferable. Then you have the cost of a domain, which again you can avoid the annual renewal cost if you prefer. Or you can just go with an IP, which is a non-issue if you have a client app instead of direct browser. It really just depends on your needs, scale and audience. If you have a large community, you can also leverage torrents. In my experience managing a community server for users to upload game mods and related features, bandwidth is a bigger concern for us than storage (of which we use 200GB or so over the past 6 years with an average peak of 1k users online at the same time), we can exceed 10TB traffic a month. That said our content is considerably larger in size than what you're interested in.
PAQ if I recall correctly is much slower to compress/decompress? zstd is not only good at compression ratio (not the best), it's notably better at the speed of doing so. And like I mentioned, there is streaming + seek support for decompression. I don't see how it's pushing it too far? The example you provided of 4.5MiB was reduced down to 310KiB, 7% of the original size already.
Yes, I can understand how that could be beneficial.
Ok, so to clarify:
Is it still intended to use the I wrote a little program in Rust to process your linked cast. Nothing special beyond restructuring the data a bit and using binary encoding. I naively de-duped the content into unique blocks for lookups (those uniques still have a fair amount of duplicated content across them): Stats:
A good amount of weight was trimmed off from the escape code prefix Prior to merging lines into frame timings, there was a few lines that split the ANSI escape codes across two lines. If encoding time/content losslessly it is about 1.5MiB or 370KiB with zstd compression. So most of the win I guess was from reducing the timestamp to frame numbers with lossy encoding as zstd could manage that prior on the JSON lines without my program :/ Let me know if the source code would be of interest to you (it's inefficient/rough, but enough to perhaps play around a bit). |
Beta Was this translation helpful? Give feedback.
-
Finally got reading this thread. Lots of interesting ideas and background here, I think I'm getting where you're coming from. First, I think you're oversimplifying what writing to a terminal actually does @bkil :) Each individual write recorded by asciinema as From: https://www.vt100.net/emu/dec_ansi_parser
When printing to a terminal color attributes are not assigned directly to text, but instead every terminal keeps "current pen color" that is altered by SGR sequences (commands), which affects text printed in the future but not any specific text. You can't reliably separate those without performing actual terminal emulation yourself, and using virtual screen buffer (this is what asciinema-player does).
Similar to above, you can't merge updates without actually interepreting the data with an emulator if you want to reduce the bandwidth (while reducing the frame rate). Of course you can just concatenate subsequent
This is how it pretty much was in asciicast v1 format: https://github.com/asciinema/asciinema/blob/develop/doc/asciicast-v1.md#frame Full timestamp in v2 was chosen for several reasons (I can dig the PR that includes reasoning behind this if you want), and this change might have brought slightly bigger file sizes, but I'd say the increase is minor in practice.
Again, this would require emulation with virtual screen buffer, which currently is not part of the cli. It's hard/impossible to make sense of line/cell (char grid in general) just by "extracting" some data from the output stream - you need to interpret the control sequences which is what terminals do.
I wish! But it's never gonna happen, it's too niche :)
As I mentioned in asciinema/asciinema#516 (comment) - Your recordings seem to be on the bigger side, but still, you can store them on disk compressed, and you can serve them in compressed form to the browser. If you choose gzip compression than you can directly tell webserver, e.g. nginx, to serve those .gz files as-is with
This is what control sequences are for, and indeed we have ones for scrolling, blinking amongst many others.
In fact I think the "Application specific compression" description finally clicked for me, and I understood your use case and the ideas you presented. And I believe I have a promissing solution for you: Custom recording format parser With the above you could create a 600 byte file (but not with asciinema recorder unfortunately, although you may try creating a script to convert asciicast file into your own efficient format) and implement a custom parser that interprets those 600 bytes in your application specific way. You would need to translate this bespoke encoding of yours into regular terminal control sequences for cursor movement and color management but it shouldn't be too hard. See the Star Wars example in the linked parser doc. |
Beta Was this translation helpful? Give feedback.
-
The present encoding is good as a YAGNI proof of concept and brotli will compress it somewhat as well. It is good enough for command line or short recordings. However, it could use quite some storage and bandwidth for interactive real time applications like long multiplayer game sessions.
Hence, some more advanced algorithms could be tested, inspired by the MPEG family of compression methods. For example, we could memorize B/P reference frames for a certain number of seconds and store the difference based on those.
We could compress and even store the colors and character styling separately from text content. This would also allow viewers an option for a low-fi, bandwidth-efficient access method.
The recorder could have an FPS-cap option that would allow for merging changes of rapid succession to joint updates. The cuts could also be chosen adaptively for extra gains.
Beta Was this translation helpful? Give feedback.
All reactions