-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use compression to shorten URLs #36
Use compression to shorten URLs #36
Conversation
@@ -7,6 +7,7 @@ import { Menu } from './Navigation' | |||
import { PreferencesContext } from './Popups/Settings' | |||
import { useWindowDimensions } from './utils/window_width' | |||
import LeanLogo from './assets/logo.svg' | |||
import LZString from 'lz-string'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This library has the MIT license.
I considered using CompressionStream
with gzip but it only became supported in Safari last year. I figured it would be bad if some live URLs (generated by people with modern browsers) didn't work on browsers that have a non-trivial amount of users left.
I came across lz-stream because it's used in sharplab.io, a C# REPL : https://github.com/ashmind/SharpLab/blob/078c4e2c5a790f6b7203401c2ec63de22913c0d1/source/WebApp/app/features/persistent-state/handlers/url.ts#L1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside: sharplab does an extra thing it calls "pre-compression" where it replaces some common substrings with shorter ones (@3
, @7
etc.) before sending it to this library: https://github.com/ashmind/SharpLab/blob/main/source/WebApp/app/features/persistent-state/handlers/url/precompressor.ts
I did some quick checks and it wasn't a big deal. If you're interested I could get some actual data using the benchmark process I described in the PR description. The trade-off is extra complexity/.js size for a much smaller reduction in URL size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to further optimise the algorithm, please feel free to do so, I'd happily merge it :)
@@ -286,7 +288,11 @@ function App() { | |||
if (args.code) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "code" argument is still supported so that old URLs continue to work.
Loading a URL like that will cause the URL to be immediately replaced in the address bar with a codez URL (if the code is compressible -- see below.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the docs after I merged your PR; the code
argument also has to keep working as is, because Zulip needs it for auto generating links from code blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's cool!
history.replaceState(undefined, undefined!, formatArgs(args)) | ||
} else { | ||
let args = {project: _project, url: null, code: fixedEncodeURIComponent(code)} | ||
history.replaceState(undefined, undefined!, formatArgs(args)) | ||
const compressed = LZString.compressToBase64(code).replace(/=*$/, ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trimming trailing =
because (A) it meses with the args parsing (B) it's not necessary anyway (these are padding bytes from base64; but the decompression doesn't actually care if they are there)
let args = {project: _project, url: null, code: fixedEncodeURIComponent(code)} | ||
history.replaceState(undefined, undefined!, formatArgs(args)) | ||
const compressed = LZString.compressToBase64(code).replace(/=*$/, ''); | ||
if(compressed.length < code.length) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this only use compression if its actually a win (see charts in PR description).
For very small character counts it isn't always a win, although it also doesn't really matter anyway.
If you'd like it to always format URLs with codez I can remove this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked in DM; the check here is biased against compression because code
is smaller than fixedEncodeURIComponent(code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in fa4342d
FYI @joneugster -- I see you've been modifying this repo lately but the repo doesn't have anyone set to get added to the PRs automatically. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is great!
@@ -7,6 +7,7 @@ import { Menu } from './Navigation' | |||
import { PreferencesContext } from './Popups/Settings' | |||
import { useWindowDimensions } from './utils/window_width' | |||
import LeanLogo from './assets/logo.svg' | |||
import LZString from 'lz-string'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to further optimise the algorithm, please feel free to do so, I'd happily merge it :)
@@ -286,7 +288,11 @@ function App() { | |||
if (args.code) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the docs after I merged your PR; the code
argument also has to keep working as is, because Zulip needs it for auto generating links from code blocks.
Currently the code is put in the URL fragment with URL encoding on key press.
This is very nice but results in long URLs. Really long URLs can be harder to share in some contexts (e.g. Discord is limited to 2000 characters.)
This PR adds support to automatically compress the code. It is backwards compatible with all existing URLs (no links will be broken).
Here's a recent live link from Zulip:
https://live.lean-lang.org/#code=import%20Mathlib.CategoryTheory.Equivalence%0A%0Aopen%20CategoryTheory%0A%0Auniverse%20v%E2%82%81%20v%E2%82%82%20v%E2%82%83%20vm%20u%E2%82%81%20u%E2%82%82%20u%E2%82%83%20u%20um%0Avariable%20%7BB%20%3A%20Type%20u%E2%82%82%20%7D%20%5BCategory.%7Bv%E2%82%82%7D%20B%5D%0Avariable%20%7BM%20%3A%20Type%20vm%20%7D%20%5BCategory.%7Bum%7D%20M%5D%0Aset_option%20linter.longLine%20false%0A%0A%40%5Bext%5D%0Astructure%20Simple%20(F%20%3A%20B%20%E2%A5%A4%20M)%20%20where%0A%20%20pt%20%3A%20M%0A%20%20leg%20c%20%3A%20pt%20%E2%9F%B6%20F.obj%20c%0A%0A%40%5Bext%5D%0Astructure%20SimpleMorphism%20%20%7BF%20%3A%20B%20%E2%A5%A4%20M%7D%20(x%20y%20%3A%20Simple%20F)%20where%0A%20%20hom%20%3A%20x.pt%20%E2%9F%B6%20y.pt%0A%0Avariable%20%7BF%20G%20%3A%20B%20%E2%A5%A4%20M%7D%0Avariable%20(i%3A%20F%20%E2%9F%B6%20G)%0A%0Adef%20fctrObj%20(i%3A%20F%20%E2%9F%B6%20G)%20(w%3A%20Simple%20F)%20%3A%20Simple%20G%20%20where%0A%20%20pt%20%3A%3D%20w.pt%0A%20%20leg%20c%20%3A%3D%20w.leg%20c%20%E2%89%AB%20i.app%20c%0A%0A%0Adef%20fctrMap%20(i%3A%20F%20%E2%9F%B6%20G)%20%7Bx%20y%7D%20(m%20%3A%20SimpleMorphism%20x%20y)%20%3A%20SimpleMorphism%20(fctrObj%20i%20x)%20(fctrObj%20i%20y)%20%3A%3D%0A%20%20%7B%20hom%20%3A%3D%20m.hom%20%7D%20--%20FctrMap%20is%20essentially%20the%20identity%0A%0A--%20I%20essentially%20want%20to%20prove%20(fctrMap%20(%F0%9D%9F%99%20F)%20m)%20%3D%20%20m%2C%20but%20this%20is%20ill-typed%0Adef%20lawBadTypeMismatch%20%7Bx%20y%20%7D%20(m%20%3A%20SimpleMorphism%20x%20y%20)%20%3A%20m%20%3D%20(fctrMap%20(%F0%9D%9F%99%20F)%20m)%20%20%3A%3D%20%20%20%20sorry%0A%0A--%20So%20I%20need%20HEq%0Adef%20law%20%7Bx%20y%20%7D%20(m%20%3A%20SimpleMorphism%20x%20y%20)%20%3A%20HEq%20m%20(fctrMap%20(%F0%9D%9F%99%20F)%20m)%20%20%3A%3D%20sorry%0A%0Adef%20lawTry%20%7Bx%20y%20%7D%20(m%20%3A%20SimpleMorphism%20x%20y%20)%20%3A%20HEq%20m%20(fctrMap%20(%F0%9D%9F%99%20F)%20m)%20%20%3A%3D%0A%20%20heq_of_cast_eq%20hcast%20(by%0A%20%20%20%20apply%20SimpleMorphism.ext%0A%20%20%20%20have%20result%20%3A%20(cast%20hcast%20m).hom%20%20%3D%20(fctrMap%20(%F0%9D%9F%99%20F)%20m).hom%20%3A%3D%0A%20%20%20%20%20%20have%20%3A%20(fctrMap%20(%F0%9D%9F%99%20F)%20m).hom%20%3D%20m.hom%20%3A%3D%20rfl%0A%20%20%20%20%20%20have%20%3A%20(cast%20hcast%20m).hom%20%3D%20m.hom%20%20%3A%3D%20sorry%0A%20%20%20%20%20%20by%0A%20%20%20%20%20%20simp_all%20only%0A%20%20%20%20exact%20result)%0A%0A%20%20where%0A%20%20%20%20%20%20hcast%20%20%7Bx%20y%20%7D%20%3A%20SimpleMorphism%20x%20y%20%3D%20SimpleMorphism%20((fctrObj%20(%F0%9D%9F%99%20F))%20x)%20((fctrObj%20(%F0%9D%9F%99%20F))%20y)%20%3A%3D%20by%0A%20%20%20%20%20%20%20%20have%20idobj%20%7Bx%7D%20%3A%20x%20%3D%20fctrObj%20(%F0%9D%9F%99%20F)%20x%20%3A%3D%20%20by%0A%20%20%20%20%20%20%20%20%20%20apply%20Simple.ext%0A%20%20%20%20%20%20%20%20%20%20%C2%B7%20rfl%0A%20%20%20%20%20%20%20%20%20%20%C2%B7%20apply%20heq_iff_eq.mpr%0A%20%20%20%20%20%20%20%20%20%20%20%20funext%20c%0A%20%20%20%20%20%20%20%20%20%20%20%20have%20%3A%20(fctrObj%20(%F0%9D%9F%99%20F)%20x).leg%20c%20%3D%20x.leg%20c%20%E2%89%AB%20(%F0%9D%9F%99%20F%3A%20F%20%E2%9F%B6%20F).app%20c%20%3A%3D%20rfl%0A%20%20%20%20%20%20%20%20%20%20%20%20simp_all%20only%20%5BCategory.comp_id%2C%20NatTrans.id_app%5D%0A%20%20%20%20%20%20%20%20congr%0A%20%20%20%20%20%20%20%20%C2%B7%20exact%20idobj%0A%20%20%20%20%20%20%20%20%C2%B7%20exact%20idobj%0A
After:
http://live.lean-lang.org/#codez=JYWwDg9gTgLgBAWQIYwBYBtgCMB0BhFAUwHNoBPAFVUPJwFEBHAV2ADcl1CA7AY0ICh+EMNzgEYJclRpQygplzaEoAZ0JxWgQIINgIIINgYIINIOE21M9TQ0xMh+7KMCRZOcAN4AhOAC44FMiJM9AF84AG1xSVkcV1YdEPcAXTskBycXVwRvX391VmMQ8KJSKNcmEBCEJLUYAH1hGGAILjhMLgkoHHQm4gAZYC51ADMONUEAAVDCAA8YKpgoJh4YJih1AGVQMBcACgAxLM9AE0pEAEo4OAB3alX+c7B4HwRblpI4Hiz7uEA38jhdnAgsAArN7jSYzOYLJYrdabTgIaBgVDAFTGNz7HxHRAhbZTOBkLIbcAuXZnK7KATnVAQYw+KY4T4/Mj0mCCeyOZzqVz7ADiBzgxwQQWSqQ5cG2wB8+x+3JOggAJoRBnBBksoAB5IFiiW/b5wGVii4+QlbdQkgmw9S8y7XClwT5eAC8l2Zz04xDe3idF06r3egGsiODAHBIMBgEHyxXK1XIMPiyW6/WuXFkbE0uDGuEIpEouDJs5Gi3wqCI5HGbYq+Ya4HAXNncuqquBvH5h3PVxwKk0p0gHCduAhAC0A9+0ZDgZUcEIKjUbUc6HQ+LQ6mACtnMDk/CHcAAkpPp9wGhwF5ckG04DAIHaoBBWOp6/MY2LAKwbgE3935nEBnJ1wEAAGjgWBMPAaDIuOgbzgO64iHK/AKkq6BIBc7hIHKfgiAgpYoDwqBuMm/ZimmGaEEWJY5nh+Y/nATr3lAj7bK+74/mcnrnOcKjQLIghbmsl67gMhBynAAASjCwZGCEXLheL4dshGFlmpa5tJFEiQwlE0XRDFmp+5yOnA7FQJxYnwYhFCyFJ+KpuaRLEQpZHKVkqnqRWtFjvRb7acxjrPNQDB1IMNQ8EgKi1IQamoEFIVilgcisXAIZbPiREkdmPbTCycWoEgt5wKsKhMOgDxipF8ARcF8Cfr21LnNRLmaR5H4nFVXbPJl2XqD4GluVpjXNVRP59XpUCDOgrWsVlOWdSVHbTZVfbdn1LEGZxcXnDFY1sZsNRHnATQLmN0xIEsuVTgVMCys8ZI3KtM3lecSbSSEBY2Slil4U6yV2WWNGNu575nFMdY/Zqf0kmcZAtgBsU3R27WBnKALAkmT1KU6Lm/T1Sl6VDG2rQlx5ETg6W43FADtuUjSTrHk/j+K+TUwCDAFYU4OAUBU3FgwKOlIIw6tE0dWK6Mg5jgM+u67xOnSboegGoPxj8JLBqGHpDZTfNxSoW07Xt+KFBIxRMjw1JgAzcr/gAcigZmnioOArttoZJHzxtcMQ7N8+Th3HSuiNU17UxHfAvtAvwQA
Benchmarks
I benchmarked this by taking mathlib and concatenating everything with
git ls-files | grep '\.lean$' | xargs cat > everything.lean
(45mb lol) and then using this script:and here's what I found:
compression is about the same on average until about 30 characters. At about 60 characters it's always better. For longer snippets the gap of course grows quite large.
(The reason there's noise in the baseline and why its not 1:1 is that things like whitespace need to be URL encoded)