Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax Highlighting #13

Open
daknob opened this issue Sep 26, 2016 · 8 comments
Open

Syntax Highlighting #13

daknob opened this issue Sep 26, 2016 · 8 comments

Comments

@daknob
Copy link
Owner

daknob commented Sep 26, 2016

Any Paste Service can benefit from Syntax Highlighting. With a quick search, a lot of syntax highlighters require JavaScript to work. Since we want to avoid JS, we have to find a backend-based syntax highlighter, which means it must be written in Python. Let's try and find one and see how well it can work with the existing code.

@daknob
Copy link
Owner Author

daknob commented Sep 28, 2016

Something to note in Syntax Highlighting is that if the user is able to select the type of code they want to enable syntax highlighting for, then this causes issues with the Advanced Paste Deduplication Mechanism™. If two users paste the same code, and one marks it as C, while the other as C++, currently only the last option will be valid.

To address this issue, TorPaste can either automatically detect the language used, or, when multiple languages are set, allow the viewer to select which language they want to enable highlighting for, in a dropdown fashion.

@daknob
Copy link
Owner Author

daknob commented Sep 28, 2016

Of course, there's also the option to change the Advanced Paste Deduplication Mechanism™ to use random IDs..

@j11e
Copy link
Contributor

j11e commented Oct 10, 2016

Another solution is to request Syntax Highlighting when viewing the paste, with a route like `/view/<paste_id>/.

The /new logic would be:

  1. Create a paste and indicate a default language, say Python;
  2. This creates the paste as /pastes/${paste_id} with the non-highlighted content, and redirect the user to /view/${paste_id}/python

The /view/<paste_id>/<syntax> logic would be:

  1. check the existence of /pastes/${paste_id}.${syntax};
  2. if it does not exist, create it by using the syntax highlighter
  3. display the content of /pastes/${paste_id}.${syntax}

As you can see, in that example, the python-highlighted file is created immediately, because the paste's author is redirected to it. But the viewing screen would allow another language to be requested, and this would just create a new highlighted version of the same paste (the python file would remain, and a new C++ one would be created for instance).
Also, with that logic, creating another paste with the same content but another language, say PHP, would just cause the creation of /pastes/${paste_id}.php, but not delete the python-highlighted version.

The /view/<paste_id> with no syntax specified would keep the current behavior (no syntax highlighting).

This would look like:

@app.route("/view/<pasteid>/<syntax>")
@app.route("/view/<pasteid>")
def view_paste(pasteid, syntax=None):
    # [...]

@daknob
Copy link
Owner Author

daknob commented Oct 12, 2016

This seems like a good idea. We can add a /view/<pasteid>/<syntax> in the torpaste.py app and then show the paste content in the desired syntax. If the uploader uses python and I want to view the content as c, I should be able to do that. However maybe we should not store the file any different: just store the paste normally, and then also store the user selected syntax as metadata (or maybe not at all). As long as the user distributed the link with the /python, then everyone who reads it can see the python syntax highlighting. If a user makes the same paste and chooses c, then they will get a /c link, but they will not be able to cause issues to the /python link at all! Also, when viewing the paste, we can add a dropdown with all supported languages so far and a button to allow the user to view it in this format.

And just to make our lives easier, we can do /view/<pasteid>?syntax=python so that <form> there can be:

<form action="/view/<pasteid>" method="get">
    <select name="syntax">
        <option value="c">C</option>
    </select>
    <button>Submit</button>
</form>

@daknob
Copy link
Owner Author

daknob commented Oct 12, 2016

(However the above proposal means that we need to do the syntax highlighting dynamically every time a paste is requested and I do not know if this will be time/computationally intensive)

@j11e
Copy link
Contributor

j11e commented Oct 12, 2016

I agree with what you said, but I do have a question: what do we prefer between:

  1. syntax highlighting recomputed at each view, as you propose
  2. syntax highlighting computed the first time a language is requested, then remembered, as I proposed before

1 takes more CPU, 2 takes more storage. I personally prefer 2, as it costs less in terms of performance, and storage is cheap anyway (after all, this is just text).

@daknob
Copy link
Owner Author

daknob commented Oct 12, 2016

I don't think it's easy to make this decision because it depends on our syntax highlighter. How much does it take to calculate the result for a specific size of input? How much CPU does it need? If it needs 5 seconds for a 1 kB file, then we have to store it. If it needs 2 ms and takes 0.1% of the CPU then I think we can safely calculate it on the fly.

Here's an attack for the storage option:
View every listed paste in every available language with a bot and cause m*n where m is the number of pastes and n is the number of available languages additional storage use.

@j11e
Copy link
Contributor

j11e commented Oct 16, 2016

A third option is to simply have a configuration variable allowing to switch between the two modes. After all, the difference is pretty small (save the highlighted paste or recompute it each time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants