Syntax Highlighting #13

daknob · 2016-09-26T08:29:37Z

Any Paste Service can benefit from Syntax Highlighting. With a quick search, a lot of syntax highlighters require JavaScript to work. Since we want to avoid JS, we have to find a backend-based syntax highlighter, which means it must be written in Python. Let's try and find one and see how well it can work with the existing code.

daknob · 2016-09-28T09:45:36Z

Something to note in Syntax Highlighting is that if the user is able to select the type of code they want to enable syntax highlighting for, then this causes issues with the Advanced Paste Deduplication Mechanism™. If two users paste the same code, and one marks it as C, while the other as C++, currently only the last option will be valid.

To address this issue, TorPaste can either automatically detect the language used, or, when multiple languages are set, allow the viewer to select which language they want to enable highlighting for, in a dropdown fashion.

daknob · 2016-09-28T09:46:08Z

Of course, there's also the option to change the Advanced Paste Deduplication Mechanism™ to use random IDs..

j11e · 2016-10-10T21:00:54Z

Another solution is to request Syntax Highlighting when viewing the paste, with a route like `/view/<paste_id>/.

The /new logic would be:

Create a paste and indicate a default language, say Python;
This creates the paste as /pastes/${paste_id} with the non-highlighted content, and redirect the user to /view/${paste_id}/python

The /view/<paste_id>/<syntax> logic would be:

check the existence of /pastes/${paste_id}.${syntax};
if it does not exist, create it by using the syntax highlighter
display the content of /pastes/${paste_id}.${syntax}

As you can see, in that example, the python-highlighted file is created immediately, because the paste's author is redirected to it. But the viewing screen would allow another language to be requested, and this would just create a new highlighted version of the same paste (the python file would remain, and a new C++ one would be created for instance).
Also, with that logic, creating another paste with the same content but another language, say PHP, would just cause the creation of /pastes/${paste_id}.php, but not delete the python-highlighted version.

The /view/<paste_id> with no syntax specified would keep the current behavior (no syntax highlighting).

This would look like:

@app.route("/view/<pasteid>/<syntax>")
@app.route("/view/<pasteid>")
def view_paste(pasteid, syntax=None):
    # [...]

daknob · 2016-10-12T07:32:11Z

This seems like a good idea. We can add a /view/<pasteid>/<syntax> in the torpaste.py app and then show the paste content in the desired syntax. If the uploader uses python and I want to view the content as c, I should be able to do that. However maybe we should not store the file any different: just store the paste normally, and then also store the user selected syntax as metadata (or maybe not at all). As long as the user distributed the link with the /python, then everyone who reads it can see the python syntax highlighting. If a user makes the same paste and chooses c, then they will get a /c link, but they will not be able to cause issues to the /python link at all! Also, when viewing the paste, we can add a dropdown with all supported languages so far and a button to allow the user to view it in this format.

And just to make our lives easier, we can do /view/<pasteid>?syntax=python so that <form> there can be:

<form action="/view/<pasteid>" method="get">
    <select name="syntax">
        <option value="c">C</option>
    </select>
    <button>Submit</button>
</form>

daknob · 2016-10-12T07:33:28Z

(However the above proposal means that we need to do the syntax highlighting dynamically every time a paste is requested and I do not know if this will be time/computationally intensive)

j11e · 2016-10-12T15:06:46Z

I agree with what you said, but I do have a question: what do we prefer between:

syntax highlighting recomputed at each view, as you propose
syntax highlighting computed the first time a language is requested, then remembered, as I proposed before

1 takes more CPU, 2 takes more storage. I personally prefer 2, as it costs less in terms of performance, and storage is cheap anyway (after all, this is just text).

daknob · 2016-10-12T17:57:47Z

I don't think it's easy to make this decision because it depends on our syntax highlighter. How much does it take to calculate the result for a specific size of input? How much CPU does it need? If it needs 5 seconds for a 1 kB file, then we have to store it. If it needs 2 ms and takes 0.1% of the CPU then I think we can safely calculate it on the fly.

Here's an attack for the storage option:
View every listed paste in every available language with a bot and cause m*n where m is the number of pastes and n is the number of available languages additional storage use.

j11e · 2016-10-16T18:50:27Z

A third option is to simply have a configuration variable allowing to switch between the two modes. After all, the difference is pretty small (save the highlighted paste or recompute it each time).

daknob added enhancement help wanted hacktoberfest labels Sep 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax Highlighting #13

Syntax Highlighting #13

daknob commented Sep 26, 2016

daknob commented Sep 28, 2016

daknob commented Sep 28, 2016

j11e commented Oct 10, 2016 •

edited

Loading

daknob commented Oct 12, 2016

daknob commented Oct 12, 2016

j11e commented Oct 12, 2016

daknob commented Oct 12, 2016

j11e commented Oct 16, 2016

Syntax Highlighting #13

Syntax Highlighting #13

Comments

daknob commented Sep 26, 2016

daknob commented Sep 28, 2016

daknob commented Sep 28, 2016

j11e commented Oct 10, 2016 • edited Loading

daknob commented Oct 12, 2016

daknob commented Oct 12, 2016

j11e commented Oct 12, 2016

daknob commented Oct 12, 2016

j11e commented Oct 16, 2016

j11e commented Oct 10, 2016 •

edited

Loading