Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce SanLang language - Lexer, Parser and Interpreter #4002

Merged
merged 14 commits into from
Dec 7, 2023

Conversation

IvanIvanoff
Copy link
Member

@IvanIvanoff IvanIvanoff commented Nov 28, 2023

Changes

Note: For syntax and examples check the san_lang_test.exs file.

The san_lang_parser.erl and san_lang_lexer.erl files are autogenerated and they should not be reviewed or included in the repository. The .xrl and .yrl files are used to generate them.

Overview

SanLang is a a small interpreted language that can execute one-liners like flat_map(map_keys(@projects), fn slug -> @projects[slug]["github_organizations"].

To improve the templating engine capabilities for Queries 2.0 we introduce SanLang -- an interpreted language inspired by the Elixir syntax.

We want to provide the ability for small code snippets (one-liners in most cases) that extract and manipulate some data.

For example, if a user wants to add a text widget with About the author information, the user doesn't have to hardcode the email/twitter/telegram/etc. links, but can use code like Email: {{@owner["email"}}, Twitter: {{@owner.twitter_handle}}

The identifiers starting with @ are provided as environment bindings by the backend and the user has access to it without doing anything else.

Why a language?

We want to allow the users to write code that is executed on the backend. To allow this, we need to be very careful in what and how we allow it to be executed.

Doing String.split/Regex.scan/etc. parsing won't suffice, or it will be much more complicated and hard to maintain and debug.
Allowing users to write Elixir code will force either to analyze all the code for un-safe operations System.cmd/http calls/etc. and it can be hard to verify that it is indeed safe.

Using a separate language like python/lua/etc. will require us to add this language compiler/interpreter as a dependency and support inter-language compatibility.

Executing Elixir in a safe environment (container/jail/etc.) will also induce complexity.

Considering all these precautions, developing a new small language does not sound so terrible.

Technologies used

The SanLang language has three main components: lexer, parser, and interpreter.

  • The lexer and parser how the input is tokenized and parsed -- validating the syntax and building an abstract syntax tree.
  • The lexer and parser are written declaratively in leex and yecc. These are the Erlang equivalent of lex and yacc tools for LALR(1) parsing.
  • The lexer and parser together are ~120 lines of code, which includes support for: named functions, env vars, local vars, lambda functions, chained access operator, arithmetic operations.
  • The interpreter is written in Elixir and translates the AST to elixir code and executes it.
  • The interpreter produces Elixir values as result, which makes it trivial to use the result in the backend without any transformations.

Language overview

The following are valid SanLang expressions:

  • Literals evaluate to themselves: 1, "string", 3.14;
  • Special boolean literals true and false;
  • Basic arithmetic with proper precedence: 1 + 2*3 + 10 evaluates to 17;
  • Named functions with literal arguments: pow(10,18), div(6,4) (for integer division);
  • Access to environment variables that are provided by the execution environment: @projects
  • Access operator, map function and lambda functions for working with this environment variables. See below for more examples.
  • Access operator that can be chained: @projects["santiment"]["main_contract_address"]["decimals"]
  • Comparisons operators: 1 == 1, 1 != 2, 1 > 2, 1 < 2, 1 >= 2, 1 <= 2;
  • Boolean operators and and or: true and false, true or false;
  • Proper precedence of boolean/comparison/arithmetic operators: 5 + 6 < 10, pow(2, 10) - 1 < 1024 and pow(2,10) + 1 > 1024.

Examples:

  • Get the list of all slugs from the @projects map:
    map_keys(@projects)
  • Get the token decimals for sentiment:
    `@projects["sentiment"]["main_contract_address"]["decimals"]
  • Get all github organizations of all projects in a list:
    flat_map(map_keys(@projects), fn slug -> @projects[slug]["github_organizations"] end)
  • Get the email address of the owner of the dashboard:
    @owner["email"]
  • filter(@data, fn x -> x > 1 and x < 10 end)
  • See san_lang_test.exs for more examples.

Ticket

Checklist:

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have tried to find clearer solution before commenting hard-to-understand parts of code
  • I have added tests that prove my fix is effective or that my feature works

@IvanIvanoff IvanIvanoff changed the title San lang Implement lexer, parser and interpreter for SanLang Nov 30, 2023
@IvanIvanoff IvanIvanoff changed the title Implement lexer, parser and interpreter for SanLang Introduce SanLang language - Lexer, Parser and Interpreter Dec 1, 2023
@tzanko-matev
Copy link
Contributor

Don't forget to add an Academy article about Sanlang

@IvanIvanoff IvanIvanoff requested a review from tspenov December 6, 2023 14:26
@IvanIvanoff IvanIvanoff marked this pull request as ready for review December 6, 2023 14:33
@IvanIvanoff IvanIvanoff merged commit 91a1c97 into master Dec 7, 2023
@delete-merged-branch delete-merged-branch bot deleted the san-lang branch December 7, 2023 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants