Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Genero language syntax files with official files #6629

Closed
sebflaesch opened this issue Dec 7, 2023 · 35 comments
Closed

Replace Genero language syntax files with official files #6629

sebflaesch opened this issue Dec 7, 2023 · 35 comments

Comments

@sebflaesch
Copy link
Contributor

New official syntax language files provided by FourJs are available here:
https://github.com/FourjsGenero/GeneroFgl.tmbundle

We need to replace the existing files created by hand by Bryan Jackson (@alienriver49) in 2020.

Bryan, we discussed this today and I thank you again for your valued contribution.
Please just confirm here that you accept that we replace your original syntax files with ours.

Cheers!
Seb

@lildude
Copy link
Member

lildude commented Dec 7, 2023

Closing as a dupe of #6628

@lildude lildude closed this as completed Dec 7, 2023
@lildude
Copy link
Member

lildude commented Dec 7, 2023

Oh wait. You mean the grammar files. Re-opening, but note your repo is not accessible.

@lildude lildude reopened this Dec 7, 2023
@sebflaesch
Copy link
Contributor Author

@lildude #6629 and #6628 can be provided as different pull requests so please can we keep both open?

@sebflaesch
Copy link
Contributor Author

@lildude I have just started to work on these and doing github forking and pull requests, so please forgive my mistakes.
I will contact the author of https://github.com/FourjsGenero/GeneroFgl.tmbundle to make it public!

@alienriver49
Copy link
Contributor

I am good with my version of the grammars being replaced with the official grammars maintained by the author(s) of the language. Thank you again @sebflaesch!

@sebflaesch
Copy link
Contributor Author

https://github.com/FourjsGenero/GeneroFgl.tmbundle is now public.
We will work on a pull request.

@sebflaesch
Copy link
Contributor Author

sebflaesch commented Dec 8, 2023

I think I need some help on this:

The new TextMate files (https://github.com/FourjsGenero/GeneroFgl.tmbundle) we would like to be used by Linguist/Github are defining new language names and identifiers (for the two distinct file types .4gl and .per), that do not match the names and ids that are in place.

Consequently, I guess that a simple "--replace" option with script/add-grammar would not work.

In such case, what is the best practice? Keep in mind that the author of the current definition (@alienriver49) is one of our customers and is ok to replace all of the existing definitions by ours.

In fact, we have currently two "language" definitions:

  1. For .4gl files: "Genero" (that would change to "Genero 4gl")
  2. For .per files: "Genero Forms" (that would change to "Genero Forms (per)" or a better name like "Genero per")

It is possible to do a complete drop of the existing definitions, and add again the new languages definitions with the script/add-grammar command? I could not find the instructions to drop a definition, in https://github.com/github-linguist/linguist/blob/master/CONTRIBUTING.md.

Regarding samples (issue #6628), my understanding is that the directory names under TOP/samples must match the language names, correct? If yes, we should provide a single pull request for both #6628 and #6629. And we should certainly find a better name for the .per files, because non of the existing language names I can see use ( ) parentheses...

@sebflaesch
Copy link
Contributor Author

sebflaesch commented Dec 8, 2023

Seems that the support for Genero language was added in a single commit:

commit adc5319

So if this works it would not be a big deal to revert this commit, and add a new language definition for Genero...
Right?

@lildude
Copy link
Member

lildude commented Dec 8, 2023

The new TextMate files (https://github.com/FourjsGenero/GeneroFgl.tmbundle) we would like to be used by Linguist/Github are defining new language names and identifiers (for the two distinct file types .4gl and .per), that do not match the names and ids that are in place.

Consequently, I guess that a simple "--replace" option with script/add-grammar would not work.

It most definitely does work 😁 If you stick with the instructions for replacing a grammar in the CONTRIBUTING.md file as they're written you should be good to go. You shouldn't need to do anything that isn't documented in that section.

Use the path to the current vendor directory (not path) for the --replace argument.

So if you start with the "Genero" grammar, you'd run script/add-grammar --replace genero.tmbundle https://github.com/FourjsGenero/GeneroFgl.tmbundle.

You won't need to do anything for the "Genero Forms" grammar as the script will detect the same grammar is now used for both languages and update the relevant files.

As for dropping the current definitions: don't. You can certainly rename them, but don't change the ID. This is stored and heavily cached in a lot of locations outside of the main GitHub database and is not an easy task to change or correct. Many third parties rely on this ID too. As such, once an ID is set, it is never changed and a language is never removed.

As for the language names: I'd advise against using parentheses (it should work for GitHub, but we can't guarantee elsewhere) and use the names users are most likely to know and use. For most people this is the commerical name of the language.

@sebflaesch
Copy link
Contributor Author

sebflaesch commented Dec 8, 2023

@lildude:

By ID do you mean the "language_id" field in lib/linguist/languages.yml, that is produced by the script/update-ids?

For now, in my local fork, I could manage to revert the original commit, setup the ruby/docker env and add from scratch the new languages "Genero 4gl" (instead of "Genero") and "Genero per" (instead of "Genero Forms"), with the regular script/add-grammar process, and created new IDs with script/update-ids ...

Would it be an issue, if I reuse the IDs ("language_id") that where generated by the first commit, but using the new names "Genero 4gl" / "Genero per" of my new adding?

This would save me some more additional work, as the only change I would have to do is:

diff --git a/lib/linguist/languages.yml b/lib/linguist/languages.yml
index bc63995a..61885902 100644
--- a/lib/linguist/languages.yml
+++ b/lib/linguist/languages.yml
@@ -2278,7 +2278,7 @@ Genero 4gl:
   - ".4gl"
   tm_scope: source.genero-4gl
   ace_mode: text
-  language_id: 241301092
+  language_id: 986054050
 Genero per:
   type: programming
   color: "#d8df39"
@@ -2286,7 +2286,7 @@ Genero per:
   - ".per"
   tm_scope: source.genero-per
   ace_mode: text
-  language_id: 828945790
+  language_id: 902995658
 Genie:
   type: programming
   ace_mode: text

Seb

@lildude
Copy link
Member

lildude commented Dec 8, 2023

By ID do you mean the "language_id" field in lib/linguist/languages.yml, that is produced by the script/update-ids?

Yes.

For now, in my local fork, I could manage to revert the original commit, setup the ruby/docker env and add from scratch the new languages "Genero 4gl" (instead of "Genero") and "Genero per" (instead of "Genero Forms"), with the regular script/add-grammar process, and created new IDs with script/update-ids ...

There's no need to revert the original commit; it's really not necessary. Follow the steps for replacing the grammar, commit those changes, rename the entries as you wish, commit those changes, rename the samples directories to match the new language names, commit your changes, update the scope lines if the grammars use different scope names, push.

Several files will be changed but the only changes in the languages.yml file will be the two name lines and the scope lines if different scope name are used in the grammar.

I will reject the PR if you change the IDs as I said before: these must not change.

@sebflaesch
Copy link
Contributor Author

Ok no worries: I clearly understood that the language IDs must NOT change.
I was wondering if I could just reuse what I have prepared, just by using the existing IDs.
But I will try to use the "replace" method and the renaming as you suggest.

@sebflaesch
Copy link
Contributor Author

sebflaesch commented Dec 8, 2023

There's no need to revert the original commit; it's really not necessary. Follow the steps for replacing the grammar, ...

So I followed: https://github.com/github-linguist/linguist/blob/master/CONTRIBUTING.md#changing-the-source-of-a-syntax-highlighting-grammar:

$ script/add-grammar --replace Genero https://github.com/sebflaesch/GeneroFgl.tmbundle
Checking Docker is installed and running
add-grammar: Submodule '' does not exist. Aborting.

or (using the new name):

$ script/add-grammar --replace "Genero 4gl" https://github.com/sebflaesch/GeneroFgl.tmbundle
Checking Docker is installed and running
add-grammar: Submodule '' does not exist. Aborting.

What's next?

I thought I should patch the tm_scope identifiers, to match the files of the new TM repo, but that did not help (I really wonder how the "replace" process can guess what is to be replaced if these are different!)

sf@toro (issue-6629):~/genero/devel/github/linguist/linguist-genero$ git diff
diff --git a/grammars.yml b/grammars.yml
index 82904434..dd5fdb9e 100644
--- a/grammars.yml
+++ b/grammars.yml
@@ -390,8 +390,8 @@ vendor/grammars/gemfile-lock-tmlanguage:
 vendor/grammars/gemini-vscode:
 - source.gemini
 vendor/grammars/genero.tmbundle:
-- source.genero
-- source.genero-forms
+- source.genero-4gl
+- source.genero-per
 vendor/grammars/gettext.tmbundle:
 - source.po
 vendor/grammars/gnuplot-tmbundle:
diff --git a/lib/linguist/languages.yml b/lib/linguist/languages.yml
index 1becf7ad..15e77ac0 100644
--- a/lib/linguist/languages.yml
+++ b/lib/linguist/languages.yml
@@ -2276,7 +2276,7 @@ Genero:
   color: "#63408e"
   extensions:
   - ".4gl"
-  tm_scope: source.genero
+  tm_scope: source.genero-4gl
   ace_mode: text
   language_id: 986054050
 Genero Forms:
@@ -2284,7 +2284,7 @@ Genero Forms:
   color: "#d8df39"
   extensions:
   - ".per"
-  tm_scope: source.genero-forms
+  tm_scope: source.genero-per
   ace_mode: text
   language_id: 902995658
 Genie:

@lildude
Copy link
Member

lildude commented Dec 8, 2023

I gave you the exact command to run earlier 😉

This won’t update languages.yml but will take care of everything else.

@lildude
Copy link
Member

lildude commented Dec 8, 2023

Start the PR and we can continue this there as it’s best suited in the PR.

@sebflaesch
Copy link
Contributor Author

sebflaesch commented Dec 8, 2023

Oups ok Colin, using

$ script/add-grammar --replace genero.tmbundle ...

worked.

Before doing any PR, I can share the testing fork I am working on, if you like:
https://github.com/sebflaesch/linguist-genero/tree/issue-6629
That's just for testing/preparing the PR.

I need also to sync with the author of https://github.com/FourjsGenero/GeneroFgl.tmbundle
This is the real repo to be eventually used, and the names are not yet correct, so I did also a fork:
https://github.com/sebflaesch/GeneroFgl.tmbundle

Thanks for your assistance!

@sebflaesch
Copy link
Contributor Author

@lildude : I have worked on changes, but this is only for validation! please do not apply!

Would you mind check the changes I have prepared and tell me if something is missing?
Commits are visible in my testing fork:
master...sebflaesch:linguist-genero:issue-6629

Note that there is NO license file (our TM repo has "The Unlicense") - is this an issue? Should it be an MIT license?

@lildude
Copy link
Member

lildude commented Dec 9, 2023

Diff looks good from a quick glance.

Note that there is NO license file (our TM repo has "The Unlicense") - is this an issue? Should it be an MIT license?

We need a license file, even for the The Unlicense. The text from https://choosealicense.com/licenses/unlicense/ in a LICENSE file will do the trick.

@sebflaesch
Copy link
Contributor Author

Sorry I was not clear: We have that "unlicense" text:
https://github.com/FourjsGenero/GeneroFgl.tmbundle/blob/master/LICENSE.txt
PR will come after some renaming in
https://github.com/FourjsGenero/GeneroFgl.tmbundle/tree/master.
Thanks a lot @lildude !

NailCareBAE referenced this issue Dec 10, 2023
* chore: add Glimmer JS to languages

* chore: add vsc-ember-syntax grammar

* chore: add Glimmer JS samples

* chore: run script/update-ids

* fix: run script/sort-submodules

* chore: pull updated vsc-ember-syntax submodule

* chore: run script/add-grammar --replace vsc-ember-syntax https://github.com/lifeart/vsc-ember-syntax

* fix: run script/sort-submodules
@sebflaesch
Copy link
Contributor Author

sebflaesch commented Dec 11, 2023

@lildude : How can I check that my local fork of linguist detects the new "Genero 4gl" and "Genero per" languages replacing the existing languages "Genero" and "Genero Forms"?

When using the github-linguist command, I still see the old languages... What I am missing here?

sf@toro (issue-6629):~/genero/devel/github/linguist/linguist-genero$ bin/github-linguist "samples/Genero 4gl/books1.4gl" 
samples/Genero 4gl/books1.4gl: 183 lines (140 sloc)
  type:      Text
  mime type: text/plain
  language:  Genero
  
sf@toro (issue-6629):~/genero/devel/github/linguist/linguist-genero$ bin/github-linguist "samples/Genero per/books1.per" 
samples/Genero per/books1.per: 38 lines (34 sloc)
  type:      Text
  mime type: text/plain
  language:  Genero Forms

Is this because I am on a branch (issue-6629) of my repo that is a fork of linguist?

sf@toro (issue-6629):~/genero/devel/github/linguist/linguist-genero$ git status
On branch issue-6629
Your branch is up to date with 'origin/issue-6629'.

I have also tried git-linguist command:

sf@toro (issue-6629):~/genero/devel/github/linguist/linguist-genero$ bin/git-linguist --commit=84b3fe0c8584ca64102caf1c621982b7820f358f -f breakdown
{"Dockerfile":[".devcontainer/Dockerfile","Dockerfile","tools/grammars/Dockerfile"],"Shell":[".devcontainer/onCreateCommand.sh","script/add-grammar","script/bootstrap","script/build-grammars-tarball","script/cibuild","script/grammar-compiler","tools/grammars/docker/build"],"Ruby":["Brewfile","Gemfile","Rakefile","bin/git-linguist","bin/github-linguist","ext/linguist/extconf.rb","github-linguist.gemspec","lib/linguist.rb","lib/linguist/blob.rb","lib/linguist/blob_helper.rb","lib/linguist/classifier.rb","lib/linguist/file_blob.rb","lib/linguist/generated.rb","lib/linguist/grammars.rb","lib/linguist/heuristics.rb","lib/linguist/language.rb","lib/linguist/lazy_blob.rb","lib/linguist/repository.rb","lib/linguist/samples.rb","lib/linguist/sha256.rb","lib/linguist/shebang.rb","lib/linguist/strategy/extension.rb","lib/linguist/strategy/filename.rb","lib/linguist/strategy/manpage.rb","lib/linguist/strategy/modeline.rb","lib/linguist/strategy/xml.rb","lib/linguist/tokenizer.rb","lib/linguist/version.rb","script/cross-validation","script/fast-submodule-update","script/list-grammars","script/normalise-url","script/sort-submodules","script/update-ids","test/helper.rb","test/test_blob.rb","test/test_classifier.rb","test/test_file_blob.rb","test/test_generated.rb","test/test_grammars.rb","test/test_heuristics.rb","test/test_instrumentation.rb","test/test_language.rb","test/test_pedantic.rb","test/test_repository.rb","test/test_samples.rb","test/test_sha256.rb","test/test_strategies.rb","test/test_tokenizer.rb"],"C":["ext/linguist/lex.linguist_yy.c","ext/linguist/lex.linguist_yy.h","ext/linguist/linguist.c"],"Lex":["ext/linguist/tokenizer.l"],"Go":["tools/grammars/cmd/grammar-compiler/main.go","tools/grammars/compiler/converter.go","tools/grammars/compiler/cson.go","tools/grammars/compiler/data.go","tools/grammars/compiler/errors.go","tools/grammars/compiler/loader.go","tools/grammars/compiler/loader_fs.go","tools/grammars/compiler/loader_url.go","tools/grammars/compiler/pcre.go","tools/grammars/compiler/pcre_test.go","tools/grammars/compiler/proto.go","tools/grammars/compiler/walker.go","tools/grammars/pcre/pcre.go"]}

@lildude
Copy link
Member

lildude commented Dec 11, 2023

@sebflaesch It looks like you've not updated the names in the languages.yml file.

@sebflaesch
Copy link
Contributor Author

@lildude quite confusing...
Was the script/add-grammar --replace genero.tmbundle ... command not supposed to do that?
Do I have to update the lib/linguist/languages.yml file before/after executing that command?

@lildude
Copy link
Member

lildude commented Dec 11, 2023

Was the script/add-grammar --replace genero.tmbundle ... command not supposed to do that?

Nope. That script only changes the grammar submodule and related vendor/README.md and cached license files.

That's why I said:

Follow the steps for replacing the grammar, commit those changes, rename the entries as you wish, commit those changes, rename the samples directories to match the new language names, commit your changes, update the scope lines if the grammars use different scope names, push.

Do I have to update the lib/linguist/languages.yml file before/after executing that command?

It doesn't matter. It only needs to be done once.

@sebflaesch
Copy link
Contributor Author

sebflaesch commented Dec 11, 2023

OK I did the renaming in languages.yml, I have pushed it to my fortk/branch and the change is visible here:
313bef7

Now when I run the github-linguist command, how can I see my "Genero 4gl" and "Genero per" languages?

sf@toro (issue-6629):~/genero/devel/github/linguist/linguist-genero$ ./bin/github-linguist 
67.14%  280830     Ruby
23.11%  96639      C
6.60%   27621      Go
1.60%   6695       Lex
1.25%   5216       Shell
0.30%   1258       Dockerfile

My guess is that it shows only the top most matching languages, but obviously, in this linguist repo, there are very few .4gl and .per sources so it would be something like 0.01% or less...

I have also tried:

sf@toro (issue-6629):~/genero/devel/github/linguist/linguist-genero$ bin/git-linguist stats -c 8f3d8d86e9fb30c23ede2c5fc83c2925870ac8d6
{"Dockerfile":1258,"Shell":5216,"Ruby":280830,"C":96639,"Lex":6695,"Go":27621}

I have executed the tests:

sf@toro (issue-6629):~/genero/devel/github/linguist/linguist-genero$ bundle exec rake test
...
  7) Error:
TestRepository#test_linguist_override_generated?:
Rugged::OdbError: object not found - no match for id (01d6b9c637a7a6581fe456c600725b68f355b295)
    /home/sf/genero/devel/github/linguist/linguist-genero/lib/linguist/repository.rb:128:in `lookup'
    /home/sf/genero/devel/github/linguist/linguist-genero/lib/linguist/repository.rb:128:in `current_tree'
    /home/sf/genero/devel/github/linguist/linguist-genero/lib/linguist/repository.rb:123:in `read_index'
    /home/sf/genero/devel/github/linguist/linguist-genero/test/test_repository.rb:137:in `test_linguist_override_generated?'

1727 runs, 35032 assertions, 0 failures, 7 errors, 1 skips

@sebflaesch
Copy link
Contributor Author

"Genero 4gl" / "Genero per" does not show up when running:

bundle exec script/cross-validation --test
...
GLSL/recurse1.frag GOOD
GLSL/recurse1.fs GOOD
Genie/Class.gs GOOD
Genie/Hello.gs GOOD
Genie/IDataLoader.gs GOOD
Genie/web.gs BAD (JavaScript)
Gerber Image/LIDARLite.ncl GOOD
Gnuplot/defense_plotter.plt GOOD
...

@lildude
Copy link
Member

lildude commented Dec 11, 2023

Please open a PR. This discussion should be happening in the PR, not the issue. It's also a lot easier to see and also gives us constant confirmation of behaviour via the tests.

@sebflaesch
Copy link
Contributor Author

@lildude here is the PR: #6632
Sorry for all the noise, I am new at this and wanted to make sure that the PR is good enough.
Thanks a lot for you help!

@lildude
Copy link
Member

lildude commented Dec 11, 2023

Now when I run the github-linguist command, how can I see my "Genero 4gl" and "Genero per" languages?

This is because Linguist does not count the samples directory and that's where all your files are in the linguist repo:

# Samples folders
- ^[Ss]amples?/

I have executed the tests:

You need to fetch the test/attributes branch. This is fetched if you use the devcontainer or Codespace.

@lildude
Copy link
Member

lildude commented Dec 11, 2023

Sorry for all the noise, I am new at this and wanted to make sure that the PR is good enough.

No probs. Noisy PRs are fine as this is where all the dev work is taking place. Context switching between issue and code makes this a lot harder to track.

@lildude
Copy link
Member

lildude commented Dec 14, 2023

Changes have shipped. Closing.

@lildude lildude closed this as completed Dec 14, 2023
@sebflaesch
Copy link
Contributor Author

@lildude : Seems that something is not working...
I see "Other" language instead of "Genero 4gl" or "Genero per", in Genero repositories such as:
https://github.com/FourjsGenero/ex_checkbox_tree
Should I open another issue?

@lildude
Copy link
Member

lildude commented Dec 14, 2023

Should I open another issue?

No need. This is because the language has been renamed but the cache not invalidated. We don't invalidate the cache when making changes via Linguist as this would be incredibly resource intensive given the number of repositories involved. Pushing a new change will force a re-analysis and things should reflect the new name.

You can see the result of this in my fork (I'll delete this later) in which I edited the README to force a reanalysis:

New languages

@sebflaesch
Copy link
Contributor Author

ok got it! thanks a lot!

@sebflaesch
Copy link
Contributor Author

Any trick to force a Linguist re-analysis of sources without pushing changes into a given repo?

@lildude
Copy link
Member

lildude commented Dec 14, 2023

Any trick to force a Linguist re-analysis of sources without pushing changes into a given repo?

Nope. This is by design as per the same reasoning for not invalidating the caches.

@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants