Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider replacing javaparser gRPC/subprocess with tree-sitter #318

Open
alexeagle opened this issue Jan 17, 2025 · 6 comments
Open

Consider replacing javaparser gRPC/subprocess with tree-sitter #318

alexeagle opened this issue Jan 17, 2025 · 6 comments

Comments

@alexeagle
Copy link
Collaborator

alexeagle commented Jan 17, 2025

rules_python made this change in bazelbuild/rules_python#1895
because tree-sitter is crazy fast, and it also relieves the need for the gazelle Go binary to spawn subprocesses that require other language runtimes, instead it's statically linked.

I'm seeing an issue just trying to run the tests in a clean clone of this repo:

INFO: From Testing //java/gazelle/testdata:grpc:
==================== Test output for //java/gazelle/testdata:grpc:

        files.go:192: expected gazelle stderr: 
            got: Error: LinkageError occurred while loading main class com.github.bazel_contrib.contrib_rules_jvm.javaparser.generators.Main
                java.lang.UnsupportedClassVersionError: com/github/bazel_contrib/contrib_rules_jvm/javaparser/generators/Main has been compiled by a more recent version of the Java Runtime (class file version 61.0), this version of the Java Runtime only recognizes class file versions up to 55.0
            {"level":"fatal","error":"failed to start / connect to javaparser server: failed to read port from javaparser server - maybe it crashed: timed out waiting for port file to be written by javaparser server","message":"could not start javaparser"}

which I think is because it's not hermetic in locating a JRE to execute this.

The Java/gRPC dependency also makes this repo a lot more complicated than it needs to be.

@jbedard
Copy link

jbedard commented Jan 17, 2025

This would make it significantly easier to compile the rules_jvm gazelle language into custom gazelle_binary or go_library.

Using tree-sitter via go-tree-sitter has been pretty simple in the gazelle extensions I've done, and I think other gazelle extensions are using tree-sitter as well. Compilation is essentially just a fetch of the language definition and a few lines of go. Then you can use the go-tree-sitter mod to parse and query+inspect the AST. The javascript gazelle language then inspects some root nodes as well as running a few queries.

I would definitely recommend it 👍

@illicitonion
Copy link
Collaborator

@stevebarrau I think you were pretty excited about this idea - any thoughts/context/experiments/desire?

If anyone's looking at picking this up, https://github.com/bazel-contrib/rules_jvm/blob/main/java/test/com/github/bazel_contrib/contrib_rules_jvm/javaparser/generators/ClasspathParserTest.java has a pretty thorough set of tests for what we currently parse out with Java code (which is currently only doing AST parsing of single files, nothing fancy like using a classpath, so should be relatively easy for any parser to do), and you should be able to test our a new implementation by replacing this line with a call to a Go implementation.

@gibfahn
Copy link
Collaborator

gibfahn commented Jan 18, 2025

Yeah we started with javaparser because we weren't sure how much we'd need to parse, and figured that was likely to be more accurate than a tree-sitter implementation. But if it passes the test suite it would be a nice improvement to have, I don't think anyone was a fan of the "gRPC → Java" bit.

@alexeagle
Copy link
Collaborator Author

Just FYI since I'm closing the tab, I started reading through some of the logic here
and used the tree-sitter playground to convert to a query

Image

This process is facilitated by gen AI which does a decent job of getting to a 75% working query.

@alexeagle
Copy link
Collaborator Author

@stevebarrau any chance you'd like to put in some effort here? Aspect is happy to help as well since we'd like to be able to statically link this into our CLI. Here's a proposed sequencing:

  1. Introduce the cgo dependency on tree-sitter and fix up the builds, update user docs so they're able to compile
  2. Archeology of the ClasspathParser to list out the queries that need to be made by the extension logic
  3. Convert ClasspathParserTest to test tree-sitter queries. Decide if we will incrementally have a mix of tree-sitter and gRPC, or swap over in a big bang
  4. Maybe some profiling/benchmarking against a big codebase to appreciate difference in runtime, memory high-water-mark
  5. Delete all the gRPC client/server code

@shs96c
Copy link
Collaborator

shs96c commented Jan 27, 2025

It's unlikely that @stevebarrau or myself will have time for this in the immediate future, but we'll happily review and land PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants