-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
formatting the AST while preserving the source location information from the original query #1634
Comments
…implementation) this implements (a tiny portion of) apache#1634 pros: really useful when passing formatted queries to a real database, in order for database error message locations to match the original user's source locations cons: if we want to do it well, we need to track source locations better, and this adds a complexity to the Display imlementations
@lovasoa 👋 I'm interested in this |
I don't understand the usecase for preserving the original formatting after Specifically, to preserve the original SQL for error messages you could also keep the actual original SQL string somewhere and on error construct the error message using the orignal SQL (and the locations in that text that are now in Spans) I think you could take that approach today without any additional work. As you hint in this description, trying to preserve the original formatting would likely be a very large undertaking (and thus I suspect practically infeasible given the amount of engineering we have available) |
I think there is a misunderstanding. My use case is: the user enters a query. I need to parse it, potentially edit the ast, and then stringify it back before feeding it to the database. When the database returns an error, the error contains position information. This position information references the stringified AST we fed to the database, not the original string entered by the user. Currently, there is no way to map that position in the output of Display back to a position in the original string.
That's true. If I'm the only one with that need, then it's probably not worth it. But maybe I'm not ? And the work does not have to be exhaustive to be useful. I've already done it just for respecting newlines and indentation in Regarding resources: @freshtonic , if this is useful to you, do you think this is something cipherstash could dedicate a few engineering hours to ? |
I see the challenge -- you have to potentially edit the AST I wonder if you could do something (crazy) like
|
That could work, but it's not necessarily a lot less work than a Display implementation that respects the Span information, and the result would be wasteful and less generally useful to other users of sqlparser... |
I'm playing around with a different approach for editing parts of the AST that works more like tools like eslint work. The idea is that given accurate source spans, why don't we just edit the original query directly without round-tripping the whole AST? The steps would be: 1. parse query into AST. 2. Visit the AST, looking for nodes we want to replace. 3. Construct and render a new AST node for local replacement. 4. Use the old node's span() to replace the new text right in the source. This way we preserve all the other formatting outside of the node we're replacing, without any work on the AST displaying logic. Of course, this needs accurate and complete spans which we don't have yet, but sounds feasible to me. |
That would be great ! But with the current (incorrect) span information, we would end up with syntactically invalid sql. |
This seems like it is a reason to focus on improving the span accuracy (which is valuable for other reasons too) first before trying to also carry along the original tokens |
In the current sqlparser, parsing
and then fromatting it back to a string results in
Since sqlparser v0.53, we have the source location information inside the AST, which theoretically makes it possible to reconstruct the original query formatting (or at least getting very close to it).
I know this is a huge undertaking, but it would be really useful. For instance, in sqlpage, I would love being able to respect the initial user formatting of queries, in order for database errors that contain source position information to be meaningful and not confusing.
I'm opening this issue to start discussing this with the community (and especially with you @iffyio and @alamb 😉 ).
Am I the only one interested in this ? What would be the best approach to implementing this ?
The text was updated successfully, but these errors were encountered: