-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add include_file statement #74
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Had a few questions/nits and may need some import error tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think you should really be explicit about the notion of what import
is importing. It reads more like include
with guards from a C/C++ context than what I would normally think of as import
from a Java/Python perspective, where you are importing something logical.
I am okay with importing a "module" which as a collection of "type definitions", and having that map to some structure in file names like it does in its Python/Java counterparts. I am also okay with importing a single "type definition" by name which poses other problems (e.g. where is the definition). I am also okay with just saying this clubs a file into the current one, but I might spell that as include
or the like to keep our options open for a logical import
later.
What is the relationship between types imported in one universe versus another? I think the answer is no relationship, right?
/** A minimal faux file system backed by a Map<String, String>. Used only for testing. */ | ||
class StringSource(val sources: Map<String, String>) : InputSource { | ||
override fun openStream(sourceName: String): InputStream { | ||
val text: String = sources[sourceName] ?: throw FileNotFoundException("$sourceName does not exist") | ||
|
||
return ByteArrayInputStream(text.toByteArray(Charsets.UTF_8)) | ||
} | ||
|
||
override fun getCanonicalName(sourceName: String): String = sourceName | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example, was the one that intuited to me that you had the beginnings of a Resolver
style API, but a sub-optimal abstraction. The file names are not interesting, the type names are and I think the import
should be based on that, otherwise you have something more like include
that has this extra layer of "file" that the user shouldn't really care about.
Abstracting in this way does offer a challenge for PIG. What file gets parsed when I say (import Foo)
, but that is something worth considering. Otherwise I would add the concept of module which is a collection of type universes and (import foo)
maps to this module which has a logical file name format if on a file system. This conceptual model is similar to Python modules and Java classes (except for the notion of one class per class file).
I specifically chose to avoid the term
I don't believe PIG yet needs modules or the ability to import a single named named type domain from another file. For that, I would like to take the "wait until it's needed" approach. The main goal here is to give the user the ability to permute a domain contained in another file, and I don't think something as complex as modules are needed for that.
No direct relationship. However, any two domains (imported or otherwise) with a declared transform |
I am fine with reserving the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My major concerns are the following:
-
I think the way imports are resolved is not correct especially with the
cyclic imports
. The current implementation in my opinion doesn't really handle cyclic dependency imports. -
How are you handling the naming clashes (if any) from the multiple imports. (I may have missed this in the review in case this is handled).
README.md
Outdated
stmt ::= <definition> | <transform> | ||
stmt ::= <definition> | <transform> | <include> | ||
|
||
import ::= `(import <path-to-include)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<path-to-import>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path-to-file
actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have some nits and a few minor suggestions. Also, if we're sticking with the term include
rather than import
, should we also change the usages in the PR title, description, branches, and/or issue?
pig/test/org/partiql/pig/domain/TypeDomainParserIncludeFileTests.kt
Outdated
Show resolved
Hide resolved
pig/test/org/partiql/pig/domain/TypeDomainParserIncludeFileTests.kt
Outdated
Show resolved
Hide resolved
I think we can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we talked about a bunch of this offline so some of the comments below are a little out of date, but here are my top-level suggestions (and I am also okay if you cut issues to address them and follow up unless you disagree).
InputStreamSource
really should encapsulate the resolution of domain files. By that I mean that it probably should model the API that resolves some name relative to some other name. Right now that logic is baked into the parser.
Overall, I am okay with relative import resolution, but I think we need absolute paths too, and I think the concept of root should be made explicit meaning that I should not be able to include a file arbitrarily in the path as that will not be portable (I don't think we ever want (include_file "/usr/share/foo/bar.ion")
referring to the root file system for example). This will be super important for composing different sets of universe files together (e.g. I want to include partiql_ast.ion
). Right now I cannot address partiql_ast.ion
without some relative path that implies I need to copy it into my root to have a chance of referring to it in a sane way. It also makes referring to that file at different levels of nesting require a different import, which we should have the flexibility to avoid.
This root concept could further support multiple roots (which is similar to classpaths and -I
in GCC), this could make it easier to support distributing Pig domains as resource files in JARs and composing them in a more flexible way (again, not suggesting you do that now, just suggesting that the concept will allow us to do so without convoluted extraction and layout-fu in build logic.
Furthermore, I think paths are logical, can never address outside of the root (or roots), and always use forward slash. This simplifies the syntax and avoids portability concerns with file paths in different operating systems. Unlike my comments below, I do think you've convinced me that relative paths make sense, but I think composing different roots will make logical absolute paths make more sense (e.g. (include_file "/partiql_ast.ion")
referring to a top-level file addressable from one of the roots).
README.md
Outdated
Paths are always relative to the directory containing the current file. The working directory at the time `pig` | ||
is executed is irrelevant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting choice and seems needlessly complex--shouldn't it be relative to some import path (which could contain the directory of the file in question)? This could lead to some really confusing include paths with nested folder structures...
/** | ||
* A simple abstraction for opening [InputStream], allows for type domains to be loaded from a file system or an | ||
* in-memory data structure. The latter is useful for unit testing. | ||
*/ | ||
internal interface InputStreamSource { | ||
/** Opens an input stream for the given filename name. */ | ||
fun openInputStream(fileName: String): InputStream | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the path/file name implementation defined? E.g. \
works on Windows, etc.?
Also, if this is relative to a given file, doesn't that imply we need a parameter here to model the "working directory?"
internal val FILE_SYSTEM_INPUT_STREAM_SOURCE = object : InputStreamSource { | ||
override fun openInputStream(fileName: String) = FileInputStream(fileName) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation defines the file name in terms of current working directory of the JVM, does it not? (versus the file being operated on as your docs would imply)
requireArityForTag(sexp, 1) | ||
val relativePath = sexp.tail.single().asString().textValue | ||
|
||
val workingDirectory = File(this.qualifedSourceStack.peek()).parentFile | ||
val workingDirectory = File(this.inputFilePathStack.peek()).parentFile | ||
val qualifiedSourcePath = File(workingDirectory, relativePath).canonicalPath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this seems like overkill to me--and interestingly, you are relying on the parser to resolve paths, which seems like it should be the responsibility of the input stream resolver.
pig/test/org/partiql/pig/domain/TypeDomainParserIncludeFileTests.kt
Outdated
Show resolved
Hide resolved
assertEquals(tc.expectedConsdieredPaths, ex.consideredFilePaths) | ||
} | ||
|
||
// TODO: include tests for InvalidIncludePathException. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still a TODO for this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I missed that. only one test was needed.
Note that paths starting with `/` do not actually refer to the root of any file system, but instead are treated as | ||
relative to the include directories. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prefix /
here is basically a sigil modifying the include.
There are two slightly distinct types of inclusion:
- unsigiled includes
- include the includee relative to the 1) includer or 2) "main" or 3) any
-I
path - these serve to allow to break a model into more manageable bits
- include the includee relative to the 1) includer or 2) "main" or 3) any
- sigiled includes:
- include the includee relative to any
-I
path - these feel more like referencing an external model
- include the includee relative to any
I see from the previous discussions on this issue that a formal idea of an import
has been deferred, but I wonder if at least we should consider a sigil that is not as overloaded as /
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nits, typos and comments. One major question/concern - How are we dealing with the duplicate domain names across multiple includee files?
|
||
- The directory containing the includer (if the path to the includee does not start with `/`) | ||
- The directory containing the "main" type universe that was passed to `pig` on the command-line. | ||
- Any directories specified with the `--include` or `-I` arguments, in the order they were specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may also want to add a point here explicitly specifying the behavior of /
i.e. all other directories are searched except the parent directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was covered on line 428:
- ... (if the path to the includee does not start with
/
)
What can I do to make this clearer?
pig/test/org/partiql/pig/domain/TypeDomainParserIncludeFileTests.kt
Outdated
Show resolved
Hide resolved
Co-authored-by: Alan Cai <[email protected]>
Co-authored-by: Alan Cai <[email protected]> Co-authored-by: Abhilash Kuhikar <[email protected]>
Co-authored-by: Alan Cai <[email protected]>
…ator into imports-phase-1
Co-authored-by: Alan Cai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bringing back a comment from Abhilash's review and an offline team discussion. Rest of the changes since the last revision LGTM
@Test | ||
fun `include happy case`() { | ||
val universe = parseWithTestRoots("test-domains/main.ion") | ||
val allDomains = universe.statements.filterIsInstance<TypeDomain>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was brought up offline in a team discussion and in Abhilash's review. What happens in the case where an included domain has the same name as a previously defined domain?
For example, if universe_b.ion
had defined domain_a
again and not domain_b
?
// in test-domains/root_b/dir_z/universe_b.ion
// defining `domain_a` (which has already been defined in `universe_a.ion`) rather than `domain_b`
(define domain_a (domain (product foo)))
Would this be an error or overridden? If it's an error, would this be thrown by the IncludeResolver
? I think a test and/or a comment should be added detailing this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently when I do the change (i.e. make domain_b
-> domain_a
in universe_b.ion
), no error is given for a duplicated domain name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently when I do the change (i.e. make
domain_b
->domain_a
inuniverse_b.ion
), no error is given for a duplicated domain name.
FYI that's because the component detects that error (TypeDomainSemanticChecker
) isn't exercised by the tests that load those domains.
The component is tested under that scenario here. Since TypeDomainSemanticChecker
does not see any include_file
statement and it appears to TypeDomainSemanticChecker
as if all of the domains were all part of the same file, there was no need to modify it as part of this feature.
However I do agree there's a small possibility that some refactoring might occur in the future which might break that ability so I'll make sure to add a test to ensure this semantic check works across files.
pig/test/org/partiql/pig/domain/TypeDomainParserIncludeFileTests.kt
Outdated
Show resolved
Hide resolved
assertEquals(tc.expectedConsdieredPaths, ex.consideredFilePaths) | ||
} | ||
|
||
// TODO: include tests for InvalidIncludePathException. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I missed that. only one test was needed.
Implements #57
Why this PR was created: PartiQL customers need the ability to make their own permuted version of the
partiql_ast
type domain. Since until now PIG only supported single-file type universes, the only way to do that is to copy and paste thepartiql_ast
into their own project and permute from there, but this is not a sustainable solution.This is the implementation of the first phase of this task which adds the ability to "import" type universes into another using a new statement:
(include_file <path-to-file>)
. The second phase of this feature will be to avoid duplication of the generated classes for the imported type domains. (Since thepartiql_ast
code will have been previously generated and will be part of the main PartiQL library.)include_file
statement tries to include a file that was previously parsed or is currently being parsed, this does not cause an error--theinclude_file
statement is simply ignored. Similar to#pragma once
from modern C/C++ compilers.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.