Add include_file statement #74

dlurton · 2021-05-21T20:47:30Z

Implements #57

Why this PR was created: PartiQL customers need the ability to make their own permuted version of the partiql_ast type domain. Since until now PIG only supported single-file type universes, the only way to do that is to copy and paste the partiql_ast into their own project and permute from there, but this is not a sustainable solution.

This is the implementation of the first phase of this task which adds the ability to "import" type universes into another using a new statement: (include_file <path-to-file>). The second phase of this feature will be to avoid duplication of the generated classes for the imported type domains. (Since the partiql_ast code will have been previously generated and will be part of the main PartiQL library.)

Imports are handled by the parser. Statements from the imported type universe are included in the current type universe being parsed as if they were "pasted" at the location of the import.
Recursive imports are supported.
Cycles are ignored--that is, if an include_file statement tries to include a file that was previously parsed or is currently being parsed, this does not cause an error--the include_file statement is simply ignored. Similar to #pragma once from modern C/C++ compilers.
Import cycles are detected using by using a set containing the "canonical" path to all either previously parsed or currently being parsed, this should prevent confusion when using relative paths.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

alancai98

Overall LGTM. Had a few questions/nits and may need some import error tests.

README.md

pig/src/org/partiql/pig/cmdline/Command.kt

pig/src/org/partiql/pig/domain/model/SemanticErrorContext.kt

pig/src/org/partiql/pig/domain/parser/ImportSourceOpener.kt

pig/src/org/partiql/pig/domain/parser/InputSource.kt

pig/src/org/partiql/pig/domain/parser/ParserErrorContext.kt

pig/test-domains/main.ion

pig/test/org/partiql/pig/domain/TypeDomainParserImportTests.kt

almann

Overall, I think you should really be explicit about the notion of what import is importing. It reads more like include with guards from a C/C++ context than what I would normally think of as import from a Java/Python perspective, where you are importing something logical.

I am okay with importing a "module" which as a collection of "type definitions", and having that map to some structure in file names like it does in its Python/Java counterparts. I am also okay with importing a single "type definition" by name which poses other problems (e.g. where is the definition). I am also okay with just saying this clubs a file into the current one, but I might spell that as include or the like to keep our options open for a logical import later.

What is the relationship between types imported in one universe versus another? I think the answer is no relationship, right?

README.md

pig/src/org/partiql/pig/cmdline/Command.kt

pig/src/org/partiql/pig/domain/parser/InputSource.kt

pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt

pig/src/org/partiql/pig/main.kt

pig/test/org/partiql/pig/domain/Util.kt

almann · 2021-05-24T20:53:00Z

pig/test/org/partiql/pig/util/ParseHelpers.kt

+/** A minimal faux file system backed by a Map<String, String>.  Used only for testing. */
+class StringSource(val sources: Map<String, String>) : InputSource {
+    override fun openStream(sourceName: String): InputStream {
+        val text: String = sources[sourceName] ?: throw FileNotFoundException("$sourceName does not exist")
+
+        return ByteArrayInputStream(text.toByteArray(Charsets.UTF_8))
+    }
+
+    override fun getCanonicalName(sourceName: String): String = sourceName
+}


This example, was the one that intuited to me that you had the beginnings of a Resolver style API, but a sub-optimal abstraction. The file names are not interesting, the type names are and I think the import should be based on that, otherwise you have something more like include that has this extra layer of "file" that the user shouldn't really care about.

Abstracting in this way does offer a challenge for PIG. What file gets parsed when I say (import Foo), but that is something worth considering. Otherwise I would add the concept of module which is a collection of type universes and (import foo) maps to this module which has a logical file name format if on a file system. This conceptual model is similar to Python modules and Java classes (except for the notion of one class per class file).

dlurton · 2021-05-24T21:31:38Z

Overall, I think you should really be explicit about the notion of what import is importing. It reads more like include with guards from a C/C++ context than what I would normally think of as import from a Java/Python perspective, where you are importing something logical.

I specifically chose to avoid the term include here because that it is already used as part of the domain permutation feature. The next logical one is import, IMHO, however I would probably be okay with include_file so as to avoid any ambiguity with the include of permute_domain.

I am okay with importing a "module" which as a collection of "type definitions", and having that map to some structure in file names like it does in its Python/Java counterparts. I am also okay with importing a single "type definition" by name which poses other problems (e.g. where is the definition). I am also okay with just saying this clubs a file into the current one, but I might spell that as include or the like to keep our options open for a logical import later.

I don't believe PIG yet needs modules or the ability to import a single named named type domain from another file. For that, I would like to take the "wait until it's needed" approach. The main goal here is to give the user the ability to permute a domain contained in another file, and I don't think something as complex as modules are needed for that.

What is the relationship between types imported in one universe versus another? I think the answer is no relationship, right?

No direct relationship. However, any two domains (imported or otherwise) with a declared transform (transform <from-type-domain> <to-type-domain>) will get a generated VisitorTransform with abstract functions for the types that are different. That VisitorTransform will contain references to both type domains.

almann · 2021-05-24T21:37:40Z

I am okay with importing a "module" which as a collection of "type definitions", and having that map to some structure in file names like it does in its Python/Java counterparts. I am also okay with importing a single "type definition" by name which poses other problems (e.g. where is the definition). I am also okay with just saying this clubs a file into the current one, but I might spell that as include or the like to keep our options open for a logical import later.

I don't believe PIG yet needs modules or the ability to import a single named named type domain from another file. For that, I would like to take the "wait until it's needed" approach. The main goal here is to give the user the ability to permute a domain contained in another file, and I don't think something as complex as modules are needed for that.

I am fine with reserving the import term for the logical one, because I do think that is probably the more interesting thing to do for PIG in the future. include_file or the like is much clearer to me about what is going on, literally macro expand the thing and parse it into the type universe in question, it eliminates the notion of the relationship of the definition with other type universes (which is an oxymoron anyhow), and just says copy it into our translation unit (in which a single translation unit defines a type universe).

abhikuhikar

My major concerns are the following:

I think the way imports are resolved is not correct especially with the cyclic imports. The current implementation in my opinion doesn't really handle cyclic dependency imports.
How are you handling the naming clashes (if any) from the multiple imports. (I may have missed this in the review in case this is handled).

abhikuhikar · 2021-05-24T22:29:11Z

README.md

-stmt ::=  <definition> | <transform>
+stmt ::=  <definition> | <transform> | <include>
+
+import ::= `(import <path-to-include)`


<path-to-import>?

path-to-file actually.

pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt

alancai98

Have some nits and a few minor suggestions. Also, if we're sticking with the term include rather than import, should we also change the usages in the PR title, description, branches, and/or issue?

README.md

pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt

pig/test/org/partiql/pig/domain/TypeDomainParserIncludeFileTests.kt

pig/test-domains/main.ion

pig/test/org/partiql/pig/domain/TypeDomainParserIncludeFileTests.kt

pig/test/org/partiql/pig/util/ParseHelpers.kt

dlurton · 2021-06-15T23:49:27Z

I think we can s/import/include/g most every place but I don't think it's necessary to do so in the branch names. Those are temporary.

almann

So we talked about a bunch of this offline so some of the comments below are a little out of date, but here are my top-level suggestions (and I am also okay if you cut issues to address them and follow up unless you disagree).

InputStreamSource really should encapsulate the resolution of domain files. By that I mean that it probably should model the API that resolves some name relative to some other name. Right now that logic is baked into the parser.

Overall, I am okay with relative import resolution, but I think we need absolute paths too, and I think the concept of root should be made explicit meaning that I should not be able to include a file arbitrarily in the path as that will not be portable (I don't think we ever want (include_file "/usr/share/foo/bar.ion") referring to the root file system for example). This will be super important for composing different sets of universe files together (e.g. I want to include partiql_ast.ion). Right now I cannot address partiql_ast.ion without some relative path that implies I need to copy it into my root to have a chance of referring to it in a sane way. It also makes referring to that file at different levels of nesting require a different import, which we should have the flexibility to avoid.

This root concept could further support multiple roots (which is similar to classpaths and -I in GCC), this could make it easier to support distributing Pig domains as resource files in JARs and composing them in a more flexible way (again, not suggesting you do that now, just suggesting that the concept will allow us to do so without convoluted extraction and layout-fu in build logic.

Furthermore, I think paths are logical, can never address outside of the root (or roots), and always use forward slash. This simplifies the syntax and avoids portability concerns with file paths in different operating systems. Unlike my comments below, I do think you've convinced me that relative paths make sense, but I think composing different roots will make logical absolute paths make more sense (e.g. (include_file "/partiql_ast.ion") referring to a top-level file addressable from one of the roots).

almann · 2021-06-16T21:32:53Z

README.md

+Paths are always relative to the directory containing the current file.  The working directory at the time `pig` 
+is executed is irrelevant.


This is an interesting choice and seems needlessly complex--shouldn't it be relative to some import path (which could contain the directory of the file in question)? This could lead to some really confusing include paths with nested folder structures...

almann · 2021-06-16T21:38:39Z

pig/src/org/partiql/pig/domain/parser/InputStreamSource.kt

+/**
+ * A simple abstraction for opening [InputStream], allows for type domains to be loaded from a file system or an
+ * in-memory data structure.  The latter is useful for unit testing.
+ */
+internal interface InputStreamSource {
+    /** Opens an input stream for the given filename name. */
+    fun openInputStream(fileName: String): InputStream
+}


Is the path/file name implementation defined? E.g. \ works on Windows, etc.?

Also, if this is relative to a given file, doesn't that imply we need a parameter here to model the "working directory?"

almann · 2021-06-16T21:40:58Z

pig/src/org/partiql/pig/domain/parser/InputStreamSource.kt

+internal val FILE_SYSTEM_INPUT_STREAM_SOURCE = object : InputStreamSource {
+    override fun openInputStream(fileName: String) = FileInputStream(fileName)
+}


This implementation defines the file name in terms of current working directory of the JVM, does it not? (versus the file being operated on as your docs would imply)

almann · 2021-06-16T21:50:27Z

pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt

        requireArityForTag(sexp, 1)
        val relativePath = sexp.tail.single().asString().textValue

-        val workingDirectory = File(this.qualifedSourceStack.peek()).parentFile
+        val workingDirectory = File(this.inputFilePathStack.peek()).parentFile
        val qualifiedSourcePath = File(workingDirectory, relativePath).canonicalPath


Yeah this seems like overkill to me--and interestingly, you are relying on the parser to resolve paths, which seems like it should be the responsibility of the input stream resolver.

…test better, etc.

pig/src/org/partiql/pig/domain/include/IncludeResolver.kt

README.md

pig/src/org/partiql/pig/cmdline/CommandLineParser.kt

pig/src/org/partiql/pig/domain/include/IncludeCycleHandler.kt

pig/src/org/partiql/pig/domain/include/IncludeResolver.kt

pig/src/org/partiql/pig/domain/parser/TypeDomainParser.kt

pig/src/org/partiql/pig/generator/custom/generator.kt

pig/test/org/partiql/pig/domain/TypeDomainParserErrorsTest.kt

pig/test/org/partiql/pig/domain/TypeDomainParserIncludeFileTests.kt

alancai98 · 2021-07-27T21:55:11Z

pig/test/org/partiql/pig/include/IncludeResolverTests.kt

+        assertEquals(tc.expectedConsdieredPaths, ex.consideredFilePaths)
+    }
+
+    // TODO: include tests for InvalidIncludePathException.


Is this still a TODO for this PR?

yes, I missed that. only one test was needed.

jpschorr · 2021-07-29T17:26:29Z

README.md

+Note that paths starting with `/` do not actually refer to the root of any file system, but instead are treated as 
+relative to the include directories.


The prefix / here is basically a sigil modifying the include.

There are two slightly distinct types of inclusion:

unsigiled includes

include the includee relative to the 1) includer or 2) "main" or 3) any -I path

these serve to allow to break a model into more manageable bits

sigiled includes:

include the includee relative to any -I path

these feel more like referencing an external model

I see from the previous discussions on this issue that a formal idea of an import has been deferred, but I wonder if at least we should consider a sigil that is not as overloaded as /.

abhikuhikar

Few nits, typos and comments. One major question/concern - How are we dealing with the duplicate domain names across multiple includee files?

README.md

pig/src/org/partiql/pig/cmdline/TargetLanguage.kt

abhikuhikar · 2021-07-29T20:23:27Z

README.md

+
+- The directory containing the includer (if the path to the includee does not start with `/`)
+- The directory containing the "main" type universe that was passed to `pig` on the command-line.
+- Any directories specified with the `--include` or `-I` arguments, in the order they were specified.


You may also want to add a point here explicitly specifying the behavior of / i.e. all other directories are searched except the parent directory.

I thought this was covered on line 428:

... (if the path to the includee does not start with /)

What can I do to make this clearer?

pig/src/org/partiql/pig/cmdline/CommandLineParser.kt

pig/src/org/partiql/pig/domain/parser/SourceLocation.kt

pig/test-domains/main.ion

pig/test/org/partiql/pig/cmdline/CommandLineParserTests.kt

pig/test/org/partiql/pig/domain/TypeDomainParserIncludeFileTests.kt

Co-authored-by: Alan Cai <[email protected]>

Co-authored-by: Alan Cai <[email protected]> Co-authored-by: Abhilash Kuhikar <[email protected]>

Co-authored-by: Alan Cai <[email protected]>

…ator into imports-phase-1

Co-authored-by: Alan Cai <[email protected]>

alancai98

Bringing back a comment from Abhilash's review and an offline team discussion. Rest of the changes since the last revision LGTM

alancai98 · 2021-08-04T00:31:59Z

pig/test/org/partiql/pig/domain/TypeDomainParserIncludeFileTests.kt

+    @Test
+    fun `include happy case`() {
+        val universe = parseWithTestRoots("test-domains/main.ion")
+        val allDomains = universe.statements.filterIsInstance<TypeDomain>()


This was brought up offline in a team discussion and in Abhilash's review. What happens in the case where an included domain has the same name as a previously defined domain?

For example, if universe_b.ion had defined domain_a again and not domain_b?

// in test-domains/root_b/dir_z/universe_b.ion // defining `domain_a` (which has already been defined in `universe_a.ion`) rather than `domain_b` (define domain_a (domain (product foo)))

Would this be an error or overridden? If it's an error, would this be thrown by the IncludeResolver? I think a test and/or a comment should be added detailing this behavior.

Currently when I do the change (i.e. make domain_b -> domain_a in universe_b.ion), no error is given for a duplicated domain name.

dlurton

Currently when I do the change (i.e. make domain_b -> domain_a in universe_b.ion), no error is given for a duplicated domain name.

FYI that's because the component detects that error (TypeDomainSemanticChecker) isn't exercised by the tests that load those domains.

The component is tested under that scenario here. Since TypeDomainSemanticChecker does not see any include_file statement and it appears to TypeDomainSemanticChecker as if all of the domains were all part of the same file, there was no need to modify it as part of this feature.

However I do agree there's a small possibility that some refactoring might occur in the future which might break that ability so I'll make sure to add a test to ensure this semantic check works across files.

pig/src/org/partiql/pig/domain/parser/SourceLocation.kt

pig/test/org/partiql/pig/cmdline/CommandLineParserTests.kt

pig/test/org/partiql/pig/domain/TypeDomainParserErrorsTest.kt

pig/test/org/partiql/pig/domain/TypeDomainParserIncludeFileTests.kt

dlurton · 2021-07-30T23:30:16Z

pig/test/org/partiql/pig/include/IncludeResolverTests.kt

+        assertEquals(tc.expectedConsdieredPaths, ex.consideredFilePaths)
+    }
+
+    // TODO: include tests for InvalidIncludePathException.


yes, I missed that. only one test was needed.

dlurton added 3 commits May 14, 2021 01:02

Add import statement (#57)

79ef1e9

Remove stale TODO

214aa8a

Fix a few build errors

e3f773e

dlurton requested review from alancai98, almann and abhikuhikar May 21, 2021 20:47

alancai98 requested changes May 24, 2021

View reviewed changes

almann requested changes May 24, 2021

View reviewed changes

abhikuhikar suggested changes May 24, 2021

View reviewed changes

dlurton added 4 commits June 1, 2021 15:37

import -> include_file

2a15e48

apply feedback wip

dd483e4

Apply feedback from Almann, Abhilash and Alan

497328e

More changes I should have included earlier.

650dedf

dlurton requested review from alancai98, abhikuhikar and almann June 14, 2021 16:51

alancai98 requested changes Jun 14, 2021

View reviewed changes

dlurton changed the title ~~Add import statement~~ Add import_file statement Jun 15, 2021

dlurton changed the title ~~Add import_file statement~~ Add include_file statement Jun 15, 2021

Apply Alan's feedback

f9d1e6e

dlurton mentioned this pull request Jun 15, 2021

Add include_file #57

Open

dlurton requested a review from alancai98 June 15, 2021 23:54

alancai98 approved these changes Jun 15, 2021

View reviewed changes

almann reviewed Jun 17, 2021

View reviewed changes

dlurton added 3 commits July 22, 2021 20:02

Almost completely rewrite this thing... separate concerns, block .., …

53af53c

…test better, etc.

Revise README.md

ed06f64

Revise README.md

8ce7cf6

Update README.md

da4fbee

dlurton commented Jul 27, 2021

View reviewed changes

pig/src/org/partiql/pig/domain/include/IncludeResolver.kt Outdated Show resolved Hide resolved

alancai98 reviewed Jul 27, 2021

View reviewed changes

jpschorr reviewed Jul 29, 2021

View reviewed changes

abhikuhikar reviewed Jul 29, 2021

View reviewed changes

dlurton and others added 7 commits July 30, 2021 16:08

Apply suggestions from code review

fe63a24

Co-authored-by: Alan Cai <[email protected]>

Apply suggestions from code review

5bd1855

Co-authored-by: Alan Cai <[email protected]> Co-authored-by: Abhilash Kuhikar <[email protected]>

Update pig/src/org/partiql/pig/domain/include/IncludeResolver.kt

393213a

Co-authored-by: Alan Cai <[email protected]>

Merge branch 'imports-phase-1' of github.com:partiql/partiql-ir-gener…

70631c3

…ator into imports-phase-1

rephrase kdoc

39471ee

Apply suggestions from code review

b9d959e

Co-authored-by: Alan Cai <[email protected]>

Apply CR feedback

4735259

dlurton requested review from almann, alancai98 and abhikuhikar July 30, 2021 23:34

alancai98 reviewed Aug 4, 2021

View reviewed changes

dlurton commented Aug 4, 2021

View reviewed changes

dlurton added 4 commits August 4, 2021 16:11

Add test for duplicate domains across multiple files

e306632

add test for when files of the same exist in multiple -I search roots.

94cdb51

Remove unused property

2bef7be

Disallow include paths starting with '/'

6b460e9

dlurton merged commit e9f6dbd into imports Aug 13, 2021

dlurton deleted the imports-phase-1 branch August 31, 2021 23:16

rchowell mentioned this pull request Feb 26, 2024

Define extended PIG IDL #153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add include_file statement #74

Add include_file statement #74

dlurton commented May 21, 2021 •

edited

Loading

alancai98 left a comment

almann left a comment

almann May 24, 2021

dlurton commented May 24, 2021

almann commented May 24, 2021

abhikuhikar left a comment

abhikuhikar May 24, 2021

dlurton Jun 13, 2021

alancai98 left a comment

dlurton commented Jun 15, 2021

almann left a comment

almann Jun 16, 2021

almann Jun 16, 2021

almann Jun 16, 2021

almann Jun 16, 2021

alancai98 Jul 27, 2021

dlurton Jul 30, 2021

jpschorr Jul 29, 2021

abhikuhikar left a comment

abhikuhikar Jul 29, 2021

dlurton Jul 30, 2021

alancai98 left a comment

alancai98 Aug 4, 2021 •

edited

Loading

alancai98 Aug 4, 2021

dlurton left a comment

dlurton Jul 30, 2021

		Paths are always relative to the directory containing the current file. The working directory at the time `pig`
		is executed is irrelevant.

		Note that paths starting with `/` do not actually refer to the root of any file system, but instead are treated as
		relative to the include directories.

Add include_file statement #74

Add include_file statement #74

Conversation

dlurton commented May 21, 2021 • edited Loading

alancai98 left a comment

Choose a reason for hiding this comment

almann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlurton commented May 24, 2021

almann commented May 24, 2021

abhikuhikar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alancai98 left a comment

Choose a reason for hiding this comment

dlurton commented Jun 15, 2021

almann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhikuhikar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alancai98 left a comment

Choose a reason for hiding this comment

alancai98 Aug 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlurton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlurton commented May 21, 2021 •

edited

Loading

alancai98 Aug 4, 2021 •

edited

Loading