From 41be40b80a4bf94452a765e72751a5f2081f6a1f Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 14 Nov 2024 17:54:54 +0100 Subject: [PATCH 01/20] chore: draft of mql-query spec --- .../src/docs/md/mql-query/mql-query.md | 122 ++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md diff --git a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md new file mode 100644 index 00000000..be5161a4 --- /dev/null +++ b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md @@ -0,0 +1,122 @@ +# MQL Query +----------- + +## Abstract + +This specification documents the structure of a MongoDB Query from a mixed perspective of both +the original source code and the target server that might run the query. It is primarily aimed +to provide developers of dialects and linters a common and flexible structure for code processing. + +## META + +The keywords "MUST", "MUST NOT", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" +and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). + +## Specification + +A MongoDB Query (**query** from now on), is a single execution unit, written in any of the supported dialects, +that MAY be consumed by a valid MongoDB Cluster. A query SHOULD contain all the semantics specific to the source dialect +so it can be tailored back to the original source code. A query MAY be unsupported by a specific target MongoDB Cluster. + +### Query validation and support + +A query MAY be valid for a target MongoDB Cluster if the MongoDB Cluster can consume the query once it +is translated to a consumable dialect by the target Cluster. However, a query MAY be unsupported by the +cluster if it doesn't have the capabilities to fulfill the query request. + +For example, let's consider the following query, in pseudocode: + +```java +collection.aggregate(AtlasSearch(text().eq("baby one more time"))) +``` + +| Cluster | Is Valid | Is Supported | +|--------------------------------|------------|----------------| +| MongoDB Community 7.0 | ✅ | 🔴 | +| MongoDB Enterprise 8.0 | ✅ | 🔴 | +| MongoDB Atlas 8.0 w/o Search | ✅ | 🔴 | +| MongoDB Atlas 8.0 with Search | ✅ | ✅ | + +For the purpose of this specification and project, we will only allow the `mongosh` dialect as a +consumable dialect for a MongoDB Cluster. + +### Query equivalence + +We will consider two queries equivalent, independently of the query structure, if the following conditions +apply: + +* They MUST be **valid** by the same set of target clusters. +* They MUST be **supported** by the same set of target clusters. +* They MUST return the same subset of results for the same input data set. +* They MAY be sourced from the same dialect. +* They MAY lead to equivalent **execution plans** for the same target cluster. + +We will consider two execution plans equivalent if the cluster query planner lead to the same list +of operations. + +Let's consider a different use case. For the following two queries in the `Java Driver` dialect: + +```java +collection.find(eq("bookName", myBookName)) +collection.aggregate(matches(eq("bookName", myBookName))) +``` + +We will test two different target clusters: + +* MongoDB Community 8.0, from a development environment, does not have an index on bookName for the target collection. +* MongoDB Atlas 8.0, production environment, does have an index on bookName for the target collection. + +In the development environment, we copy the data from the production environment once every week. For this example, +we will consider that the data sets are exactly the same on both clusters. + +| Cluster Environment | Is Valid | Is Supported | Same Results | Same Dialect | Same Execution Plan | +|---------------------------------|-------------|----------------|----------------|----------------|-----------------------| +| MongoDB Development | ✅ | ✅ | ✅ | ✅ | 🔴 | +| MongoDB Production | ✅ | ✅ | ✅ | ✅ | 🔴 | + +**✅ They are equivalent.** + +And now, finally, let's assume the same environment, but with the query written in two dialects, +`mongosh` and `Java Driver`: + +```java +collection.find(eq("bookName", myBookName)) +``` +```js +collection.find({"bookName": myBookName}) +``` + +| Cluster Environment | Is Valid | Is Supported | Same Results | Same Dialect | Same Execution Plan | +|---------------------------------|-------------|----------------|----------------|----------------|-----------------------| +| MongoDB Development | ✅ | ✅ | ✅ | 🔴 | 🔴 | +| MongoDB Production | ✅ | ✅ | ✅ | 🔴 | 🔴 | + +**✅ They are equivalent.** + + +### Query Nodes + +A [MQL Node or a Node for short](/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/Node.kt) +represents a set of semantic properties of a MongoDB query or a subset of it. Nodes MUST NOT be specific to +a single source dialect, but MAY contain semantics that are relevant for processing. + +Nodes MUST contain a single reference to the original source code, in any of the valid dialects. Multiple +nodes MAY contain a reference to the same original source code. That reference is called the **source** +of the Node. For example, let's consider this query, written in the **Java Driver dialect** and how it is referenced by a Node. + +```java + collection.find(eq("_id", 123456)).first(); +// ^ ^ +// +----------------------------------------+ +// Node(source) +``` + +A Node MAY contain parent nodes and children nodes, through specific **components**. A Node that +doesn't contain any parent node, but contains children nodes is called the **root** node, and +represents the whole query. + +Nodes MAY have additional components that contain metadata for that node. Components MAY have +references to other Nodes and other components. + +Nodes with components MAY build a tree like structure, resembling an Abstract Syntax Tree. Nodes MUST +NOT refer to themselves either directly or through one of it's childrens, avoiding circular references. From 86ead3e9e189b498b907b9defd93c622f9520b75 Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 14 Nov 2024 18:06:21 +0100 Subject: [PATCH 02/20] chore: add serialization --- .../src/docs/md/mql-query/mql-query.md | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md index be5161a4..09a816a7 100644 --- a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md +++ b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md @@ -120,3 +120,26 @@ references to other Nodes and other components. Nodes with components MAY build a tree like structure, resembling an Abstract Syntax Tree. Nodes MUST NOT refer to themselves either directly or through one of it's childrens, avoiding circular references. + +### MQL Serialization + +A query MUST be serializable to readable text. The serialization format is independent of the +dialects used for parsing it. A serialized query SHOULD look like this: + +```kt +Node( + source=collection.find(eq("_id", 123456)).first(), + components=[ + // list of components + ] +) +``` + +The serialization format MAY ignore printing the source of the query, but MUST print all the components +attached to each of the nodes of the query. In that case, a short form on the syntax MAY be used: + +```kt +Node([ + // list of components +]) +``` From 23dc67f581d7e909b353e4f4c16e257755230140 Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 14 Nov 2024 18:26:10 +0100 Subject: [PATCH 03/20] Update packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md Co-authored-by: Anna Henningsen --- packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md index 09a816a7..f352e030 100644 --- a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md +++ b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md @@ -119,7 +119,7 @@ Nodes MAY have additional components that contain metadata for that node. Compon references to other Nodes and other components. Nodes with components MAY build a tree like structure, resembling an Abstract Syntax Tree. Nodes MUST -NOT refer to themselves either directly or through one of it's childrens, avoiding circular references. +NOT refer to themselves either directly or through one of it's children, avoiding circular references. ### MQL Serialization From 2f535e1460003fa62155d6822823ea04f586f31d Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 14 Nov 2024 18:29:24 +0100 Subject: [PATCH 04/20] chore: specify that components are a sorted list --- .../mongodb-mql-model/src/docs/md/mql-query/mql-query.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md index f352e030..3fcfdd70 100644 --- a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md +++ b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md @@ -115,8 +115,9 @@ A Node MAY contain parent nodes and children nodes, through specific **component doesn't contain any parent node, but contains children nodes is called the **root** node, and represents the whole query. -Nodes MAY have additional components that contain metadata for that node. Components MAY have -references to other Nodes and other components. +All components in a node MUST be stored in a sorted list. The sorting criteria is left to the specific +node and the combination of components. Nodes MAY have additional components that contain metadata for +that node. Components MAY have references to other Nodes and other components. Nodes with components MAY build a tree like structure, resembling an Abstract Syntax Tree. Nodes MUST NOT refer to themselves either directly or through one of it's children, avoiding circular references. From d02dc8a954ced7e069f6ca2ac1c129a2b122767d Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 14 Nov 2024 18:30:13 +0100 Subject: [PATCH 05/20] chore: specify that a component can be more than once in a node --- packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md index 3fcfdd70..00951b88 100644 --- a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md +++ b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md @@ -117,7 +117,8 @@ represents the whole query. All components in a node MUST be stored in a sorted list. The sorting criteria is left to the specific node and the combination of components. Nodes MAY have additional components that contain metadata for -that node. Components MAY have references to other Nodes and other components. +that node. Components MAY have references to other Nodes and other components. Components in a node MAY +not be unique: the same component MAY be found in the same node more than once. Nodes with components MAY build a tree like structure, resembling an Abstract Syntax Tree. Nodes MUST NOT refer to themselves either directly or through one of it's children, avoiding circular references. From 50f04e7845d907e171fd7ac5cb54b9f0ed2f9342 Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 14 Nov 2024 18:35:18 +0100 Subject: [PATCH 06/20] chore: fix typo --- packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md index 00951b88..7876d20b 100644 --- a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md +++ b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md @@ -121,7 +121,7 @@ that node. Components MAY have references to other Nodes and other components. C not be unique: the same component MAY be found in the same node more than once. Nodes with components MAY build a tree like structure, resembling an Abstract Syntax Tree. Nodes MUST -NOT refer to themselves either directly or through one of it's children, avoiding circular references. +NOT refer to themselves either directly or through one of its children, avoiding circular references. ### MQL Serialization From f6f6fda0ba507e08f5a5e8dc61868a29daf26c47 Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 14 Nov 2024 18:44:07 +0100 Subject: [PATCH 07/20] chore: typo --- packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md index 7876d20b..c0ec7b3e 100644 --- a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md +++ b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md @@ -16,7 +16,7 @@ and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119] A MongoDB Query (**query** from now on), is a single execution unit, written in any of the supported dialects, that MAY be consumed by a valid MongoDB Cluster. A query SHOULD contain all the semantics specific to the source dialect -so it can be tailored back to the original source code. A query MAY be unsupported by a specific target MongoDB Cluster. +so it can be tailed back to the original source code. A query MAY be unsupported by a specific target MongoDB Cluster. ### Query validation and support From 655314f92ab431085b5bc2fdd0f452dcc530ad1f Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 14 Nov 2024 18:58:55 +0100 Subject: [PATCH 08/20] chore: add the query to the table --- .../src/docs/md/mql-query/mql-query.md | 29 ++++++++++--------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md index c0ec7b3e..641bd0f4 100644 --- a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md +++ b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md @@ -69,12 +69,14 @@ We will test two different target clusters: In the development environment, we copy the data from the production environment once every week. For this example, we will consider that the data sets are exactly the same on both clusters. -| Cluster Environment | Is Valid | Is Supported | Same Results | Same Dialect | Same Execution Plan | -|---------------------------------|-------------|----------------|----------------|----------------|-----------------------| -| MongoDB Development | ✅ | ✅ | ✅ | ✅ | 🔴 | -| MongoDB Production | ✅ | ✅ | ✅ | ✅ | 🔴 | +| Query | Cluster Environment | Is Valid | Is Supported | Results | Dialect | Execution Plan | +|:------------------------------------------------------------|---------------------------------|:-----------:|:--------------:|:------------:|----------------------|---------------------------| +| `collection.find(eq("bookName", myBookName))` | Development | ✅ | ✅ | N | Java Driver | COLLSCAN | +| `collection.find(eq("bookName", myBookName))` | Production | ✅ | ✅ | N | Java Driver | IXSCAN | +| `collection.aggregate(matches(eq("bookName", myBookName)))` | Development | ✅ | ✅ | N | Java Driver | COLLSCAN | +| `collection.aggregate(matches(eq("bookName", myBookName)))` | Production | ✅ | ✅ | N | Java Driver | IXSCAN | -**✅ They are equivalent.** +**✅ They are equivalent because they are valid, supported and return the same result set in all clusters.** And now, finally, let's assume the same environment, but with the query written in two dialects, `mongosh` and `Java Driver`: @@ -86,12 +88,14 @@ collection.find(eq("bookName", myBookName)) collection.find({"bookName": myBookName}) ``` -| Cluster Environment | Is Valid | Is Supported | Same Results | Same Dialect | Same Execution Plan | -|---------------------------------|-------------|----------------|----------------|----------------|-----------------------| -| MongoDB Development | ✅ | ✅ | ✅ | 🔴 | 🔴 | -| MongoDB Production | ✅ | ✅ | ✅ | 🔴 | 🔴 | +| Query | Cluster Environment | Is Valid | Is Supported | Results | Dialect | Execution Plan | +|:----------------------------------------------|---------------------------------|:-----------:|:--------------:|:------------:|----------------------|---------------------------| +| `collection.find(eq("bookName", myBookName))` | Development | ✅ | ✅ | N | Java Driver | COLLSCAN | +| `collection.find(eq("bookName", myBookName))` | Production | ✅ | ✅ | N | Java Driver | IXSCAN | +| `collection.find({"bookName": myBookName})` | Development | ✅ | ✅ | N | mongosh | COLLSCAN | +| `collection.find({"bookName": myBookName})` | Production | ✅ | ✅ | N | mongosh | IXSCAN | -**✅ They are equivalent.** +**✅ They are equivalent because they are valid, supported and return the same result set in all clusters even if the dialect is different.** ### Query Nodes @@ -115,9 +119,8 @@ A Node MAY contain parent nodes and children nodes, through specific **component doesn't contain any parent node, but contains children nodes is called the **root** node, and represents the whole query. -All components in a node MUST be stored in a sorted list. The sorting criteria is left to the specific -node and the combination of components. Nodes MAY have additional components that contain metadata for -that node. Components MAY have references to other Nodes and other components. Components in a node MAY +Components MUST be stored in an ordered list inside a Node. Nodes MAY have additional components that contain metadata for +that Node. Components MAY have references to other Nodes and other components. Components in a node MAY not be unique: the same component MAY be found in the same node more than once. Nodes with components MAY build a tree like structure, resembling an Abstract Syntax Tree. Nodes MUST From 7d7f45fda60a64b8308d99e32ba69b5b735cbd0d Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Tue, 19 Nov 2024 18:25:51 +0100 Subject: [PATCH 09/20] chore: initial bson-type spec --- .../src/docs/md/bson-type/bson-type.md | 86 +++++++++++++++++++ 1 file changed, 86 insertions(+) create mode 100644 packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md diff --git a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md new file mode 100644 index 00000000..6e846fab --- /dev/null +++ b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md @@ -0,0 +1,86 @@ +# MQL BSON Type +----------- + +## Abstract + +This specification documents the different kinds of BSON types and how they are related to the +original source code of a [MQL Query](../mql-query/mql-query.md). This document aims to provide +information about the behaviour of dialects and linters on the computation of the original +expression BSON type. + +## META + +The keywords "MUST", "MUST NOT", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" +and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). + +## Specification + +[BSON](https://bsonspec.org/spec.html) is a binary format that is used to communicate between the +MongoDB Client (through a driver) and a MongoDB Cluster. MQL BSON (from now on we will just say BSON) +is a superset of the original BSON types. + +A BSON Type represents the data type inferred from the original source code or from a MongoDB sample +of documents. A BSON Type MUST be consumable by a MongoDB Cluster and it's serialization MUST be +BSON 1.1 compliant. + +### Primitive BSON Types + +#### [BsonString](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L58) + +A BsonString defines a sequence of characters independently of its locale or encoding. A BsonString MUST be +encodable to a UTF-8 String. + +#### [BsonBoolean](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L63) + +A BsonBoolean represents a disjoint true or false values. The actual internal encoding is left to the +original BSON 1.1 specification. + +#### [BsonDate](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L68) + +A BsonDate represents a date and a time, serializable to a UNIX timestamp. This specific type MAY be +represented differently in some dialects. + +In any Java-based dialects, a BsonDate can be represented as: + +* [java.util.Date](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/util/Date.html) +* [java.time.Instant](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/time/Instant.html) +* [java.time.LocalDate](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/time/LocalDate.html) +* [java.time.LocalDateTime](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/time/LocalDateTime.html) + +#### [BsonObjectId](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L73) + +A BsonObjectId represents a 12 bytes unique identifier for an object. + +#### [BsonInt32](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L79) + +A signed integer of 32 bits precision. In Java it's mapped to an `int` type. + +#### [BsonInt64](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L89) + +A signed integer of 64 bits precision. In Java it's mapped to both `long` and `BigInteger`. + +#### [BsonDouble](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L94) + +A 64bit floating point number. In Java it's mapped to both float and double. + +#### [BsonDecimal128](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#104) + +A 128bit floating point number. In Java it's mapped to BigDecimal. + +#### [BsonNull](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L110) + +Represents the absence of a value. + +#### [BsonAny](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L123) + +Represents any possible type. Essentially, all type is a subtype of BsonAny. + +#### [BsonAnyOf](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L132) + +Represents an intersection of types. For example, BsonAnyOf([BsonString, BsonInt32]). + +#### [BsonObject](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L149) + +Represents the shape of a BSON document. + +#### [BsonArray](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L171) From 04c22dd576f9ca45991e7e72e34d7e96c04d086d Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 21 Nov 2024 13:41:25 +0100 Subject: [PATCH 10/20] chore: type assignability table --- .../src/docs/md/bson-type/bson-type.md | 35 +++++++++++++++++++ .../src/docs/md/mql-query/mql-query.md | 32 ++++++++--------- 2 files changed, 51 insertions(+), 16 deletions(-) diff --git a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md index 6e846fab..40b50118 100644 --- a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md +++ b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md @@ -84,3 +84,38 @@ Represents an intersection of types. For example, BsonAnyOf([BsonString, BsonInt Represents the shape of a BSON document. #### [BsonArray](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L171) + +### Type Assignability + +Assignable types MUST not change the semantics of a query when they are swapped. Let's say that +we have a query $Q$, and two variants, $Q_A$ and $Q_B$, where $Q_A$ and $Q_B$ differ on the specified type +in either a field or a value reference. + +We will say that $A$ is assignable to $B$ if $Q_A$ and $Q_B$ are +[equivalent queries](/main/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md#query-equivalence). + +Type assignability **MAY not be commutative**. + +#### Assignability table + +| ⬇️ can be assigned to ➡️ | BsonString | BsonBoolean | BsonDate | BsonObjectId | BsonInt32 | BsonInt64 | BsonDouble | BsonDecimal128 | BsonNull | BsonAny | BsonAnyOf | BsonObject | BsonArray | +|--------------------------|:----------:|:-----------:|:--------:|:------------:|:---------:|:---------:|:----------:|:--------------:|:--------:|:-------:|:---------:|:----------:|:---------:| +| BsonString | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | +| BsonBoolean | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | +| BsonDate | 🔴 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | +| BsonObjectId | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | +| BsonInt32 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | +| BsonInt64 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | +| BsonDouble | 🔴 | 🔴 | 🔴 | 🔴 | 🟠$^2$ | 🟠$^2$ | 🟢 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | +| BsonDecimal128 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | +| BsonNull | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | +| BsonAny | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | +| BsonAnyOf | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟢 | 🟠$^1$ | 🟠$^1$ | 🟠$^4$ | +| BsonObject | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🟠$^3$ | 🟠$^4$ | +| BsonArray | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^5$ | + +* 🟠$^1$: $A$ is assignable to $BsonAnyOf(B)$ only if $A$ is assignable to $B$. +* 🟠$^2$: It's assignable but there might be a significant loss of precision. +* 🟠$^3$: $BsonObject A$ is assignable to $B$ if $A$ is a subset of $B$. +* 🟠$^4$: $A$ is assignable to $BsonArray(B)$ only if $A$ is assignable to $B$. +* 🟠$^5$: $BsonArray(A)$ is assignable to $BsonArray(B)$ only if $A$ is assignable to $B$. diff --git a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md index 641bd0f4..0ecdd4c1 100644 --- a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md +++ b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md @@ -32,10 +32,10 @@ collection.aggregate(AtlasSearch(text().eq("baby one more time"))) | Cluster | Is Valid | Is Supported | |--------------------------------|------------|----------------| -| MongoDB Community 7.0 | ✅ | 🔴 | -| MongoDB Enterprise 8.0 | ✅ | 🔴 | -| MongoDB Atlas 8.0 w/o Search | ✅ | 🔴 | -| MongoDB Atlas 8.0 with Search | ✅ | ✅ | +| MongoDB Community 7.0 | 🟢 | 🔴 | +| MongoDB Enterprise 8.0 | 🟢 | 🔴 | +| MongoDB Atlas 8.0 w/o Search | 🟢 | 🔴 | +| MongoDB Atlas 8.0 with Search | 🟢 | 🟢 | For the purpose of this specification and project, we will only allow the `mongosh` dialect as a consumable dialect for a MongoDB Cluster. @@ -71,12 +71,12 @@ we will consider that the data sets are exactly the same on both clusters. | Query | Cluster Environment | Is Valid | Is Supported | Results | Dialect | Execution Plan | |:------------------------------------------------------------|---------------------------------|:-----------:|:--------------:|:------------:|----------------------|---------------------------| -| `collection.find(eq("bookName", myBookName))` | Development | ✅ | ✅ | N | Java Driver | COLLSCAN | -| `collection.find(eq("bookName", myBookName))` | Production | ✅ | ✅ | N | Java Driver | IXSCAN | -| `collection.aggregate(matches(eq("bookName", myBookName)))` | Development | ✅ | ✅ | N | Java Driver | COLLSCAN | -| `collection.aggregate(matches(eq("bookName", myBookName)))` | Production | ✅ | ✅ | N | Java Driver | IXSCAN | +| `collection.find(eq("bookName", myBookName))` | Development | 🟢 | 🟢 | N | Java Driver | COLLSCAN | +| `collection.find(eq("bookName", myBookName))` | Production | 🟢 | 🟢 | N | Java Driver | IXSCAN | +| `collection.aggregate(matches(eq("bookName", myBookName)))` | Development | 🟢 | 🟢 | N | Java Driver | COLLSCAN | +| `collection.aggregate(matches(eq("bookName", myBookName)))` | Production | 🟢 | 🟢 | N | Java Driver | IXSCAN | -**✅ They are equivalent because they are valid, supported and return the same result set in all clusters.** +**🟢 They are equivalent because they are valid, supported and return the same result set in all clusters.** And now, finally, let's assume the same environment, but with the query written in two dialects, `mongosh` and `Java Driver`: @@ -88,14 +88,14 @@ collection.find(eq("bookName", myBookName)) collection.find({"bookName": myBookName}) ``` -| Query | Cluster Environment | Is Valid | Is Supported | Results | Dialect | Execution Plan | -|:----------------------------------------------|---------------------------------|:-----------:|:--------------:|:------------:|----------------------|---------------------------| -| `collection.find(eq("bookName", myBookName))` | Development | ✅ | ✅ | N | Java Driver | COLLSCAN | -| `collection.find(eq("bookName", myBookName))` | Production | ✅ | ✅ | N | Java Driver | IXSCAN | -| `collection.find({"bookName": myBookName})` | Development | ✅ | ✅ | N | mongosh | COLLSCAN | -| `collection.find({"bookName": myBookName})` | Production | ✅ | ✅ | N | mongosh | IXSCAN | +| Query | Cluster Environment | Is Valid | Is Supported | Results | Dialect | Execution Plan | +|:----------------------------------------------|---------------------------------|:------------:|:--------------:|:------------:|----------------------|---------------------------| +| `collection.find(eq("bookName", myBookName))` | Development | 🟢 | 🟢 | N | Java Driver | COLLSCAN | +| `collection.find(eq("bookName", myBookName))` | Production | 🟢 | 🟢 | N | Java Driver | IXSCAN | +| `collection.find({"bookName": myBookName})` | Development | 🟢 | 🟢 | N | mongosh | COLLSCAN | +| `collection.find({"bookName": myBookName})` | Production | 🟢 | 🟢 | N | mongosh | IXSCAN | -**✅ They are equivalent because they are valid, supported and return the same result set in all clusters even if the dialect is different.** +**🟢 They are equivalent because they are valid, supported and return the same result set in all clusters even if the dialect is different.** ### Query Nodes From b9c4b7fdad065cd9a3c44591622ed92d2fa05760 Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 21 Nov 2024 13:44:52 +0100 Subject: [PATCH 11/20] chore: small typos --- .../mongodb-mql-model/src/docs/md/bson-type/bson-type.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md index 40b50118..f34a3d50 100644 --- a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md +++ b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md @@ -4,7 +4,7 @@ ## Abstract This specification documents the different kinds of BSON types and how they are related to the -original source code of a [MQL Query](../mql-query/mql-query.md). This document aims to provide +original source code of an [MQL Query](../mql-query/mql-query.md). This document aims to provide information about the behaviour of dialects and linters on the computation of the original expression BSON type. @@ -91,10 +91,10 @@ Assignable types MUST not change the semantics of a query when they are swapped. we have a query $Q$, and two variants, $Q_A$ and $Q_B$, where $Q_A$ and $Q_B$ differ on the specified type in either a field or a value reference. -We will say that $A$ is assignable to $B$ if $Q_A$ and $Q_B$ are +We will say that type $A$ is assignable to type $B$ if $Q_A$ and $Q_B$ are [equivalent queries](/main/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md#query-equivalence). -Type assignability **MAY not be commutative**. +Type assignability MAY NOT be commutative. #### Assignability table From 922d0ebf766587956893046c03fbf5bbe039fd5b Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 21 Nov 2024 14:00:27 +0100 Subject: [PATCH 12/20] chore: java type mapping table --- .../src/docs/md/bson-type/bson-type.md | 30 +++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md index f34a3d50..2a368ab5 100644 --- a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md +++ b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md @@ -119,3 +119,33 @@ Type assignability MAY NOT be commutative. * 🟠$^3$: $BsonObject A$ is assignable to $B$ if $A$ is a subset of $B$. * 🟠$^4$: $A$ is assignable to $BsonArray(B)$ only if $A$ is assignable to $B$. * 🟠$^5$: $BsonArray(A)$ is assignable to $BsonArray(B)$ only if $A$ is assignable to $B$. + +### Type mapping + +#### Java + +| Java Type | Bson Type | +|:--------------|:------------------------------------| +| null | BsonNull | +| float | BsonDouble | +| Float | BsonAnyOf(BsonNull, BsonDouble) | +| double | BsonDouble | +| Double | BsonAnyOf(BsonNull, BsonDouble) | +| BigDecimal | BsonAnyOf(BsonNull, BsonDecimal128) | +| boolean | BsonBoolean | +| short | BsonInt32 | +| Short | BsonAnyOf(BsonNull, BsonInt32) | +| int | BsonInt32 | +| Integer | BsonAnyOf(BsonNull, BsonInt32) | +| BigInteger | BsonAnyOf(BsonNull, BsonInt64) | +| long | BsonInt64 | +| Long | BsonAnyOf(BsonNull, BsonInt64) | +| CharSequence | BsonAnyOf(BsonNull, BsonString) | +| String | BsonAnyOf(BsonNull, BsonString) | +| Date | BsonAnyOf(BsonNull, BsonDate) | +| Instant | BsonAnyOf(BsonNull, BsonDate) | +| LocalDate | BsonAnyOf(BsonNull, BsonDate) | +| LocalDateTime | BsonAnyOf(BsonNull, BsonDate) | +| Collection | BsonAnyOf(BsonNull, BsonArray(T)) | +| Map | BsonAnyOf(BsonNull, BsonObject) | +| Object | BsonAnyOf(BsonNull, BsonObject) | From 4beefa3c3947b6966c6b837baa555d865ef4cb77 Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 21 Nov 2024 14:17:47 +0100 Subject: [PATCH 13/20] chore: fixing typo Co-authored-by: Anna Henningsen --- packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md index 2a368ab5..9cb75e55 100644 --- a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md +++ b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md @@ -20,7 +20,7 @@ MongoDB Client (through a driver) and a MongoDB Cluster. MQL BSON (from now on w is a superset of the original BSON types. A BSON Type represents the data type inferred from the original source code or from a MongoDB sample -of documents. A BSON Type MUST be consumable by a MongoDB Cluster and it's serialization MUST be +of documents. A BSON Type MUST be consumable by a MongoDB Cluster and its serialization MUST be BSON 1.1 compliant. ### Primitive BSON Types From aa6c360584b2769a263e94e4b3366a98846e571d Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 21 Nov 2024 14:21:38 +0100 Subject: [PATCH 14/20] Update packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md Co-authored-by: Anna Henningsen --- packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md index 9cb75e55..7551c39b 100644 --- a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md +++ b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md @@ -77,7 +77,7 @@ Represents any possible type. Essentially, all type is a subtype of BsonAny. #### [BsonAnyOf](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L132) -Represents an intersection of types. For example, BsonAnyOf([BsonString, BsonInt32]). +Represents an union of types. For example, BsonAnyOf([BsonString, BsonInt32]). #### [BsonObject](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L149) From b17b90c28b1028fb635c6c469f5ee9f369caefbf Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 21 Nov 2024 14:37:50 +0100 Subject: [PATCH 15/20] chore: simplify BsonString explanation and drop links to code --- .../src/docs/md/bson-type/bson-type.md | 29 +++++++++---------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md index 7551c39b..7d48b950 100644 --- a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md +++ b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md @@ -25,17 +25,16 @@ BSON 1.1 compliant. ### Primitive BSON Types -#### [BsonString](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L58) +#### BsonString -A BsonString defines a sequence of characters independently of its locale or encoding. A BsonString MUST be -encodable to a UTF-8 String. +A BsonString is a sequence of Unicode characters. -#### [BsonBoolean](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L63) +#### BsonBoolean A BsonBoolean represents a disjoint true or false values. The actual internal encoding is left to the original BSON 1.1 specification. -#### [BsonDate](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L68) +#### BsonDate A BsonDate represents a date and a time, serializable to a UNIX timestamp. This specific type MAY be represented differently in some dialects. @@ -47,43 +46,43 @@ In any Java-based dialects, a BsonDate can be represented as: * [java.time.LocalDate](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/time/LocalDate.html) * [java.time.LocalDateTime](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/time/LocalDateTime.html) -#### [BsonObjectId](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L73) +#### BsonObjectId A BsonObjectId represents a 12 bytes unique identifier for an object. -#### [BsonInt32](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L79) +#### BsonInt32 A signed integer of 32 bits precision. In Java it's mapped to an `int` type. -#### [BsonInt64](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L89) +#### BsonInt64 A signed integer of 64 bits precision. In Java it's mapped to both `long` and `BigInteger`. -#### [BsonDouble](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L94) +#### BsonDouble A 64bit floating point number. In Java it's mapped to both float and double. -#### [BsonDecimal128](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#104) +#### BsonDecimal128 A 128bit floating point number. In Java it's mapped to BigDecimal. -#### [BsonNull](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L110) +#### BsonNull Represents the absence of a value. -#### [BsonAny](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L123) +#### BsonAny Represents any possible type. Essentially, all type is a subtype of BsonAny. -#### [BsonAnyOf](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L132) +#### BsonAnyOf Represents an union of types. For example, BsonAnyOf([BsonString, BsonInt32]). -#### [BsonObject](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L149) +#### BsonObject Represents the shape of a BSON document. -#### [BsonArray](/main/packages/mongodb-mql-model/src/main/kotlin/com/mongodb/jbplugin/mql/BsonType.kt#L171) +#### BsonArray ### Type Assignability From 42f7a90a31a6c0ae6a052b3f1b550f6977f5b54b Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 21 Nov 2024 15:15:07 +0100 Subject: [PATCH 16/20] chore: add an example for the superset --- packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md index 7d48b950..54ebde75 100644 --- a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md +++ b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md @@ -17,7 +17,8 @@ and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119] [BSON](https://bsonspec.org/spec.html) is a binary format that is used to communicate between the MongoDB Client (through a driver) and a MongoDB Cluster. MQL BSON (from now on we will just say BSON) -is a superset of the original BSON types. +is a superset of the original BSON types. For example some semantics, like BsonAnyOf, are not part +of the original BSON. A BSON Type represents the data type inferred from the original source code or from a MongoDB sample of documents. A BSON Type MUST be consumable by a MongoDB Cluster and its serialization MUST be From af3c6b1443bd7c3a23ee2b76eef2711ac2ff53ab Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Fri, 22 Nov 2024 17:46:02 +0100 Subject: [PATCH 17/20] chore: fix typos --- .../mongodb-mql-model/src/docs/md/bson-type/bson-type.md | 4 +++- .../mongodb-mql-model/src/docs/md/mql-query/mql-query.md | 6 +++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md index 54ebde75..b57f7c02 100644 --- a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md +++ b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md @@ -73,7 +73,7 @@ Represents the absence of a value. #### BsonAny -Represents any possible type. Essentially, all type is a subtype of BsonAny. +Represents any possible type. Essentially, every type is a subtype of BsonAny. #### BsonAnyOf @@ -85,6 +85,8 @@ Represents the shape of a BSON document. #### BsonArray +Represents a list of elements of a single type. For example: [ 1, 2, 3 ] is a BsonArray. + ### Type Assignability Assignable types MUST not change the semantics of a query when they are swapped. Let's say that diff --git a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md index 0ecdd4c1..f0bef966 100644 --- a/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md +++ b/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md @@ -51,7 +51,7 @@ apply: * They MAY be sourced from the same dialect. * They MAY lead to equivalent **execution plans** for the same target cluster. -We will consider two execution plans equivalent if the cluster query planner lead to the same list +We will consider two execution plans equivalent if the cluster query planner leads to the same list of operations. Let's consider a different use case. For the following two queries in the `Java Driver` dialect: @@ -115,8 +115,8 @@ of the Node. For example, let's consider this query, written in the **Java Drive // Node(source) ``` -A Node MAY contain parent nodes and children nodes, through specific **components**. A Node that -doesn't contain any parent node, but contains children nodes is called the **root** node, and +A Node MAY contain parent nodes and child nodes, through specific **components**. A Node that +doesn't contain any parent node, but contains child nodes is called the **root** node, and represents the whole query. Components MUST be stored in an ordered list inside a Node. Nodes MAY have additional components that contain metadata for From 3f0ac521ced2ca2da5bf3968f1ec5f1eb6a59657 Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 28 Nov 2024 13:29:14 +0100 Subject: [PATCH 18/20] chore: definition of the computed types --- .../src/docs/md/bson-type/bson-type.md | 41 ++++++++++++------- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md index b57f7c02..7faa57b2 100644 --- a/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md +++ b/packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md @@ -87,6 +87,15 @@ Represents the shape of a BSON document. Represents a list of elements of a single type. For example: [ 1, 2, 3 ] is a BsonArray. +#### ComputedBsonType + +A ComputedBsonType is a type that represents an expression that happens outside the boundaries +of the user. The typical use case is for expressions defined as MQL expressions (like $expr) that +will run on a valid MongoDB Cluster. + +They contain a `baseType` that is the inferred type of the result of computing the expression. In +case the `baseType` can not be inferred, it MUST be BsonAny. + ### Type Assignability Assignable types MUST not change the semantics of a query when they are swapped. Let's say that @@ -100,27 +109,29 @@ Type assignability MAY NOT be commutative. #### Assignability table -| ⬇️ can be assigned to ➡️ | BsonString | BsonBoolean | BsonDate | BsonObjectId | BsonInt32 | BsonInt64 | BsonDouble | BsonDecimal128 | BsonNull | BsonAny | BsonAnyOf | BsonObject | BsonArray | -|--------------------------|:----------:|:-----------:|:--------:|:------------:|:---------:|:---------:|:----------:|:--------------:|:--------:|:-------:|:---------:|:----------:|:---------:| -| BsonString | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | -| BsonBoolean | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | -| BsonDate | 🔴 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | -| BsonObjectId | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | -| BsonInt32 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | -| BsonInt64 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | -| BsonDouble | 🔴 | 🔴 | 🔴 | 🔴 | 🟠$^2$ | 🟠$^2$ | 🟢 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | -| BsonDecimal128 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | -| BsonNull | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | -| BsonAny | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | -| BsonAnyOf | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟢 | 🟠$^1$ | 🟠$^1$ | 🟠$^4$ | -| BsonObject | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🟠$^3$ | 🟠$^4$ | -| BsonArray | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^5$ | +| ⬇️ can be assigned to ➡️ | BsonString | BsonBoolean | BsonDate | BsonObjectId | BsonInt32 | BsonInt64 | BsonDouble | BsonDecimal128 | BsonNull | BsonAny | BsonAnyOf | BsonObject | BsonArray | ComputedBsonType | +|--------------------------|:----------:|:-----------:|:--------:|:------------:|:---------:|:---------:|:----------:|:--------------:|:--------:|:-------:|:---------:|:----------:|:---------:|:-----------------| +| BsonString | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ | +| BsonBoolean | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ | +| BsonDate | 🔴 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ | +| BsonObjectId | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ | +| BsonInt32 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ | +| BsonInt64 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ | +| BsonDouble | 🔴 | 🔴 | 🔴 | 🔴 | 🟠$^2$ | 🟠$^2$ | 🟢 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ | +| BsonDecimal128 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ | +| BsonNull | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ | +| BsonAny | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ | +| BsonAnyOf | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟢 | 🟠$^1$ | 🟠$^1$ | 🟠$^4$ | 🟠$^6$ | +| BsonObject | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🟠$^3$ | 🟠$^4$ | 🟠$^6$ | +| BsonArray | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^5$ | 🟠$^6$ | +| ComputedBsonType | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | * 🟠$^1$: $A$ is assignable to $BsonAnyOf(B)$ only if $A$ is assignable to $B$. * 🟠$^2$: It's assignable but there might be a significant loss of precision. * 🟠$^3$: $BsonObject A$ is assignable to $B$ if $A$ is a subset of $B$. * 🟠$^4$: $A$ is assignable to $BsonArray(B)$ only if $A$ is assignable to $B$. * 🟠$^5$: $BsonArray(A)$ is assignable to $BsonArray(B)$ only if $A$ is assignable to $B$. +* 🟠$^6$: $A$ is assignable to $ComputedBsonType(BaseType)$ only if $A$ is assignable to $BaseType$. ### Type mapping From 39cb56947f2c9d3f8af413727f24d755dec62cf3 Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 28 Nov 2024 15:10:12 +0100 Subject: [PATCH 19/20] chore: components --- .../docs/md/mql-component/mql-component.md | 111 ++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 packages/mongodb-mql-model/src/docs/md/mql-component/mql-component.md diff --git a/packages/mongodb-mql-model/src/docs/md/mql-component/mql-component.md b/packages/mongodb-mql-model/src/docs/md/mql-component/mql-component.md new file mode 100644 index 00000000..f383d07e --- /dev/null +++ b/packages/mongodb-mql-model/src/docs/md/mql-component/mql-component.md @@ -0,0 +1,111 @@ +# MQL Component +--------------- + +## Abstract + +This specification documents the structure of an MQL Component from a mixed perspective of both +the original source code and the target server that might run the query. It is primarily aimed +to provide developers of dialects and linters a common and flexible structure for code processing. + +## META + +The keywords "MUST", "MUST NOT", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY" +and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt). + +## Specification + +MQL Components (from now on just components) encapsulate units of meaning of an MQL query. Components +MAY be related to how a target MongoDB Cluster can process a query. Components MAY contain other components +or MQL Nodes. + +Components are categorised as: + +* Leaf components: they don't contain other components or nodes. +* Non-leaf components: they contain other components or nodes. + +Components MUST be part of a Node, they are meaningless outside of it. Components MAY be found +more than once in the same node. + +## List of Components + +### HasAccumulatedFields + +Contains a list of Nodes that represent the accumulated fields of a group operation. Each +node MUST represent one accumulated field and it's accumulator. + +### HasAddedFields + +Contains a list of Nodes that represent fields added to a document. For example, through the +$addFields aggregation stage. Each node MUST represent one added field. + +### HasAggregation + +Contains a list of Nodes, where each node represent one single aggregation stage. + +### HasCollectionReference + +Contains information whether this query or a specific subquery targets a specific collection. There +are three variants: + +* **Unknown**: there is a collection reference, but we don't know on which collection. +* **OnlyCollection**: there is a collection reference, but we only know the collection, not the full namespace. +* **Known**: both the collection and database are known. + +### HasFieldReference + +Contains information of a field. The field can be used for filtering, computing or aggregation. There +are different variants depending on the amount of information we have at the moment of parsing the query. + +* **Unknown**: we couldn't infer any information from the field. +* **FromSchema**: the field MUST be in the schema of the target collection. +* **Inferred**: Refers to a field that is not explicitly specified in the code. For example: +Filters.eq(A) refers to the _id field. +* **Computed**: Refers to a field that is not part of the schema because it's newly computed. + +### HasFilter + +Contains a list of Nodes that represent the filter of a query. + +### HasProjections + +Contains a list of Node that represents the projections of a $project stage. + +### HasSorts + +Contains a list of Node that represent the sorting criteria of a $sort stage. + +### HasSourceDialect + +Identifies the source dialect that parsed this query. + +### HasTargetCluster + +Identifies the version of the cluster that MAY run the query. + +### HasUpdates + +Contains a list of Node representing updates to a document. + +### HasValueReference + +Identifies a value in a query. Usually a value is the right side of a comparison, +but it can be used in different places, like for computing aggregation expressions. + +There are 5 variants: + +* **Unknown**: We don't have any information of the provided value. +* **Constant**: It's a value that can be resolved without evaluating it. A literal value is a constant. +* **Inferred**: It's a value that could be inferred from other operations. For example, Sort.ascending("field") would have an Inferred(1). +* **Runtime**: It's a value that could not be resolved without evaluating it, but we have enough information +to infer its runtime type. For example, a parameter from a method. +* **Computed**: Refers to a computed expression in the MongoDB Cluster, like a $expr node. + +### IsCommand + +References the command that will be evaluated in the MongoDB cluster. The list of +valid commands can be found in the IsCommand.kt file. + +### Named + +References the name of the operation that is being referenced in the node. The list +of valid names can be found in the Named.kt file. From 52657dcd4af3983b5df7075033d823ab8a4e7234 Mon Sep 17 00:00:00 2001 From: Kevin Mas Ruiz Date: Thu, 28 Nov 2024 15:14:19 +0100 Subject: [PATCH 20/20] chore: iteration 2 --- .../docs/md/mql-component/mql-component.md | 34 ++++++++++++------- 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/packages/mongodb-mql-model/src/docs/md/mql-component/mql-component.md b/packages/mongodb-mql-model/src/docs/md/mql-component/mql-component.md index f383d07e..42368074 100644 --- a/packages/mongodb-mql-model/src/docs/md/mql-component/mql-component.md +++ b/packages/mongodb-mql-model/src/docs/md/mql-component/mql-component.md @@ -40,12 +40,12 @@ $addFields aggregation stage. Each node MUST represent one added field. ### HasAggregation -Contains a list of Nodes, where each node represent one single aggregation stage. +Contains a list of Nodes, where each node MUST represent one single aggregation stage. ### HasCollectionReference -Contains information whether this query or a specific subquery targets a specific collection. There -are three variants: +Contains information whether this query or a specific subquery targets a specific collection. The +reference MUST be one of the following variants: * **Unknown**: there is a collection reference, but we don't know on which collection. * **OnlyCollection**: there is a collection reference, but we only know the collection, not the full namespace. @@ -53,8 +53,9 @@ are three variants: ### HasFieldReference -Contains information of a field. The field can be used for filtering, computing or aggregation. There -are different variants depending on the amount of information we have at the moment of parsing the query. +Contains information of a field. The field MAY be used for filtering, computing or aggregating data. +There are different variants depending on the amount of information we have at the moment of parsing the query. +The variant MUST be one of the following: * **Unknown**: we couldn't infer any information from the field. * **FromSchema**: the field MUST be in the schema of the target collection. @@ -64,34 +65,43 @@ Filters.eq(A) refers to the _id field. ### HasFilter -Contains a list of Nodes that represent the filter of a query. +Contains a list of Nodes that represent the filter of a query. It MAY not contain any +node for empty queries. ### HasProjections -Contains a list of Node that represents the projections of a $project stage. +Contains a list of Node that represents the projections of a $project stage. It MAY not +contain any node for empty projections. ### HasSorts -Contains a list of Node that represent the sorting criteria of a $sort stage. +Contains a list of Node that represent the sorting criteria of a $sort stage. It MAY not +contain any node if the sort criteria is still not defined. ### HasSourceDialect -Identifies the source dialect that parsed this query. +Identifies the source dialect that parsed this query. It MUST be one of the valid dialects: + +* Java Driver +* Spring Criteria +* Spring @Query ### HasTargetCluster -Identifies the version of the cluster that MAY run the query. +Identifies the version of the cluster that MAY run the query. It MUST be a valid released MongoDB +version. ### HasUpdates -Contains a list of Node representing updates to a document. +Contains a list of Node representing updates to a document. It MAY be empty if no updates are +specified yet. ### HasValueReference Identifies a value in a query. Usually a value is the right side of a comparison, but it can be used in different places, like for computing aggregation expressions. -There are 5 variants: +It MUST be one of these variants: * **Unknown**: We don't have any information of the provided value. * **Constant**: It's a value that can be resolved without evaluating it. A literal value is a constant.