Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: mql specs #92

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 164 additions & 0 deletions packages/mongodb-mql-model/src/docs/md/bson-type/bson-type.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# MQL BSON Type
-----------

## Abstract

This specification documents the different kinds of BSON types and how they are related to the
original source code of an [MQL Query](../mql-query/mql-query.md). This document aims to provide
information about the behaviour of dialects and linters on the computation of the original
expression BSON type.

## META

The keywords "MUST", "MUST NOT", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY"
and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).

## Specification

[BSON](https://bsonspec.org/spec.html) is a binary format that is used to communicate between the
MongoDB Client (through a driver) and a MongoDB Cluster. MQL BSON (from now on we will just say BSON)
is a superset of the original BSON types. For example some semantics, like BsonAnyOf, are not part
of the original BSON.

A BSON Type represents the data type inferred from the original source code or from a MongoDB sample
of documents. A BSON Type MUST be consumable by a MongoDB Cluster and its serialization MUST be
BSON 1.1 compliant.

### Primitive BSON Types

#### BsonString

A BsonString is a sequence of Unicode characters.

#### BsonBoolean

A BsonBoolean represents a disjoint true or false values. The actual internal encoding is left to the
original BSON 1.1 specification.

#### BsonDate

A BsonDate represents a date and a time, serializable to a UNIX timestamp. This specific type MAY be
represented differently in some dialects.

In any Java-based dialects, a BsonDate can be represented as:

* [java.util.Date](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/util/Date.html)
* [java.time.Instant](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/time/Instant.html)
* [java.time.LocalDate](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/time/LocalDate.html)
* [java.time.LocalDateTime](https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/java/time/LocalDateTime.html)

#### BsonObjectId

A BsonObjectId represents a 12 bytes unique identifier for an object.

#### BsonInt32

A signed integer of 32 bits precision. In Java it's mapped to an `int` type.

#### BsonInt64

A signed integer of 64 bits precision. In Java it's mapped to both `long` and `BigInteger`.

#### BsonDouble

A 64bit floating point number. In Java it's mapped to both float and double.

#### BsonDecimal128

A 128bit floating point number. In Java it's mapped to BigDecimal.

#### BsonNull

Represents the absence of a value.

#### BsonAny

Represents any possible type. Essentially, every type is a subtype of BsonAny.

#### BsonAnyOf

Represents an union of types. For example, BsonAnyOf([BsonString, BsonInt32]).

#### BsonObject

Represents the shape of a BSON document.

#### BsonArray

Represents a list of elements of a single type. For example: [ 1, 2, 3 ] is a BsonArray.

#### ComputedBsonType

A ComputedBsonType is a type that represents an expression that happens outside the boundaries
of the user. The typical use case is for expressions defined as MQL expressions (like $expr) that
will run on a valid MongoDB Cluster.

They contain a `baseType` that is the inferred type of the result of computing the expression. In
case the `baseType` can not be inferred, it MUST be BsonAny.

### Type Assignability

Assignable types MUST not change the semantics of a query when they are swapped. Let's say that
we have a query $Q$, and two variants, $Q_A$ and $Q_B$, where $Q_A$ and $Q_B$ differ on the specified type
in either a field or a value reference.

We will say that type $A$ is assignable to type $B$ if $Q_A$ and $Q_B$ are
[equivalent queries](/main/packages/mongodb-mql-model/src/docs/md/mql-query/mql-query.md#query-equivalence).

Type assignability MAY NOT be commutative.

#### Assignability table

| ⬇️ can be assigned to ➡️ | BsonString | BsonBoolean | BsonDate | BsonObjectId | BsonInt32 | BsonInt64 | BsonDouble | BsonDecimal128 | BsonNull | BsonAny | BsonAnyOf | BsonObject | BsonArray | ComputedBsonType |
|--------------------------|:----------:|:-----------:|:--------:|:------------:|:---------:|:---------:|:----------:|:--------------:|:--------:|:-------:|:---------:|:----------:|:---------:|:-----------------|
| BsonString | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonBoolean | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonDate | 🔴 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonObjectId | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonInt32 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonInt64 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonDouble | 🔴 | 🔴 | 🔴 | 🔴 | 🟠$^2$ | 🟠$^2$ | 🟢 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonDecimal128 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonNull | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonAny | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^4$ | 🟠$^6$ |
| BsonAnyOf | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟠$^1$ | 🟢 | 🟠$^1$ | 🟠$^1$ | 🟠$^4$ | 🟠$^6$ |
| BsonObject | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🟠$^3$ | 🟠$^4$ | 🟠$^6$ |
| BsonArray | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🔴 | 🟢 | 🟠$^1$ | 🔴 | 🟠$^5$ | 🟠$^6$ |
| ComputedBsonType | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ | 🟠$^6$ |

* 🟠$^1$: $A$ is assignable to $BsonAnyOf(B)$ only if $A$ is assignable to $B$.
* 🟠$^2$: It's assignable but there might be a significant loss of precision.
* 🟠$^3$: $BsonObject A$ is assignable to $B$ if $A$ is a subset of $B$.
* 🟠$^4$: $A$ is assignable to $BsonArray(B)$ only if $A$ is assignable to $B$.
* 🟠$^5$: $BsonArray(A)$ is assignable to $BsonArray(B)$ only if $A$ is assignable to $B$.
* 🟠$^6$: $A$ is assignable to $ComputedBsonType(BaseType)$ only if $A$ is assignable to $BaseType$.

### Type mapping

#### Java

| Java Type | Bson Type |
|:--------------|:------------------------------------|
| null | BsonNull |
| float | BsonDouble |
| Float | BsonAnyOf(BsonNull, BsonDouble) |
| double | BsonDouble |
| Double | BsonAnyOf(BsonNull, BsonDouble) |
| BigDecimal | BsonAnyOf(BsonNull, BsonDecimal128) |
| boolean | BsonBoolean |
| short | BsonInt32 |
| Short | BsonAnyOf(BsonNull, BsonInt32) |
| int | BsonInt32 |
| Integer | BsonAnyOf(BsonNull, BsonInt32) |
| BigInteger | BsonAnyOf(BsonNull, BsonInt64) |
| long | BsonInt64 |
| Long | BsonAnyOf(BsonNull, BsonInt64) |
| CharSequence | BsonAnyOf(BsonNull, BsonString) |
| String | BsonAnyOf(BsonNull, BsonString) |
| Date | BsonAnyOf(BsonNull, BsonDate) |
| Instant | BsonAnyOf(BsonNull, BsonDate) |
| LocalDate | BsonAnyOf(BsonNull, BsonDate) |
| LocalDateTime | BsonAnyOf(BsonNull, BsonDate) |
| Collection<T> | BsonAnyOf(BsonNull, BsonArray(T)) |
| Map<K, V> | BsonAnyOf(BsonNull, BsonObject) |
| Object | BsonAnyOf(BsonNull, BsonObject) |
121 changes: 121 additions & 0 deletions packages/mongodb-mql-model/src/docs/md/mql-component/mql-component.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# MQL Component
---------------

## Abstract

This specification documents the structure of an MQL Component from a mixed perspective of both
the original source code and the target server that might run the query. It is primarily aimed
to provide developers of dialects and linters a common and flexible structure for code processing.

## META

The keywords "MUST", "MUST NOT", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY"
and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.ietf.org/rfc/rfc2119.txt).

## Specification

MQL Components (from now on just components) encapsulate units of meaning of an MQL query. Components
MAY be related to how a target MongoDB Cluster can process a query. Components MAY contain other components
or MQL Nodes.

Components are categorised as:

* Leaf components: they don't contain other components or nodes.
* Non-leaf components: they contain other components or nodes.

Components MUST be part of a Node, they are meaningless outside of it. Components MAY be found
more than once in the same node.

## List of Components

### HasAccumulatedFields

Contains a list of Nodes that represent the accumulated fields of a group operation. Each
node MUST represent one accumulated field and it's accumulator.

### HasAddedFields

Contains a list of Nodes that represent fields added to a document. For example, through the
$addFields aggregation stage. Each node MUST represent one added field.

### HasAggregation

Contains a list of Nodes, where each node MUST represent one single aggregation stage.

### HasCollectionReference

Contains information whether this query or a specific subquery targets a specific collection. The
reference MUST be one of the following variants:

* **Unknown**: there is a collection reference, but we don't know on which collection.
* **OnlyCollection**: there is a collection reference, but we only know the collection, not the full namespace.
* **Known**: both the collection and database are known.

### HasFieldReference

Contains information of a field. The field MAY be used for filtering, computing or aggregating data.
There are different variants depending on the amount of information we have at the moment of parsing the query.
The variant MUST be one of the following:

* **Unknown**: we couldn't infer any information from the field.
* **FromSchema**: the field MUST be in the schema of the target collection.
* **Inferred**: Refers to a field that is not explicitly specified in the code. For example:
Filters.eq(A) refers to the _id field.
* **Computed**: Refers to a field that is not part of the schema because it's newly computed.

### HasFilter

Contains a list of Nodes that represent the filter of a query. It MAY not contain any
node for empty queries.

### HasProjections

Contains a list of Node that represents the projections of a $project stage. It MAY not
contain any node for empty projections.

### HasSorts

Contains a list of Node that represent the sorting criteria of a $sort stage. It MAY not
contain any node if the sort criteria is still not defined.

### HasSourceDialect

Identifies the source dialect that parsed this query. It MUST be one of the valid dialects:

* Java Driver
* Spring Criteria
* Spring @Query

### HasTargetCluster

Identifies the version of the cluster that MAY run the query. It MUST be a valid released MongoDB
version.

### HasUpdates

Contains a list of Node representing updates to a document. It MAY be empty if no updates are
specified yet.

### HasValueReference

Identifies a value in a query. Usually a value is the right side of a comparison,
but it can be used in different places, like for computing aggregation expressions.

It MUST be one of these variants:

* **Unknown**: We don't have any information of the provided value.
* **Constant**: It's a value that can be resolved without evaluating it. A literal value is a constant.
* **Inferred**: It's a value that could be inferred from other operations. For example, Sort.ascending("field") would have an Inferred(1).
* **Runtime**: It's a value that could not be resolved without evaluating it, but we have enough information
to infer its runtime type. For example, a parameter from a method.
* **Computed**: Refers to a computed expression in the MongoDB Cluster, like a $expr node.

### IsCommand

References the command that will be evaluated in the MongoDB cluster. The list of
valid commands can be found in the IsCommand.kt file.

### Named

References the name of the operation that is being referenced in the node. The list
of valid names can be found in the Named.kt file.
Loading
Loading