diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml new file mode 100644 index 000000000..fc84715a4 --- /dev/null +++ b/.github/workflows/docs.yml @@ -0,0 +1,39 @@ +name: Build HTML Docs from Markdown and Push to GitHub Pages Branch +on: + push: + branches: + - main + pull_request: + branches: + - main + +jobs: + build: + name: Build Docs + runs-on: ubuntu-20.04 + steps: + - name: Checkout + uses: actions/checkout@v2 + - name: Prepare Docs + run: | + cp CONTRIBUTING.md doc/development/Contributing.md + sed -ri 's@(\(/doc/[a-zA-Z0-9/]*)(.md)@\1@g' doc/*.md + sed -ri 's@(\(/doc/[a-zA-Z0-9/]*)(.md)@\1@g' doc/**/*.md + sed -ri 's@\(/([a-zA-Z0-9]*.[a-zA-Z0-9]*)@\(https://github.com/daphne-eu/daphne/tree/main/\1@g' doc/*.md + sed -ri 's@\(/([a-zA-Z0-9]*.[a-zA-Z0-9]*)@\(https://github.com/daphne-eu/daphne/tree/main/\1@g' doc/**/*.md + sed -ri 's@]\(/([a-z]+)@]\(https://github.com/daphne-eu/daphne/tree/main/\1@g' doc/*.md + sed -ri 's@]\(/([a-z]+)@]\(https://github.com/daphne-eu/daphne/tree/main/\1@g' doc/**/*.md + sed -i 's@](https://github.com/daphne-eu/daphne/tree/main/doc/@](/daphne/@g' doc/*.md + sed -i 's@](https://github.com/daphne-eu/daphne/tree/main/doc/@](/daphne/@g' doc/**/*.md + sed -ri 's@]\(/issues/([0-9]+)@]\(https://github.com/daphne-eu/daphne/issues/\1@g' doc/*.md + - name: Build + uses: Tiryoh/actions-mkdocs@v0 + with: + mkdocs_version: 'latest' + requirements: 'doc/docs-build-requirements.txt' + configfile: 'mkdocs.yml' # option + - name: Deploy + uses: peaceiris/actions-gh-pages@v3 + with: + github_token: ${{ secrets.GITHUB_TOKEN }} + publish_dir: ./doc_build diff --git a/.gitignore b/.gitignore index f520067d1..58991963c 100644 --- a/.gitignore +++ b/.gitignore @@ -6,6 +6,9 @@ build_*/ /lib /tmp +# documentation build output +doc_build/ + # dependencies thirdparty/* !thirdparty/llvm-project/ @@ -16,7 +19,7 @@ thirdparty/* # Python __pycache__/ -/venv* +/**/*venv* # Jetbrains IDE .idea/ diff --git a/containers/Readme.md b/containers/README.md similarity index 100% rename from containers/Readme.md rename to containers/README.md diff --git a/deploy/README.md b/deploy/README.md index 393af4bde..4e13be7a1 100644 --- a/deploy/README.md +++ b/deploy/README.md @@ -103,7 +103,7 @@ This directory includes a set of **bash scripts** providing support for: ## List of Files in this Directory 1. This short [README](README.md) file to explain directory structure and point to more documentation at [Deploy](/doc/Deploy.md). -2. A [script](build-daphne-singularity-image.sh) that builds the "daphne.sif" singularity image from the [Docker image](../containers/Readme.md) +2. A [script](build-daphne-singularity-image.sh) that builds the "daphne.sif" singularity image from the [Docker image](/containers/README.md) daphneeu/daphne-dev 3. [deploy-distributed-on-slurm](deploy-distributed-on-slurm.sh) script allows the user to deploy DAPHNE with SLURM. 4. [deployDistributed](deployDistributed.sh) script builds and sends DAPHNE to remote machines manually with SSH (no tools like Slurm needed). diff --git a/doc/BinaryFormat.md b/doc/BinaryFormat.md index 92d707393..a91325b61 100644 --- a/doc/BinaryFormat.md +++ b/doc/BinaryFormat.md @@ -14,10 +14,11 @@ See the License for the specific language governing permissions and limitations under the License. --> -# DAPHNE Binary Data Format +# Binary Data Format DAPHNE defines its own binary representation for the serialization of in-memory data objects (matrices/frames). This representation is intended to be used by default whenever we need to transfer or persistently store these in-memory objects, e.g., for + - the data transfer in the distributed runtime - a custom binary file format - the eviction of in-memory data to secondary storage @@ -25,29 +26,26 @@ This representation is intended to be used by default whenever we need to transf *Disclaimer:* The current specification is a first draft and will likely be refined as we proceed. At the moment, we focus on the case of a single block per data object. -**Endianess** - -For now, we assume *little endian*. - -**Images** +**Endianess:** For now, we assume *little endian*. -In the images below, all addresses and sizes are specified in bytes (`[B]`). +**Images:** In the images below, all addresses and sizes are specified in bytes (`[B]`). -### Binary Representation of a Whole Data Object +## Binary Representation of a Whole Data Object The binary representation of a data object (matrix/frame) starts with a header containing general and data type-specific information. The data object is partitioned into rectangular blocks (in the extreme case, this can mean a single block). All blocks are represented individually (see binary representation of a single block below) and stored along with their position in the data object. -``` +```text +--------+------+ | header | body | +--------+------+ ``` -**Header** +### Header The header consists of the following information: + - DAPHNE binary format version number (`1` for now) (uint8) - data type `dt` (uint8) - number of rows `#r` (uint64) @@ -81,9 +79,10 @@ We currently support the following **value types**: Depending on the data type, there are more information in the header: *For `DenseMatrix` and `CSRMatrix`*: + - value type `vt` (uint8) -``` +```text addr[B] 0 0 1 1 2 9 10 17 18 18 +---+----+----+-----+-----+ | 1 | dt | #r | #c | vt | @@ -92,10 +91,11 @@ size[B] 1 1 8 8 1 ``` *For `Frame`*: + - value type `vt` (uint8), for each column - length of the label `len` (uint16) and label `lbl` (character string), for each column -``` +```text addr[B] 0 0 1 1 2 9 10 17 18 18+#c-1 18+#c * +---+----+----+-----+-------+ +----------+--------+--------+ +-----------+-----------+ | 1 | dt | #r | #c | vt[0] | ... | vt[#c-1] | len[0] | lbl[0] | ... | len[#c-1] | lbl[#c-1] | @@ -103,9 +103,10 @@ addr[B] 0 0 1 1 2 9 10 17 18 18+#c-1 18+#c size[B] 1 1 8 8 1 1 2 len[0] 2 len[#c-1] ``` -**Body** +### Body The body consists of a sequence of: + - a pair of - row index `rx` (uint64) - column index `cx` (uint64) @@ -113,7 +114,7 @@ The body consists of a sequence of: For the special case of a single block, this looks as follows: -``` +```text addr[B] 0 7 8 15 16 * +---+----+----------+ | 0 | 0 | block[0] | @@ -122,18 +123,19 @@ addr[B] 0 7 8 15 16 * size[B] ``` -### Binary Representation of a Single Block +## Binary Representation of a Single Block A single data block is a rectangular partition of a data object. In the extreme case, a single block can span the entire data object in both dimensions (one block per data object). General block header + - number of rows `#r` (uint32) - number of columns `#c` (uint32) - block type `bt` (uint8) - block type-specific information (see below) -``` +```text addr[B] 0 3 4 7 8 8 9 * +----+----+----+--------------------------+ | #r | #c | bt | block type-specific info | @@ -141,7 +143,7 @@ addr[B] 0 3 4 7 8 8 9 * size[B] 4 4 1 * ``` -**Block types** +## Block types We define different block types to allow for a space-efficient representation depending on the data. When serializing a data object, the block types are not required to match the in-memory representation (e.g., the blocks of a `DenseMatrix` could use the *sparse* binary representation). @@ -159,13 +161,13 @@ Most block types store their value type as part of the block type-specific infor Note that the value type used for the binary representation is not required to match the value type of the in-memory object (e.g., `DenseMatrix` may be represented as a *dense* block with value type `uint8_t`, if the value range permits). Furthermore, each block may be represented using its individual value type. -**Empty block** +### Empty block This block type is used to represent blocks that contain only zeros of the respective value type very space-efficiently. Block type-specific information: *none* -``` +```text addr[B] 0 3 4 7 8 8 +----+----+---+ | #r | #c | 0 | @@ -173,15 +175,16 @@ addr[B] 0 3 4 7 8 8 size[B] 4 4 1 ``` -**Dense block** +### Dense block Block type-specific information: + - value type `vt` (uint8) - values `v` in row-major (value type `vt`) Below, `S` denotes the size (in bytes) of a single value of type `vt`. -``` +```text addr[B] 0 3 4 7 8 8 9 9 10 10+#r*#c*S +----+----+---+----+---------+---------+ +---------------+ | #r | #c | 1 | vt | v[0, 0] | v[0, 1] | ... | v[#r-1, #c-1] | @@ -189,9 +192,10 @@ addr[B] 0 3 4 7 8 8 9 9 10 10+#r*#c*S size[B] 4 4 1 1 S S S ``` -**Sparse block (compressed sparse row, CSR)** +### Sparse block (compressed sparse row, CSR) Block type-specific information: + - value type `vt` (uint8) - number of non-zeros in the block `#nzb` (uint64) - for each row @@ -199,13 +203,12 @@ Block type-specific information: - for each non-zero in the row - column index `cx` (uint32) - value `v` (value type `vt`) - + Note that both a row and the entire block might contain no non-zeros. Below, `S` denotes the size (in bytes) of a single value of type `vt`. - -``` +```text 18 + 4*#r + addr[B] 0 3 4 7 8 8 9 9 10 17 18 #nzb*(4+S) +----+----+---+----+------+--------+ +--------+ +-----------+ @@ -230,14 +233,15 @@ size[B] 4 4 1 1 8 4+#nzr[i]*(4+S) 4 S ``` -**Ultra-sparse block (coordinate, COO)** +### Ultra-sparse block (coordinate, COO) Ultra-sparse blocks contain almost no non-zeros, so we want to keep the overhead of the meta data low. Thus, we distinguish blocks with a single column (where we don't need to store the column index) and blocks with more than one column. -*Blocks with a single column* +### Blocks with a single column Block type-specific information: + - value type `vt` (uint8) - number of non-zeros in the block `#nzb` (uint32) - for each non-zero @@ -246,7 +250,7 @@ Block type-specific information: Below, `S` denotes the size (in bytes) of a single value of type `vt`. -``` +```text addr[B] 0 3 4 7 8 8 9 9 10 13 14 14+#nzb*(4+S) +----+----+---+----+------+-------+ +-------+ +------------+ | #r | #c | 3 | vt | #nzb | nz[0] | ... | nz[i] | ... | nz[#nzb-1] | @@ -262,9 +266,10 @@ size[B] 4 4 1 1 4 4+S 4+S 4+S 4 S ``` -*Blocks with more than one column* +### Blocks with more than one column Block type-specific information: + - value type `vt` (uint8) - number of non-zeros in the block `#nzb` (uint32) - for each non-zero @@ -274,7 +279,7 @@ Block type-specific information: Below, `S` denotes the size (in bytes) of a single value of type `vt`. -``` +```text addr[B] 0 3 4 7 8 8 9 9 10 13 14 14+#nzb*(8+S) +----+----+---+----+------+-------+ +-------+ +------------+ | #r | #c | 3 | vt | #nzb | nz[0] | ... | nz[i] | ... | nz[#nzb-1] | diff --git a/doc/Config.md b/doc/Config.md index 8b4cfdf0e..a271dcd38 100644 --- a/doc/Config.md +++ b/doc/Config.md @@ -14,7 +14,7 @@ See the License for the specific language governing permissions and limitations under the License. --> -# DAPHNE Configuration: Getting Information from the User +# Configuration - Getting Information from the User The behavior of the DAPHNE system can be influenced by the user by means of a cascading configuration mechanism. There is a set of options that can be passed from the user to the system. @@ -27,17 +27,18 @@ The cascade consists of the following steps: - (In the future, DaphneDSL will also offer means to change the configuration at run-time.) The `DaphneUserConfig` is available to all parts of the code, including: + - The DAPHNE compiler: The `DaphneUserConfig` is passed to the `DaphneIrExecutor` and from there to all passes that need it. - The DAPHNE runtime: The `DaphneUserConfig` is part of the `DaphneContext`, which is passed to all kernels. Hence, information provided by the user can be used to influence both, the compiler and the runtime. *The use of environment variables to pass information into the system is discouraged.* -### How to extend the configuration? +## How to extend the configuration? If you need to add additional information from the user, you must take roughly the following steps: 1. Create a new member in `DaphneUserConfig` and hard-code a reasonable default. 2. Add a command-line argument to the system's CLI API in [src/api/cli/daphne.cpp](/src/api/cli/daphne.cpp). We use LLVM's [CommandLine 2.0 library](https://llvm.org/docs/CommandLine.html) for parsing CLI arguments. Make sure to update the corresponding member the `DaphneUserConfig` with the parsed argument. 3. *For compiler passes*: If necessary, pass on the `DaphneUserConfig` to the compiler pass you are working on in [src/compiler/execution/DaphneIrExecutor.cpp](/src/compiler/execution/DaphneIrExecutor.cpp). *For kernels*: All kernels automatically get the `DaphneUserConfig` via the `DaphneContext` (their last parameter), so no action is required from your side. -4. Access the new member of the `DaphneUserConfig` in your code. \ No newline at end of file +4. Access the new member of the `DaphneUserConfig` in your code. diff --git a/doc/DaphneDSLBuiltins.md b/doc/DaphneDSL/Builtins.md similarity index 56% rename from doc/DaphneDSLBuiltins.md rename to doc/DaphneDSL/Builtins.md index 58da3bdb1..15e83dd58 100644 --- a/doc/DaphneDSLBuiltins.md +++ b/doc/DaphneDSL/Builtins.md @@ -14,15 +14,10 @@ See the License for the specific language governing permissions and limitations under the License. --> - - -# DaphneDSL Built-in Functions +# Built-in Functions DaphneDSL offers numerous built-in functions, which can be used in every DaphneDSL script without requiring any imports. -The general syntax for calling a built-in function is `func(param1, param2, ...)` (see the [DaphneDSL Language Reference](/doc/DaphneDSLLanguageRef.md)). +The general syntax for calling a built-in function is `func(param1, param2, ...)` (see the [DaphneDSL Language Reference](/doc/DaphneDSL/LanguageRef.md)). This document provides an overview of the DaphneDSL built-in functions. Note that we are still extending this set of built-in functions. @@ -30,6 +25,7 @@ Furthermore, **we also plan to create a library of higher-level ML primitives** Those library functions will internally be implemented using the built-in functions described in this document. We use the following notation (deviating from the DaphneDSL function syntax): + - square brackets `[]` mean that a parameter is optional - `...` stands for an arbitrary repetition of the previous parameter (including zero). - `/` means alternative options, e.g., `matrix/frame` means the parameter could be a matrix or a frame @@ -58,34 +54,34 @@ DaphneDSL's built-in functions can be categorized as follows: - **`fill`**`(value:scalar, numRows:size, numCols:size)` - Creates a *(`numRows` x `numCols`)* matrix and sets all elements to `value`. + Creates a *(`numRows` x `numCols`)* matrix and sets all elements to `value`. - **`createFrame`**`(column:matrix, ...[, labels:str, ...])` - Creates a frame from an arbitrary number of column matrices. - Optionally, a label can be specified for each column (the number of provided columns and labels must be equal). + Creates a frame from an arbitrary number of column matrices. + Optionally, a label can be specified for each column (the number of provided columns and labels must be equal). - **`diagMatrix`**`(arg:matrix)` - Creates an *(n x n)* diagonal matrix by placing the elements of the given *(n x 1)* column-matrix `arg` on the diagonal of an otherwise empty (zero) square matrix. + Creates an *(n x n)* diagonal matrix by placing the elements of the given *(n x 1)* column-matrix `arg` on the diagonal of an otherwise empty (zero) square matrix. - **`rand`**`(numRows:size, numCols:size, min:scalar, max:scalar, sparsity:double, seed:si64)` - Generates a *(`numRows` x `numCols`)* matrix of random values. - The values are drawn uniformly from the range *[`min`, `max`]* (both inclusive). - The `sparsity` can be chosen between `0.0` (all zeros) and `1.0` (all non-zeros). - The `seed` can be set to `-1` (randomly chooses a seed), or be provided explicitly to enable reproducible random values. + Generates a *(`numRows` x `numCols`)* matrix of random values. + The values are drawn uniformly from the range *[`min`, `max`]* (both inclusive). + The `sparsity` can be chosen between `0.0` (all zeros) and `1.0` (all non-zeros). + The `seed` can be set to `-1` (randomly chooses a seed), or be provided explicitly to enable reproducible random values. - **`sample`**`(range:scalar, size:size, withReplacement:bool, seed:si64)` - Generates a *(`size` x 1)* column-matrix of values drawn from the range *[0, `range` - 1]*. - The parameter `withReplacement` determines if a value can be drawn multiple times (`true`) or not (`false`). - The `seed` can be set to `-1` (randomly chooses a seed), or be provided explicitly to enable reproducible random values. + Generates a *(`size` x 1)* column-matrix of values drawn from the range *[0, `range` - 1]*. + The parameter `withReplacement` determines if a value can be drawn multiple times (`true`) or not (`false`). + The `seed` can be set to `-1` (randomly chooses a seed), or be provided explicitly to enable reproducible random values. - **`seq`**`(from:scalar, to:scalar, inc:scalar)` - Generates a column matrix containing an arithmetic sequence of values starting at `from`, going through `to`, in increments of `inc`. - Note that `from` may be greater than `to`, and `inc` may be negative. + Generates a column matrix containing an arithmetic sequence of values starting at `from`, going through `to`, in increments of `inc`. + Note that `from` may be greater than `to`, and `inc` may be negative. ## Matrix/frame dimensions @@ -93,16 +89,16 @@ The following built-in functions allow to find out the shape/dimensions of matri - **`nrow`**`(arg:matrix/frame)` - Returns the number of rows in `arg`. + Returns the number of rows in `arg`. - **`ncol`**`(arg:matrix/frame)` - Returns the number of columns in `arg`. + Returns the number of columns in `arg`. - **`ncell`**`(arg:matrix/frame)` - Returns the number of cells in `arg`. - This is the product of the number of rows and the number of columns. + Returns the number of cells in `arg`. + This is the product of the number of rows and the number of columns. ## Elementwise unary @@ -110,7 +106,7 @@ The following built-in functions all follow the same scheme: - ***`unaryFunc`***`(arg:scalar/matrix)` - Applies the respective unary function (see table below) to the given scalar `arg` or to each element of the given matrix `arg`. + Applies the respective unary function (see table below) to the given scalar `arg` or to each element of the given matrix `arg`. ### Arithmetic/general math @@ -145,10 +141,10 @@ The built-in functions all follow the same scheme: - ***`binaryFunc`***`(lhs:scalar/matrix, rhs:scalar/matrix)` - Applies the respective binary function (see table below) to the corresponding pairs of a value in the left-hand-side argument `lhs` and the right-hand-side argument `rhs`. - Regarding the combinations of scalars and matrices, the same broadcasting semantics apply as for binary operations like `+`, `*`, etc. (see the [DaphneDSL Language Reference](/doc/DaphneDSLLanguageRef.md)). + Applies the respective binary function (see table below) to the corresponding pairs of a value in the left-hand-side argument `lhs` and the right-hand-side argument `rhs`. + Regarding the combinations of scalars and matrices, the same broadcasting semantics apply as for binary operations like `+`, `*`, etc. (see the [DaphneDSL Language Reference](/doc/DaphneDSL/LanguageRef.md)). -### Arithmetic +Note that DaphneDSL support various other elementwise binary functions via operators in infix notation (see [DaphneDSL](/doc/DaphneDSL/LanguageRef.md)), e.g., `^`, `%`, `*`, `/`, `+`, `-`, `==`, `!=`, `<`, `<=`, `>`, `>=`, `&&`, `||`. | function | operator | meaning | | ----- | ----- | ----- | @@ -172,7 +168,7 @@ The built-in functions all follow the same scheme: | function | operator | meaning | | ----- | ----- | ----- | | | **`&&`** | logical conjunction | -| | **`\|\|`** | logical disjunction | +| | **`||`** | logical disjunction | ### Strings @@ -261,14 +257,14 @@ The following built-in functions all follow the same scheme: - **`agg`**`(arg:matrix)` - Full aggregation over all elements of the matrix `arg` using aggregation function `agg` (see table below). - Returns a scalar. + Full aggregation over all elements of the matrix `arg` using aggregation function `agg` (see table below). + Returns a scalar. - **`agg`**`(arg:matrix, axis:si64)` - Row or column aggregation over a *(n x m)* matrix `arg` using aggregation function `agg` (see table below). - - `axis` == 0: calculate one aggregate per row; the result is a *(n x 1)* (column) matrix - - `axis` == 1: calculate one aggregate per column; the result is a *(1 x m)* (row) matrix + Row or column aggregation over a *(n x m)* matrix `arg` using aggregation function `agg` (see table below). + - `axis` == 0: calculate one aggregate per row; the result is a *(n x 1)* (column) matrix + - `axis` == 1: calculate one aggregate per column; the result is a *(1 x m)* (row) matrix | function | meaning | | ----- | ----- | @@ -301,35 +297,34 @@ The following built-in functions all follow the same scheme: - **`reshape`**`(arg:matrix, numRows:size, numCols:size)` - Changes the shape of `arg` to *(`numRows` x `numCols`)*. - Note that the number of cells must be retained, i.e., the product of `numRows` and `numCols` must be equal to the product of the number of rows in `arg` and the number of columns in `arg`. + Changes the shape of `arg` to *(`numRows` x `numCols`)*. + Note that the number of cells must be retained, i.e., the product of `numRows` and `numCols` must be equal to the product of the number of rows in `arg` and the number of columns in `arg`. - **`transpose/t`**`(arg:matrix)` - Transposes the given matrix `arg`. + Transposes the given matrix `arg`. - **`cbind`**`(lhs:matrix/frame, rhs:matrix/frame)` - Concatenates two matrices or two frames horizontally. - The two inputs must have the same number of rows. + Concatenates two matrices or two frames horizontally. + The two inputs must have the same number of rows. - **`rbind`**`(lhs:matrix/frame, rhs:matrix/frame)` - Concatenates two matrices or two frames vertically. - The two inputs must have the same number of columns. + Concatenates two matrices or two frames vertically. + The two inputs must have the same number of columns. - **`reverse`**`(arg:matrix)` - Reverses the rows in the given matrix `arg`. + Reverses the rows in the given matrix `arg`. - **`order`**`(arg:matrix/frame, colIdxs:size, ..., ascs:bool, ..., returnIndexes:bool)` - Sorts the given matrix or frame by an arbitrary number of columns. - The columns are specified in terms of their indexes (counting starts at zero). - Each column can be sorted either in ascending (`true`) or descending (`false`) order (as determined by parameter `ascs`). - The provided number of columns and sort orders must match. - The parameter `returnIndexes` determines whether to return the sorted data (`false`) or a column-matrix of positions representing the permutation applied by the sorting (`true`). - + Sorts the given matrix or frame by an arbitrary number of columns. + The columns are specified in terms of their indexes (counting starts at zero). + Each column can be sorted either in ascending (`true`) or descending (`false`) order (as determined by parameter `ascs`). + The provided number of columns and sort orders must match. + The parameter `returnIndexes` determines whether to return the sorted data (`false`) or a column-matrix of positions representing the permutation applied by the sorting (`true`). ## Matrix decomposition & co @@ -343,23 +338,23 @@ Note that most of these operations only have a CUDNN-based kernel for GPU execut - **`avg_pool2d`**`(inputData:matrix, numImages:size, numChannels:size, imgHeight:size, imgWidth:size, poolHeight:size, poolWidth:size, strideHeight:size, strideWidth:size, paddingHeight:size, paddingWidth:size)` - Performs average pooling operation. + Performs average pooling operation. - **`max_pool2d`**`(inputData:matrix, numImages:size, numChannels:size, imgHeight:size, imgWidth:size, poolHeight:size, poolWidth:size, strideHeight:size, strideWidth:size, paddingHeight:size, paddingWidth:size)` - Performs max pooling operation. + Performs max pooling operation. - **`batch_norm2d`**`(inputData:matrix, gamma, beta, emaMean, emaVar, eps)` - Performs batch normalization operation. + Performs batch normalization operation. - **`biasAdd`**`(input:matrix, bias:matrix)` - Adds the *(1 x `numChannels`)* row-matrix `bias` to the `input` with the given number of channels. + Adds the *(1 x `numChannels`)* row-matrix `bias` to the `input` with the given number of channels. - **`conv2d`**`(input:matrix, filter:matrix, numImages:size, numChannels:size, imgHeight:size, imgWidth:size, filterHeight:size, filterWidth:size, strideHeight:size, strideWidth:size, paddingHeight:size, paddingWidth:size)` - 2D convolution. + 2D convolution. - **`relu`**`(inputData:matrix)` @@ -369,54 +364,54 @@ Note that most of these operations only have a CUDNN-based kernel for GPU execut - **`diagVector`**`(arg:matrix)` - Extracts the diagonal of the given *(n x n)* matrix `arg` as a *(n x 1)* column-matrix. + Extracts the diagonal of the given *(n x n)* matrix `arg` as a *(n x 1)* column-matrix. - **`lowerTri`**`(arg:matrix, diag:bool, values:bool)` - Extracts the lower triangle of the given square matrix `arg` by setting all elements in the upper triangle to zero. - If `diag` is `true`, the elements on the diagonal are retained; otherwise, they are set to zero, too. - If `values` is `true`, the non-zero elements in the lower triangle are retained; otherwise, they are set to one. + Extracts the lower triangle of the given square matrix `arg` by setting all elements in the upper triangle to zero. + If `diag` is `true`, the elements on the diagonal are retained; otherwise, they are set to zero, too. + If `values` is `true`, the non-zero elements in the lower triangle are retained; otherwise, they are set to one. - **`upperTri`**`(arg:matrix, diag:bool, values:bool)` - Extracts the upper triangle of the given square matrix `arg` by setting all elements in the lower triangle to zero. - If `diag` is `true`, the elements on the diagonal are retained; otherwise, they are set to zero, too. - If `values` is `true`, the non-zero elements in the upper triangle are retained; otherwise, they are set to one. + Extracts the upper triangle of the given square matrix `arg` by setting all elements in the lower triangle to zero. + If `diag` is `true`, the elements on the diagonal are retained; otherwise, they are set to zero, too. + If `values` is `true`, the non-zero elements in the upper triangle are retained; otherwise, they are set to one. - **`solve`**`(A:matrix, b:matrix)` - Solves the system of linear equations given by the *(n x n)* matrix `A` and the *(n x 1)* column-matrix `b` and returns the result as a *(n x 1)* column-matrix. + Solves the system of linear equations given by the *(n x n)* matrix `A` and the *(n x 1)* column-matrix `b` and returns the result as a *(n x 1)* column-matrix. - **`replace`**`(arg:matrix, pattern:scalar, replacement:scalar)` - Replaces all occurrences of the element `pattern` in the matrix `arg` by the element `replacement`. + Replaces all occurrences of the element `pattern` in the matrix `arg` by the element `replacement`. - **`ctable`**`(ys:matrix, xs:matrix[, weight:scalar][, numRows:int, numCols:int])` - Returns the contingency table of two *(n x 1)* column-matrices `ys` and `xs`. - The resulting matrix `res` consists of `max(ys) + 1` rows and `max(xs) + 1` columns. - More precisely, *`res[x, y]` = |{ k | `ys[k, 0]` = y and `xs[k, 0]` = x, 0 ≤ k ≤ n-1 }| * `weight`*. + Returns the contingency table of two *(n x 1)* column-matrices `ys` and `xs`. + The resulting matrix `res` consists of `max(ys) + 1` rows and `max(xs) + 1` columns. + More precisely, *`res[x, y] `= |{ k | `ys[k, 0]` = y and `xs[k, 0]` = x, 0 ≤ k ≤ n-1 }| * `weight`*. - In other words, starting with an all-zero result matrix, `ys` and `xs` can be thought of as lists of `y`/`x`-coordinates which indicate the result matrix's cells whose value shall be increased by `weight`. - Note that `ys` and `xs` must not contain negative numbers. + In other words, starting with an all-zero result matrix, `ys` and `xs` can be thought of as lists of `y`/`x`-coordinates which indicate the result matrix's cells whose value shall be increased by `weight`. + Note that `ys` and `xs` must not contain negative numbers. - The scalar weight is an optional argument and defaults to 1.0. - The weight also determines the value type of the result. - - Moreover, optionally, the result shape in terms of the number of rows and columns can be specified. - If omited, it defaults to the smallest numbers required to accommodate all given `y`/`x`-coordinates, as expressed above. - If specified, the result can be either cropped or padded with zeros to the desired shape. - If a value less than zero is provided as the number of rows/columns, the respective dimension will also be determined from the input data. - - This built-in function can be called with 2, 3, 4, or 5 arguments, depending on which optional arguments are given. + The scalar weight is an optional argument and defaults to 1.0. + The weight also determines the value type of the result. + + Moreover, optionally, the result shape in terms of the number of rows and columns can be specified. + If omited, it defaults to the smallest numbers required to accommodate all given `y`/`x`-coordinates, as expressed above. + If specified, the result can be either cropped or padded with zeros to the desired shape. + If a value less than zero is provided as the number of rows/columns, the respective dimension will also be determined from the input data. + + This built-in function can be called with 2, 3, 4, or 5 arguments, depending on which optional arguments are given. - **`syrk`**`(A:martix)` - Calculates `t(A) @ A` by symmetric rank-k update operations. + Calculates `t(A) @ A` by symmetric rank-k update operations. - **`gemv`**`(A:matrix, x:matrix)` - Calcuates `t(A) @ x` for the given *(n x m)* matrix `A` and *(n x 1)* column-matrix `x`. + Calcuates `t(A) @ x` for the given *(n x m)* matrix `A` and *(n x 1)* column-matrix `x`. ## Extended relational algebra @@ -429,13 +424,13 @@ On the other hand, built-in functions for individual operations of extended rela - **`registerView`**`(viewName:str, arg:frame)` - Registers the frame `arg` to be accessible to SQL queries by the name `viewName`. + Registers the frame `arg` to be accessible to SQL queries by the name `viewName`. - **`sql`**`(query:str)` - Executes the SQL query `query` on the frames previously registered with `registerView()` and returns the result as a frame. + Executes the SQL query `query` on the frames previously registered with `registerView()` and returns the result as a frame. -### Set operations +### Set Operations We will support set operations such as **`intersect`**, **`merge`**, and **`except`**. -## How to import functions from other Daphne scripts +# Imports + +How to import functions from other Daphne scripts Example usage: -``` + +```cpp import "bar.daphne"; import "foo.daphne" as "utils"; print(bar.x); print(utils.x); ``` -------------------------------------- + +--- `UserConfig.json` now has a new field `daphnedsl_import_paths`, which maps e.g., library names to a list of paths, see example: -``` + +```json "daphnedsl_import_paths": { "default_dirs": ["test/api/cli/import/sandbox", "some/other/path"], "algorithms": ["test/api/cli/import/sandbox/algos"] } ``` + NOTE: `default_dirs` can hold many paths and it will look for the **one** specified file in each, whereas any other library names have a list consisting of **one** directory, from which **all** files will be imported (can be easily extended to multiple directories). + Example: -``` + +```cpp import "a.daphne"; import "algorithms"; print(a.x); print(algorithms.kmeans.someVar); ``` + The first import will first check if the relative path exists, then it will look for it relative to paths in `default_dirs`. If the specified file exists for more than one relative path, an error will be thrown. The second import goes to `algorithms` directory from `UserConfig` and imports all files from it. Paths from `UserConfig` get to `DaphneDSLVisitor` from `daphne.cpp` via `DaphneUserConfig`. ------------------------------------ +--- + Variable name collision resolution: Whenever we stumble upon equal prefixes (e.g., files with the same name in different directories), a parent directory of the file where conflict is detected is prepended before prefix. + Example: -``` + +```cpp import "somedir/a.daphne"; import "otherdir/a.daphne"; print(a.x); print(otherdir.a.x); ``` + NOTE: the parent directory may be prepended even though you never specified it (e.g., the import script is in the same directory as the original script). + Example: -``` + +```cpp import "somedir/a.daphne"; import "a.daphne"; @@ -71,16 +86,19 @@ print(otherdir.a.x); Libraries and aliases: Currently, the following example is allowed: -``` + +```cpp import "algorithms"; import "sandbox/b.daphne" as "algorithms"; print(algorithms.x); print(algorithms.kmeans1.someVar); ``` + Even though both prefixes will begin with `algorithms.`, the entire library content's prefix is extended with filenames. It is up to user to not confuse yourself. ------------------------------------ +--- + Cascade imports: Any variables/functions imported into the script we are currently importing will be discarded. Example import scheme: `A<-B<-C`. A imports B, B imports C. B uses some vars/functions from C, but A doesn't "see" any of C's content. diff --git a/doc/DaphneDSLLanguageRef.md b/doc/DaphneDSL/LanguageRef.md similarity index 95% rename from doc/DaphneDSLLanguageRef.md rename to doc/DaphneDSL/LanguageRef.md index 125130a49..9a1adeccd 100644 --- a/doc/DaphneDSLLanguageRef.md +++ b/doc/DaphneDSL/LanguageRef.md @@ -14,7 +14,7 @@ See the License for the specific language governing permissions and limitations under the License. --> -# DaphneDSL Language Reference +# Language Reference DaphneDSL is DAPHNE's domain-specific language (DSL). DaphneDSL is written in plain text files, typically ending with `.daphne` or `.daph`. @@ -27,13 +27,13 @@ Its syntax is inspired by C/Java-like languages. A simple hello-world script can look as follows: -``` +```csharp print("hello world"); ``` Assuming this script is stored in the file `hello.daphne`, it can be executed by the following command: -``` +```shell bin/daphne hello.daphne ``` @@ -47,8 +47,9 @@ Variables are used to refer to values. The following reserved keywords must not be used as identifiers: `if`, `else`, `while`, `do`, `for`, `in`, `true`, `false`, `as`, `def`, `return`, `import`, `matrix`, `frame`, `scalar`, `f64`, `f32`, `si64`, `si8`, `ui64`, `ui32`, `ui8`, `str`, `nan`, and `inf`. -*Examples* -``` +*Examples:* + +```text X y _hello123 @@ -64,25 +65,29 @@ Variables must have been assigned to before they are used in an expression. DaphneDSL differentiates *data types* and *value types*. Currently, DaphneDSL supports the following *abstract* **data types**: + - `matrix`: homogeneous value type for all cells - `frame`: a table with columns of potentially different value types - `scalar`: a single value **Value types** specify the representation of individual values. We currently support: + - floating-point numbers of various widths: `f64`, `f32` - signed and unsigned integers of various widths: `si64`, `si32`, `si8`, `ui64`, `ui32`, `ui8` - strings `str` *(currently only for scalars, support for matrix elements is still experimental)* - booleans `bool` *(currently only for scalars)* Data types and value types can be combined, e.g.: + - `matrix` is a matrix of double-precision floating point values ## Comments DaphneDSL supports single-line comments (starting with `#` or `//`) and multi-line comments (everything enclosed in `/*` and `*/`). -*Examples* -``` +*Examples:* + +```csharp # this is a comment print("Hello World!"); // this is also a comment /* comments can @@ -115,6 +120,7 @@ Furthermore, the following literals stand for special floating-point values: `na **String literals** are enclosed in quotation marks `"`. Special characters must be escaped using a backslash: + - `\n`: new line - `\t`: tab - `\"`: quotation mark @@ -123,8 +129,9 @@ Special characters must be escaped using a backslash: - `\f`: line feed - `\r`: carriage return -*Examples* -``` +*Examples*: + +```csharp "Hello World!" "line 1\nline 2\nline 3" "This is \"hello.daphne\"." @@ -136,20 +143,22 @@ A matrix literal consists of a comma-separated list of scalar literals, enclosed All scalars specified for the elements must be of the same type. Furthermore, all specified elements must be actual literals, i.e., expressions are not supported yet. The resulting matrix is always a column matrix, i.e., if *n* elements are specified, its shape is *(n x 1)*. -Note that the [built-in function](/doc/DaphneDSLBuiltins.md) `reshape` can be used to modify the shape. +Note that the [built-in function](/doc/DaphneDSL/Builtins.md) `reshape` can be used to modify the shape. *Examples:* -``` + +```r [1.0, 0.0, -4.0] # matrix with shape (3 x 1) reshape([1, 2, 3, 4], 1, 4) # matrix with shape (1 x 4) ``` -#### Variables +#### Variable Expressions Variables are referenced by their name. -*Examples* -``` +*Examples:* + +```text x ``` @@ -158,8 +167,9 @@ x Script arguments are named *literals* that can be passed to a DaphneDSL script. They are referenced by a dollar sign `$` followed by the argument's name. -*Examples* -``` +*Examples:* + +```r $x ``` @@ -201,12 +211,13 @@ The following table shows which combinations of inputs are allowed and which res | matrix (n x m) | matrix (1 x m) | matrix (n x m) | broadcasting of row-vector | | matrix (n x m) | matrix (n x 1) | matrix (n x m) | broadcasting of column-vector | -**(*)** *Scalar-`op`-matrix* operations are so far only supported for `+`, `-`, `*`, `/`; for `/` only if the matrix is of a floating-point value type. +**(\*)** *Scalar-`op`-matrix* operations are so far only supported for `+`, `-`, `*`, `/`; for `/` only if the matrix is of a floating-point value type. In the future, we will fully support *scalar-`op`-matrix* operations as well as row/column-matrices as the left-hand-side operands. -*Examples* -``` +*Examples:* + +```r 1.5 * X @ y + 0.001 x == 1 && y < 3.5 ``` @@ -215,8 +226,9 @@ x == 1 && y < 3.5 Parentheses can be used to manually control operator precedence. -*Examples* -``` +*Examples:* + +```r 1 * (2 + 3) ``` @@ -231,8 +243,9 @@ The rows and columns to extract can be specified independently in any of the fol Omitting the specification of rows/columns means extracting all rows/columns. -*Examples* -``` +*Examples:* + +```r X[, ] # same as X (all rows and columns) ``` @@ -243,8 +256,9 @@ This is supported for addressing rows and columns in matrices and frames. - *Single row/column position:* Extracts only the specified row/column. - *Examples* - ``` + *Examples:* + + ```r X[2, 3] # extracts the cell in row 2, column 3 as a 1 x 1 matrix ``` @@ -253,8 +267,9 @@ This is supported for addressing rows and columns in matrices and frames. The lower and upper bounds can be omitted independently of each other. In that case, they are replaced by zero and the number of rows/columns, respectively. - *Examples* - ``` + *Examples:* + + ```r X[2:5, 3] # extracts rows 2, 3, 4 of column 3 X[2, 3:] # extracts row 2 of all columns from column 3 onward X[:5, 3] # extracts rows 0, 1, 2, 3, 4 of column 3 @@ -266,8 +281,9 @@ This is supported for addressing rows and columns in matrices and frames. There are no restrictions on these positions, except that they must be in bounds. In particular, they do *not* need to be contiguous, sorted, or unique. - *Examples* - ``` + *Examples:* + + ```r X[ [5, 1, 3], ] # extracts rows 5, 1, and 3 X[, [2, 2, 2] ] # extracts column 2 three times ``` @@ -275,14 +291,16 @@ This is supported for addressing rows and columns in matrices and frames. Note that, when using matrix literals to specify the positions, a space must be left between the opening/closing bracket `[`/`]` of the indexing and that of the matrix literal, in order to avoid confusion with the indexing by bit vector. A few remarks on positions: + - Counting starts at zero. For instance, a 5 x 3 matrix has row positions 0, 1, 2, 3, and 4, and column positions 0, 1, and 2. - They must be non-negative. - They can be provided as integers or floating-point numbers (the latter are rounded down to integers). - They can be given as literals or as any expression evaluating to a suitable value. -*Examples* -``` +*Examples:* + +```r X[1.2, ] # same as X[1, ] X[1.9, ] # same as X[1, ] X[i, (j + 2*sum(Y)):] # expressions @@ -295,8 +313,9 @@ So far, this is only supported for addressing columns of frames. - *Single column label:* Extracts only the column with the given label. - *Examples* - ``` + *Examples:* + + ```r X[, "revenue"] # extracts the column labeled "revenue" X[100:200, "revenue"] # extracts rows 100 through 199 of the column labeled "revenue" ``` @@ -311,8 +330,9 @@ Only the rows/columns with a corresponding 1-value in the bit vector are present Note that double square brackets (`[[...]]`) must be used to distinguish indexing by bit vector from indexing by an arbitrary sequence of positions. -*Examples* -``` +*Examples:* + +```r # Assume X is a 4x3 matrix. X[[[0, 1, 1, 0], ]] # extracts rows 1 and 2 # same as X[[1, 2], ] @@ -328,6 +348,7 @@ Note that, when using a matrix literal to provide the column bit vector, there m Values can be casted to a particular type explicitly. Currently, it is possible to cast: + - between scalars of different types - between matrices of different value types - between matrix and frame @@ -336,8 +357,9 @@ Currently, it is possible to cast: Casts can either fully specify the target data *and* value type, or specify only the target data type *or* the target value type. In the latter case, the unspecified part of the type will be retained from the argument. -*Examples* -``` +*Examples:* + +```r as.scalar(x) # casts x to f64 scalar as.matrix(x) # casts x to a matrix of ui32 @@ -353,11 +375,12 @@ Note that casting to frames does not support changing the value/column type yet, #### Function calls -Function calls can address [*built-in* functions](/doc/DaphneDSLBuiltins.md) as well as [*user-defined* functions](#user-defined-functions-udfs), but the syntax is the same in both cases: +Function calls can address [*built-in* functions](/doc/DaphneDSL/Builtins.md) as well as [*user-defined* functions](#user-defined-functions-udfs), but the syntax is the same in both cases: The name of the function followed by a comma-separated list of positional parameters in parentheses. -*Examples* -``` +*Examples:* + +```r print("hello"); t(myMatrix); seq(0, 10, 2); @@ -366,11 +389,13 @@ seq(0, 10, 2); #### Conditional expression DaphneDSL supports the conditional expression with the general syntax: -``` + +```csharp condition ? then-value : else-value ``` The condition can be either a scalar or a matrix. + - *Condition is a scalar:* If the condition is `true` (when casted to boolean), then the result is the `then-value`. Otherwise, the result is the `else-value`. @@ -382,8 +407,9 @@ The condition can be either a scalar or a matrix. The `then-value` and `else-value` may also be scalars, in which case they are treated like matrices with a constant value. The result is a matrix of the same shape as the condition and the same value type as the `then-value`/`else-value`. -*Examples* -``` +*Examples:* + +```r (i > 5) ? 42.0 : -42.0 # 42.0 if i > 5, -42.0 otherwise [1, 0, 0, 1] ? [1.0, 2.0, 3.0, 4.0] : 99.9 # [1.0, 99.9, 99.9, 4.0] ``` @@ -399,8 +425,9 @@ Every expression followed by a semicolon `;` can be used as a statement. This is useful for expressions (especially function calls) which do not return a value. Nevertheless, it can also be used for expressions with one or more return values, in which case these values are ignored. -*Examples* -``` +*Examples:* + +```r print("hello"); # built-in function without return value 1 + 2; # value is ignored, useless but possible doSomething(); # possible return values are ignored, but the execution @@ -413,15 +440,17 @@ The return value(s) of an expression can be assigned to one (or more) variable(s **Single-assignments** are used for expressions with exactly one return value. -*Examples* -``` +*Examples:* + +```r x = 1 + 2; ``` **Multi-assignments** are used for expressions with more than one return value. -*Examples* -``` +*Examples:* + +```r evals, evecs = eigen(A); # eigen() returns two values, the (n x 1)-matrix of # eigen-values and the (n x n)-matrix of eigen-vectors # of the input matrix A. @@ -435,13 +464,15 @@ This is done by (left) indexing, whose syntax is similar to (right) indexing in Currently, left indexing is supported only for matrices. Furthermore, the rows/columns cannot be addressed by arbitrary positions lists or bit vectors (yet). -*Examples* -``` +*Examples:* + +```r X[5, 2] = [123]; # insert (1 x 1)-matrix X[10:20, 2:5] = fill(123, 10, 3); # insert (10 x 3)-matrix ``` The following conditions must be fulfilled: + - The left-hand-side variable must have been initialized. - The left-hand-side variable must be of data type matrix. - The right-hand-side expression must return a matrix. @@ -451,8 +482,9 @@ The following conditions must be fulfilled: Left indexing can be used with both single and multi-assignments. With the latter, it can be used with each variable on the left-hand side individually and independently. -*Examples* -``` +*Examples:* + +```r x, Y[3, :], Z = calculateSomething(); ``` @@ -462,8 +494,9 @@ Left indexing enables the modification of existing data objects, whereby the sem That is, if two different variables represent the same runtime data object, then left indexing on one of these variables does not have any effects on the other one. This is achieved by transparently copying the data as necessary. -*Examples* -``` +*Examples:* + +```r A = ...; # some matrix B = A; # copy-by-reference B[..., ...] = ...; # copy-on-write: changes B, but no effect on A @@ -484,7 +517,8 @@ Within a block, all variables from outside the block can be read and written. However, variables created inside a block are not visible anymore after the block. The syntax of a block statement is: -``` + +```r { statement1 statement2 @@ -492,8 +526,9 @@ The syntax of a block statement is: } ``` -*Examples* -``` +*Examples:* + +```r x = 1; { print(x); # read access @@ -507,12 +542,14 @@ print(y); # error #### If-then-else The syntax of an if-then-else statement is as follows: -``` + +```csharp if (condition) then-statement else else-statement ``` + *condition* is an expression returning a single value. If this value is `true` (when casted to value type `bool`, if necessary), the *then-statement* is executed. Otherwise, the *else-statement* is executed, *if it is present*. @@ -520,11 +557,13 @@ Note that the *else*-branch (keyword and statement) may be omitted. Furthermore, *then-statement* and *else-statement* can be block statements, to allow any number of statements in the then and else-branches. *Examples:* -``` + +```r if (sum(X) == 0) X = X + 1; ``` -``` + +```r if (2 * x > y) { z = z / 2; a = true; @@ -532,7 +571,8 @@ if (2 * x > y) { else z = z * 2; ``` -``` + +```r if (a) print("a"); else if (b) @@ -550,10 +590,12 @@ In the future we plan to support also parfor-loops as well as `break` and `conti For-loops are used to iterate over the elements of a sequence of integers. The syntax of a for-loop is as follows: -``` + +```r for (var in start:end[:step]) body-statement ``` + *var* must be a valid identifier and is assigned the values from *start* to *end* in increments of *step*. *start*, *end*, and *step* are expressions evaluating to a single number. *step* is optional and defaults to 1 if *end* is greater than *start*, or -1 otherwise. @@ -562,11 +604,13 @@ The *body-statement* is executed for each value in the sequence, and within the Note that the *body-statement* may be a block statement enclosing an arbitrary number of statements. *Examples:* -``` + +```csharp for(i in 1:3) print(i); # 1 2 3 ``` -``` + +```r x = 0; y = 0; for(i in 10:1:-3) { x = x + i; @@ -580,17 +624,20 @@ print(y); # 4 While loops are used to execute a (block of) statement(s) as long as an arbitrary condition holds true. The syntax of a while-loop is as follows: -``` + +```csharp while (condition) body-statement ``` + *condition* is an expression returning a single value, and is evaluated before each iteration. If this value is `true` (when casted to value type `bool`, if necessary), the *body-statement* is executed, and the loop starts anew. Otherwise, the program continues after the loop. Note that the *body-statement* may be a block statement enclosing an arbitrary number of statements. *Examples:* -``` + +```r i = 0; while(i < 10 && !converged) { A = A @ B; @@ -604,16 +651,19 @@ while(i < 10 && !converged) { Do-while-loops are a variant of while-loops, which checks the condition after each iteration. Consequently, a do-while-loop always executes at least one iteration. The syntax of a do-while-loop is as follows: -``` + +```csharp do body-statement while (condition); ``` + The semicolon at the end is optional. Note that the *body-statement* may be a block statement enclosing an arbitrary number of statements. *Examples:* -``` + +```csharp i = 5; do { A = sqrt(A); @@ -626,7 +676,7 @@ do { DaphneDSL allows users to define their own functions. The syntax of a function definition looks as follows: -``` +```csharp def funcName(paramName1[:paramType1], paramName2[:paramType2], ...) [-> returnType] { statement1 statement2 @@ -649,7 +699,8 @@ Functions must be defined in the top-level scope of a DaphneDSL script, i.e., a User-defined functions can return zero or more values. Values are returned by a `return`-statement with the following syntax: -``` + +```csharp return x; ``` @@ -658,7 +709,8 @@ Alternatively, it can be nested into if-then-else (early return), as long as it Note that multi-value returns are not fully supported yet. *Examples:* -``` + +```csharp def fib(n: si64) -> si64 { if (n <= 0) return 0; @@ -674,7 +726,8 @@ A user-defined function can be called like any other (built-in) function (see *f Note that user-defined functions returning multiple values are not fully supported yet. *Examples:* -``` + +```r fib(5); ``` @@ -694,6 +747,7 @@ Consistently, the types of untyped return values are infered from the parameter ## Example Scripts A few example DaphneDSL scripts can be found in: + - [scripts/algorithms/](/scripts/algorithms/) - [scripts/examples/](/scripts/examples/) - [test/api/cli/algorithms/](/test/api/cli/algorithms/) diff --git a/doc/DaphneLibAPIRef.md b/doc/DaphneLib/APIRef.md similarity index 77% rename from doc/DaphneLibAPIRef.md rename to doc/DaphneLib/APIRef.md index a3087d8f1..a458fa227 100644 --- a/doc/DaphneLibAPIRef.md +++ b/doc/DaphneLib/APIRef.md @@ -1,5 +1,5 @@ -# DaphneLib API Reference +# API Reference This document is a hand-crafted reference of the DaphneLib API. -A general introduction to [DaphneLib (DAPHNE's Python API)](/doc/DaphneLib.md) can be found in a separate document. +A general introduction to [DaphneLib (DAPHNE's Python API)](/doc/DaphneLib/Overview.md) can be found in a separate document. DaphneLib will offer numerous methods for *obtaining DAPHNE matrices and frames* as well as for *building complex computations* based on them. -Ultimately, DaphneLib will support all [DaphneDSL built-in functions](/doc/DaphneDSLBuiltins.md) on matrices and frames. +Ultimately, DaphneLib will support all [DaphneDSL built-in functions](/doc/DaphneDSL/Builtins.md) on matrices and frames. Futhermore, **we also plan to create a library of higher-level primitives** allowing users to productively implement integrated data analysis pipelines at a much higher level of abstraction. At the moment, the documentation is still rather incomplete. -However, as the methods largely map to DaphneDSL built-in functions, you can find some more information in the [List of DaphneDSL built-in functions](/doc/DaphneDSLBuiltins.md), for the time being. +However, as the methods largely map to DaphneDSL built-in functions, you can find some more information in the [List of DaphneDSL built-in functions](/doc/DaphneDSL/Builtins.md), for the time being. ## Obtaining DAPHNE Matrices and Frames ### `DaphneContext` -**Importing data from other Python libraries** +**Importing data from other Python libraries:** - **`from_numpy`**`(mat: np.array, shared_memory=True) -> Matrix` - **`from_pandas`**`(df: pd.DataFrame) -> Frame` -**Generating data in DAPHNE** +**Generating data in DAPHNE:** - **`fill`**`(arg, rows:int, cols:int) -> Matrix` - **`seq`**`(start, end, inc) -> Matrix` - **`rand`**`(rows: int, cols: int, min: Union[float, int] = None, max: Union[float, int] = None, sparsity: Union[float, int] = 0, seed: Union[float, int] = 0) -> Matrix` -**Reading files using DAPHNE's readers** +**Reading files using DAPHNE's readers:** - **`readMatrix`**`(file:str) -> Matrix` - **`readFrame`**`(file:str) -> Frame` ## Building Complex Computations -Complex computations can be built using Python operators (see [DaphneLib](/doc/DaphneLib.md)) and using DAPHNE matrix/frame/scalar methods. +Complex computations can be built using Python operators (see [DaphneLib](/doc/DaphneLib/Overview.md)) and using DAPHNE matrix/frame/scalar methods. In the following, we describe only the latter. ### `Matrix` API Reference -**Data Generation** +**Data Generation:** - **`diagMatrix`**`()` -**Matrix dimensions** +**Matrix dimensions:** - **`ncol`**`()` - **`nrow`**`()` -**Elementwise unary** +**Elementwise unary:** - **`sqrt`**`()` -**Elementwise binary** +**Elementwise binary:** - **`max`**`(other: 'Matrix')` - **`min`**`(other: 'Matrix')` -**Aggregation** +**Aggregation:** - **`sum`**`(axis: int = None)` - **`aggMin`**`(axis: int = None)` @@ -78,46 +78,46 @@ In the following, we describe only the latter. - **`mean`**`(axis: int = None)` - **`stddev`**`(axis: int = None)` -**Reorganization** +**Reorganization:** - **`t`**`()` -**Other matrix operations** +**Other matrix operations:** - **`solve`**`(other: 'Matrix')` -**Input/output** +**Input/output:** - **`print`**`()` - **`write`**`(file: str)` ### `Frame` API Reference -**Frame dimensions** +**Frame dimensions:** - **`nrow`**`()` - **`ncol`**`()` -**Reorganization** +**Reorganization:** - **`cbind`**`(other)` - **`rbind`**`(other)` -**Extended relational algebra** +**Extended relational algebra:** - **`cartesian`**`(other)` -**Input/output** +**Input/output:** - **`print`**`()` - **`write`**`(file: str)` ### `Scalar` API Reference -**Unary operations** +**Unary operations:** - **`sqrt`**`()` -**Input/output** +**Input/output:** -- **`print`**`()` \ No newline at end of file +- **`print`**`()` diff --git a/doc/DaphneLib.md b/doc/DaphneLib/Overview.md similarity index 91% rename from doc/DaphneLib.md rename to doc/DaphneLib/Overview.md index d8d2787ad..c12da2903 100644 --- a/doc/DaphneLib.md +++ b/doc/DaphneLib/Overview.md @@ -1,5 +1,5 @@ -# DaphneLib: DAPHNE's Python API +# Overview: DAPHNE's Python API DaphneLib is a simple user-facing Python API that allows calling individual basic and higher-level DAPHNE built-in functions. -The overall design follows similar abstractions like PySpark and Dask by using lazy evaluation. When the evaluation is triggered, DaphneLib assembles and executes a [DaphneDSL](/doc/DaphneDSLLanguageRef.md) script that uses the entire DAPHNE compilation and runtime stack, including all optimizations. +The overall design follows similar abstractions like PySpark and Dask by using lazy evaluation. When the evaluation is triggered, DaphneLib assembles and executes a [DaphneDSL](/doc/DaphneDSL/LanguageRef.md) script that uses the entire DAPHNE compilation and runtime stack, including all optimizations. Users can easily mix and match DAPHNE computations with other Python libraries and plotting functionality. **DaphneLib is still in an experimental stage, feedback and bug reports via GitHub issues are highly welcome.** @@ -58,7 +58,7 @@ Up until here, no acutal computations are performed. Instead, an internal DAG (directed acyclic graph) representation of the computation is built. When calling `compute()` on the result **(5)**, the DAG is automatically optimized and executed by DAPHNE. This principle is known as *lazy evaluation*. -(Internally, a [DaphneDSL](/doc/DaphneDSLLanguageRef.md) script is created, which is sent through the entire DAPHNE compiler and runtime stack, thereby profiting from all optimizations in DAPHNE.) +(Internally, a [DaphneDSL](/doc/DaphneDSL/LanguageRef.md) script is created, which is sent through the entire DAPHNE compiler and runtime stack, thereby profiting from all optimizations in DAPHNE.) The result is returned as a `numpy.ndarray` (for DAPHNE matrices), as a `pandas.DataFrame` (for DAPHNE frames), or as a plain Python scalar (for DAPHNE scalars), and can then be further used in Python. The script above can be executed by: @@ -68,8 +68,10 @@ python3 scripts/examples/daphnelib/shift-and-scale.py ``` Note that there are some **temporary limitations** (which will be fixed in the future): + - `python3` must be executed from the DAPHNE base directory. - Before executing DaphneLib Python scripts, the environment variable `PYTHONPATH` must be updated by executing the following command once per session: + ```bash export PYTHONPATH="$PYTHONPATH:$PWD/src/" ``` @@ -81,35 +83,40 @@ The remainder of this document presents the core features of DaphneLib *as they DAPHNE differentiates *data types* and *value types*. Currently, DAPHNE supports the following *abstract* **data types**: + - `matrix`: homogeneous value type for all cells - `frame`: a table with columns of potentially different value types - `scalar`: a single value **Value types** specify the representation of individual values. We currently support: + - floating-point numbers of various widths: `f64`, `f32` - signed and unsigned integers of various widths: `si64`, `si32`, `si8`, `ui64`, `ui32`, `ui8` - strings `str` *(currently only for scalars, support for matrix elements is still experimental)* - booleans `bool` *(currently only for scalars)* Data types and value types can be combined, e.g.: + - `matrix` is a matrix of double-precision floating point values In DaphneLib, each node of the computation DAG has one of the types `api.python.operator.nodes.matrix.Matrix`, `api.python.operator.nodes.frame.Frame`, or `api.python.operator.nodes.scalar.Scalar`. -The type of a node determines which methods can be invoked on it (see [DaphneLib API reference](/doc/DaphneLibAPIRef.md)). +The type of a node determines which methods can be invoked on it (see [DaphneLib API reference](/doc/DaphneLib/APIRef.md)). ## Obtaining DAPHNE Matrices and Frames The `DaphneContext` offers means to obtain DAPHNE matrices and frames, which serve as the starting point for defining complex computations. More precisely, DAPHNE matrices and frames can be obtained in the following ways: + - importing data from other Python libraries (e.g., numpy and pandas) - generating data in DAPHNE (e.g., random data, constants, or sequences) - reading files using DAPHNE's readers (e.g., CSV, Matrix Market, Parquet, DAPHNE binary format) -A comprehensive list can be found in the [DaphneLib API reference](/doc/DaphneLibAPIRef.md#daphnecontext). +A comprehensive list can be found in the [DaphneLib API reference](/doc/DaphneLib/APIRef.md#daphnecontext). ## Building Complex Computations Based on DAPHNE matrices/frames/scalars and Python scalars, complex expressions can be defined by + - Python operators - DAPHNE matrix/frame/scalar methods @@ -143,22 +150,24 @@ The following table shows which combinations of inputs are allowed and which res | matrix (n x m) | matrix (1 x m) | matrix (n x m) | broadcasting of row-vector | | matrix (n x m) | matrix (n x 1) | matrix (n x m) | broadcasting of column-vector | -**(*)** *Scalar-`op`-matrix* operations are so far only supported for `+`, `-`, `*`, `/`; for `/` only if the matrix is of a floating-point value type. +**(\*)** *Scalar-`op`-matrix* operations are so far only supported for `+`, `-`, `*`, `/`; for `/` only if the matrix is of a floating-point value type. In the future, we will fully support *scalar-`op`-matrix* operations as well as row/column-matrices as the left-hand-side operands. -*Examples* -``` +*Examples:* + +```r 1.5 * X @ y + 0.001 ``` ### Matrix/Frame/Scalar Methods DaphneLib's classes `Matrix`, `Frame`, and `Scalar` offer a range of methods to call DAPHNE built-in functions. -A comprehensive list can be found in the [DaphneLib API reference](/doc/DaphneLibAPIRef.md#building-complex-computations). +A comprehensive list can be found in the [DaphneLib API reference](/doc/DaphneLib/APIRef.md#building-complex-computations). -*Examples* -``` +*Examples:* + +```r X.t() X.sqrt() X.cbind(Y) @@ -168,17 +177,18 @@ X.cbind(Y) DaphneLib will support efficient data exchange with other well-known Python libraries, in both directions. The data transfer from other Python libraries to DaphneLib can be triggered through the `from_...()` methods of the `DaphneContext` (e.g., `from_numpy()`). -A comprehensive list of these methods can be found in the [DaphneLib API reference](/doc/DaphneLibAPIRef.md#daphnecontext). +A comprehensive list of these methods can be found in the [DaphneLib API reference](/doc/DaphneLib/APIRef.md#daphnecontext). The data transfer from DaphneLib back to Python happens during the call to `compute()`. If the result of the computation in DAPHNE is a matrix, `compute()` returns a `numpy.ndarray`; if the result is a frame, it returns a `pandas.DataFrame`; and if the result is a scalar, it returns a plain Python scalar. So far, DaphneLib can exchange data with numpy (via shared memory) and pandas (via CSV files). Enabling data exchange with TensorFlow and PyTorch is on our agenda. -Furthermore, we are working on making the data exchange more efficient in general. +Furthermore, we are working on making the data exchange more efficient in general. ### Data Exchange with numpy *Example:* + ```python from api.python.context.daphne_context import DaphneContext import numpy as np @@ -201,12 +211,16 @@ X = X + 100.0 print("\nResult of adding 100 to each value, back in Python:") print(X.compute()) ``` + *Run by:* -``` + +```shell python3 scripts/examples/daphnelib/data-exchange-numpy.py ``` + *Output:* -``` + +```text How DAPHNE sees the data from numpy: DenseMatrix(2x4, double) 0 1 2 3 @@ -215,12 +229,12 @@ DenseMatrix(2x4, double) Result of adding 100 to each value, back in Python: [[100. 101. 102. 103.] [104. 105. 106. 107.]] - ``` ### Data Exchange with pandas *Example:* + ```python from api.python.context.daphne_context import DaphneContext import pandas as pd @@ -243,12 +257,16 @@ F = F.rbind(F) print("\nResult of appending the frame to itself, back in Python:") print(F.compute()) ``` + *Run by:* -``` + +```shell python3 scripts/examples/daphnelib/data-exchange-pandas.py ``` + *Output:* -``` + +```text How DAPHNE sees the data from pandas: Frame(3x2, [a:int64_t, b:double]) 1 1.1 @@ -274,4 +292,4 @@ We plan to fix all of these limitations in the future. - Using DAPHNE's command-line arguments to influence its behavior is not supported yet. - Many DaphneDSL built-in functions are not represented by DaphneLib methods yet. - Complex control flow (if-then-else, loops, functions) are not supported yet. Python control flow statements are of limited applicability for DaphneLib. -- High-level primitives for integrated data analysis pipelines, which are implemented in DaphneDSL, cannot be called from DaphneLib yet. \ No newline at end of file +- High-level primitives for integrated data analysis pipelines, which are implemented in DaphneDSL, cannot be called from DaphneLib yet. diff --git a/doc/Deploy.md b/doc/Deploy.md index 2a175d77e..eee8f4739 100644 --- a/doc/Deploy.md +++ b/doc/Deploy.md @@ -1,11 +1,11 @@ -# DAPHNE Packaging, Distributed Deployment, and Management +# Deploying + +DAPHNE Packaging, Distributed Deployment, and Management ## Overview This file explains the deployment of the **Daphne system**, on HPC with SLURM or manually through SSH, and highlights the excerpts from descriptions of functionalities in [deploy/](/deploy/) directory (mostly [deploy-distributed-on-slurm.sh](/deploy/deploy-distributed-on-slurm.sh)): + - compilation of the Singularity image, - compilation of Daphne (and the Daphne DistributedWorker) within the Singularity image, - packaging compiled Daphne, @@ -39,6 +42,7 @@ through an environmental variable. ## Deploying without Slurm support **`deployDistributed.sh`** can be used to manually connect to a list of machines and remotely start up workers, get status of running workers or terminate distributed worker processes. This script depends only on an SSH client/server and does not require any use of a resource management tool (e.g. SLURM). With this script you can: + - build and deploy DistributedWorkers to remote machines - start workers - check status of running workers @@ -53,6 +57,7 @@ Ssh username must be specified inside the script. For now the script assumes all Usage example: ```bash +# deploy distributed $ ./deployDistributed.sh --help $ ./deployDistributed.sh --deploy --pathToBuild /path/to/dir --peers localhost:5000,localhost:5001 $ ./deployDistributed.sh -r # (Uses default peers and path/to/build/ to start workers) @@ -67,8 +72,7 @@ Building the Daphne system (to be later deployed on distributed nodes) can be do This explains how to set up the Distributed Workers on a HPC platform, and it also briefly comments on what to do afterwards (how to run, manage, stop, and clean it). Commands, with their parameters and arguments, are hence described below for deployment with [deploy-distributed-on-slurm.sh](/deploy/deploy-distributed-on-slurm.sh). - -``` +```shell Usage: deploy-distributed-on-slurm.sh Start the DAPHNE distributed deployment on remote machines using Slurm. @@ -114,121 +118,125 @@ Logs can be found at [pathToBuild]/logs. ``` ### Short Examples + The following list presents few examples about how to use the [deploy-distributed-on-slurm.sh](/deploy/deploy-distributed-on-slurm.sh) command. These comprise more hands-on documentation about deployment, including tutorial-like explanation examples about how to package, distributively deploy, manage, and execute workloads using DAPHNE. 1. Builds the Singularity image and uses it to compile the build directory codes, then packages it. -```shell -./deploy-distributed-on-slurm.sh singularity && ./deploy-distributed-on-slurm.sh build && ./deploy-distributed-on-slurm.sh package -``` + ```shell + ./deploy-distributed-on-slurm.sh singularity && ./deploy-distributed-on-slurm.sh build && ./deploy-distributed-on-slurm.sh package + ``` -2. Transfers a package to the target platform through OpenSSH, using login node HPC, user hpc, and identify key hpc.pub. -```shell -./deploy-distributed-on-slurm.sh --login HPC --user hpc -i ~/.ssh/hpc.pub transfer -``` +1. Transfers a package to the target platform through OpenSSH, using login node HPC, user hpc, and identify key hpc.pub. + ```shell + ./deploy-distributed-on-slurm.sh --login HPC --user hpc -i ~/.ssh/hpc.pub transfer + ``` -3. Using login node HPC, accesses the target platform and starts workers on remote machines. -```shell -./deploy-distributed-on-slurm.sh -l HPC start -``` +1. Using login node HPC, accesses the target platform and starts workers on remote machines. + ```shell + ./deploy-distributed-on-slurm.sh -l HPC start + ``` -4. Runs one request (script called example-time.daphne) on the deployment using 1024 cores, login node HPC, and default OpenSSH configuration. -```shell -./deploy-distributed-on-slurm.sh -l HPC -n 1024 run example-time.daphne -``` +1. Runs one request (script called example-time.daphne) on the deployment using 1024 cores, login node HPC, and default OpenSSH configuration. + ```shell + ./deploy-distributed-on-slurm.sh -l HPC -n 1024 run example-time.daphne + ``` -5. Executes one request (DaphneDSL script input from standard input) at a running deployed platform, using default singularity/srun configurations. -```shell -./deploy-distributed-on-slurm.sh run -``` +1. Executes one request (DaphneDSL script input from standard input) at a running deployed platform, using default singularity/srun configurations. + ```shell + ./deploy-distributed-on-slurm.sh run + ``` -6. Deploys once at the target platform through OpenSSH using default login node (localhost), then cleans. -```shell -./deploy-distributed-on-slurm.sh deploy -n 10 -``` +1. Deploys once at the target platform through OpenSSH using default login node (localhost), then cleans. + ```shell + ./deploy-distributed-on-slurm.sh deploy -n 10 + ``` -7. Starts workers at a running deployed platform using custom srun arguments (2 hours dual-core with 10G memory). -```shell -./deploy-distributed-on-slurm.sh workers -R="-t 120 --mem-per-cpu=10G --cpu-bind=cores --cpus-per-task=2" -``` +1. Starts workers at a running deployed platform using custom srun arguments (2 hours dual-core with 10G memory). + ```shell + ./deploy-distributed-on-slurm.sh workers -R="-t 120 --mem-per-cpu=10G --cpu-bind=cores --cpus-per-task=2" + ``` -8. Executes a request with custom srun arguments (30 minutes single-core). -```shell -./deploy-distributed-on-slurm.sh run -R="--time=30 --cpu-bind=cores --nodes=1 --ntasks-per-node=1 --cpus-per-task=1" -``` +1. Executes a request with custom srun arguments (30 minutes single-core). + ```shell + ./deploy-distributed-on-slurm.sh run -R="--time=30 --cpu-bind=cores --nodes=1 --ntasks-per-node=1 --cpus-per-task=1" + ``` -9. Example request job from a pipe. -```shell -cat ../scripts/examples/hello-world.daph | ./deploy-distributed-on-slurm.sh run -``` +1. Example request job from a pipe. + ```shell + cat ../scripts/examples/hello-world.daph | ./deploy-distributed-on-slurm.sh run + ``` ### Scenario Usage Example Here is a scenario usage as a longer example demo. 1. Fetch the code from the latest GitHub code repository. -```shell -function compile() { - git clone --recursive git@github.com:daphne-eu/daphne.git 2>&1 | tee daphne-$(date +%F-%T).log - cd daphne/deploy - ./deploy-distributed-on-slurm.sh singularity # creates the Singularity container image - ./deploy-distributed-on-slurm.sh build # Builds the daphne codes using the container -} -compile -``` + ```shell + function compile() { + git clone --recursive git@github.com:daphne-eu/daphne.git 2>&1 | tee daphne-$(date +%F-%T).log + cd daphne/deploy + ./deploy-distributed-on-slurm.sh singularity # creates the Singularity container image + ./deploy-distributed-on-slurm.sh build # Builds the daphne codes using the container + } + compile + ``` -2. Package the built targets (binaries) to packet file `daphne-package.tgz`. -```shell -./deploy-distributed-on-slurm.sh package -``` +1. Package the built targets (binaries) to packet file `daphne-package.tgz`. + ```shell + ./deploy-distributed-on-slurm.sh package + ``` -3. Transfer the packet file `daphne-package.tgz` to `HPC` (Slurm) with OpenSSH key `~/.ssh/hpc.pub` and unpack it. -```shell -./deploy-distributed-on-slurm.sh --login HPC --user $USER -i ~/.ssh/hpc.pub transfer -``` -E.g., for EuroHPC Vega, use the instance, if your username matches the one at Vega and the key is `~/.ssh/hpc.pub`: -```shell -./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub transfer -``` +1. Transfer the packet file `daphne-package.tgz` to `HPC` (Slurm) with OpenSSH key `~/.ssh/hpc.pub` and unpack it. + ```shell + ./deploy-distributed-on-slurm.sh --login HPC --user $USER -i ~/.ssh/hpc.pub transfer + ``` -4. Start the workers from the local computer by logging into the HPC login node: -```shell -./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub start -``` + E.g., for EuroHPC Vega, use the instance, if your username matches the one at Vega and the key is `~/.ssh/hpc.pub`: + ```shell + ./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub transfer + ``` -5. Starting a main target on the HPC (Slurm) and connecting it with the started workers, to execute payload from the stream. -```shell -cat ../scripts/examples/hello-world.daph | ./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub run -``` +1. Start the workers from the local computer by logging into the HPC login node: + ```shell + ./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub start + ``` -6. Starting a main target on the HPC (Slurm) and connecting it with the started workers, to execute payload from a file. -```shell -./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub run example-time.daphne -``` +1. Starting a main target on the HPC (Slurm) and connecting it with the started workers, to execute payload from the stream. + ```shell + cat ../scripts/examples/hello-world.daph | ./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub run + ``` -7. Stopping all workers on the HPC (Slurm). -```shell -./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub stop -``` +1. Starting a main target on the HPC (Slurm) and connecting it with the started workers, to execute payload from a file. + ```shell + ./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub run example-time.daphne + ``` -8. Cleaning the uploaded targets from the HPC login node. -```shell -./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub clean -``` +1. Stopping all workers on the HPC (Slurm). + + ```shell + ./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub stop + ``` + +1. Cleaning the uploaded targets from the HPC login node. + + ```shell + ./deploy-distributed-on-slurm.sh --login login.vega.izum.si --user $USER -i ~/.ssh/hpc.pub clean + ``` diff --git a/doc/DistributedRuntime.md b/doc/DistributedRuntime.md index 6ed6be72d..f68f35a05 100644 --- a/doc/DistributedRuntime.md +++ b/doc/DistributedRuntime.md @@ -1,7 +1,7 @@ -# Running DAPHNE on the Distributed Runtime +# Distributed Runtime + +Running DAPHNE on the Distributed Runtime ## Background -Daphne supports execution in a distributed fashion. Utilizing the Daphne Distributed Runtime -does not require any changes to the DaphneDSL script. -Similar to the local vectorized engine ([here, section 4](https://daphne-eu.eu/wp-content/uploads/2022/08/D2.2-Refined-System-Architecture.pdf)), the compiler automatically fuses operations and -creates pipelines for the distributed runtime, which then -uses multiple distributed nodes (workers) that work on their local data, while a main node, the coordinator, is responsible -for transferring the data and code to be executed. As mentioned above, changes at DaphneDSL -code are not needed, however the user is required to start the workers, either manually or -using an -HPC tool as SLURM (scripts that start the workers locally or remotely, natively or not, can be found [here](/deploy)). +Daphne supports execution in a distributed fashion. Utilizing the Daphne Distributed Runtime does not require any changes to the DaphneDSL script. +Similar to the local vectorized engine ([here, section 4](https://daphne-eu.eu/wp-content/uploads/2022/08/D2.2-Refined-System-Architecture.pdf)), the compiler automatically fuses operations and creates pipelines for the distributed runtime, which then uses multiple distributed nodes (workers) that work on their local data, while a main node, the coordinator, is responsible for transferring the data and code to be executed. +As mentioned above, changes at DaphneDSL code are not needed, however the user is required to start the workers, either manually or using an HPC tool as SLURM (scripts that start the workers locally or remotely, natively or not, can be found [here](/deploy)). -## Scope +## Scope This document focuses on: + - how to start distributed workers - executing Daphne scripts on the distributed runtime -- DAPHNE's distributed runtime has two different backends. This page explains how things work with the **gRPC backend**. -A brief introduction to the other backend using **OpenMPI** can be viewed in [this document](MPI-Usage.md). +- DAPHNE's distributed runtime has two different backends. This page explains how things work with the **gRPC backend**. +A brief introduction to the other backend using **OpenMPI** can be viewed in [this document](MPI-Usage.md). ## Build the Daphne prototype -First you need to build the Daphne prototype. This doc assumes that you already built Daphne and can run it locally. If -you need help building or running Daphne see [here](/doc/GettingStarted.md). +First you need to build the Daphne prototype. This doc assumes that you already built Daphne and can run it locally. If you need help building or running Daphne see [here](/doc/GettingStarted.md). ## Building the Distributed Worker The Daphne distributed worker is a different executable which can be build using the build-script and providing the `--target` argument: + ```bash ./build.sh --target DistributedWorker ``` @@ -58,8 +55,7 @@ Before executing Daphne on the distributed runtime, worker nodes must first be u ./bin/DistributedWorker IP:PORT ``` -There are [scripts](/deploy) that automate this task and can help running multiple workers at once -locally or even utilizing tools (like SLURM) in HPC environments. +There are [scripts](/deploy) that automate this task and can help running multiple workers at once locally or even utilizing tools (like SLURM) in HPC environments. Each worker can be left running and reused for multiple scripts and pipeline executions (however, for now they might run into memory issues, see **Limitations** section below). @@ -67,8 +63,8 @@ Each worker can be terminated by sending a `SIGINT` (Ctrl+C) or by using the scr ## Set up environmental variables -After setting up the workers, before we run Daphne we need to specify which IPs -and ports the workers are listening too. For now we use an environmental variable called +After setting up the workers, before we run Daphne we need to specify which IPs +and ports the workers are listening too. For now we use an environmental variable called `DISTRIBUTED_WORKERS` where we list IPs and ports of the workers separated by a comma. ```bash @@ -88,7 +84,7 @@ Now that we have all workers up and running and the environmental variable is se ./bin/daphne --distributed ./example.script ``` -For now only asynchronous-gRPC is implemented as a distributed backend and selection is hardcoded [here](/src/runtime/distributed/coordinator/kernels/DistributedWrapper.h#L73). +For now only asynchronous-gRPC is implemented as a distributed backend and selection is hardcoded [here](/src/runtime/distributed/coordinator/kernels/DistributedWrapper.h#L73). @@ -96,12 +92,14 @@ TODO: PR #436 provides support for MPI and implements a cli argument for selecti ## Example On one terminal with start up a Distributed Worker: + ```bash $./bin/DistributedWorker localhost:5000 Started Distributed Worker on `localhost:5000` ``` On another terminal we set the environment variable and execute script [`distributed.daph`](/scripts/examples/distributed.daph): + ```bash $ export DISTRIBUTED_WORKERS=localhost:5000 $ ./bin/daphne --distributed ./scripts/example/distributed.daph @@ -115,13 +113,12 @@ Distributed Runtime is still under development and currently there are various l created and multiple operations are fused together (more [here - section 4](https://daphne-eu.eu/wp-content/uploads/2022/08/D2.2-Refined-System-Architecture.pdf)). This causes some limitations related to pipeline creation (e.g. [not supporting pipelines with different result outputs](/issues/397) or pipelines with no outputs). - For now distributed runtime only supports `DenseMatrix` types and value types `double` - `DenseMatrix` (issue [#194](/issues/194)). - A Daphne pipeline input might exist multiple times in the input array. For now this is not supported. In the future similar pipelines will simply omit multiple pipeline inputs and each one will be provided only once. -- Garbage collection at worker (node) level is not implemented yet. This means that after some time -the workers can fill up their memory completely, requiring a restart. - +- Garbage collection at worker (node) level is not implemented yet. This means that after some time the workers can fill up their memory completely, requiring a restart. ## What Next? You might want to have a look at + - the [distributed runtime development guideline](/doc/development/ExtendingDistributedRuntime.md) - the [contribution guidelines](/CONTRIBUTING.md) - the [open distributed related issues](https://github.com/daphne-eu/daphne/issues?q=is%3Aopen+is%3Aissue+label%3ADistributed) diff --git a/doc/FPGAconfiguration.md b/doc/FPGAconfiguration.md index 8cd702d0d..7f3239edb 100644 --- a/doc/FPGAconfiguration.md +++ b/doc/FPGAconfiguration.md @@ -14,49 +14,46 @@ See the License for the specific language governing permissions and limitations under the License. --> -# FPGA configuration for usage in DAPHNE +# FPGA Configuration +FPGA configuration for usage in DAPHNE -### System requirments +## System requirments Daphne build script for FPGA kernels support requires additional QUARTUSDIR system variable definition. Example command is presented in fpga-build-env.sh or in the following command: -export QUARTUSDIR=/opt/intel/intelFPGA_pro/21.4 +`export QUARTUSDIR=/opt/intel/intelFPGA_pro/21.4` -To build the Daphne with the FPGA support -fpgaopencl flag has to be used: - - ./build.sh --fpgaopenc +To build the Daphne with the FPGA support `-fpgaopencl` flag has to be used: +`./build.sh --fpgaopenc` To run developed or precompiled, included in Daphne repository FPGA OpenCL kernels an installedand configured FPGA device is required. -Our example kernels have been tested using Intel(R) PAC D5005 card (https://www.intel.com/content/www/us/en/products/sku/193921/intel-fpga-pac-d5005/specifications.html) +Our example kernels have been tested using [Intel(R) PAC D5005 card](https://www.intel.com/content/www/us/en/products/sku/193921/intel-fpga-pac-d5005/specifications.html) -DAPHNE contains some example linear algebra kernels developed using T2SP framework(https://github.com/IntelLabs/t2sp/blob/master/README.md). -Example precompiled FPGA kernels can be usedon DAPHNE DSL description level. -To prepare the system for the precompiled FPGA kernels some FPGA and OpenCL system variables are required. -The easiest way to set up required varables is to use the init_opencl.sh script from installed Intel(R) Quartus sowtware or from the -Intel(R) OpenCL RTE or Intel(R) OpenCL SDK packages. +DAPHNE contains some example linear algebra kernels developed using [T2SP framework](https://github.com/IntelLabs/t2sp/blob/master/README.md). +Example precompiled FPGA kernels can be usedon DAPHNE DSL description level. +To prepare the system for the precompiled FPGA kernels some FPGA and OpenCL system variables are required. +The easiest way to set up required varables is to use the init_opencl.sh script from installed Intel(R) Quartus sowtware or from the +Intel(R) OpenCL RTE or Intel(R) OpenCL SDK packages. Example script usage: -source /opt/intel/intelFPGA_pro/21.4/hld/init_opencl.sh - -For additional details please look into https://www.intel.com/content/www/us/en/docs/programmable/683550/18-1/standard-edition-getting-started-guide.html -or https://www.intel.com/content/www/us/en/software/programmable/sdk-for-opencl/overview.html. +`source /opt/intel/intelFPGA_pro/21.4/hld/init_opencl.sh` +For additional details please look into the [Intel guide](https://www.intel.com/content/www/us/en/docs/programmable/683550/18-1/standard-edition-getting-started-guide.html) +or [SDK for openCL](https://www.intel.com/content/www/us/en/software/programmable/sdk-for-opencl/overview.html). -### Precompiled FPGA kernels usage +### Precompiled FPGA Kernels To use a precompiled FPGA kernel a FPGA image is required (*.aocx). FPGA device has to programmed with particular image which contains required kernel implementation. Example FPGA programming command using example FPGA image: - aocl program acl0 src/runtime/local/kernels/FPGAOPENCL/bitstreams/sgemm.aocx - +`aocl program acl0 src/runtime/local/kernels/FPGAOPENCL/bitstreams/sgemm.aocx` Additionally the BITSTREAM variable has to be defind in the system. Please look into the following example: - export BITSTREAM=src/runtime/local/kernels/FPGAOPENCL/bitstreams/sgemm.aocx +`export BITSTREAM=src/runtime/local/kernels/FPGAOPENCL/bitstreams/sgemm.aocx` When another FPGA image contains implementation for another required computational kernel then FPGA device has to be reprogrammed and BITSTREAM variable value has to be changed. - diff --git a/doc/FileMetaDataFormat.md b/doc/FileMetaDataFormat.md index c951fc039..aa43d1c72 100644 --- a/doc/FileMetaDataFormat.md +++ b/doc/FileMetaDataFormat.md @@ -14,20 +14,22 @@ See the License for the specific language governing permissions and limitations under the License. --> -## Reading and writing (meta) data in Daphne +# Read and Write Data -When loading data with ``read()`` in a DaphneDSL script, the system expects a file with the same file name in the same -directory as the data file with an additional extension ``.meta``. This file contains a description of meta data stored +Reading and writing (meta) data in Daphne. + +When loading data with ``read()`` in a DaphneDSL script, the system expects a file with the same file name in the same +directory as the data file with an additional extension ``.meta``. This file contains a description of meta data stored in JSON format. -There are two slightly varying ways of specifying meta data depending on whether there is a schema for the columns (e.g., -a data frame - the corresponding C++ type is the Frame class) or not (this data can currently (as of version 0.1) be -loaded as DenseMatrix or CSRMatrix where VT is the value type template parameter). +There are two slightly varying ways of specifying meta data depending on whether there is a schema for the columns (e.g., +a data frame - the corresponding C++ type is the Frame class) or not (this data can currently (as of version 0.1) be +loaded as `DenseMatrix` or `CSRMatrix` where `VT` is the value type template parameter). + +If data is written from a DaphneDSL script via ``write()``, the meta data file will be written to the corresponding ``filename.meta``. -If data is written from a DaphneDSL script via ``write()``, the meta data file will be written to the corresponding -``filename.meta``. +## Currently supported JSON fields -### Currently supported JSON fields: | Name | Expected Data | Allowed values | |-------------|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | numRows | Integer | # number of rows | @@ -37,16 +39,19 @@ If data is written from a DaphneDSL script via ``write()``, the meta data file w | schema | JSON | nested elements of "label" and "valueType" fields | | label | String | column name/header (optional, may be empty string "") | - +## Matrix Example -### Simple matrix example: The example below describes a 2 by 4 dense matrix of double precision values. -##### CSV Data -``` + +### Matrix CSV + +```text -0.1,-0.2,0.1,0.2 3.14,5.41,6.22216,5 ``` -#### Metadata + +### Matrix Metadata + ```json { "numRows": 2, @@ -56,17 +61,20 @@ The example below describes a 2 by 4 dense matrix of double precision values. } ``` +## Data Frame Example -### Simple example of a data frame The example below describes a 2 by 2 Frame with signed integers in the first columns named foo and double precision values in the second column named bar. -##### CSV Data -``` +### Data Frame CSV + +```text 1,0.5 2,1.0 ``` -#### Metadata + +### Data Frame Metadata + ```json { "numRows": 2, @@ -84,7 +92,8 @@ values in the second column named bar. } ``` -### Example meta data of a Frame with default valueType +#### Data Frame Meta Data with Default ValueType + ```json { "numRows": 5, @@ -107,7 +116,8 @@ values in the second column named bar. } ``` -### Example meta data of a Frame with empty labels +### Data Frame Meta Data with Empty Labels + ```json { "numRows": 5, diff --git a/doc/GettingStarted.md b/doc/GettingStarted.md index 7a8bbdaf8..98d02b0f0 100644 --- a/doc/GettingStarted.md +++ b/doc/GettingStarted.md @@ -18,7 +18,7 @@ limitations under the License. This document summarizes everything you need to know to get started with using or extending the DAPHNE system. -### System Requirements +## System Requirements Please ensure that your development system meets the following requirements before trying to build the system. @@ -26,7 +26,7 @@ Please ensure that your development system meets the following requirements befo You can view the version numbers as an orientation rather than a strict requirement. Newer versions should work as well, older versions might work as well. -##### Operating system +### Operating system | OS | distribution/version known to work (*) | Comment | |------------|----------------------------------------|----------------------------------------------------------------------------| @@ -35,11 +35,12 @@ Newer versions should work as well, older versions might work as well. | GNU/Linux | Ubuntu 18.04 | Used with Intel PAC D5005 FPGA, custom toolchain needed | | MS Windows | 10 Build 19041, 11 | Should work in Ubuntu WSL, using the provided Docker images is recommended | -##### Windows +#### Windows + Installing WSL and Docker should be straight forward using the documentation proveded by [Microsoft](https://learn.microsoft.com/en-us/windows/wsl/). On an installed WSL container launching DAPHNE via Docker (see below) should work the same way as in a native installation. -##### Software +### Software | tool/lib | version known to work (*) | comment | |--------------------------------------|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------| @@ -65,13 +66,13 @@ launching DAPHNE via Docker (see below) should work the same way as in a native | OneAPI SDK | 2022.x | Optional for OneAPI ops | | Intel FPGA SDK or OneAPI FPGA Add-On | 2022.x | Optional for FPGAOPENCL ops | -##### Hardware +### Hardware - - about 7.5 GB of free disk space to build from source (mostly due to dependencies) - * Optional: - * NVidia GPU for CUDA ops (tested on Pascal and newer architectures); 8GB for CUDA SDK - * Intel GPU for OneAPI ops (tested on Coffeelake graphics); 23 GB for OneAPI - * Intel FPGA for FPGAOPENCL ops (tested on PAC D5005 accelerator); 23 GB for OneAPI +- about 7.5 GB of free disk space to build from source (mostly due to dependencies) +- Optional: + - NVidia GPU for CUDA ops (tested on Pascal and newer architectures); 8GB for CUDA SDK + - Intel GPU for OneAPI ops (tested on Coffeelake graphics); 23 GB for OneAPI + - Intel FPGA for FPGAOPENCL ops (tested on PAC D5005 accelerator); 23 GB for OneAPI ### Obtaining the Source Code @@ -108,17 +109,22 @@ Simply build the system using the build-script without any arguments: When you do this the first time, or when there were updates to the LLVM submodule, this will also download and build the third-party material, which might increase the build time significantly. Subsequent builds, e.g., when you changed something in this repository, will be much faster. -If the build fails in between (e.g., due to missing packages), multiple build directories (e.g., daphne, antlr, llvm) +If the build fails in between (e.g., due to missing packages), multiple build directories (e.g., daphne, antlr, llvm) require cleanup. To only remove build output use the following two commands: + ```bash ./build.sh --clean ./build.sh --cleanDeps ``` + If you want to remove downloaded and extracted artifacts, use this: + ```bash ./build.sh --cleanCache ``` + For convenience, you can call the following to remove them all. + ```bash ./build.sh --cleanAll ``` @@ -127,9 +133,9 @@ See [this page](/doc/development/BuildingDaphne.md) for more information. ### Setting up the environment -As DAPHNE uses shared libraries, these need to be found by the operating system's loader to link them at runtime. -Since most DAPHNE setups will not end up in one of the standard directories (e.g., `/usr/local/lib`), environment variables -are a convenient way to set everything up without interfering with system installations (where you might not even have +As DAPHNE uses shared libraries, these need to be found by the operating system's loader to link them at runtime. +Since most DAPHNE setups will not end up in one of the standard directories (e.g., `/usr/local/lib`), environment variables +are a convenient way to set everything up without interfering with system installations (where you might not even have administrative privileges to do so). ```bash @@ -153,9 +159,9 @@ We use [catch2](https://github.com/catchorg/Catch2) as the unit test framework. ### Running the DAPHNE system -Write a little DaphneDSL script or use [`scripts/examples/hello-world.daph`](../scripts/examples/hello-world.daph)... +Write a little DaphneDSL script or use [`scripts/examples/hello-world.daph`](/scripts/examples/hello-world.daph)... -``` +```csharp x = 1; y = 2; print(x + y); @@ -166,36 +172,40 @@ print(m + m); print(t(m)); ``` -... and execute it as follows: `bin/daphne scripts/examples/hello-world.daph` (This command works if Daphne is run +... and execute it as follows: `bin/daphne scripts/examples/hello-world.daph` (This command works if Daphne is run after building from source. Omit "build" in the path to the Daphne binary if executed from the binary distribution). -Optionally flags like ``--cuda`` can be added after the daphne command and before the script file to activate support -for accelerated ops (see [software requirements](#software) above and [build instructions](development/BuildingDaphne.md)). +Optionally flags like ``--cuda`` can be added after the daphne command and before the script file to activate support +for accelerated ops (see [software requirements](#software) above and [build instructions](development/BuildingDaphne.md)). For further flags that can be set at runtime to activate additional functionality, run ``daphne --help``. -### Building and running with containers [Alternative path for building and running the system and the tests] -If one wants to avoid installing dependencies and avoid conflicting with his/her existing installed libraries, one may +### Building and Running with Containers [Alternative path for building and running the system and the tests] + +If one wants to avoid installing dependencies and avoid conflicting with his/her existing installed libraries, one may use containers. -- you need to install Docker or Singularity: Docker version 20.10.2 or higher | Singularity version 3.7.0-1.el7 or + +- you need to install Docker or Singularity: Docker version 20.10.2 or higher | Singularity version 3.7.0-1.el7 or higher are sufficient - you can use the provided docker files and scripts to create and run DAPHNE. **A full description on containers is available in the [containers](containers) subdirectory.** +The following recreates all images provided by [daphneeu](https://hub.docker.com/u/daphneeu) -The following recreates all images provided by [daphneeu](https://hub.docker.com/u/daphneeu) ```bash cd container ./build-containers.sh ``` -Running in an interactive container can be done with this run script, which takes care of mounting your +Running in an interactive container can be done with this run script, which takes care of mounting your current directory and handling permissions: + ```bash # please customize this script first ./containers/run-docker-example.sh ``` -For more about building and running with containers, refer (once again) to the directory `containers/` and its + +For more about building and running with containers, refer (once again) to the directory `containers/` and its [README.md](/containers/README.md). For documentation about using containers in conjunction with our cluster deployment scripts, refer to [Deploy.md](/doc/Deploy.md). @@ -207,18 +217,19 @@ On the top-level, there are the following directories: - `bin`: after compilation, generated binaries will be placed here (e.g., daphne) - `build`: temporary build output -- [`containers`:](containers) scripts and configuration files to get/build/run with Docker or Singularity containers -- [`deploy`:](deploy) shell scripts to ease deployment in SLURM clusters -- [`doc`:](doc) documentation written in markdown (e.g., what you are reading at the moment) +- [`containers`:](/containers) scripts and configuration files to get/build/run with Docker or Singularity containers +- [`deploy`:](/deploy) shell scripts to ease deployment in SLURM clusters +- [`doc`:](/doc) documentation written in markdown (e.g., what you are reading at the moment) - `lib`: after compilation, generated library files will be placed here (e.g., libAllKernels.so, libCUDAKernels.so, ...) -- [`scripts`:](scripts) a collection of algorithms and examples written in DAPHNE's own domain specific language ([DaphneDSL](DaphneDSLLanguageRef.md)) -- [`src`:](src) the actual source code, subdivided into the individual components of the system -- [`test`:](test) test cases +- [`scripts`:](/scripts) a collection of algorithms and examples written in DAPHNE's own domain specific language ([DaphneDSL](/doc/DaphneDSL/LanguageRef.md)) +- [`src`:](/src) the actual source code, subdivided into the individual components of the system +- [`test`:](/test) test cases - `thirdparty`: required external software ### What Next? You might want to have a look at -- the [documentation](/doc) + +- the [documentation](/doc/) - the [contribution guidelines](/CONTRIBUTING.md) - the [open issues](https://github.com/daphne-eu/daphne/issues) diff --git a/doc/MPI-Usage.md b/doc/MPI-Usage.md index 94295b16d..92eaabab2 100644 --- a/doc/MPI-Usage.md +++ b/doc/MPI-Usage.md @@ -14,50 +14,60 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Employing MPI as a Distributed Runtime Backend +# MPI Usage + +About employing MPI as a distributed runtime backend. + The DAPHNE runtime system is designed with the goal of supporting various distributed runtime that relies on various technologies, e.g. MPI and RPC. This document shows how a DAPHNE user can execute DAPHNE scripts on a distributed computing environment with the MPI backend implementation of the DAPHNE runtime system. -This document assumes that the DAPHNE was build with the --mpi options, if this is not the case please rebuild DAPHNE with the --mpi option +This document assumes that the DAPHNE was build with the `--mpi` options, if this is not the case please rebuild DAPHNE with the `--mpi` option ```./build.sh --mpi``` The DAPHNE build script uses [Open MPI](https://www.open-mpi.org/). The DAPHNE build script does not configure the Open MPI installation with the SLURM support option. -For users who want to add the SLURM, please visit the [Open MPI](https://www.open-mpi.org/) documentation (adding ```--with-slurm``` to the build command of the Open MPI libbrary) and edit the DAPHNE build script. +For users who want to add the SLURM, please visit the [Open MPI](https://www.open-mpi.org/) documentation (adding ```--with-slurm``` to the build command of the Open MPI libbrary) and edit the DAPHNE build script. Also, users who wants to use other MPI implementations e.g., Intel MPI may edit the corresponding part in the DAPHNE build script. -## When DAPHNE is installed natively (without container) +## When DAPHNE is Installed Natively (w/o Container) + 1. Ensure that your system knows about the installed MPI --- The ```PATH``` and ```LD_LIBRARY_PATH```environment variable has to be updated as follows -```bash -export PATH=$PATH:/thirdparty/installed/bin/ -export LD_LIBRARY_PATH=$LD_LIBRARY_PATH://thirdparty/installed/lib/ -``` -Please do not forget to replace with the actual path + -- The ```PATH``` and ```LD_LIBRARY_PATH```environment variable has to be updated as follows -2. Run basic example @ ```/examples/matrix_addition_for_mpi.daph``` as follows -```bash -mpirun -np 10 ./bin/daphne --distributed --dist_backend=MPI scripts/examples/matrix_addition_for_mpi.daph -``` -The command above executes 10 processes **locally** on one machine. + ```bash + export PATH=$PATH:/thirdparty/installed/bin/ + export LD_LIBRARY_PATH=$LD_LIBRARY_PATH://thirdparty/installed/lib/ + ``` + + Please do not forget to replace `` with the actual path -In order to run on **a distributed system**, you need to provide the machine names or the machinefile which contains the machine names. +1. Run basic example @ ```/examples/matrix_addition_for_mpi.daph``` as follows + + ```bash + mpirun -np 10 ./bin/daphne --distributed --dist_backend=MPI scripts/examples/matrix_addition_for_mpi.daph + ``` + +The command above executes 10 processes **locally** on one machine. + +In order to run on **a distributed system**, you need to provide the machine names or the machinefile which contains the machine names. For instance assuming that ```my_hostfile``` is a text file that contains machine names + ```bash mpirun -np 10 --hostfile my_hostfile ./bin/daphne --distributed --dist_backend=MPI scripts/examples/matrix_addition_for_mpi.daph ``` -The command above starts 10 processes distributed on following the hosts in the my_hostfile. -For more options, please check the [Open MPI documentation] (https://www.open-mpi.org/faq/?category=running#mpirun-hostfile). + +The command above starts 10 processes distributed on following the hosts in the my_hostfile. +For more options, please check the [Open MPI documentation](https://www.open-mpi.org/faq/?category=running#mpirun-hostfile). From a DAPHNE runtime point of view, the ```--distributed``` option tells the DAPHNE runtime system to utilize the distributed backend, while the ```--dist_backend=MPI``` indicate the type of the backend implementation. -## When DAPHNE is installed with (containers, e.g. singularity) +## When DAPHNE is Installed with Containers (e.g. singularity) + The main difference is that the mpirun command is called at the level of the container as follows + ```bash mpirun -np 10 singularity exec daphne/bin/daphne --distributed --dist_backend=MPI --vec --num-threads=2 daphne/scripts/examples/matrix_addition_for_mpi.daph ``` -Please do not forget to replace with the actual singularity image. - - +Please do not forget to replace `` with the actual singularity image. diff --git a/doc/Quickstart.md b/doc/Quickstart.md index 820eb78f8..cfb31e0d0 100644 --- a/doc/Quickstart.md +++ b/doc/Quickstart.md @@ -1,5 +1,5 @@ -# Quickstarting DAPHNE +# Quickstart -These reduced instructions should get you started by firing up a hello world script from the latest binary release. +These reduced instructions should get you started by firing up a hello world script from the latest binary release. -### The recipe is as follows: +**The recipe is as follows:** -1. Download and extract daphne--bin.tgz from the [release page](https://github.com/daphne-eu/daphne/releases). -Optionally choose the daphne-cuda--bin.tgz archive if you want to run DAPHNE with CUDA support (Nvidia Pascal +1. Download and extract `daphne--bin.tgz` from the [release page](https://github.com/daphne-eu/daphne/releases). +Optionally choose the `daphne-cuda--bin.tgz` archive if you want to run DAPHNE with CUDA support (Nvidia Pascal hardware or newer and an installed CUDA SDK are required) 2. In a bash (or compatible) shell, from the extracted DAPHNE directory, execute this command + ```bash ./run-daphne.sh scripts/examples/hello-world.daph - ```` - Optionally you can activate CUDA ops by including --cuda: - ```bash + ``` + + Optionally you can activate CUDA ops by including --cuda: + + ```bash ./run-daphne.sh --cuda scripts/examples/hello-world.daph ``` - - Earning extra points: To see one level of intermediate representation that the DAPHNE compiler generates in its wealth -of optimization passes run with the explain flag - ```bash + + Earning extra points: To see one level of intermediate representation that the DAPHNE compiler generates in its wealth of optimization passes run with the explain flag + + ```bash ./run-daphne.sh --explain=kernels scripts/examples/hello-world.daph ``` -### Explanation +## Explanation -The ``run-daphne.sh`` script sets up the required environment (so your system's shared library loader finds the required -.so files) and passes the provided parameters to the daphne executable. +The ``run-daphne.sh`` script sets up the required environment (so your system's shared library loader finds the required +.so files) and passes the provided parameters to the daphne executable. Interesting things to look at: -* file ``run-daphne.sh`` -* file ``UserConfig.json`` -* file ``scripts/examples/hello-world.daph`` -* output of ``run-daphne.sh --help`` + +- file ``run-daphne.sh`` +- file ``UserConfig.json`` +- file ``scripts/examples/hello-world.daph`` +- output of ``run-daphne.sh --help`` ### What Next? You might want to have a look at + - a more elaborate [getting started guide](/doc/GettingStarted.md) - the [documentation](/doc) - the [contribution guidelines](/CONTRIBUTING.md) diff --git a/doc/README.md b/doc/README.md index 7faeb194c..b2b62e3a7 100644 --- a/doc/README.md +++ b/doc/README.md @@ -1,20 +1,7 @@ - - -# Documentation - [Quickstart](/doc/Quickstart.md) - [Getting Started](/doc/GettingStarted.md) @@ -24,10 +11,10 @@ limitations under the License. - [Running DAPHNE in a Local Environment](/doc/RunningDaphneLocally.md) - [Running DAPHNE on the Distributed Runtime](/doc/DistributedRuntime.md) - [DAPHNE Packaging, Distributed Deployment, and Management](/doc/Deploy.md) -- [DaphneLib: DAPHNE's Python API](/doc/DaphneLib.md) -- [DaphneLib API Reference](/doc/DaphneLibAPIRef.md) -- [DaphneDSL Language Reference](/doc/DaphneDSLLanguageRef.md) -- [DaphneDSL Built-in Functions](/doc/DaphneDSLBuiltins.md) +- [DaphneLib: DAPHNE's Python API](/doc/DaphneLib/Overview.md) +- [DaphneLib API Reference](/doc/DaphneLib/APIRef.md) +- [DaphneDSL Language Reference](/doc/DaphneDSL/LanguageRef.md) +- [DaphneDSL Built-in Functions](/doc/DaphneDSL/Builtins.md) - [Using SQL in DaphneDSL](/doc/tutorial/sqlTutorial.md) - [A Few Early Example Algorithms in DaphneDSL](/scripts/algorithms/) - [FileMetaData Format (reading and writing data)](/doc/FileMetaDataFormat.md) diff --git a/doc/ReleaseScripts.md b/doc/ReleaseScripts.md index f43bd0f8b..4db943a96 100644 --- a/doc/ReleaseScripts.md +++ b/doc/ReleaseScripts.md @@ -14,38 +14,46 @@ See the License for the specific language governing permissions and limitations under the License. --> -## How to use the release scripts to create binary artifacts +# Release Scripts + +How to use the release scripts to create binary artifacts This is a quick write-up of how the scripts to create a binary release artifact are meant to be used. The release.sh script will call pack.sh which will call build.sh and test.sh. Only if testing completes successfully, the artifact, a gzipped tar archive (format open for discussion) is created. The command after ``--githash`` fetches the git hash of the current commit. The script checks out this git hash and restores the current commit after successful completion. This is a bit of a shortcoming as you have to issue a manual ``git checkout -`` if the script fails and terminates early. -#### Signing +## Signing + The release manager will have to sign the artifacts to verify that the provided software has been created by that person. To create an appropriate GPG key, [these instructions](https://downloads.apache.org/systemds/KEYS) can be adapted to our needs. The keys of Daphne release managers will be provided in [this file](/KEYS.txt). Ideally, future release managers sign each others keys. Key signing is a form of showing that the one key owner trusts the other. -#### The procedure (preliminary for 0.1) -0) Get into a bash shell and change to your working copy (aka daphne root) directory. -1) **Create the artifacts (plain Daphne):** ``./release.sh --version 0.1 --githash `git rev-parse HEAD` `` -2) **Create additional artifacts with extra features compiled in:**
``./release.sh --version 0.1 --githash `git rev-parse HEAD` --feature cuda`` +## The Procedure (Preliminary for v0.1) + +1. Get into a bash shell and change to your working copy (aka daphne root) directory. +1. **Create the artifacts (plain Daphne):** ``./release.sh --version 0.1 --githash `git rev-parse HEAD` `` +1. **Create additional artifacts with extra features compiled in:**
``./release.sh --version 0.1 --githash `git rev-parse HEAD` --feature cuda``
Note that this adds additional constraints on the binaries (e.g., if CUDA support is compiled in, the executable will fail to load on a system without the CUDA SDK properly installed)_ -3) **Copy the artifacts** to a machine where you have your top secret signing key installed (can be skipped if this is the build machine):
+1. **Copy the artifacts** to a machine where you have your top secret signing key installed (can be skipped if this is the build machine):
``rsync -vuPah :path/to/daphne/artifacts .`` -4) **Signing and checksumming:** - * ``` bash - cd artifacts - ~/path/to/daphne/release.sh --version 0.1 --artifact ./daphne-0.1-bin.tgz --gpgkey --githash `cat daphne-0.1-bin.githash` - ``` +1. **Signing and checksumming:** + + ``` bash + cd artifacts + ~/path/to/daphne/release.sh --version 0.1 --artifact ./daphne-0.1-bin.tgz --gpgkey --githash `cat daphne-0.1-bin.githash` + ``` + * repeat for other feature artifacts -5) **Tag & push** The previous signing command will provide you with two more git commands to tag the commit that the artfiacts were made from and to push these tags to github. - This should look something like this: - ``` bash - git tag -a -u B28F8F4D 0.1 312b2b50b4e60b3c5157c3365ec38383d35e28d8 - git push git@github.com:corepointer/daphne.git --tags - ``` -6) **Upload & release**: - * Click the "create new release" link on the front page of the Daphne github repository (right column under "Releases"). - * Select the tag for the release, create a title, add release notes (highlights of this release, list of contributors, maybe a detailed change log at the end) - * Upload the artifacts: All the ``.{tgz,tgz.asc,tgz.sha512sum}`` files before either saving as draft for further polishing or finally release the new version. +1. **Tag & push** The previous signing command will provide you with two more git commands to tag the commit that the artfiacts were made from and to push these tags to github. + This should look something like this: + + ``` bash + git tag -a -u B28F8F4D 0.1 312b2b50b4e60b3c5157c3365ec38383d35e28d8 + git push git@github.com:corepointer/daphne.git --tags + ``` + +1. **Upload & release**: + * Click the "create new release" link on the front page of the Daphne github repository (right column under "Releases"). + * Select the tag for the release, create a title, add release notes (highlights of this release, list of contributors, maybe a detailed change log at the end) + * Upload the artifacts: All the ``.{tgz,tgz.asc,tgz.sha512sum}`` files before either saving as draft for further polishing or finally release the new version. diff --git a/doc/RunningDaphneLocally.md b/doc/RunningDaphneLocally.md index bbe6a0827..47d206d41 100644 --- a/doc/RunningDaphneLocally.md +++ b/doc/RunningDaphneLocally.md @@ -14,7 +14,9 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Running DAPHNE in a Local Environment +# Running DAPHNE Locally + +Running DAPHNE in a local environment. This document explains how to run DAPHNE on a local machine. For more details on running DAPHNE in a distributed setup, please see the documentation on the [distributed runtime](/doc/DistributedRuntime.md) and [distributed deployment](/doc/Deploy.md). @@ -22,14 +24,16 @@ For more details on running DAPHNE in a distributed setup, please see the docume Before DAPHNE can be executed, the system must be built using `./build.sh` (for more details see [Getting Started](/doc/GettingStarted.md)). The main executable of the DAPHNE system is `bin/daphne`. The general scheme of an invocation of DAPHNE looks as follows: -``` + +```bash bin/daphne [options] script [arguments] ``` -Where `script` is a [DaphneDSL](/doc/DaphneDSLLanguageRef.md) file. +Where `script` is a [DaphneDSL](/doc/DaphneDSL/LanguageRef.md) file. *Example:* -``` + +```bash bin/daphne scripts/examples/hello-world.daph ``` @@ -41,9 +45,11 @@ Arguments to the DaphneDSL script can be provided as space-separated pairs of th These can the accessed as `$key` in the DaphneDSL script. *Example:* -``` + +```bash bin/daphne test/api/cli/algorithms/kmeans.daphne r=1000 f=20 c=5 i=10 ``` + *This example executes a simplified variant of the k-means clustering algorithm on random data with 1000 rows and 20 features using 5 centroids and a fixed number of 10 iterations.* `value` must be a valid DaphneDSL literal, e.g., `key=123` (signed 64-bit integer), `key=-12.3` (double-precision floating-point), or `key="hello"` (string). @@ -56,22 +62,23 @@ To see the full list of available options, invoke `bin/daphne --help`. In the following, a few noteworthy general options are mentioned. Note that some of the more specific options are described in the documentation pages on the respective topics, e.g., [distributed execution](/doc/DistributedRuntime.md), [scheduling](/doc/SchedulingOptions.md), [configuration](/doc/Config.md), [FPGA configuration](/doc/FPGAconfiguration.md), etc. - + - **`--explain`** - Prints the MLIR-based intermediate representation (IR), the so-called *DaphneIR*, after the specified compiler passes. - For instance, to see the IR after parsing (and some initial simplifications) and after property inference, invoke - ``` - bin/daphne --explain parsing_simplified,property_inference test/api/cli/algorithms/kmeans.daphne r=1000 f=20 c=5 i=10 - ``` + Prints the MLIR-based intermediate representation (IR), the so-called *DaphneIR*, after the specified compiler passes. + For instance, to see the IR after parsing (and some initial simplifications) and after property inference, invoke + + ```bash + bin/daphne --explain parsing_simplified,property_inference test/api/cli/algorithms/kmeans.daphne r=1000 f=20 c=5 i=10 + ``` - **`--vec`** - Turns on DAPHNE's vectorized execution engine, which fuses qualifying operations into vectorized pipelines. *Experimental feature.* + Turns on DAPHNE's vectorized execution engine, which fuses qualifying operations into vectorized pipelines. *Experimental feature.* - **`--select-matrix-repr`** - Turns on the automatic selection of a suitable matrix representation (currently dense or sparse (CSR)). *Experimental feature.* + Turns on the automatic selection of a suitable matrix representation (currently dense or sparse (CSR)). *Experimental feature.* ## Return Codes @@ -94,24 +101,30 @@ In many (but not yet all) cases, there will be an error message indicating what *Examples:* - **Wrong way of passing string literals as DaphneDSL script arguments.** - ``` - line 1:0 mismatched input 'foo' expecting {'true', 'false', INT_LITERAL, FLOAT_LITERAL, STRING_LITERAL} - Parser error: unexpected literal - ``` - Maybe you tried to pass a string as an argument to a DaphneDSL script and forgot the quotation marks or they got lost. - Pass strings as `bin/daphne script.daphne foo=\"abc\"` (not `foo=abc` or `foo="abc"`) on a terminal. + + ```text + line 1:0 mismatched input 'foo' expecting {'true', 'false', INT_LITERAL, FLOAT_LITERAL, STRING_LITERAL} + Parser error: unexpected literal + ``` + + Maybe you tried to pass a string as an argument to a DaphneDSL script and forgot the quotation marks or they got lost. + Pass strings as `bin/daphne script.daphne foo=\"abc\"` (not `foo=abc` or `foo="abc"`) on a terminal. - **Missing metadata file.** - ``` - Parser error: Could not open file 'data/foo.csv.meta' for reading meta data. - ``` - Maybe you try to read a dataset called `data/foo.csv`, but the required [metadata file](/doc/FileMetaDataFormat.md) `data/foo.csv.meta` does not exist. + + ```text + Parser error: Could not open file 'data/foo.csv.meta' for reading meta data. + ``` + + Maybe you try to read a dataset called `data/foo.csv`, but the required [metadata file](/doc/FileMetaDataFormat.md) `data/foo.csv.meta` does not exist. - **Using the old file metadata format.** - ``` - Parser error: [json.exception.parse_error.101] parse error at line 1, column 7: syntax error while parsing value - unexpected ','; expected end of input - ``` - Maybe you try to read a dataset with `readMatrix()` or `readFrame()` in DaphneDSL, but the file metadata file does not have the right structure. Note that we changed the initial one-line text-based format to a more human-readable [JSON-based format](/doc/FileMetaDataFormat.md). + + ```text + Parser error: [json.exception.parse_error.101] parse error at line 1, column 7: syntax error while parsing value - unexpected ','; expected end of input + ``` + + Maybe you try to read a dataset with `readMatrix()` or `readFrame()` in DaphneDSL, but the file metadata file does not have the right structure. Note that we changed the initial one-line text-based format to a more human-readable [JSON-based format](/doc/FileMetaDataFormat.md). ### `JIT session error: Symbols not found: ...` @@ -125,7 +138,7 @@ Developers can fix this problem by adding the respective instantiation in `src/r *Example:* -``` +```text JIT session error: Symbols not found: [ _ewAdd__int32_t__int32_t__int32_t ] JIT-Engine invocation failed: Failed to materialize symbols: { (main, { _mlir_ciface_main, _mlir_main, _mlir__mlir_ciface_main, main }) }Program aborted due to an unhandled Error: Failed to materialize symbols: { (main, { _mlir_ciface_main, _mlir_main, _mlir__mlir_ciface_main, main }) } @@ -134,14 +147,15 @@ Aborted (core dumped) ### `Failed to create MemoryBuffer for: ...` -This error occurs when `daphne` is not invoked from the repository's root directory `daphne/` as `bin/daphne`. +This error occurs when `daphne` is not invoked from the repository's root directory `daphne/` as `bin/daphne`. It will be fixed in the future (see issue #445). In the meantime, please always invoke `daphne` from the repository's root directory `daphne/`. *Example:* -``` +```text Failed to create MemoryBuffer for: lib/libAllKernels.so Error: No such file or directory ``` + *Typically followed by an error or the type `JIT session error: Symbols not found: ...`, which is described above.* diff --git a/doc/SchedulingOptions.md b/doc/SchedulingOptions.md index d79c0c845..5c8ccce49 100644 --- a/doc/SchedulingOptions.md +++ b/doc/SchedulingOptions.md @@ -14,10 +14,12 @@ See the License for the specific language governing permissions and limitations under the License. --> -## Document Description -This document describes the use of the pipeline and task scheduling mechanisms currently supported in the DAPHNE system. - -### Scheduling Decisions in DAPHNE +# DAPHNE Scheduling + +This document describes the use of the pipeline and task scheduling mechanisms currently supported in the DAPHNE system. + +## Scheduling Decisions + The DAPHNE system considers four types of scheduling decisions: work partitioning, assignment, ordering and timing. - Work partitioning refers to the partitioning of the work into units of work (or tasks) according to a certain granularity (fine or coarse, equal or variable). @@ -29,13 +31,16 @@ The DAPHNE system considers four types of scheduling decisions: work partitionin **Work Assignment**: The current snapshot of the DAPHNE prototype supports two main assignment mechanisms: Single centralized work queue and Multiple work queues. When work assignment relies on a centralized work queue (CENTRALIZED), workers follow the self-scheduling principle, i.e., whenever a worker is free and idle, it obtains a task from a central queue. When work assignment relies on multiple work queues, workers follow the work-stealing principle, i.e., whenever workers are free, idle, and have no tasks in their queues, they steal tasks from the work queue of each other. Work queues can be per worker (PERCPU) or per group of workers (PERGROUP). In work-stealing, workers need to apply a victim selection mechanism to find a queue and steal work from it. The currently supported victim selection mechanisms are SEQ (steal from the next adjacent worker), SEQPRI (Steal from the next adjacent worker, but prioritize same NUMA domain), RANDOM (Steal from a random worker), RANDOMPRI (Steal from a random worker, but prioritize same NUMA domain). - -### Explore the scheduling options in DAPHNE -To list all possible execution options of the DAPHNE system, one needs to execute the following +## Scheduling Options + +To list all possible execution options of the DAPHNE system, one needs to execute the following + ```shell -> ./bin/daphne --help +$ ./bin/daphne --help ``` -The output of this command shows all DAPHNE compilation and execution parameters including the scheduling options that are currently support. The output below shows only the scheduling options that we will cover in this document. + +The output of this command shows all DAPHNE compilation and execution parameters including the scheduling options that are currently support. The output below shows only the scheduling options that we will cover in this document. + ```shell > This program compiles and executes a DaphneDSL script. USAGE: daphne [options] script [arguments] @@ -96,70 +101,84 @@ EXAMPLES: daphne --vec example.daphne x=1 y=2.2 z="foo" daphne --vec --args x=1,y=2.2,z="foo" example.daphne ``` + **_NOTE:_** the DAPHNE system relies on the vectorized (tile) execution engine to support parallelism at the node level. The vectorized execution engine takes decision concerning work partition and assignment during applications’ execution. Therefore, one needs always to use the option --vec with any of the scheduling options that we present in this document. +### Multithreading Options -### Multithreading Options - **Number of threads**: A DAPHNE user can control the total number of threads spawn by the DAPHNE runtime system use the following parameter **--num-threads**. This parameter should be non-zero positive value. Illegal integer values will be ignored by the system and the default value will be used. The default value of --num-threads is equal to the total number of physical cores of the host machine. The option can be used as below, e.g., the DAPHNE system spawns only 4 threads. -```shell -./bin/daphne --vec --num-threads=4 some_daphne_script.daphne -``` + + ```shell + ./bin/daphne --vec --num-threads=4 some_daphne_script.daphne + ``` - **Thread Pinning**: A DAPHNE user can decide if the DAPHNE system pins its threads to the physical cores. Currently, the DAPHNE system supports one simple pining strategy, namely, round-robin strategy. By default, the DAPHNE system does not pin its threads. The option **--pin-workers** can be used to activate thread pinning as follows -```shell -./bin/daphne --vec --pin-threads some_daphne_script.daphne -``` -- **Hyperthreading**: if a host machine supports hyperthreading, a DAPHNE user can decide to use logical cores, i.e., if the –num-threads is not specified, the DAPHNE system sets the total number of threads to the count of the physical cores. However, when the user specify the following parameter **--hyperthreading**, the DAPHNE system sets the number of threads to the count of the logical cores. -```shell -./bin/daphne --vec --hyperthreading some_daphne_script.daphne -``` + ```shell + ./bin/daphne --vec --pin-threads some_daphne_script.daphne + ``` + +- **Hyperthreading**: if a host machine supports hyperthreading, a DAPHNE user can decide to use logical cores, i.e., if the –num-threads is not specified, the DAPHNE system sets the total number of threads to the count of the physical cores. However, when the user specify the following parameter **--hyperthreading**, the DAPHNE system sets the number of threads to the count of the logical cores. + + ```shell + ./bin/daphne --vec --hyperthreading some_daphne_script.daphne + ``` ### Work Partitioning Options + - **Partition Scheme**: A DAPHNE user selects the partition scheme by passing the name of the partition scheme as an argument to the DAPHNE system. If the user does not specify a partition scheme, the default partition scheme (STATIC) will be used. As an example, the following command uses GSS as a partition scheme. -```shell -./bin/daphne --vec --GSS some_daphne_script.daphne -``` -- **Task granularity**: The DAPHNE user can exploit the **--grain-size** parameter to set the minimum size of the tasks generated by the DAPHNE system. This parameter should be non-zero positive value. Illegal integer values will be ignored by the system and the default value will be used. The default value of **--grain-size** is 1, i.e., the data associated with a task represents 1 row of the input matrix. -As an example, the following command uses SS as a partition scheme with minimum task size of 100 -```shell -./bin/daphne --vec --SS --grain-size=100 some_daphne_script.daphne -``` + ```shell + ./bin/daphne --vec --GSS some_daphne_script.daphne + ``` + +- **Task granularity**: The DAPHNE user can exploit the **--grain-size** parameter to set the minimum size of the tasks generated by the DAPHNE system. This parameter should be non-zero positive value. Illegal integer values will be ignored by the system and the default value will be used. The default value of **--grain-size** is 1, i.e., the data associated with a task represents 1 row of the input matrix. +As an example, the following command uses SS as a partition scheme with minimum task size of 100 + + ```shell + ./bin/daphne --vec --SS --grain-size=100 some_daphne_script.daphne + ``` + ### Work Assignment Options + - **Single centralized work queue**: By default, the DAPHNE system uses a single centralized work queue. However, the user may explicitly use the following parameter **--CENTRALIZED** to ensure the use of single centralized work queue. -```shell -./bin/daphne --vec --GSS --CENTRALIZED some_daphne_script.daphne -``` -- **Multiple work queues**: a DAPHNE user can exploit the use of multiple work queues by passing one of the following parameters **--PERCPU** or **--PERGROUP**. The two parameters cannot be used together, and if **--CENTRALIZED** is used with any of them, --CENTRALIZED will be ignored by the system. + + ```shell + ./bin/daphne --vec --GSS --CENTRALIZED some_daphne_script.daphne + ``` + +- **Multiple work queues**: a DAPHNE user can exploit the use of multiple work queues by passing one of the following parameters **--PERCPU** or **--PERGROUP**. The two parameters cannot be used together, and if **--CENTRALIZED** is used with any of them, --CENTRALIZED will be ignored by the system. - parameter **--PERGROUP** ensures that the DAPHNE system creates a number of groups equals to the number of NUMA domains on the target host machine. The DAPHNE system assigns equal number of workers (threads) to each of the groups. Workers within the same group share one work queue. The **–-PERGROUP** can be used as follows -```shell -./bin/daphne --vec --PERGROUP some_daphne_script.daphne -``` + + ```shell + ./bin/daphne --vec --PERGROUP some_daphne_script.daphne + ``` + - The parameter **--PERCPU** ensures that the DAPHNE system creates a number of queues equal to the total number of workers (threads), i.e., each worker is assigned to a single work queue. The parameter **--PERCPU** can be used as follows -```shell -./bin/daphne --vec --PERCPU some_daphne_script.daphne -``` + ```shell + ./bin/daphne --vec --PERCPU some_daphne_script.daphne + ``` - **Victim Selection**: A DAPHNE user can choose a victim selection strategy by passing one of the following parameters --SEQ, --SEQPRI, --RANDOM, and --RANDOMPRI. These parameters activate different victim selection strategies as follows - - - **--SEQ** activates a sequential victim selection strategy, i.e., the ith worker steals form the (i+1)th worker. The last worker steals from the first worker. - - **--SEQPRI** is similar to --SEQ except that --SEQPRI priorities workers assigned to the same NUMA domain. When the host machine has one NUMA domain, - - --SEQ and --SEQPRI have no difference. - - **--RANDOM** activates a random victim selection strategy, i.e., the ith worker steals form a randomly chosen worker. - - **--RANDOMPRI** is similar to --RANDOM except that --RANDOM priorities workers assigned to the same NUMA domain. When the host machine has one NUMA domain, --RANDOMPRI and --RANDOMPRI have no difference. + - **--SEQ** activates a sequential victim selection strategy, i.e., the ith worker steals form the (i+1)th worker. The last worker steals from the first worker. + - **--SEQPRI** is similar to --SEQ except that --SEQPRI priorities workers assigned to the same NUMA domain. When the host machine has one NUMA domain, + - --SEQ and --SEQPRI have no difference. + - **--RANDOM** activates a random victim selection strategy, i.e., the ith worker steals form a randomly chosen worker. + - **--RANDOMPRI** is similar to --RANDOM except that --RANDOM priorities workers assigned to the same NUMA domain. When the host machine has one NUMA domain, --RANDOMPRI and --RANDOMPRI have no difference. **_NOTE:_** When the user does not choose one of these parameters, the DAPHNE system considers --SEQ as a default victim selection strategy. As an example, the following command uses --SEQPRI as a victim selection strategy. + ```shell ./bin/daphne --vec --PERGROUP --SEQPRI some_daphne_script.daphne ``` -## References -[D4.1](https://daphne-eu.eu/wp-content/uploads/2021/11/Deliverable-4.1-fin.pdf) DAPHNE: D4.1 DSL Runtime Design, 11/2021 +## References + +[D4.1](https://daphne-eu.eu/wp-content/uploads/2021/11/Deliverable-4.1-fin.pdf) DAPHNE: D4.1 DSL Runtime Design, 11/2021 [D5.1](https://daphne-eu.eu/wp-content/uploads/2021/11/Deliverable-5.1-fin.pdf) DAPHNE: D5.1 Scheduler Design for Pipelines and Tasks, 11/2021 diff --git a/doc/assets/logo_large.png b/doc/assets/logo_large.png new file mode 100644 index 000000000..07a27835a Binary files /dev/null and b/doc/assets/logo_large.png differ diff --git a/doc/assets/logo_medium.png b/doc/assets/logo_medium.png new file mode 100644 index 000000000..feaac8bb2 Binary files /dev/null and b/doc/assets/logo_medium.png differ diff --git a/doc/assets/logo_small.png b/doc/assets/logo_small.png new file mode 100644 index 000000000..72e58d3ce Binary files /dev/null and b/doc/assets/logo_small.png differ diff --git a/doc/development/BuildingDaphne.md b/doc/development/BuildingDaphne.md index 27304abb5..6b06ebbc5 100644 --- a/doc/development/BuildingDaphne.md +++ b/doc/development/BuildingDaphne.md @@ -14,29 +14,30 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Building Daphne +# Building DAPHNE -The DAPHNE project provides a full-fledged build script. After cloning, it does everything from dependency setup to +The DAPHNE project provides a full-fledged build script. After cloning, it does everything from dependency setup to generation of the executable. -### What does the build script do? (simplified) +## What does the build script do? (simplified) - Download & build all code dependencies - Build Daphne - Clean Project -### How long does a build take? +## How long does a build take? The first run will take a while, due to long compilation times of the dependencies (~1 hour on a 16 vcore desktop, ~10 minutes on a 128 vcore cluster node). But they only have to be compiled once (except updates). Following builds only take a few seconds/minutes. Contents: - - [Usage of the build script](#1-usage-of-the-build-script) - - [Extension of the build script](#2-extension) +- [Usage of the build script](#usage-of-the-build-script) +- [Extension of the build script](#extension) ---- -## 1. Usage of the build script +--- + +## Usage of the build script This section shows the possibilities of the build script. @@ -59,15 +60,16 @@ Build a specific **target**. ```bash ./build.sh --target "target" ``` + For example the following builds the main test target. + ```bash ./build.sh --target "run_tests" ``` - ### Clean -Clean all build directories, i.e., the daphne build dir `/build` and the build output in +Clean all build directories, i.e., the daphne build dir `/build` and the build output in `/bin` and `/lib` ```bash @@ -75,13 +77,13 @@ Clean all build directories, i.e., the daphne build dir `/build` a ``` Clean all downloads and extracted archive directories, i.e., ``/download-cache, ``/sources -and ``/*.download.success files: +and ``/*.download.success files: ```bash ./build.sh --cleanCache ``` -Clean third party build output, i.e., `/installed`, `/build` and +Clean third party build output, i.e., `/installed`, `/build` and ``/*.install.success files: ```bash @@ -94,28 +96,31 @@ Clean everything (DAPHNE build output and third party directory) ./build.sh --cleanAll ``` -### Minimize long compile times of dependencies +### Minimize Compile Times of Dependencies + The most time-consuming part of getting DAPHNE compiled is building the third party dependencies. -To avoid this, one can either use a prebuilt container image (in combination with some parameters to the build script +To avoid this, one can either use a prebuilt container image (in combination with some parameters to the build script see below) or at least build the dependencies once and subsequently point to the directory where the third party dependencies get installed. The bulid script must be invoked with the following two parameters to achieve this: -``` ./build.sh --no-deps --installPrefix | Install third party dependencies in \ (default: /thirdparty/installed) | -| --clean | Clean DAPHNE build output (/{bin,build,lib}) | +| --installPrefix | Install third party dependencies in `` (default: `/thirdparty/installed`) | +| --clean | Clean DAPHNE build output (`/{bin,build,lib}`) | | --cleanCache | Clean downloaded and extracted third party artifacts | | --cleanDeps | Clean third party dependency build output and installed files | | --cleanAll | Clean DAPHNE build output and reset the third party directory to the state in the git repo | -| --target \ | Build specific target | +| --target | Build specific target | | -nf, --no-fancy | Disable colorized output | | --no-deps | Avoid building third party dependencies | | -y, --yes | Accept prompt (e.g., when executing the clean command) | @@ -124,43 +129,46 @@ All possible options for the build script: | --oneapi | Compile with support for accelerated operations using the OneAPI SDK | | --fpgaopencl | Compile with support for FPGA operations using the Intel FPGA SDK or OneAPI+FPGA Add-On | +--- +## Extension ---- +### Overview over the build script -## 2. Extension -### 2.1 Overview over the build script The build script is divided into sections, visualized by -``` + +```bash #****************************************************************************** # #1 Section name #****************************************************************************** ``` + Each section should only contain functionality related to the section name. -The following list contains a rough overview over the sections and the concrete functions or functionality done here. +The following list contains a rough overview over the sections and the concrete functions or functionality done here. + 1. Help message 1. **printHelp()** // prints help message 2. Build message helper - 1. **daphne_msg(** \ **)** // prints a status message in DAPHNE style - 2. **printableTimestamp(** \ **)** // converts a unix epoch timestamp into a human readable string (e.g., 5min 20s 100ms) + 1. **daphne_msg(** **)** // prints a status message in DAPHNE style + 2. **printableTimestamp(** **)** // converts a unix epoch timestamp into a human readable string (e.g., 5min 20s 100ms) 3. **printLogo()** // prints a DAPHNE logo to the console 3. Clean build directories - 1. **clean(** \ \ **)** // removes all given directories (1. parameter) and all given files (2. parameter) from disk + 1. **clean(** **)** // removes all given directories (1. parameter) and all given files (2. parameter) from disk 2. **cleanBuildDirs()** // cleans build dirs (daphne and dependency build dirs) - 3. **cleanAll()** // cleans daphne build dir and wipes all dependencies from disk (resetting the third party directory) + 3. **cleanAll()** // cleans daphne build dir and wipes all dependencies from disk (resetting the third party directory) 4. **cleanDeps()** // removes third party build output 5. **cleanCache()** // removes downloaded third party artifacts (but leaving git submodules (only LLVM/MLIR at the time of writing) 4. Create / Check Indicator-files - 1. **dependency_install_success(** \ **)** // used after successful build of a dependency; creates related indicator file - 2. **dependency_download_success(** \ **)** // used after successful download of a dependency; creates related indicator file - 3. **is_dependency_installed(** \ **)** // checks if dependency is already installed/built successfully - 4. **is_dependency_downloaded(** \ **)** // checks if dependency is already downloaded successfully + 1. **dependency_install_success(** **)** // used after successful build of a dependency; creates related indicator file + 2. dependency_download_success() // used after successful download of a dependency; creates related indicator file + 3. **is_dependency_installed(** **)** // checks if dependency is already installed/built successfully + 4. **is_dependency_downloaded(** **)** // checks if dependency is already downloaded successfully 5. Versions of third party dependencies 1. Versions of the software dependencies are configured here 6. Set some prefixes, paths and dirs 1. Definition of project related paths - 2. Configuration of path prefixes. For example all build directories are prefixed with `buildPrefix`. If fast storage + 2. Configuration of path prefixes. For example all build directories are prefixed with `buildPrefix`. If fast storage is available on the system, build directories could be redirected with this central configuration. 7. Parse arguments 1. Parsing @@ -178,17 +186,19 @@ The following list contains a rough overview over the sections and the concrete 9. Build DAPHNE target 1. Compilation of the DAPHNE-target ('daphne' is default) -### 2.2 Adding a dependency +### Adding a dependency + 1. If the dependency is fixed to a specific version, add it to the dependency versions section (section 5). -2. Create a new segment in section 8 for the new dependency. -3. Define needed dependency variables: +1. Create a new segment in section 8 for the new dependency. +1. Define needed dependency variables: 1. Directory Name (which is used by the script to locate the dependency in different stages) - 2. Create an internal version variable in form of an array with two entries. Those are used for internal versioning and updating of the dependency without rebuilding each time. + 1. Create an internal version variable in form of an array with two entries. Those are used for internal versioning and updating of the dependency without rebuilding each time. 1. First: Name and version of the dependency as a string of the form `_v${dep_version}` (This one is updated, if a new version of the dependency is choosen.) - 2. Second: Thirdparty Version of the dependency as a string of the form `v1` (This one is incremented each time by hand, if something changes on the path system of the dependency or DAPHNE itself. This way already existing projects are updated automatically, if something changes.) - 3. Optionals: Dep-specific paths, Dep-specific files, etc. -4. Download the dependency, encased by: - ``` + 1. Second: Thirdparty Version of the dependency as a string of the form `v1` (This one is incremented each time by hand, if something changes on the path system of the dependency or DAPHNE itself. This way already existing projects are updated automatically, if something changes.) + 1. Optionals: Dep-specific paths, Dep-specific files, etc. +1. Download the dependency, encased by: + + ```bash # in segment 5 _version="" @@ -206,9 +216,11 @@ The following list contains a rough overview over the sections and the concrete dependency_download_success "${_version_internal[@]}" fi ``` - > Hint: It is recommended to use the paths defined in section 6 for dependency downloads and installations. There are predefined paths like 'cacheDir', 'sourcePrefix', 'buildPrefix' and 'installPrefix'. Take a look at other dependencies to see how to use them. -5. Install the dependency (if necessary), encased by: - ``` + + **Hint:** It is recommended to use the paths defined in section 6 for dependency downloads and installations. There are predefined paths like 'cacheDir', 'sourcePrefix', 'buildPrefix' and 'installPrefix'. Take a look at other dependencies to see how to use them. +1. Install the dependency (if necessary), encased by: + + ```bash if ! is_dependency_installed "${_version_internal[@]}"; then # do your stuff here @@ -216,8 +228,8 @@ The following list contains a rough overview over the sections and the concrete dependency_install_success "${_version_internal[@]}" fi ``` -6. Define a flag for the build script if your dependency is optional or poses unnecessary + +1. Define a flag for the build script if your dependency is optional or poses unnecessary overhead for users (e.g., CUDA is optional as the CUDA SDK is a considerably sized package that only owners of Nvidia hardware would want to install). See section 7 about argument parsing. Quick guide: define a variable and its default value and add an item to the argument handling loop. - diff --git a/doc/development/Contributing.md b/doc/development/Contributing.md new file mode 100644 index 000000000..427b918a2 --- /dev/null +++ b/doc/development/Contributing.md @@ -0,0 +1,3 @@ +# Contributing + +See [CONTRIBUTING.md](/CONTRIBUTING.md) diff --git a/doc/development/ExtendingDistributedRuntime.md b/doc/development/ExtendingDistributedRuntime.md index 898386f78..690849206 100644 --- a/doc/development/ExtendingDistributedRuntime.md +++ b/doc/development/ExtendingDistributedRuntime.md @@ -1,7 +1,7 @@ -# Distributed Worker + +## Distributed Worker Worker code can be found here: + ```bash src/runtime/distributed/worker ``` @@ -193,7 +195,8 @@ There are 3 important methods in this class: - The **Store** method, which stores an object in memory and returns an identifier. - The **Compute** method, which receives the IR code fragment along with identifier of inputs, computes the pipeline and returns identifiers of pipeline outputs. - And the **Transfer** method, which is used to return an object using an identifier. -```c++ + +```cpp /** * @brief Stores a matrix at worker's memory * @@ -224,6 +227,7 @@ Structure * Transfer(StoredInfo storedInfo); The developer can provide an implementation for a distributed worker by deriving `WorkerImpl` class. The derived class handles all the communication using the preferred distributed backend and invokes the parent methods for the logic. You can find the gRPC implementation of the distributed worker here: + ```bash src/runtime/distributed/worker/WorkerImplGrpc.h/.cpp ``` @@ -231,7 +235,7 @@ src/runtime/distributed/worker/WorkerImplGrpc.h/.cpp `main.cpp` is the entry point of the distributed worker. A distributed implementation is created using a pointer to the parent class `WorkerImpl`. The distributed node then blocks and waits for the coordinator to send a request by invoking the virtual method: -```C++ +```cpp virtual void Wait() { }; ``` diff --git a/doc/development/ExtendingSchedulingKnobs.md b/doc/development/ExtendingSchedulingKnobs.md index 80b05b0af..ce3e535ee 100644 --- a/doc/development/ExtendingSchedulingKnobs.md +++ b/doc/development/ExtendingSchedulingKnobs.md @@ -14,54 +14,56 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Extending DAPHNE with more scheduling knobs +# Extending DAPHNE with More Scheduling Knobs -This document focuses on: -- how a daphne developer may extend the DAPHNE system by adding new scheduling techniques +This document focuses on how a daphne developer may extend the DAPHNE system by adding new scheduling techniques -### Guidelines +## Guidelines The daphne developer should consider the following files for adding a new scheduling technique + 1. src/runtime/local/vectorized/LoadPartitioning.h -2. src/api/cli/daphne.cpp +1. src/api/cli/daphne.cpp -**Adding the actual code of the technique** +**Adding the actual code of the technique:** -The first file `LoadPartitioning.h` contains the implementation of the currently supported scheduling techniques, i.e., the current version of DAPHNE uses self-scheduling techniques to partition the tasks. Also, it uses the self-scheduling principle for executing the tasks. +The first file `LoadPartitioning.h` contains the implementation of the currently supported scheduling techniques, i.e., the current version of DAPHNE uses self-scheduling techniques to partition the tasks. Also, it uses the self-scheduling principle for executing the tasks. For more details, please visit [Scheduler design for tasks and pipelines](https://daphne-eu.eu/wp-content/uploads/2021/11/Deliverable-5.1-fin.pdf). In this file, the developer should change two things: -1. The enumeration that is called `SelfSchedulingScheme`. The developer will have to add a name for the new technique, e.g., `MYTECH` -```c++ -enum SelfSchedulingScheme { STATIC=0, SS, GSS, TSS, FAC2, TFSS, FISS, VISS, PLS, MSTATIC, MFSC, PSS, MYTECH }; -``` +1. The enumeration that is called `SelfSchedulingScheme`. The developer will have to add a name for the new technique, e.g., `MYTECH` -2. The function that is called `getNextChunk()`. This function has a switch case that selects the mathematical formula that corresponds to the chosen scheduling method. The developer has to add a new case to handle the new technique. - -```c++ - uint64_t getNextChunk(){ - //... - switch (schedulingMethod){ - //... - //Only the following part is what the developer has to add. The rest remains the same - case MYTECH:{ // the new technique - chunkSize= FORMULA;//Some Formula to calculate the chunksize (partition size) - break; - } - //... + ```cpp + enum SelfSchedulingScheme { STATIC=0, SS, GSS, TSS, FAC2, TFSS, FISS, VISS, PLS, MSTATIC, MFSC, PSS, MYTECH }; + ``` + +1. The function that is called `getNextChunk()`. This function has a switch case that selects the mathematical formula that corresponds to the chosen scheduling method. The developer has to add a new case to handle the new technique. + + ```cpp + uint64_t getNextChunk(){ + //... + switch (schedulingMethod){ + //... + //Only the following part is what the developer has to add. The rest remains the same + case MYTECH:{ // the new technique + chunkSize= FORMULA;//Some Formula to calculate the chunksize (partition size) + break; + } + //... + } + //... + return chunkSize; } - //... - return chunkSize; - } - -``` -**Enabling the selection of the newly added technique** - -The second file `daphne.cpp` contains the code that parses the command line arguments and passes them to the DAPHNE compiler and runtime. The developer has to add the new technique as a vaild option. Otherwise, the developer will not be able to use the newly added technique. + ``` + +**Enabling the selection of the newly added technique:** + +The second file `daphne.cpp` contains the code that parses the command line arguments and passes them to the DAPHNE compiler and runtime. The developer has to add the new technique as a vaild option. Otherwise, the developer will not be able to use the newly added technique. There is a variable called `taskPartitioningScheme` and it is of type `opt`. The developer should extend the declaration of `opt` as follows: -```c++ + +```cpp opt taskPartitioningScheme( cat(daphneOptions), desc("Choose task partitioning scheme:"), values( @@ -82,20 +84,23 @@ opt taskPartitioningScheme( ); ``` -**Usage of the new technique** +**Usage of the new technique:** Daphne developers may now pass the new technique as an option when they execute a DaphneDSL script. + ```bash daphne --vec --MYTECH --grain-size 10 --num-threads 4 --PERCPU --SEQPRI --hyperthreading --debug-mt my_script.daphne ``` + In this example, the daphne system will execute `my_script.daphne` with the following configuration: + 1. the vectorized engine is enabled due to `--vec` -2. the DAPHNE runtime will use MYTECH for task partitioning due to `--MYTECH` -3. the minimum partition size will be 10 due to `--grain-size 10 ` -4. the vectorized engine will use 4 threads due to `--num-threads 4` -5. work stealing will be used with a separate queue for each CPU due to `--PERCPU` -6. the work stealing victim selection will be sequential prioritized due to `--SEQPRI` -7. the rows will be evenly distributed before the scheduling technique is applied due to `--pre-partition` -8. the CPU workers will be pinned to CPU cores due to `--pin-workers` -9. if the number of threads were not specified the number of logical CPU cores would be used (instead of physical CPU cores) due to `--hyperthreading` -10. Debugging information related to the multithreading of vectorizable operations will be printed due to `--debug-mt` +1. the DAPHNE runtime will use MYTECH for task partitioning due to `--MYTECH` +1. the minimum partition size will be 10 due to `--grain-size 10` +1. the vectorized engine will use 4 threads due to `--num-threads 4` +1. work stealing will be used with a separate queue for each CPU due to `--PERCPU` +1. the work stealing victim selection will be sequential prioritized due to `--SEQPRI` +1. the rows will be evenly distributed before the scheduling technique is applied due to `--pre-partition` +1. the CPU workers will be pinned to CPU cores due to `--pin-workers` +1. if the number of threads were not specified the number of logical CPU cores would be used (instead of physical CPU cores) due to `--hyperthreading` +1. Debugging information related to the multithreading of vectorizable operations will be printed due to `--debug-mt` diff --git a/doc/development/HandlingPRs.md b/doc/development/HandlingPRs.md index c62248c62..35be1d877 100644 --- a/doc/development/HandlingPRs.md +++ b/doc/development/HandlingPRs.md @@ -14,8 +14,7 @@ See the License for the specific language governing permissions and limitations under the License. --> -# Guidelines for Handling a Pull Request (PR) - +# Pull Request (PR) Guideline ## Terminology @@ -23,7 +22,6 @@ limitations under the License. - *Collaborator*: a person who has the right to merge a branch into main (besides other rights) (official collaborator status on GitHub) - *Reviewer*: a person who provides feedback on the contribution - ## Disclaimer **These guidelines are mainly for DAPHNE *collaborators***. @@ -31,24 +29,22 @@ However, they could also be interesting for *contributors* to (a) understand how **Feel free to suggest changes to these guidelines** by opening an issue or a pull request if you feel something is missing, could be improved, needs further clarification, etc. - ## Goals of these Guidelines - **Merge useful contributions** (not necessarily perfect ones) into main without unnecessary delay - **Guarantee a certain quality level**, especially since many people are working with the main branch - **Balance the load** for handling PRs among collaborators - ## PR Review/Merging Procedure -### 1. PR creation +### PR creation The contributor creates the PR. - if the PR is marked as a draft, it is handled by an informal discussion depending on concrete questions by the contributor; if there are none, the PR is left alone for now - if the PR is not marked as a draft, the review/merge procedure continues -### 2. Initial response and reviewer assignment +### Initial response and reviewer assignment The DAPHNE collaborators provide an *initial response* and *assign one (or multiple) reviewers* (usually from among themselves, but can also be non-collaborators). @@ -69,180 +65,189 @@ The DAPHNE collaborators provide an *initial response* and *assign one (or multi - on GitHub - ideally, reviewer(s) should *communicate when review can be expected* (based on their availability and urgency of the PR) -### 3. Rounds of feedback and response +### Rounds of feedback and response If necessary, the reviewer(s) and the contributor prepare the contribution for a merge by multiple (but ideally not more than one) rounds of feedback and response. -**3.1. Reviewer examines the contribution** +**Reviewer examines the contribution:** + - **read the code, look at the diff** - - *level of detail* - - focus on integration into overall code base - - if really special topic, which reviewer is not familiar with, then no deep review possible/required - - especially for "good first issues": read code in detail - - be the stricter the more central the code is (the more people are affected by it) - - *clarify relation of PR to issue* - - if PR states to address an issue, check if it really does so - - it can be okay if a PR addresses just a part of a complex issue (if contribution still makes sense) - - briefly check if PR addresses further issues (if so, also mention that in feedback and commit message later) - - PR does not need to address an issue, but if it doesn't, check if contribution really belongs to the DAPHNE system itself (there might be useful contributions which should better reside in a separate repo, e.g., for the usage of DAPHNE, tools around DAPHNE, experiments/reproducibility, ...) - - *contribution DOs* - - readable code - - necessary API changes should be reflected in the documentation (e.g., DaphneDSL/DaphneLib, command line arguments, environment variables, ...) - - appropriate test cases (should be present and make sense, test expected cases, corner cases, and exceptional cases) - - comments/explanations in central pieces of the code - - meaningful placement of the contributed code in the directory hierarchy - - correct integration of additional third-party code (reasonable placement in directory hierarchy, license compatibility, ...) - - DAPHNE license header (*should be checked automatically*) - - ... - - *contributions DON'Ts* - - obvious bugs (also think of corner cases) - - changes unrelated to the PR (should be addressed in separate PRs) - - significant performance degradation (in terms of building as well as executing DAPHNE) (*such checks should be automated*) - - files that should not be committed, because they are not useful to others, too large, or can be generated from other files (e.g., IDE project files, output logs, executables, container images, empty files, unrelated files, experimental results, diagrams, unused files, auto-generated files, ...) - - *unnecessary* API changes (e.g., DaphneDSL/DaphneLib, command line arguments, possibly environment variables, ...) - - reimplementation of things we already have or that should better be imported from some third-party library - - breaking existing code, formatting, tests, documentation, etc. - - confidential information (usernames, passwords, ...) - - paths on local system of contributor - - misleading comments - - copy-paste errors - - extreme code duplication - - useless prints (might even fail test cases) - - whitespace changes that unnecessarily blow up the diff (especially in files that otherwise have no changes) - - ... - - *code style* - - don't be strict as long as we don't have a clearly defined code style which can be enforced automatically - - but watch out for things that make code hard to read, e.g. - - wrong indentation - - lots of commented out lines (especially artifacts from development/debugging) + - *level of detail* + - focus on integration into overall code base + - if really special topic, which reviewer is not familiar with, then no deep review possible/required + - especially for "good first issues": read code in detail + - be the stricter the more central the code is (the more people are affected by it) + - *clarify relation of PR to issue* + - if PR states to address an issue, check if it really does so + - it can be okay if a PR addresses just a part of a complex issue (if contribution still makes sense) + - briefly check if PR addresses further issues (if so, also mention that in feedback and commit message later) + - PR does not need to address an issue, but if it doesn't, check if contribution really belongs to the DAPHNE system itself (there might be useful contributions which should better reside in a separate repo, e.g., for the usage of DAPHNE, tools around DAPHNE, experiments/reproducibility, ...) + - *contribution DOs* + - readable code + - necessary API changes should be reflected in the documentation (e.g., DaphneDSL/DaphneLib, command line arguments, environment variables, ...) + - appropriate test cases (should be present and make sense, test expected cases, corner cases, and exceptional cases) + - comments/explanations in central pieces of the code + - meaningful placement of the contributed code in the directory hierarchy + - correct integration of additional third-party code (reasonable placement in directory hierarchy, license compatibility, ...) + - DAPHNE license header (*should be checked automatically*) + - ... + - *contributions DON'Ts* + - obvious bugs (also think of corner cases) + - changes unrelated to the PR (should be addressed in separate PRs) + - significant performance degradation (in terms of building as well as executing DAPHNE) (*such checks should be automated*) + - files that should not be committed, because they are not useful to others, too large, or can be generated from other files (e.g., IDE project files, output logs, executables, container images, empty files, unrelated files, experimental results, diagrams, unused files, auto-generated files, ...) + - *unnecessary* API changes (e.g., DaphneDSL/DaphneLib, command line arguments, possibly environment variables, ...) + - reimplementation of things we already have or that should better be imported from some third-party library + - breaking existing code, formatting, tests, documentation, etc. + - confidential information (usernames, passwords, ...) + - paths on local system of contributor + - misleading comments + - copy-paste errors + - extreme code duplication + - useless prints (might even fail test cases) + - whitespace changes that unnecessarily blow up the diff (especially in files that otherwise have no changes) + - ... + - *code style* + - don't be strict as long as we don't have a clearly defined code style which can be enforced automatically + - but watch out for things that make code hard to read, e.g. + - wrong indentation + - lots of commented out lines (especially artifacts from development/debugging) - **try out the code** - - check out the branch - - If the contribution originates from a github fork, these steps will help to clone the PR's state into a branch of your working copy (example taken from PR #415): - - Make sure your local copy of the main branch is up to date - ``` bash - git checkout main - git pull - ``` - - Create a branch for the PR changes and pull them on top of that local branch - ``` bash - git checkout -b akroviakov-415-densemat-strings-kernels main - git pull git@github.com:akroviakov/daphne.git 415-densemat-strings-kernels - ``` - - Once you have resolved all potential merge conflicts, you will have to do a merge commit. To get rid of this and ensure a linear history, start an interactive rebase from the last commit in main. In that process all non-relevant commits can be squashed and meaningful commit messages created if necessary. - ``` bash - git rebase -i - ``` - - Once everything is cleaned up in the local PR branch, switch back to main and merge from the PR branch. This should yield clean commits on top of main because of the prior rebasing. - ``` bash - git checkout main - git merge akroviakov-415-densemat-strings-kernels - git push origin main - ``` - - check if the code builds at all (should be checked automatically) - - check if there are compiler warnings (should be fixed) (should be checked automatically) - - check if the test cases pass (should be checked automatically) - - whether these checks succeed or fail may be platform-specific - - **TODO:** think about that aspect in more detail - -**3.2. Reviewer fixes minor problems** + - check out the branch + - If the contribution originates from a github fork, these steps will help to clone the PR's state into a branch of your working copy (example taken from PR #415): + - Make sure your local copy of the main branch is up to date + + ```bash + git checkout main + git pull + ``` + + - Create a branch for the PR changes and pull them on top of that local branch + + ```bash + git checkout -b akroviakov-415-densemat-strings-kernels main + git pull git@github.com:akroviakov/daphne.git 415-densemat-strings-kernels + ``` + + - Once you have resolved all potential merge conflicts, you will have to do a merge commit. To get rid of this and ensure a linear history, start an interactive rebase from the last commit in main. In that process all non-relevant commits can be squashed and meaningful commit messages created if necessary. + + ```bash + git rebase -i + ``` + + - Once everything is cleaned up in the local PR branch, switch back to main and merge from the PR branch. This should yield clean commits on top of main because of the prior rebasing. + + ```bash + git checkout main + git merge akroviakov-415-densemat-strings-kernels + git push origin main + ``` + + - check if the code builds at all (should be checked automatically) + - check if there are compiler warnings (should be fixed) (should be checked automatically) + - check if the test cases pass (should be checked automatically) + - whether these checks succeed or fail may be platform-specific + - **TODO:** think about that aspect in more detail + +**Reviewer fixes minor problems:** + - things that are quicker to fix, than to communicate back and forth - - typos and grammar mistakes (in variable names, status/error messages, comments, ...) - - obvious minor bugs - - wording/terminology (especially in comments) + - typos and grammar mistakes (in variable names, status/error messages, comments, ...) + - obvious minor bugs + - wording/terminology (especially in comments) - add separate commit(s) on PR's branch - - to clearly separate these amendments from original contribution + - to clearly separate these amendments from original contribution - changes should be briefly mentioned/summarized in feedback - - to document that something was changed - - to notify contributor (ideally, they look at the changes in detail, learn from them, and do it better the next time) + - to document that something was changed + - to notify contributor (ideally, they look at the changes in detail, learn from them, and do it better the next time) - may be done at any point in time (before or after requested changes have been addressed by contributor) -**3.3. Reviewer provides feedback on the contribution** +**Reviewer provides feedback on the contribution:** + - **identify requests for concrete changes from contributor** - - *things that the reviewer cannot fix within a few minutes* - - more general corrections, refactoring, ... - - more difficult bugs - - *suitable for requesting mandatory changes* - - in general, things that must to be done before the contribution can be merged, because there will be problems of some kind otherwise - - bugs (functional, non-functional/performance) - - things that could hinder others (e.g., unsolicited refactoring) - - simplifications that make the code dramatically shorter and/or easier to read/maintain, and are straightforward to achieve - - potentially also things that are in conflict with upcoming other PRs - - *not suitable for requesting mandatory changes* - - nice-to-have extensions of the feature: anything that could be done in a separate PR without leaving the code base in a bad state should not be a requirement for merging in at least a meaningful part of a feature - - the contribution of a PR does not need to be perfect, but it should bring us forward - - requests based on personal opinions which cannot be convincingly justified (e.g., implementing a feature in a different way as a matter of taste) (but might be okay for consistency) - - top efficiency - - such points can become follow-up issues and/or todos in the code (feel free to include issue number in todo) + - *things that the reviewer cannot fix within a few minutes* + - more general corrections, refactoring, ... + - more difficult bugs + - *suitable for requesting mandatory changes* + - in general, things that must to be done before the contribution can be merged, because there will be problems of some kind otherwise + - bugs (functional, non-functional/performance) + - things that could hinder others (e.g., unsolicited refactoring) + - simplifications that make the code dramatically shorter and/or easier to read/maintain, and are straightforward to achieve + - potentially also things that are in conflict with upcoming other PRs + - *not suitable for requesting mandatory changes* + - nice-to-have extensions of the feature: anything that could be done in a separate PR without leaving the code base in a bad state should not be a requirement for merging in at least a meaningful part of a feature + - the contribution of a PR does not need to be perfect, but it should bring us forward + - requests based on personal opinions which cannot be convincingly justified (e.g., implementing a feature in a different way as a matter of taste) (but might be okay for consistency) + - top efficiency + - such points can become follow-up issues and/or todos in the code (feel free to include issue number in todo) - **reviewer gives feedback by commenting on the PR** - - use the form on GitHub ("Files changed"-tab -> "Review changes": select "Approve" or "Request changes") - - things to change should be enumerated clearly in the feedback on the PR (ideally numbered list or bullet points) - - briefly explain why these requested changes are necessary - - ideally provide some rough hints on how they could be addressed (but contributor is responsible for figuring out the details) - - optional extensions can be added as suggestions (some contributors are very eager), but clearly say that they are not required before merging - - feedback should be polite, actionable, concrete, and constructive + - use the form on GitHub ("Files changed"-tab -> "Review changes": select "Approve" or "Request changes") + - things to change should be enumerated clearly in the feedback on the PR (ideally numbered list or bullet points) + - briefly explain why these requested changes are necessary + - ideally provide some rough hints on how they could be addressed (but contributor is responsible for figuring out the details) + - optional extensions can be added as suggestions (some contributors are very eager), but clearly say that they are not required before merging + - feedback should be polite, actionable, concrete, and constructive -**3.4. Contributor addresses reviewer comments** +**Contributor addresses reviewer comments:** - ideally, the contributor is willing to do this - otherwise (and especially for new contributors, for whom we want to lower the barrier of entry), the reviewer or someone else should take charge of this, if possible -### 4. Once the contribution is ready, a collaborator merges the PR +### Once the contribution is ready, a collaborator merges the PR - can be done by the reviewer or any collaborator - we want to keep a clean history on the main branch (and remember never to force-push to main) - - makes it easier for others to keep track of the changes that happen - - PR's branch might have untidy history with lots of commits for implementing the contribution and addressing reviewer comments; that should not end up on main + - makes it easier for others to keep track of the changes that happen + - PR's branch might have untidy history with lots of commits for implementing the contribution and addressing reviewer comments; that should not end up on main - typically, we want to rebase the PR branch on main, which may require resolving conflicts -- an example of how to use git on the command line is given in **try out the code in section 3.1** above +- an example of how to use git on the command line is given in **try out the code in section 3.1** above - **case A) if PR is conceptually one contribution** - - on GitHub: - - "Conversation"-tab: use "Squash and merge"-button (select this mode if necessary) - - on the command line: - - rebase and squash as required in the locally checked out PR branch - - force-push to the PR branch (but never force-push to main) - - locally switch to main and merge the PR branch - - push to main - - note: this procedure also ensures that the PR is shown as *merged* (not as *closed*) in GitHub later - - *this will place a single new commit onto main* because you rebased/squashed in the PR branch first + - on GitHub: + - "Conversation"-tab: use "Squash and merge"-button (select this mode if necessary) + - on the command line: + - rebase and squash as required in the locally checked out PR branch + - force-push to the PR branch (but never force-push to main) + - locally switch to main and merge the PR branch + - push to main + - note: this procedure also ensures that the PR is shown as *merged* (not as *closed*) in GitHub later + - *this will place a single new commit onto main* because you rebased/squashed in the PR branch first - **case B) if PR consists of individual meaningful commits of a larger feature (with meaningful commit messages)** - - on GitHub: - - "Conversation"-tab: use "Rebase and merge"-button (select this mode if necessary) - - on the command line: - - rebase as required in the locally checked out PR branch - - force-push to the PR branch (but never force-push to main) - - locally switch to main and merge the PR branch - - push to main - - note: this procedure also ensures that the PR is shown as *merged* (not as *closed*) in GitHub later - - *this will place the new commits onto main* because you rebased in the PR branch first + - on GitHub: + - "Conversation"-tab: use "Rebase and merge"-button (select this mode if necessary) + - on the command line: + - rebase as required in the locally checked out PR branch + - force-push to the PR branch (but never force-push to main) + - locally switch to main and merge the PR branch + - push to main + - note: this procedure also ensures that the PR is shown as *merged* (not as *closed*) in GitHub later + - *this will place the new commits onto main* because you rebased in the PR branch first - **in any case** - - enter a meaningful commit message (what and why, closed issues, ...) - - ideally reuse the initial description of the PR - - in case of squashing (case A above): please remove the unnecessarily long generated commit message) - - **TODO:** commit messages should be a separate item in the developer documentation - - *authorship* - - if multiple authors edited the branch: choose one of them as the main author (after squashing in GitHub it should be the person who opened the PR); more authors can be added by adding [`Co-authored-by: NAME NAME@EXAMPLE.COM`](https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors) after two blank lines at the end of the commit message (one for each co-author). - - very often, reviewers may have made minor fixes, but should refrain from adding themselves as co-authors (prefer to give full credit for the contribution to the initial contributor, unless the reviewer's contribution was significant) + - enter a meaningful commit message (what and why, closed issues, ...) + - ideally reuse the initial description of the PR + - in case of squashing (case A above): please remove the unnecessarily long generated commit message) + - **TODO:** commit messages should be a separate item in the developer documentation + - *authorship* + - if multiple authors edited the branch: choose one of them as the main author (after squashing in GitHub it should be the person who opened the PR); more authors can be added by adding [`Co-authored-by: NAME NAME@EXAMPLE.COM`](https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors) after two blank lines at the end of the commit message (one for each co-author). + - very often, reviewers may have made minor fixes, but should refrain from adding themselves as co-authors (prefer to give full credit for the contribution to the initial contributor, unless the reviewer's contribution was significant) -### 5. Creation of follow-up issues (optional) +### Creation of follow-up issues (optional) - things that were left out - nice-to-haves - functional, non-functional, documentation, tests -### 6. Inviting the contributor as a collaborator (conditional) +### Inviting the contributor as a collaborator (conditional) If this contributor has made enough non-trivial contributions of good quality (currently, we require three), he/she should be invited as a collaborator on GitHub. - ## More Hints - reviewer's time is precious, don't hesitate to request changes from the contributor (but keep in mind that we want to lower the barrier of entry for new contributors) - avoid making a PR too large - - makes it difficult to context-switch into it again and again - - makes overall changes hard to understand and diffs hard to read + - makes it difficult to context-switch into it again and again + - makes overall changes hard to understand and diffs hard to read - whenever in doubt: use discussion features on GitHub to get others' opinions - ## Communication We want to facilitate an open and inclusive atmosphere, which should be reflected in the way we communicate. @@ -252,7 +257,7 @@ We want to facilitate an open and inclusive atmosphere, which should be reflecte - always be polite and respectful to others - keep the conversation constructive - keep in mind the background of other persons - - experienced DAPHNE collaborator or new contributor - - level of technical experience - - English language skills + - experienced DAPHNE collaborator or new contributor + - level of technical experience + - English language skills - ... diff --git a/doc/development/ImplementBuiltinKernel.md b/doc/development/ImplementBuiltinKernel.md index f46773f9e..9b9d4b708 100644 --- a/doc/development/ImplementBuiltinKernel.md +++ b/doc/development/ImplementBuiltinKernel.md @@ -16,16 +16,17 @@ limitations under the License. # Implementing a Built-in Kernel for a DaphneIR Operation -### Background +## Background (Almost) every DaphneIR operation will be backed by a kernel (= physical operator) at run-time. Extensibility w.r.t. kernels is one on the core goals of the DAPHNE system. It shall be easy for a user to add a custom kernel. However, the system will offer a full set of built-in kernels so that all DaphneIR operations can be used out-of-the-box. -### Scope +## Scope This document focuses on: + - default built-in kernels (not custom/external kernels) - implementations for CPU (not HW accelerators) - local execution (not distributed) @@ -40,14 +41,14 @@ As we proceed, we might adapt these guidelines step by step. The goal is to clarify how to implement a built-in kernel and what the thoughts behind it are. This is meant as a proposal, *comments/suggestions are always welcome*. -**Integration into the directory tree** +**Integration into the directory tree:** The implementations of built-in kernels shall reside in `src/runtime/local/kernels`. By default, one C++ header-file should be used for all specializations of a kernel. Depending on the amount of code, the separation into multiple header files is also possible. At least, we should rather not mix kernels of different DaphneIR operations in one header file. -**Interfaces** +**Interfaces:** Technically, a kernel is a C++ function taking one or more data objects (matrices, frames) and/or scalars as input and returning one or more data objects and/or scalars as output. As a central idea, (almost) all DaphneIR operations should be able to process (almost) all kinds of Daphne data structures, whereby these could have any Daphne value type. @@ -69,7 +70,7 @@ The reason for passing output data objects as parameters is that this mechanism The declaration of a kernel function could look as follows: -```c++ +```cpp template void someOp(DTRes *& res, const DTArg * arg, VT otherArg); ``` @@ -91,7 +92,7 @@ Instead of partially specializing the kernel function, we partially specialize t Finally, callers will call an instantiation of the kernel template function. C++ templates offer many ways to express such (partial) specializations, some examples are given below: -```c++ +```cpp // Kernel struct to enable partial template specialization. template struct SomeOp { @@ -147,7 +148,7 @@ struct SomeOp, CSRMatrix, double> { }; ``` -**Implementation of the `apply`-functions** +**Implementation of the `apply`-functions:** As stated above, the `apply`-function contain the actual implementation of the kernel. Of course, that depends on what the kernel is supposed to do, but there some recurring actions. @@ -155,9 +156,11 @@ Of course, that depends on what the kernel is supposed to do, but there some rec - *Obtaining an output data object* Data objects like matrices and frames cannot be obtained using the `new`-operator, but must be obtained from the `DataObjectFactory`, e.g., as follows: - ```c++ + + ```cpp auto res = DataObjectFactory::create>(3, 4, 6, false); ``` + Internally, this `create`-function calls a private constructor of the specified data type implementation; so please have a look at these. - *Accessing the input and output data objects* @@ -172,6 +175,7 @@ Of course, that depends on what the kernel is supposed to do, but there some rec For concrete examples, please have a look at existing kernel implementations in [src/runtime/local/kernels](/src/runtime/local/kernels). For instance, the following kernels represent some interesting cases: + - [ewBinarySca](/src/runtime/local/kernels/EwBinarySca.h) works only on scalars. - [ewBinaryMat](/src/runtime/local/kernels/EwBinaryMat.h) works only on matrices. - [ewBinaryObjSca](/src/runtime/local/kernels/EwBinaryObjSca.h) combines matrix/frame and scalar inputs. diff --git a/doc/development/Logging.md b/doc/development/Logging.md index 20f4f2527..90969a9c6 100644 --- a/doc/development/Logging.md +++ b/doc/development/Logging.md @@ -1,5 +1,5 @@ -# Logging in DAPHNE -### General +# Logging + +## General + To write out messages of any kind from DAPHNE internals we use the [spdlog](https://github.com/gabime/spdlog/) library. E.g., not from a user's print() statement but when ``std::cout << "my value: " << value << std::endl;`` is needed. With -spdlog, the previous std::cout example would read like this: ```spdlog::info("my value: {}", value);```. The only +spdlog, the previous std::cout example would read like this: ```spdlog::info("my value: {}", value);```. The only difference being that we now need to choose a log level (which is arbitrarily chosen to be *info* in this case). -### Usage +## Usage + 1. Before using the logging functionality, the loggers need to be created and registered. Due to the nature of how singletons work in C++, this has to be done once per binary (e.g., daphne, run_tests, libAllKernels.so, libCUDAKernels.so, etc). For the mentioned binaries this has already been taken care of (either somewhere near the main program entrypoint or via context creation in case of the libs). All setup is handled by the class DaphneLogger (with some extras in ConfigParser). -2. Log messages can be submitted in two forms: - 3. ``spdlog::warn("my warning");`` - 4. ``spdlog::get("default")->warn("my warning");`` +1. Log messages can be submitted in two forms: + 1. ``spdlog::warn("my warning");`` + 1. ``spdlog::get("default")->warn("my warning");`` + + The two statements have the same effect. But while *iii.* is a short form for using the default logger, *iv.* + explicitly chooses the logger via the static get() method. - The two statements have the same effect. But while *iii.* is a short form for using the default logger, *iv.* -explicitly chooses the logger via the static get() method. -3. We can have several loggers, which can be configured differently. For example, to control how messages are logged +1. We can have several loggers, which can be configured differently. For example, to control how messages are logged in the CUDA compiler pass MarkCUDAOpsPass, a logger named "compiler::cuda" is used. For each used logger, an entry in ``fallback_loggers`` (see DaphneLogger.cpp) must exist to prevent crashing when using an unconfigured logger. -4. To configure log levels, formatting and output options, the DaphneUserConfig and ConfigParser have been extended. +1. To configure log levels, formatting and output options, the DaphneUserConfig and ConfigParser have been extended. See an example of this inthe ``UserConfig.json`` in the root directory of the DAPHNE code base. -5. At the moment, the output options of our logging infrastructure are a bit limited (inital version). A logger currently +1. At the moment, the output options of our logging infrastructure are a bit limited (inital version). A logger currently always emmits messages to the console's std-out and optionally to a file if a file name is given in the config. -6. The format of log messages can be customized. See the examples in ``UserConfig.json`` and the +1. The format of log messages can be customized. See the examples in ``UserConfig.json`` and the [spdlog documentation](https://github.com/gabime/spdlog/). -7. If a logger is called while running unit tests (run_tests executable), make sure to ```#include ``` and +1. If a logger is called while running unit tests (run_tests executable), make sure to ```#include ``` and call ```auto dctx = setupContextAndLogger();``` somewhere before calling the kernel to be tested. -8. Logging can be set to only work from a certain log level and above. This mechanism also serves as a global toggle. -To set the log level limit, set ``` { "log-level-limit": "OFF" },```. In this example, taken from ``UserConfig.json``, +1. Logging can be set to only work from a certain log level and above. This mechanism also serves as a global toggle. +To set the log level limit, set ```{ "log-level-limit": "OFF" },```. In this example, taken from ``UserConfig.json``, all logging is switched off, regardless of configuration. -### Log Levels +## Log Levels + These are the available log levels (taken from ``````). Since it's an enum, their numeric value start from 0 for TRACE to 6 for OFF. + ```cpp namespace level { enum level_enum : int @@ -65,7 +71,8 @@ enum level_enum : int }; ``` -### ToDo: +## ToDo + * Guideline when which log level is recommended * Toggle console output * Other log sinks diff --git a/doc/development/WriteDocs.md b/doc/development/WriteDocs.md new file mode 100644 index 000000000..5229bb311 --- /dev/null +++ b/doc/development/WriteDocs.md @@ -0,0 +1,47 @@ + + +# Writing Documentation + +At the moment the collection of markdown files in the `doc` directory is rendered to HTML and deployed via GitHub Pages. + +If you insert a new markdown file, you have to add it into the html docs tree in [mkdocs.yml](/mkdocs.yml) at a suitable position under the `nav` section. + +## Markdown Guideline + +Please write clean markdown code to ensure a proper parsing by the tools used to render HTML. It is very recommended to use an IDE like *VS Code*. Code offers the feature to directly render markdown pages while you work on them. The extension [markdownlint](https://marketplace.visualstudio.com/items?itemName=DavidAnson.vscode-markdownlint) directly highlights syntax violations / problems. + +### Links + +* With `[]()` you can link to other files in the repo +* Write links to other markdown files or source code files/diretories so that they work locally / in the github repository +* Do not use relative links like `../BuildingDaphne.md` +* Always use absolute paths relative to the repo root like `/doc/development/BuildingDaphne.md` +* The Links/URLs will be altered in order to work on the rendered HTML page as well +* Reference to issues with `[](/issues/123)`. This won't work on github itself but will be rendered in the html page then + +### Additional Syntax + +While some markdown renderers are much more relaxed and render as wished, some points have to be considered so that mkdocs renders correctly as well. + +* 4 spaces indentation for nested lists (ordered/unordered) and code blocks within lists to ensure proper rendering for HTML +* Using <\>: To use angle brackets use `<\>` notation outside an codeblock + * Example: `` renders to + +## Toolstack + +* [MkDocs](https://www.mkdocs.org/) to build html from markdown files +* [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/) as HTML theme diff --git a/doc/docs-build-requirements.txt b/doc/docs-build-requirements.txt new file mode 100644 index 000000000..383793d74 --- /dev/null +++ b/doc/docs-build-requirements.txt @@ -0,0 +1 @@ +mkdocs-material diff --git a/doc/stylesheets/extra.css b/doc/stylesheets/extra.css new file mode 100644 index 000000000..91ee9a0cc --- /dev/null +++ b/doc/stylesheets/extra.css @@ -0,0 +1,11 @@ +.md-grid { + max-width: 1600px; +} + +:root { + --md-primary-fg-color: #7889fb; + --md-primary-fg-color--light: #a0acfc; + --md-primary-fg-color--dark: #5c69bf; + --md-accent-fg-color: #5c69bf; + --md-accent-fg-color--transparent: #a0acfc; + } diff --git a/doc/tutorial/sqlTutorial.md b/doc/tutorial/sqlTutorial.md index c7941465c..19d2e4c51 100644 --- a/doc/tutorial/sqlTutorial.md +++ b/doc/tutorial/sqlTutorial.md @@ -15,6 +15,7 @@ limitations under the License. --> # Using SQL in DaphneDSL + DAPHNE supports a rudimentary version of SQL. At any point in a DaphneDSL script, we can execute a SQL query on frames. We need two operations to achieve this: ```registerView(...)``` and ```sql(...)``` @@ -23,20 +24,24 @@ For the following examples we assume we already have a DaphneDSL script which in ## General Procedure ### registerView(...) + RegisterView registers a frame for the sql operation. If we want to execute a SQL query on a frame, we *need* to register it before that. The operation has two inputs: the name of the table, as a string, and the frame which shall be associated with the given name. For example, we can register the frame "x", from previous calculations, under the name "Table1". The DaphneDSL script for this would look like this: -``` + +```cpp registerView("Table1", x); ``` ### sql(...) + Now that we have registered the tables, that we need for our SQL query, we can go ahead and execute our query. The SQL operation takes one input: the SQL query, as a string. In it, we will reference the table names we previously have registered via registerView(...). As a result of this operation, we get back a frame. The columns of the frame are named after the projection arguments inside the SQL query. For example, we want to return all the rows of the frame x, which we have previously registered under the name "Table1", where the column "a" is greater than 5 and save it in a new frame named "y". The DaphneDSL script for this would look like this: -``` + +```cpp y = sql("SELECT t.a as a, t.b as b, t.c as c FROM Table1 as t WHERE t.a > 5;"); ``` @@ -44,11 +49,13 @@ This results in a frame "y" that has three columns "a", "b" and "c". On the frame y we can continue to build our DaphneDSL script. ## Features + We don't support the complete SQL standard at the moment. For instance, we need to fully specify on which columns we want to operate. In the example above, we see "t.a" instead of simply "a". Also, not supported are DDL and DCL Queries. Our goal for DML queries is to only support SELECT-statements. Other features we do and don't support right now can be found below. ### Supported Features + * Cross Product * Complex Where Clauses * Inner Join with single and multiple join conditions separated by an "AND" Operator @@ -58,6 +65,7 @@ Other features we do and don't support right now can be found below. * As ### Not Yet Supported Features + * The Star Operator \* * Nested SQL Queries like: ```SELECT a FROM x WHERE a IN SELECT a FROM y``` * All Set Operations (Union, Except, Intersect) @@ -72,37 +80,37 @@ The DaphneDSL scripts can be found in `doc/tutorial/sqlExample1.daph` and `doc/t ### Example 1 -``` +```cpp //Creation of different matrices for a Frame - //seq(a, b, c) generates a sequences of the form [a, b] and step size c - employee_id = seq(1, 20, 1); - //rand(a, b, c, d, e, f) generates a matrix with a rows and b columns in a value range of [c, d] - salary = rand(20, 1, 250.0, 500.0, 1.0, -1); - //with [a, b, ..] we can create a matrix with the given values. - age = [20, 30, 23, 65, 70, 42, 34, 55, 76, 32, 53, 40, 42, 69, 63, 26, 70, 36, 21, 23]; + //seq(a, b, c) generates a sequences of the form [a, b] and step size c + employee_id = seq(1, 20, 1); + //rand(a, b, c, d, e, f) generates a matrix with a rows and b columns in a value range of [c, d] + salary = rand(20, 1, 250.0, 500.0, 1.0, -1); + //with [a, b, ..] we can create a matrix with the given values. + age = [20, 30, 23, 65, 70, 42, 34, 55, 76, 32, 53, 40, 42, 69, 63, 26, 70, 36, 21, 23]; - //createFrame() creates a Frame with the given matrices. The column names (strings) are optional. - employee_frame = createFrame(employee_id, salary, age, "employee_id", "salary", "age"); + //createFrame() creates a Frame with the given matrices. The column names (strings) are optional. + employee_frame = createFrame(employee_id, salary, age, "employee_id", "salary", "age"); //We register the employee_frame we created previously. note the name for the registration and the //name of the frame don't have to be the same. - registerView("employee", employee_frame); + registerView("employee", employee_frame); //We run a SQL Query on the registered Frame. Note here we have to reference the name we choose //during registration. - res = sql( - "SELECT e.employee_id as employee_id, e.salary as salary, e.age as age - FROM employee as e - WHERE e.salary > 450.0;"); + res = sql( + "SELECT e.employee_id as employee_id, e.salary as salary, e.age as age + FROM employee as e + WHERE e.salary > 450.0;"); //We can Print both employee and the query result to the console with print(). - print(employee_frame); - print(res); + print(employee_frame); + print(res); ``` ### Example 2 -``` +```cpp employee_id = seq(1, 20, 1); salary = rand(20, 1, 250.0, 500.0, 1.0, -1); age = [20, 30, 23, 65, 70, 42, 34, 55, 76, 32, 53, 40, 42, 69, 63, 26, 70, 36, 21, 23]; @@ -112,10 +120,10 @@ employee_frame = createFrame(employee_id, salary, age, "employee_id", "salary", registerView("employee", employee_frame); res = sql( - "SELECT e.age as age, avg(e.salary) as salary - FROM employee as e - GROUP BY e.age - ORDER BY e.age"); + "SELECT e.age as age, avg(e.salary) as salary + FROM employee as e + GROUP BY e.age + ORDER BY e.age"); print(employee_frame); print(res); diff --git a/mkdocs.yml b/mkdocs.yml new file mode 100755 index 000000000..961406c57 --- /dev/null +++ b/mkdocs.yml @@ -0,0 +1,73 @@ +site_name: DAPHNE +docs_dir: doc +site_dir: doc_build +theme: + name: material + logo: assets/logo_small.png + favicon: assets/logo_small.png + icon: + repo: fontawesome/brands/github + palette: + # Palette toggle for light mode + - scheme: default + primary: custom + accent: custom + toggle: + icon: material/weather-night + name: Switch to dark mode + # Palette toggle for dark mode + - scheme: slate + primary: pink + accent: pink + toggle: + icon: material/weather-sunny + name: Switch to light mode + features: + - content.code.copy + - navigation.tabs + - navigation.tabs.sticky +markdown_extensions: + - pymdownx.highlight: + anchor_linenums: true + - pymdownx.inlinehilite + - pymdownx.snippets + - pymdownx.superfences + - pymdownx.striphtml +extra_css: + - stylesheets/extra.css +repo_url: https://github.com/daphne-eu/daphne +repo_name: daphne-eu/daphne +nav: + - 'Home': + - README.md + - GettingStarted.md + - 'Users': + - Quickstart.md + - RunningDaphneLocally.md + - 'Configuration': Config.md + - 'DaphneDSL': + - DaphneDSL/LanguageRef.md + - DaphneDSL/Builtins.md + - DaphneDSL/Imports.md + - 'DaphneLib': + - DaphneLib/Overview.md + - DaphneLib/APIRef.md + - FileMetaDataFormat.md + - DistributedRuntime.md + - SchedulingOptions.md + - 'Tutorials': + - tutorial/sqlTutorial.md + - FPGAconfiguration.md + - MPI-Usage.md + - 'Developers': + - development/Contributing.md + - development/BuildingDaphne.md + - Deploy.md + - ReleaseScripts.md + - BinaryFormat.md + - development/Logging.md + - development/ExtendingDistributedRuntime.md + - development/ExtendingSchedulingKnobs.md + - development/HandlingPRs.md + - development/ImplementBuiltinKernel.md + - development/WriteDocs.md