diff --git a/soda/contracts/adr/10_data_source_configuration.md b/soda/contracts/adr/10_data_source_configuration.md index 668584dd2..28e5fbe69 100644 --- a/soda/contracts/adr/10_data_source_configuration.md +++ b/soda/contracts/adr/10_data_source_configuration.md @@ -2,15 +2,15 @@ In order to verify (Soda) or push (Atlan) a contract, a data source is required. -A contract file will reference a data source by name. Data sources are configured in +A contract file will reference a data source by name. Data sources are configured in data source YAML files. -Other configurations next to data sources are variables, Soda Cloud connection and Atlan +Other configurations next to data sources are variables, Soda Cloud connection and Atlan connections. -Different environments require different configurations. The configuration files are -grouped per environment in a directory. When performing a contract operation, next to -the contract files, the environment configurations is passed as a directory. +Different environments require different configurations. The configuration files are +grouped per environment in a directory. When performing a contract operation, next to +the contract files, the environment configuration is passed as a directory. In an environment configuration directory, the file names are used to determine the file type: @@ -25,7 +25,7 @@ In an environment configuration directory, the file names are used to determine A CLI or API operation like Soda's `verify` or Atlan's `push` requires one or more contract files. Each contract file will need to resolve the data source by name. -A `data_source` is **required** in a contract file and refers to the data source in the data sources +A `data_source` is **required** in a contract file and refers to the data source in the data sources configuration file. Eg `./customers.yml` @@ -45,28 +45,30 @@ For example: with following file structure... +- contracts +- customers.yml ``` -And a command line prompt in the root directory, soda can be invoked with +And a command line prompt in the root directory, soda can be invoked with ``` -> soda --configuration .soda --contract contracts/customers.yml +> soda --configuration .soda --contract contracts/customers.yml ``` -In this case the `--configuration .soda` is optional as it will be 1 of 4 default locations +In this case the `--configuration .soda` is optional as it will be 1 of 4 default locations where the configuration is fetched from: ``` -> soda --contract contracts/customers.yml +> soda --contract contracts/customers.yml ``` -will by default take the first configuration directory that exists and is contains configuration files: -1. ${user.home}/.soda -2. ${user.home}/.atlan -3. ./.soda -4. ./.atlan +will by default take the first configuration directory that exists and contains configuration files: +1. ./.soda +2. ./.atlan +3. ${user.home}/.soda +4. ${user.home}/.atlan + +*(Note: for the Atlan CLI the precedence of (1) and (2) may be inverted, and the precedence of (3) and (4) may be inverted.)* ### Working with environments -Typically engineers need to work on different environments like (local) development, cicd and production. +Typically engineers need to work on different environments like (local) development, CI/CD and production. -All configuration files for an environment must be located in a single configuration directory. That +All configuration files for an environment must be located in a single configuration directory. That environment configuration directory is passed to the CLI or API command. For example: with following file structure... @@ -84,7 +86,7 @@ For example: with following file structure... +- contracts | +- customers.yml ``` -Soda can be executing using the cicd environment configurations like this: +Soda can be executing using the cicd environment configurations like this: ```shell soda -cfg ./.soda/cicd ./contracts/customers.yml ``` @@ -98,7 +100,7 @@ data_source: snowflake_landing_zone ### Data source file format -(1) Example +(1) Example ```yaml - type: snowflake connection: @@ -118,11 +120,11 @@ data_source: snowflake_landing_zone A data source contains a list of connections. Each connection has a type. -A connection also has a list of named `data_sources`. These data source keys must match the names referenced -in the contract files with `data_source`. +A connection also has a list of named `data_sources`. These data source keys must match the names referenced +in the contract files with `data_source`. -Each data source can specify a specific schema, database or any other structural element of the specific SQL engine -Data source properties like `database`, `schema` (and in the bigquery case `project` and `datasets`) are dependent +Each data source can specify a specific schema, database or any other structural element of the specific SQL engine. +Data source properties like `database`, `schema` (and in the bigquery case `project` and `datasets`) are dependent on the connection type. For each of these properties we use connection-specific terminology. And each data source can also overwrite certain connection properties. @@ -134,12 +136,12 @@ Variables must be upper case underscore: regex `[A-Z0-9_]+` (no jinja templating, only variable substitution without spaces) -In the future we may define a vault.yml configuration file that specifies how to load environment variables from +In the future we may define a `vault.yml` configuration file that specifies how to load environment variables from a vault or secret store. ## Identifying datasets -TODO: describe how datasets are identified as a combination +TODO: describe how datasets are identified as a combination * configuration folder name * data source name * dataset name