-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathgenerate_surrogate_key.jinja
42 lines (38 loc) · 1.69 KB
/
generate_surrogate_key.jinja
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{#
Macro: generate_surrogate_key(fields)
Description: This macro generates a surrogate key for a given list of fields.
The surrogate key is a unique identifier for each record in a table.
It is generated by concatenating the values of the given fields and applying the MD5 hash function to the result.
The fields are cast to varchar type before concatenation. There's also a delimited placed between each field value to avoid the same hash been generated from different field values.
If a field value is null, it is replaced with the string '_jinja_surrogate_key_null_'.
The resulting surrogate key is a varchar type.
Parameters:
fields - A list of field names for which the surrogate key should be generated.
Usage:
SELECT
{{ generate_surrogate_key(['column1', 'column2', 'column3']) }} AS surrogate_key,
column1,
column2,
column3
FROM
(Values ('a', 'b', 'c'), ('d', 'e', 'f'), ('g', 'h', 'i')) AS t(column1, column2, column3);
This will produce:
SELECT
md5(
coalesce(cast(column1 as varchar),'_jinja_surrogate_key_null_')
|| '-' || coalesce(cast(column2 as varchar),'_jinja_surrogate_key_null_')
|| '-' || coalesce(cast(column3 as varchar),'_jinja_surrogate_key_null_')) AS surrogate_key,
...
Resulting in types like this:
| column_name | data_type | classifier |
+---------------+-----------+------------+
| surrogate_key | varchar | |
#}
{%- macro generate_surrogate_key(fields) -%}
md5(
{%- for field in fields %}
{% if not loop.first %}|| '-' || {% endif -%}
coalesce(cast({{ field }} as varchar),'_jinja_surrogate_key_null_')
{%- endfor -%}
)
{%- endmacro -%}