Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stripped out brac models #81

Closed
wants to merge 81 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
4b38db5
feat: update contactview_metadata to read from couchdb and add indexes
njogz May 8, 2024
4fa0456
feat: more pruning of models
njogz May 8, 2024
e5cd36c
fix, missing comma
njogz May 8, 2024
6bdcd38
feat: add missing column
njogz May 8, 2024
4c91b49
feat: add reported_date to indexes
njogz May 8, 2024
c382000
feat: add other contactview views
njogz May 8, 2024
1481bca
feat: add references
njogz May 8, 2024
0daafaf
fix: try removing semicolons
njogz May 8, 2024
eb19e58
fix: model references
njogz May 8, 2024
53ca9c1
fix: model references
njogz May 8, 2024
9a6524e
fix: model reference
njogz May 8, 2024
91afb61
feat: update references
njogz May 8, 2024
b74ee43
fix: typo
njogz May 8, 2024
fe4534a
fix: column names
njogz May 8, 2024
cfcf5f3
fix: missed a reference
njogz May 8, 2024
d5ede3d
feat: add doc to metadata view
njogz May 8, 2024
bd8f042
fix: column names
njogz May 8, 2024
2996280
Add incremental filter logic
May 8, 2024
a383355
Convert root model to incremental and everything else to views
May 8, 2024
1dc8538
Add COALESCE
May 8, 2024
3b82da0
Move the use of coalesce to the right side of the filter
May 9, 2024
c15869f
Add subquery
May 9, 2024
021a032
Back to original structure
May 9, 2024
e085b39
Try subquery result for max_timestamp
May 9, 2024
0b8ecfa
Reverse
May 9, 2024
315f1db
Create max_existing_multiro_macros and use it
May 9, 2024
7bf0982
Add default value
May 9, 2024
4df2d18
Add default value to macro
May 9, 2024
5742459
add data_reocrd root model and brac form models
witash May 13, 2024
b5079cd
feat: change chp_with_branches back to table
njogz May 13, 2024
01390e8
feat: handle null values
njogz May 14, 2024
b13a7fe
feat: stop using macro for where clause
njogz May 14, 2024
33df728
chore: undo previous macro change
njogz May 14, 2024
7893789
chore: rename column to reduce ambiguity
njogz May 14, 2024
f98650b
chore: use metadata.reported for filter
njogz May 14, 2024
dae5ba3
chore: try subquery instead of macro
njogz May 14, 2024
bb40734
feat: stop using macro in data_record
njogz May 16, 2024
f6a60f0
feat: try btree indexes
njogz May 16, 2024
6c83336
feat: remove couchdb model, read from medic table for contact and dat…
njogz May 17, 2024
fd89811
chore: missed a couchdb reference
njogz May 17, 2024
f355cef
chore: fix typo in query
njogz May 17, 2024
e531b72
feat: add timestamp to contact model
njogz May 17, 2024
83781d5
fix: another typo
njogz May 17, 2024
bf87b20
feat: add the raw doc to the contact model
njogz May 17, 2024
d108cb6
fix: use reported and not reported_date
njogz May 17, 2024
83d9829
feat: bring back couchdb model and make use of auto_id
njogz May 20, 2024
a746700
fix: add auto_id to contact model
njogz May 20, 2024
8e523b8
feat: make use of type column which is indexed
njogz May 20, 2024
e1eb93a
feat: undo auto_id and try out macros again
njogz May 21, 2024
c8a25f9
feat: use sub query
njogz May 21, 2024
6ace4a4
feat: edit macros and try it on the data record model
njogz May 22, 2024
c87ad10
feat: try holding latest timestamps in mat views
njogz May 22, 2024
68f9bc7
feat: remove data_record and brac forms modesls, use macros for incre…
njogz May 22, 2024
3b72437
feat: try using mat view for incremental query
njogz May 23, 2024
db585f9
feat: using the mat views causes aa cyclical dep error
njogz May 23, 2024
8a5962b
feat: filter by person early
njogz May 23, 2024
c8ca5e9
fix: remove semicolon
njogz May 23, 2024
91ee6c8
try converting contact_person_fileds to mat view
njogz May 23, 2024
18eeecb
fix: remove indexes
njogz May 23, 2024
d87bb3a
contactview_chp as matview
njogz May 23, 2024
dfac556
feat: temporarily use sub query
njogz May 23, 2024
e757d03
fix: use macros again
njogz May 23, 2024
3204876
chore: temp change to confirm issue
njogz May 23, 2024
a2a8c8e
fix: issue with new tables confirmed
njogz May 23, 2024
28c64e8
fix: use macros for incremental view
njogz May 23, 2024
99ee50a
feat: convert matviews back to views
njogz May 24, 2024
b30b996
feat: try lamda view for couchdb model
njogz May 24, 2024
61b2522
fix: edit macros to get max from stable_couchdb
njogz May 24, 2024
ec372e8
fix: edit macros to get max from stable_couchdb
njogz May 24, 2024
3699d1b
fix: use CTE instead
njogz May 24, 2024
0b40fde
fix: latest timestamp from correct table
njogz May 24, 2024
c50ff94
fix: add coalsesce to fix new table issue
njogz May 24, 2024
ee66f48
fix: change where we add coalesce
njogz May 24, 2024
4e825b0
fix: coalesce not working
njogz May 24, 2024
a785b09
chore: get data unto the tables
njogz May 24, 2024
72e9ddb
fix: now we can use macros
njogz May 24, 2024
3736db7
fix: contact model still failing and update query in couchdb view
njogz May 24, 2024
0e6c948
fix: update query in couchdb view
njogz May 24, 2024
6f62150
fix: CTE in couchdb model
njogz May 24, 2024
60ba390
fix: address sonar issues
njogz May 27, 2024
f911a9a
fix: add soanr exclusions and undo previous variable introduction
njogz May 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .sonarcloud.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sonar.exclusions=macros/max_existing_multiro_macros.sql, models/contacts/contact.sql, models/root/stable_couchdb.sql
237 changes: 237 additions & 0 deletions macros/max_existing_multiro_macros.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
{% macro max_existing(max_field, target_ref=this) -%}
{#
Attribution: https://gist.github.com/davehowell/1d6564875f35e58d8da14c9adbcb92da
#}
{#
Gets the max existing value for some field in the target, or some other ref or source.
Use in incremental models, inside `if is_incremental()` or `if execute` blocks,
otherwise the dbt model will not compile.
Useful where you have a primary key or other watermark field and want to construct SQL with
that value determined at compile time.
Why?
When using the max value multiple times in a query it will be better than inlining multiple
subqueries to fetch the same value, and in many cases, hardcoding a value in a where clause
generates a significantly better execution plan.

Params: max_field - string, the name of the field
target_ref - string, pass in a call to ref or source to ensure dag dependency
defaults to this (current context model)
Return: A literal string of the max value - will not have any quoting
Usage: To return dates use `max_existing_date`
To return timestamps use `max_existing_timestamp`

For integers:

SELECT pk_column [,col2 ...]
FROM source_table_ref
WHERE True
{%- if is_incremental() %}
AND some_int_column > {{ max_existing('some_int_column') }}
{%- endif %}

For strings - n.b. probably not useful to use strings as watermarks

SELECT pk_column [,col2 ...]
FROM source_table_ref
WHERE True
{%- if is_incremental() %}
AND some_string_column > '{{ max_existing("some_string_column") }}'
{%- endif %}
#}
{% call statement('get_max_existing', fetch_result=True)-%}

SELECT max({{ max_field }}) as max_existing
FROM {{ '"' ~ this.schema ~ '"' ~ '.' ~ '"' ~ this.name~ '"' if target_ref in [this] else target_ref }}

{%- endcall %}
{% set max_existing_field = load_result('get_max_existing').table.columns['max_existing'].values()[0] %}
{% if max_existing_field == 'NULL' %}
{{ return('1990-01-01 23:00.000') }}
{% else %}
{{ return(max_existing_field) }}
{% endif %}
{%- endmacro %}


{% macro _max_existing_multirow(max_field, group_by_field) -%}
{#
Internal usage. Gets the max existing values for multiple groups in a target table.

Use in incremental model,
inside `if is_incremental()` or `if execute` otherwise
the dbt model will not compile.
Params:
max_field - column or valid SQL expression to be maxed
group_by_field - column or valid SQL expression to use in group by clause
Return: agate.Table.rows https://agate.readthedocs.io/en/1.6.1/api/table.html

Usage:
See the public version of this macro.
#}
{% call statement('get_max_existing_multirow', fetch_result=True)-%}

SELECT
{{ group_by_field }} as group_by_field,
max({{ max_field }}) AS max_existing
FROM {{ this.schema }}.{{ this.name }}
GROUP BY {{ group_by_field }}

{%- endcall %}
{% set result = load_result('get_max_existing_multirow') %}

{{ return(result.table.rows) }}
{%- endmacro %}

{% macro _two_column_matrix_to_list_dicts(agate_rows, col1_name, col2_name) %}
{#
Converts an agate.Table.rows to a list of dicts
Params: an agate.Table.rows and names of the two columns
Return: list of dicts with keys col1_name and col2_name
#}
{%- set list_dicts = [] -%}
{%- for row in agate_rows -%}
{{- list_dicts.append( {col1_name: row[0], col2_name : row[1] } ) -}}
{%- endfor -%}
{% for dict_val in list_dicts %}
{{ log(dict_val) }}
{%- endfor -%}
{{ return(list_dicts)}}
{% endmacro %}


{% macro max_existing_multirow(max_field, group_by_field) %}
{#
Gets the max existing values for multiple groups in a target table.
Use in incremental models, inside `if is_incremental()` or `if execute`
otherwise the dbt model will not compile.

Params:
max_field - column or valid SQL expression to be maxed
group_by_field - column or valid SQL expression to use in group by clause
Note: expressions should not have column aliases
Return: list of dicts where dict attributes have the keys 'type' and 'max_val'

Usage:
Iterate over the rows e.g.

{%- for row in max_existing_multirow('pk_column', 'type_column') %}
{{ row.pk_column }}
{%- endfor %}

Useful in an incremental model that selects from a `UNION ALL` across
many source models where the unique_key watermark field might not be unique
across those models. A surrogate key, hash or concatenated field in the target is not
adequate to check watermarks across multiple sources.
If you maintain some lineage indicator then it is possible to fetch all the max values
e.g.
Note for this example the fields 'type' and 'pk' in the target model store which parent
table the row comes from and the respective watermark of that table,
in this case a primary key. The watermark could also be some timestamp like "updated_at"


{%- set types = ['ada', 'grace', 'rosie'] %}

{%- if is_incremental() %}
{%- set max_values_list = mathspace.max_existing_multirow('pk', 'type') %}
{%- endif -%}

{%- for t in types %}
SELECT pk, col2, col3, coln FROM {{ ref(t) }}
WHERE True

{%- if is_incremental() %}
AND pk > {{ lookup_max_value(max_values, type) }}
{%- endif -%}

{%- if not loop.last %}
UNION ALL
{%- endif %}

{%- endfor %}

#}
{% set agate_rows = _max_existing_multirow(max_field, group_by_field) %}
{% set list_dicts = _two_column_matrix_to_list_dicts(agate_rows, 'type', 'max_val') %}
{{ return(list_dicts) }}
{% endmacro %}

{% macro lookup_max_value(max_values_list, lookup_value, lookup_key='type', return_key='max_val') %}
{#
Use with the result of the max_existing_multirow macro.
Does a lookup into that list of dicts for a given lookup_key.

Params: max_values - the output of the max_existing_multirow macro, a list of dicts
must have a 'type' and 'max_val' key
Return: literal string of the max value if found or None
None will print blank and probably cause an error in the SQL of your model.
#}
{{ return (max_values_list | selectattr(lookup_key, 'eq', lookup_value) | map(attribute=return_key) | list | first) }}
{% endmacro %}

{% macro _cast_as_type(stringvalue, as_type) %}
{{ return("CAST('" ~ stringvalue ~ "' AS " ~ as_type ~ ")") }}
{% endmacro %}


{% macro max_existing_timestamp(timestampfield, target_ref=this) -%}
{#
Params:
timestampfield: date or timestamp
Returns a SQL expression casting the max value string to a timestamp.
e.g. "CAST('2019-01-01 23:00.000' AS TIMESTAMP)"
#}
{{ return(_cast_as_type(max_existing(timestampfield, target_ref), 'TIMESTAMP')) }}
{%- endmacro %}


{% macro max_existing_date(timestampfield, target_ref=this) -%}
{#
Params:
timestampfield: date or timestamp
target_ref: string, optional, defaults to the current context model
pass in a call to ref or source to target a different model
Returns a SQL expression casting the max value string to a timestamp.
e.g. "CAST('2019-01-01' AS DATE)"
Usage:
current model - latest value of a field called "updated_at":
{{ max_existing_date('updated_at') }}

some other model ref - latest value of a field called "deactivated_at":
{{ max_existing_date('deactivated_at', ref('other_model')) }}

some other model source - latest value of a field called "deactivated_at":
{{ max_existing_date('deactivated_at', source('source_name', 'table_or_view_name')) }}

#}
{{ return(_cast_as_type(max_existing(timestampfield, target_ref), 'DATE')) }}
{%- endmacro %}


{% macro _max_existing_prior_date(datefield, target_ref=this) -%}
{% call statement('max_existing_prior_date', fetch_result=True)-%}

SELECT {{ dbt_utils.dateadd('day', -1, "max(" ~ datefield ~ ")" ) }} as max_existing_prior_date
FROM {{ '"' ~ this.schema ~ '"' ~ '.' ~ '"' ~ this.name~ '"' if target_ref in [this] else target_ref }}

{%- endcall %}
{% set max_existing_prior = load_result('max_existing_prior_date').table.columns['max_existing_prior_date'].values()[0] %}
{{ return(max_existing_prior) }}
{%- endmacro %}


{% macro max_existing_prior_date(datefield, target_ref=this) -%}
{#
A day prior to the max of some date field.
Params, Return and Usage as per the max_existing_date() macro
#}
{{ return(_cast_as_type(_max_existing_prior_date(datefield, target_ref), 'DATE')) }}
{%- endmacro %}


{% macro max_existing_prior_timestamp(datefield, target_ref=this) -%}
{#
A day prior to the max of some timestamp field
Params, Return and Usage as per the max_existing_date() macro
#}
{{ return(_cast_as_type(_max_existing_prior_date(datefield, target_ref), 'TIMESTAMP')) }}
{%- endmacro %}
33 changes: 33 additions & 0 deletions models/contact_views/chp_with_branches.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{{
config(
materialized = 'incremental',
)
}}

SELECT
chp.name as CHP_Name,
chp.uuid as CHP_ID,
branch.name as Branch_Name,
chp.branch_uuid as Branch_ID,
chp.phone as Phone,
chp.phone2 as Phone2,
chp.date_of_birth as DOB,
chp.parent_type as Parent_Type,
chp.area_uuid as Area_ID,
metadata.name as supervisor_name,
metadata.reported as reported
FROM
{{ ref("contactview_chp") }} AS chp,
{{ ref("contactview_branch") }} AS branch,
{{ ref("contact") }} AS metadata,
{{ ref("contact") }} cm
WHERE
chp.branch_uuid = branch.uuid AND
chp.supervisor_uuid = metadata.uuid AND
chp.uuid = cm.contact_uuid AND
branch.name != 'HQ' AND branch.name != 'HQ OVC'
{% if is_incremental() %}
AND metadata.reported >= (SELECT MAX(reported) FROM {{ this }} WHERE reported IS NOT NULL)
{% endif %}
ORDER BY
CHP_Name
26 changes: 0 additions & 26 deletions models/contact_views/contact_views.yml

This file was deleted.

15 changes: 15 additions & 0 deletions models/contact_views/contactview_branch.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{{
config(
materialized = 'view',
)
}}

SELECT
ch.uuid,
ch.name,
cm.area,
cm.region
FROM
{{ ref("contactview_hospital") }} AS ch
INNER JOIN {{ ref("contact") }} AS cm
ON (cm.uuid = ch.uuid AND cm.type = 'district_hospital')
62 changes: 62 additions & 0 deletions models/contact_views/contactview_chp.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{{
config(
materialized = 'view',
)
}}

SELECT
chw.name,
chw.uuid,
user_settings.username,
chw.phone,
chw.phone2,
chw.date_of_birth,
chw.parent_type,
chw.area_uuid,
chw.branch_uuid,
branch.name AS branch_name,
branch.region,
coalesce(nullif(chp.doc ->> 'supervisor'::TEXT, ''::TEXT), '563649afa0e2a13740a1982abc0a2d0d'::TEXT) AS supervisor_uuid,
chp.doc #>> '{chp_profile, g_individual_info,nin}'::TEXT[] AS nin,
chp.doc #>> '{chp_profile, g_individual_info,district_of_residence}'::TEXT[] AS district_of_residence,
chp.doc #>> '{chp_profile, g_individual_info, county}'::TEXT[] AS county,
chp.doc #>> '{chp_profile, g_individual_info, sub_county}'::TEXT[] AS sub_county,
chp.doc #>> '{chp_profile, g_individual_info, parish}'::TEXT[] AS parish,
chp.doc #>> '{chp_profile, g_individual_info, village}'::TEXT[] AS village,
chp.doc #>> '{chp_profile, g_individual_info, dob}'::TEXT[] AS dob,
chp.doc #>> '{chp_profile, g_individual_info, sex}'::TEXT[] AS sex,
chp.doc #>> '{chp_profile, g_individual_info, marital_status}'::TEXT[] AS marital_status,
chp.doc #>> '{chp_profile, g_contact_info, phone_number}'::TEXT[] AS phone_number,
chp.doc #>> '{chp_profile, g_contact_info, alternate_number}'::TEXT[] AS alternate_number,
chp.doc #>> '{chp_profile, g_contact_info, brac_bank}'::TEXT[] AS brac_bank_ac,
chp.doc #>> '{chp_profile, g_position_info, chw_type}'::TEXT[] AS chw_type,
chp.doc #>> '{chp_profile, g_position_info, start_date}'::TEXT[] AS enrolment_date,
lower(chp.doc #>> '{chp_profile, g_position_info, facility_name}'::TEXT[]) AS facility_name,
chp.doc #>> '{chp_profile, g_position_info, facility_level}'::TEXT[] AS facility_level,
chp.doc #>> '{chp_profile, g_position_info, villages_served}'::TEXT[] AS villages_served,
chp.doc #>> '{chp_profile, g_education, education_level}'::TEXT[] AS education_level,
chp.doc #>> '{chp_profile, g_education, institution}'::TEXT[] AS institution,
chp.doc #>> '{chp_profile, g_education, completion_year}'::TEXT[] AS completion_year,
chp.doc #>> '{chp_profile, g_language, speak_english}'::TEXT[] AS speak_english,
chp.doc #>> '{chp_profile, g_language, read_english}'::TEXT[] AS read_english,
chp.doc #>> '{chp_profile, g_language, write_english}'::TEXT[] AS write_english,
chp.doc #>> '{chp_profile, g_language, mother_tongue}'::TEXT[] AS mother_tongue,
chp.doc #>> '{chp_profile, g_language, other_languages}'::TEXT[] AS other_languages,
chp.doc #>> '{chp_profile, g_other_details, incentives}'::TEXT[] AS incentives,
chp.doc #>> '{chp_profile, g_other_details, chp_services}'::TEXT[] AS chp_services
FROM
{{ ref("contactview_chw") }} chw
INNER JOIN {{ ref("contact") }} AS cm ON chw.area_uuid = cm.uuid
INNER JOIN {{ ref("contact") }} AS meta ON meta.uuid = chw.uuid
INNER JOIN {{ ref("contactview_branch") }} AS branch ON chw.branch_uuid = branch.uuid
LEFT JOIN {{ ref("contact") }} AS chp ON chp.uuid = chw.uuid
LEFT JOIN
(
SELECT
contact_id,
string_agg(doc ->>'name', ', ') AS username
FROM {{ ref("contact") }} AS c
WHERE type = 'user-settings' AND contact_id IS NOT NULL
GROUP BY contact_id
) AS user_settings ON user_settings.contact_id = chw.uuid

Loading
Loading