-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathunified_rag.yml
32 lines (31 loc) · 1.81 KB
/
unified_rag.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
version: 2
models:
- name: rag__unified_document
description: Each record represents a chunk of text prepared for semantic-search and additional fields for use in LLM workflows.
columns:
- name: unique_id
description: Unique identifier of the table represented as a combination of document_id, platform, chunk_index, and source_relation fields.
tests:
- unique
- not_null
- name: document_id
description: Identifier of the base object which the unstructured data is associated (ie. Zendesk ticket_id, Jira issue_id, and HubSpot deal_id).
- name: title
description: Title of the base object which the unstructured data is associated to. This will be the Zendesk ticket subject, Jira issue summary, or HubSpot deal name if found (otherwise the email_subject or the string `engagement_note`).
- name: platform
- name: url_reference
description: URL reference to the respective base object.
- name: platform
description: Record identifying the respective upstream connector type (ie. HubSpot, Jira, Zendesk).
- name: most_recent_chunk_update
description: Timestamp indicating the most recent update to the overall chunk.
- name: update_date
description: Truncated date of the most_recent_chunk_update field used for incremental and partition logic.
- name: chunk_index
description: The index of the chunk associated with the `document_id`.
- name: chunk_tokens_approximate
description: Approximate number of tokens for the chunk, assuming 4 characters per token.
- name: chunk
description: The text of the chunk.
- name: source_relation
description: The source of the record if the unioning functionality is being used. If it is not this field will be empty.