Skip to content

Commit

Permalink
Updating new schema tweaks + a new diagram
Browse files Browse the repository at this point in the history
  • Loading branch information
RasonJ committed Apr 20, 2024
1 parent 168649b commit 5f61fcd
Show file tree
Hide file tree
Showing 12 changed files with 150 additions and 55 deletions.
6 changes: 3 additions & 3 deletions documentation/documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ docker compose -f docker-compose-cluster.yaml up
Initialize the `awesome` UBI store:

```
curl -X PUT "http://localhost:9200/_plugins/ubi/awesome?index=ecommerce&id_field=id"
curl -X PUT "http://localhost:9200/_plugins/ubi/awesome?index=ecommerce&object_id=id"
```

Send an event to the `awesome` store:
Expand Down Expand Up @@ -88,7 +88,7 @@ The current event mappings file can be found [here](https://github.com/o19s/open
- `event_attributes.object` - contains an associated JSONified data object (i.e. books, products, user info, etc) if there are any
- `event_attributes.object.object_id` - points to a unique, internal, id representing and instance of that object
- `event_attributes.object.key_value` - points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.).
**This field value should match the value in for the object's value in the `id_field` [below](#id_field) from the search store**
**This field value should match the value in for the object's value in the `object_id` [below](#object_id) from the search store**
It is possible that the `object_id` and `key_value` match if the same id is used both internally for indexing and externally for the users.
- `event_attributes.object.object_type` - indicates the type/class of object
- `event_attributes.object.description` - optional description of the object
Expand Down Expand Up @@ -122,7 +122,7 @@ The plugin exposes a REST API for managing UBI stores and persisting events.

| Method | Endpoint | Purpose |
|--------|-----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `PUT` | `/_plugins/ubi/{store}?index={index}&id_field={id_field}` | <p id="id_field">Initialize a new UBI store for the given index. The `id_field` is optional and allows for providing the name of a field in the `index`'s schema to be used as the unique result/item ID for each search result. If not provided, the `_id` field is used. </p>|
| `PUT` | `/_plugins/ubi/{store}?index={index}&object_id={object_id}` | <p id="object_id">Initialize a new UBI store for the given index. The `object_id` is optional and allows for providing the name of a field in the `index`'s schema to be used as the unique result/item ID for each search result. If not provided, the `_id` field is used. </p>|
| `DELETE` | `/_plugins/ubi/{store}` | Delete a UBI store |
| `GET` | `/_plugins/ubi` | Get a list of all UBI stores |
| `POST` | `/_plugins/ubi/{store}` | Index an event into the UBI store |
Expand Down
6 changes: 3 additions & 3 deletions documentation/queries/sql_queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Although it's trivial on the server side to find queries with no results, we can
select
count(0)
from .ubi_log_queries
where query_response_hit_ids is null
where query_response_objects_ids is null
order by user_id
```

Expand All @@ -18,7 +18,7 @@ order by user_id
select
count(0)
from .ubi_log_events
where action_name='on_search' and event_attributes.data.data_detail.query_data.query_response_hit_ids is null
where action_name='on_search' and event_attributes.data.data_detail.query_data.query_response_objects_ids is null
order by timestamp
```

Expand Down Expand Up @@ -113,7 +113,7 @@ where query_id ='1065c70f-d46a-442f-8ce4-0b5e7a71a892'
order by timestamp
```
(In this generated data, the `query` field is plain text; however in the real implementation the query will be in the internal DSL of the query and parameters.)
query_response_id|query_id|user_id|query|query_response_hit_ids|session_id|timestamp
query_response_id|query_id|user_id|query|query_response_objects_ids|session_id|timestamp
---|---|---|---|---|---|---
1065c70f-d46a-442f-8ce4-0b5e7a71a892|1065c70f-d46a-442f-8ce4-0b5e7a71a892|155_7e3471ff-14c8-45cb-bc49-83a056c37192|Blanditiis quo sint repudiandae a sit.|8659955|fa6e3b1c-3212-44d2-b16b-690b4aeddbba_1975|2027-04-17 10:16:45

Expand Down
155 changes: 125 additions & 30 deletions documentation/schemas.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,60 @@

# Key UBI concepts
## Ubi Roles
- **User Behavior Insights** module: once activated, is in charge of indexing a user's queries and results in the **query store** with a unique [`query_id`](#query_id), and passing that `query_id` back to the search client.

- **Search Client**: in charge of searching and recieving the `query_id` from **User Behavior Insights**. This `query_id` is then passed to the **Ubi Logging Client**

- **Ubi Logging Client**: is in charge of indexing user events, such as onClick, in the **event store** along with the `query_id` that links to the underlying, technical query DSL and the results' `object_id`'s.

*Note:* We break out the roles of "search" and "Ubi logging" here, but many implementations will likely use the same OpenSearch client instance for both roles of searching and index writing.

```mermaid
%%{init: {
"flowchart": {"htmlLabels": false},
}
}%%
graph TB
style OS stroke-width:2px, stroke:#0A1CCF, fill:#62affb, opacity:.5
subgraph OS[OpenSearch Cluster fa:fa-database]
E[(&emsp;Ubi Events&emsp;)]
Docs[(Document Index)] --3) DSL & object_id's--> Q[(&emsp;Ubi Queries&emsp;)];
Q -."4) query_id".-> Docs ;
end
style *client-side* stroke-width:2px, stroke:#EC6363
subgraph "`*client-side*`"
style User stroke-width:4px, stroke:#EC636
User["`*User*`" fa:fa-user]
App
Search
U
style App fill:#EC6363,opacity:.5
subgraph App[UserApp fa:fa-store]
Search(&emsp;Search Client&emsp;)
U(&emsp;Ubi Client&emsp;)
end
User--1) raw search string-->Search;
end
Search--2) search string-->Docs
Docs -. 6) query_id & objects...->Search ;
Search --results--> User
Search-.7) query_id.->U;
User--8) selects
object_id:123-->U;
U-."9) index event:{query_id, onClick, object_id:123}".->E;
linkStyle 3,0,5 stroke-width:2px,fill:none,stroke:#0A1CCF
linkStyle 1,4,6,8 stroke-width:2px,fill:none,stroke:red
```



Although the named fields below follow a schema which lends to easier analytics, the schema is dynamic and allows for users to add new dynamic fields where there is need.

Expand All @@ -15,7 +70,6 @@ When UBI is turned on, a *search client* will get a `query_id` back from OpenSea
information on what part of the application the user is interacting with,
and [`event_attributes.object`](#object), which contains identifying information of the object returned from the query that the user interacts with (i.e.: a book, a product, a post, etc..).

# TODO: `key_field` rename?
The `object` structure has two ways to refer to the object:
- `event_attributes.object.object_id` is the unique id that OpenSearch uses internally to index the object, think the `_id` field in the indices.
- `event_attributes.object.catalog_id` is the id that a user could look up the object in a *catalog*
Expand All @@ -31,55 +85,87 @@ and `event_attributes.object` is referring to the precise query result that the
The current event mappings file can be found [here](../src/main/resources/events-mapping.json).

**Primary fields include:**
- `application` <p id="application">
- `application`
<p id="application">
&ensp; (size 100) - name of application tracking UBI events
- `action_name` <p id="action_name">
- `action_name`
<p id="action_name">
&ensp; (size 100) - any name you want to call your event
- `timestamp`: \
- `timestamp`:

&ensp; Unix epoch time. <s>If not set , will be set by the plugin when the event is received</s>
- `query_id` <p id="query_id">
- `query_id`
<p id="query_id">
&ensp; (size 100) - ID for some query. Either the client provides this, or the `query_id` is generated by the server.
- `user_id`. `session_id`, `source_id` <p id="user_id">
&ensp; (size 100) - are id's largely at the calling client's discretion for tracking users, sessions and sources (i.e. pages) of the event.
The `user_id` must be consistent in both the `query` and `event` stores.
- `message_type` \
- `message_type`

&ensp; (size 100) - originally thought of in terms of ERROR, INFO, WARN, but could be anything useful such as `QUERY` or `CONVERSION`.
Can be used to group `action_name` together in logical bins.

- `message` \
- `message`

&ensp; (size 256) - optional text for the log entry

**Other attribute fields & data objects** <p id="event_attributes">
- `event_attributes.object` \
- `event_attributes.object`

&ensp; represents the search result object (i.e. books, products, user info, etc) if there are any
- `event_attributes.object.object_id` - points to a unique, internal, id representing and instance of that object
- `event_attributes.object.catalog_id` \

- `event_attributes.object.internal_id` - points to a unique, internal, id representing and instance of that object

- `event_attributes.object.object_id`
<p id="object_id">
&ensp; points to a unique, external key, matching the item that the user searched for, found and acted upon (i.e. sku, isbn, ean, etc.).
**This field value should match the value in for the object's value in the `catalog_field` [below](#catalog_field) from the search store**
It is possible that the `object_id` and `key_value` match if the same id is used both internally for indexing and externally for the users.
- `event_attributes.object.object_type` \
**This field value should match the value in for the object's value in the `Object_id` [below](#object_id) from the search store**
It is possible that the `object_id` and `internal_id` match if the same id is used both internally for indexing and externally for the users.

- `event_attributes.object.object_type`

&ensp; indicates the type/class of object
- `event_attributes.object.description` \

- `event_attributes.object.description`

&ensp; optional description of the object
- `event_attributes.object.transaction_id` \

- `event_attributes.object.transaction_id`

&ensp; optionally points to a unique id representing a successful transaction
- `event_attributes.object.to_user_id` \

- `event_attributes.object.to_user_id`

&ensp; optionally points to another user, if they are the recipient of this object, perhaps as a gift, from the user's `user_id`
- `event_attributes.object.object_detail` \
- `event_attributes.object.object_detail`

&ensp; optional text for further data object details
- `event_attributes.object.object_detail.json` \
- `event_attributes.object.object_detail.json`

&ensp; if the user has a json object representing what was acted upon, it can be stored here; however, note that that could lead to index bloat if the json objects are large.
- `event_attributes.position` \

- `event_attributes.position`

&ensp; nested object to track user events to the location of the event origins
- `event_attributes.position.ordinal` \
- `event_attributes.position.ordinal`

&ensp; tracks the nth item within a list that a user could select, click
- `event_attributes.position.{x,y}` \

- `event_attributes.position.{x,y}`

&ensp; tracks x and y values, that the client defines
- `event_attributes.position.page_depth` \

- `event_attributes.position.page_depth`

&ensp; tracks page depth
- `event_attributes.position.scroll_depth` \

- `event_attributes.position.scroll_depth`

&ensp; tracks scroll depth
- `event_attributes.position.trail` \

- `event_attributes.position.trail`

&ensp; text field for tracking the path/trail that a user took to get to this location

* Note the developers can add optional, dynamic fields like `user_name`, `email`, `price` per individual use-cases.
Expand All @@ -88,13 +174,22 @@ The current event mappings file can be found [here](../src/main/resources/events

The current query mappings file can be found [here](../src/main/resources/queries-mapping.json).

- `timestamp` \
- `timestamp`

&ensp; A unix timestamp of when the query was received
- `query_id` \

- `query_id`

&ensp; A unique ID of the query provided by the client or generated automatically. The same query text issued multiple times would generate different `query_id`.
- `query_response_objects_ids` \
&ensp; This is an array of the `object_id`. The size
- `user_id` \

- `query_response_objects_ids`

&ensp; This is an array of the `object_id`'s.

- `user_id`

&ensp; A user ID provided by the client
- `session_id` \

- `session_id`

&ensp; An optional session ID provided by the client
2 changes: 1 addition & 1 deletion src/main/java/com/o19s/ubi/UserBehaviorInsightsPlugin.java
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ public List<Setting<?>> getSettings() {

settings.add(Setting.intSetting(SettingsConstants.VERSION_SETTING, 1, -1, Integer.MAX_VALUE, Setting.Property.IndexScope));
settings.add(Setting.simpleString(SettingsConstants.INDEX, "", Setting.Property.IndexScope));
settings.add(Setting.simpleString(SettingsConstants.ID_FIELD, "", Setting.Property.IndexScope));
settings.add(Setting.simpleString(SettingsConstants.object_id, "", Setting.Property.IndexScope));

return settings;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,9 +103,9 @@ public void onResponse(Response response) {
if(!"".equals(storeName)) {

final String index = getStoreSettings(storeName, SettingsConstants.INDEX);
final String idField = getStoreSettings(storeName, SettingsConstants.ID_FIELD);
final String idField = getStoreSettings(storeName, SettingsConstants.object_id);

LOGGER.debug("Using id_field [{}] of index [{}] for UBI query.", idField, index);
LOGGER.debug("Using object_id [{}] of index [{}] for UBI query.", idField, index);

// Only consider this search if the index being searched matches the store's index setting.
if (Arrays.asList(searchRequest.indices()).contains(index)) {
Expand All @@ -124,7 +124,7 @@ public void onResponse(Response response) {

if (idField == null || "".equals(idField) || idField.equals("null")) {

// Use the _id since there is no id_field setting for this index.
// Use the _id since there is no object_id setting for this index.
queryResponseHitIds.add(String.valueOf(hit.docId()));

} else {
Expand Down Expand Up @@ -240,7 +240,7 @@ private void persistQuery(final String storeName, final QueryRequest queryReques
source.put("query_id", queryRequest.getQueryId());
source.put("query", queryRequest.getQuery());
source.put("query_response_id", queryResponse.getQueryResponseId());
source.put("query_response_hit_ids", queryResponse.getQueryResponseHitIds());
source.put("query_response_objects_ids", queryResponse.getQueryResponseHitIds());
source.put("user_id", queryRequest.getUserId());
source.put("session_id", queryRequest.getSessionId());

Expand Down
2 changes: 1 addition & 1 deletion src/main/java/com/o19s/ubi/data/OpenSearchDataManager.java
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ private Map<String, Object> buildQueryRequestMap(final QueryRequest queryRequest
source.put("query_id", queryRequest.getQueryId());
source.put("query", queryRequest.getQuery());
source.put("query_response_id", queryRequest.getQueryResponse().getQueryResponseId());
source.put("query_response_hit_ids", queryRequest.getQueryResponse().getQueryResponseHitIds());
source.put("query_response_objects_ids", queryRequest.getQueryResponse().getQueryResponseHitIds());
source.put("user_id", queryRequest.getUserId());
source.put("session_id", queryRequest.getSessionId());

Expand Down
2 changes: 1 addition & 1 deletion src/main/java/com/o19s/ubi/model/SettingsConstants.java
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,6 @@ public class SettingsConstants {
/**
* The field in an index's mapping that will be used as the unique identifier for a query result item.
*/
public static final String ID_FIELD = "index.ubi.id_field";
public static final String object_id = "index.ubi.object_id";

}
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ protected RestChannelConsumer prepareRequest(RestRequest restRequest, NodeClient

final String storeName = restRequest.param("store");
final String index = restRequest.param("index");
final String idField = restRequest.param("id_field");
final String idField = restRequest.param("object_id");

LOGGER.info("Received PUT for store {}", storeName);

Expand Down Expand Up @@ -191,7 +191,7 @@ private RestChannelConsumer create(final NodeClient nodeClient, final String sto
.put(IndexMetadata.INDEX_AUTO_EXPAND_REPLICAS_SETTING.getKey(), "0-2")
.put(IndexMetadata.SETTING_PRIORITY, Integer.MAX_VALUE)
.put(SettingsConstants.INDEX, index)
.put(SettingsConstants.ID_FIELD, idField)
.put(SettingsConstants.object_id, idField)
.put(SettingsConstants.VERSION_SETTING, VERSION)
.build();

Expand Down
4 changes: 2 additions & 2 deletions src/main/resources/events-mapping.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@
},
"object": {
"properties": {
"catalog_id": { "type": "keyword" },
"object_id": { "type": "keyword", "ignore_above": 256 },
"internal_id": { "type": "keyword", "ignore_above": 256 },
"object_id": { "type": "keyword" },
"object_type": { "type": "keyword", "ignore_above": 100 },
"transaction_id": { "type": "keyword", "ignore_above": 100 },
"name": { "type": "keyword", "ignore_above": 256 },
Expand Down
2 changes: 1 addition & 1 deletion src/main/resources/queries-mapping.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"type": "text"
},
"query_response_id": { "type": "keyword", "ignore_above": 100 },
"query_response_hit_ids": { "type": "keyword" },
"query_response_objects_ids": { "type": "keyword" },
"user_id": { "type": "keyword", "ignore_above": 100 },
"session_id": { "type": "keyword", "ignore_above": 100 }
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
"type": "string",
"description": "The name of the index being searched"
},
"id_field": {
"object_id": {
"required": false,
"type": "string",
"description": "The name of the field to use for the doc ID field"
Expand Down
Loading

0 comments on commit 5f61fcd

Please sign in to comment.