Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ESCI for UBI #40

Merged
merged 3 commits into from
Nov 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

This repository contains the search quality evaluation framework as described in the [RFC](https://github.com/opensearch-project/OpenSearch/issues/15354).

Note: Some of the data files in this repository are tracked by `git lfs`.

## Repository Contents

* `data` - The data directory contains scripts for creating random UBI queries and events for purposes of development and testing.
Expand Down
1 change: 1 addition & 0 deletions data/esci/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ubi_queries_events_1000.ndjson.bz2 filter=lfs diff=lfs merge=lfs -text
15 changes: 15 additions & 0 deletions data/esci/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# ESCI Data in UBI Format

This directory contains ESCI data in the UBI format. Created using https://github.com/opensearch-project/user-behavior-insights/tree/main/ubi-data-generator.

https://github.com/amazon-science/esci-data

```
@article{reddy2022shopping,
title={Shopping Queries Dataset: A Large-Scale {ESCI} Benchmark for Improving Product Search},
author={Chandan K. Reddy and Lluís Màrquez and Fran Valero and Nikhil Rao and Hugo Zaragoza and Sambaran Bandyopadhyay and Arnab Biswas and Anlu Xing and Karthik Subbian},
year={2022},
eprint={2206.06588},
archivePrefix={arXiv}
}
```
3 changes: 3 additions & 0 deletions data/esci/index.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash -e

curl -X POST "http://localhost:9200/_bulk?pretty" -H "Content-Type: application/x-ndjson" --data-binary @ubi_queries_events_1000.ndjson
3 changes: 3 additions & 0 deletions data/esci/ubi_queries_events_1000.ndjson.bz2
Git LFS file not shown
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ services:
plugins.security.disabled: "true"
logger.level: info
OPENSEARCH_INITIAL_ADMIN_PASSWORD: SuperSecretPassword_123
http.max_content_length: 500mb
OPENSEARCH_JAVA_OPTS: "-Xms8192m -Xmx8192m"
ulimits:
memlock:
soft: -1
Expand Down
Loading