Skip to content

Commit

Permalink
Merge pull request #40 from o19s/esci-data
Browse files Browse the repository at this point in the history
Adding ESCI for UBI
  • Loading branch information
jzonthemtn authored Nov 11, 2024
2 parents 216ef07 + 8908932 commit df2f49b
Show file tree
Hide file tree
Showing 6 changed files with 26 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

This repository contains the search quality evaluation framework as described in the [RFC](https://github.com/opensearch-project/OpenSearch/issues/15354).

Note: Some of the data files in this repository are tracked by `git lfs`.

## Repository Contents

* `data` - The data directory contains scripts for creating random UBI queries and events for purposes of development and testing.
Expand Down
1 change: 1 addition & 0 deletions data/esci/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ubi_queries_events_1000.ndjson.bz2 filter=lfs diff=lfs merge=lfs -text
15 changes: 15 additions & 0 deletions data/esci/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# ESCI Data in UBI Format

This directory contains ESCI data in the UBI format. Created using https://github.com/opensearch-project/user-behavior-insights/tree/main/ubi-data-generator.

https://github.com/amazon-science/esci-data

```
@article{reddy2022shopping,
title={Shopping Queries Dataset: A Large-Scale {ESCI} Benchmark for Improving Product Search},
author={Chandan K. Reddy and Lluís Màrquez and Fran Valero and Nikhil Rao and Hugo Zaragoza and Sambaran Bandyopadhyay and Arnab Biswas and Anlu Xing and Karthik Subbian},
year={2022},
eprint={2206.06588},
archivePrefix={arXiv}
}
```
3 changes: 3 additions & 0 deletions data/esci/index.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/bin/bash -e

curl -X POST "http://localhost:9200/_bulk?pretty" -H "Content-Type: application/x-ndjson" --data-binary @ubi_queries_events_1000.ndjson
3 changes: 3 additions & 0 deletions data/esci/ubi_queries_events_1000.ndjson.bz2
Git LFS file not shown
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ services:
plugins.security.disabled: "true"
logger.level: info
OPENSEARCH_INITIAL_ADMIN_PASSWORD: SuperSecretPassword_123
http.max_content_length: 500mb
OPENSEARCH_JAVA_OPTS: "-Xms8192m -Xmx8192m"
ulimits:
memlock:
soft: -1
Expand Down

0 comments on commit df2f49b

Please sign in to comment.