From 9aa63d15b83d82b8bbdc1350a6dcdf7a255286c0 Mon Sep 17 00:00:00 2001 From: Sergei Grebnov Date: Mon, 10 Feb 2025 15:10:06 -0800 Subject: [PATCH] Searching GitHub Files: remove chunking step (#88) --- search_github_files/README.md | 53 ++--------------------------------- 1 file changed, 2 insertions(+), 51 deletions(-) diff --git a/search_github_files/README.md b/search_github_files/README.md index 3856f5b..e450c10 100644 --- a/search_github_files/README.md +++ b/search_github_files/README.md @@ -45,6 +45,7 @@ Result: +------------------------------+ | docs/criteria/definitions.md | | docs/dev/error_handling.md | +| docs/dev/metrics.md | | docs/dev/style_guide.md | +------------------------------+ ``` @@ -99,57 +100,7 @@ Result: } ``` -### Additional Configuration - Chunking - -1. Update the spicepod `datasets[0].columns[0].embeddings.chunking.enabled: true`. -2. Restart the spiced. -3. Rerun the search - -```shell -curl -XPOST http://localhost:8090/v1/search \ --H 'Content-Type: application/json' \ --d "{ - \"datasets\": [\"spiceai.files\"], - \"text\": \"errors\", - \"where\": \"not contains(path, 'docs/release_notes')\", - \"additional_columns\": [\"download_url\"], - \"limit\": 2 -}" -``` - -Result: - -````json -{ - "matches": [ - { - "value": "# Spice.ai Extensibility\n\nThis document is an overview of all the interfaces and extension points in Spice.ai.\n\n| Component | Description | Definition Link |\n| --------------- | -----------------------------", - "score": 0.7811596783985292, - "dataset": "spiceai.files", - "primary_key": { - "path": "docs/EXTENSIBILITY.md" - }, - "metadata": { - "download_url": "https://raw.githubusercontent.com/spiceai/spiceai/trunk/docs/EXTENSIBILITY.md" - } - }, - { - "value": "# Guidelines for error handlling\n\n## Rust Error Traits\n\nIn Rust, the Error trait implements both the Debug and Display traits. All user-facing errors should use the Display trait, not the Debug trait.\n\ni.e.\n\nGood (uses Display trait)\n```rust\nif let Err(user_facing_err) = upload_data(datasource) {\n tracing::error!(\"Unable to upload data to {datasource}: {user_facing_err}\");\n}\n``", - "score": 0.8009972672322939, - "dataset": "spiceai.files", - "primary_key": { - "path": "docs/dev/error_handling.md" - }, - "metadata": { - "download_url": "https://raw.githubusercontent.com/spiceai/spiceai/trunk/docs/dev/error_handling.md" - } - } - ], - "duration_ms": 48 -} -```` - -4. Rerun the search, and retrieve the full document (as an entry in `additional_coluumns`). +4. Rerun the search, and retrieve the full document by adding `content` column to `additional_columns`). ```shell curl -XPOST http://localhost:8090/v1/search \