Skip to content

Commit

Permalink
Searching GitHub Files: remove chunking step (#88)
Browse files Browse the repository at this point in the history
  • Loading branch information
sgrebnov authored Feb 10, 2025
1 parent 8a9ddd7 commit 9aa63d1
Showing 1 changed file with 2 additions and 51 deletions.
53 changes: 2 additions & 51 deletions search_github_files/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ Result:
+------------------------------+
| docs/criteria/definitions.md |
| docs/dev/error_handling.md |
| docs/dev/metrics.md |
| docs/dev/style_guide.md |
+------------------------------+
```
Expand Down Expand Up @@ -99,57 +100,7 @@ Result:
}
```

### Additional Configuration - Chunking

1. Update the spicepod `datasets[0].columns[0].embeddings.chunking.enabled: true`.
2. Restart the spiced.
3. Rerun the search

```shell
curl -XPOST http://localhost:8090/v1/search \
-H 'Content-Type: application/json' \
-d "{
\"datasets\": [\"spiceai.files\"],
\"text\": \"errors\",
\"where\": \"not contains(path, 'docs/release_notes')\",
\"additional_columns\": [\"download_url\"],
\"limit\": 2
}"
```

Result:

````json
{
"matches": [
{
"value": "# Spice.ai Extensibility\n\nThis document is an overview of all the interfaces and extension points in Spice.ai.\n\n| Component | Description | Definition Link |\n| --------------- | -----------------------------",
"score": 0.7811596783985292,
"dataset": "spiceai.files",
"primary_key": {
"path": "docs/EXTENSIBILITY.md"
},
"metadata": {
"download_url": "https://raw.githubusercontent.com/spiceai/spiceai/trunk/docs/EXTENSIBILITY.md"
}
},
{
"value": "# Guidelines for error handlling\n\n## Rust Error Traits\n\nIn Rust, the Error trait implements both the Debug and Display traits. All user-facing errors should use the Display trait, not the Debug trait.\n\ni.e.\n\nGood (uses Display trait)\n```rust\nif let Err(user_facing_err) = upload_data(datasource) {\n tracing::error!(\"Unable to upload data to {datasource}: {user_facing_err}\");\n}\n``",
"score": 0.8009972672322939,
"dataset": "spiceai.files",
"primary_key": {
"path": "docs/dev/error_handling.md"
},
"metadata": {
"download_url": "https://raw.githubusercontent.com/spiceai/spiceai/trunk/docs/dev/error_handling.md"
}
}
],
"duration_ms": 48
}
````

4. Rerun the search, and retrieve the full document (as an entry in `additional_coluumns`).
4. Rerun the search, and retrieve the full document by adding `content` column to `additional_columns`).

```shell
curl -XPOST http://localhost:8090/v1/search \
Expand Down

0 comments on commit 9aa63d1

Please sign in to comment.