Skip to content

Commit

Permalink
docs: add sample parameter (#87)
Browse files Browse the repository at this point in the history
  • Loading branch information
shreyashankar authored Oct 9, 2024
1 parent 2e6997d commit 29ace39
Show file tree
Hide file tree
Showing 8 changed files with 9 additions and 4 deletions.
1 change: 1 addition & 0 deletions docs/operators/filter.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ This example demonstrates how the Filter operation distinguishes between high-im
| `num_retries_on_validate_failure` | Number of retry attempts on validation failure | 0 |
| `timeout` | Timeout for each LLM call in seconds | 120 |
| `max_retries_per_timeout` | Maximum number of retries per timeout | 2 |
| `sample` | Number of samples to use for the operation | None |

!!! info "Validation"

Expand Down
1 change: 1 addition & 0 deletions docs/operators/gather.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ The Gather operation includes several key components:
- `content_key`: Indicates the field containing the chunk content
- `peripheral_chunks`: Specifies how to include context from surrounding chunks
- `doc_header_key` (optional): Denotes a field representing extracted headers for each chunk
- `sample` (optional): Number of samples to use for the operation

### Peripheral Chunks Configuration

Expand Down
4 changes: 2 additions & 2 deletions docs/operators/map.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ This example demonstrates how the Map operation can transform long, unstructured
| `model` | The language model to use | Falls back to `default_model` |
| `optimize` | Flag to enable operation optimization | `True` |
| `recursively_optimize` | Flag to enable recursive optimization of operators synthesized as part of rewrite rules | `false` |
| `sample_size` | Number of samples to use for the operation | Processes all data |
| `sample` | Number of samples to use for the operation | Processes all data |
| `tools` | List of tool definitions for LLM use | None |
| `validate` | List of Python expressions to validate the output | None |
| `num_retries_on_validate_failure` | Number of retry attempts on validation failure | 0 |
Expand Down Expand Up @@ -223,5 +223,5 @@ You can use a map operation to act as an LLM no-op, and just drop any key-value
1. **Clear Prompts**: Write clear, specific prompts that guide the LLM to produce the desired output.
2. **Robust Validation**: Use validation to ensure output quality and consistency.
3. **Appropriate Model Selection**: Choose the right model for your task, balancing performance and cost.
4. **Optimize for Scale**: For large datasets, consider using `sample_size` to test your operation before running on the full dataset.
4. **Optimize for Scale**: For large datasets, consider using `sample` to test your operation before running on the full dataset.
5. **Use Tools Wisely**: Leverage tools for complex calculations or operations that the LLM might struggle with. You can write any Python code in the tools, so you can even use tools to call other APIs or search the internet.
2 changes: 1 addition & 1 deletion docs/operators/parallel-map.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Each prompt configuration in the `prompts` list should contain:
| `model` | The default language model to use | Falls back to `default_model` |
| `optimize` | Flag to enable operation optimization | True |
| `recursively_optimize` | Flag to enable recursive optimization | false |
| `sample_size` | Number of samples to use for the operation | Processes all data |
| `sample` | Number of samples to use for the operation | Processes all data |
| `timeout` | Timeout for each LLM call in seconds | 120 |
| `max_retries_per_timeout` | Maximum number of retries per timeout | 2 |

Expand Down
1 change: 1 addition & 0 deletions docs/operators/reduce.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ This Reduce operation processes customer feedback grouped by department:

| Parameter | Description | Default |
| ------------------------- | ------------------------------------------------------------------------------------------------------ | --------------------------- |
| `sample` | Number of samples to use for the operation | None |
| `synthesize_resolve` | If false, won't synthesize a resolve operation between map and reduce | true |
| `model` | The language model to use | Falls back to default_model |
| `input` | Specifies the schema or keys to subselect from each item | All keys from input items |
Expand Down
2 changes: 1 addition & 1 deletion docs/operators/resolve.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ After determining eligible pairs for comparison, the Resolve operation uses a Un
| `limit_comparisons` | Maximum number of comparisons to perform | None |
| `timeout` | Timeout for each LLM call in seconds | 120 |
| `max_retries_per_timeout` | Maximum number of retries per timeout | 2 |

| `sample` | Number of samples to use for the operation | None |
## Best Practices

1. **Anticipate Resolve Needs**: If you anticipate needing a Resolve operation and want to control the prompts, create it in your pipeline and let the optimizer find the appropriate blocking rules and thresholds.
Expand Down
1 change: 1 addition & 0 deletions docs/operators/split.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ Note that chunks will not overlap in content.
| --------------------- | ------------------------------------------------------------------------------- | ----------------------------- |
| `model` | The language model's tokenizer to use | Falls back to `default_model` |
| `num_splits_to_group` | Number of splits to group together into one chunk (only for "delimiter" method) | 1 |
| `sample` | Number of samples to use for the operation | None |

### Splitting Methods

Expand Down
1 change: 1 addition & 0 deletions docs/operators/unnest.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ The Unnest operation is valuable in scenarios where you need to:
| expand_fields | A list of fields to expand from the nested dictionary into the parent dictionary, if unnesting a dict | [] |
| recursive | If true, the unnest operation will be applied recursively to nested arrays | false |
| depth | The maximum depth for recursive unnesting (only applicable if recursive is true) | inf |
| sample | Number of samples to use for the operation | None |

## Output

Expand Down

0 comments on commit 29ace39

Please sign in to comment.