Skip to content

Commit

Permalink
Reorg
Browse files Browse the repository at this point in the history
  • Loading branch information
dehume committed Jan 23, 2025
1 parent 477b56c commit cae45d5
Show file tree
Hide file tree
Showing 32 changed files with 42 additions and 62 deletions.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

6 changes: 0 additions & 6 deletions docs/docs-beta/docs/tutorials/category-one/third-tutorial.md

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

3 changes: 1 addition & 2 deletions docs/docs-beta/docs/tutorials/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
---
title: Dagster tutorials
sidebar_class_name: hidden
---

TK
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ To follow the steps in this guide, you'll need:

First, set up a new Dagster project.

1. Within the Dagster repo, navigate to the project:
1. Clone the [Dagster repo](https://github.com/dagster-io/dagster) and navigate to the project:

```bash
cd examples/dagster-llm-fine-tune
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,3 @@ We will store the accuracy of both models as metadata in the check. Because this
We can also execute this asset check separately from the fine-tuning job if we ever want to compare the accuracy. Running it a few more times, we can see that the accuracy is plotted:

![2048 resolution](/images/tutorial/llm-fine-tuning/model_accuracy_2.png)

## Summary

This should give you a good sense of how to fine-tune a model end to end, from ingesting the files, to creating features, and generating and validating the model.
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ This I/O manager will be attached to the <PyObject section="definitions" module=

## Scraping embeddings

The assets for the documentation scraping will behave similar to the GitHub assets. We do not need to worry about rate limiting in the same way as GitHub, so we can leave out the partition that we had defined for GitHub. Instead, we will just include half a second sleep between scraping pages. But like the GitHub assets, our ingestion asset will return a collection of `Documents` that will be handled by the I/O manager. This asset will also include the <PyObject section="assets" module="dagster" object="AutomationCondition" /> to update data on the same cadence as our GitHub source.
The assets for the documentation scraping will behave similar to the GitHub assets. We will not partition by date like Github, so we can leave out that out of the asset. But like the GitHub assets, our ingestion asset will return a collection of `Documents` that will be handled by the I/O manager. This asset will also include the <PyObject section="assets" module="dagster" object="AutomationCondition" /> to update data on the same cadence as our GitHub source.

Check failure on line 35 in docs/docs-beta/docs/tutorials/rag/embeddings.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Vale.Terms] Use 'GitHub' instead of 'Github'. Raw Output: {"message": "[Vale.Terms] Use 'GitHub' instead of 'Github'.", "location": {"path": "docs/docs-beta/docs/tutorials/rag/embeddings.md", "range": {"start": {"line": 35, "column": 120}}}, "severity": "ERROR"}

The asset that generates the embeddings with the documentation site will need one additional change. Because the content of the documentation pages is so large, we need to split data into chunks. The `split_text` function ensures that we split the text into equal length chunks. We also want to keep similar chunks together and associated with the page they were on so we will hash the index of the URL to ensure data stays together. correctly Once the data is chunked, it can be batched and sent to Pinecone:

Expand Down
File renamed without changes.
34 changes: 28 additions & 6 deletions docs/docs-beta/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -124,23 +124,45 @@ const sidebars: SidebarsConfig = {
'tutorials/index',
{
type: 'category',
label: 'Category one',
collapsed: false,
label: 'RAG',
collapsed: true,
items: [
{
type: 'autogenerated',
dirName: 'tutorials/category-one',
dirName: 'tutorials/rag',
},
],
},
{
type: 'category',
label: 'Category two',
collapsed: false,
label: 'Fine-tuning',
collapsed: true,
items: [
{
type: 'autogenerated',
dirName: 'tutorials/llm-fine-tuning',
},
],
},
{
type: 'category',
label: 'Prompt engineering',
collapsed: true,
items: [
{
type: 'autogenerated',
dirName: 'tutorials/prompt-engineering',
},
],
},
{
type: 'category',
label: 'Modal',
collapsed: true,
items: [
{
type: 'autogenerated',
dirName: 'tutorials/category-two',
dirName: 'tutorials/modal',
},
],
},
Expand Down
11 changes: 11 additions & 0 deletions docs/docs-beta/src/code-examples-content.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit cae45d5

Please sign in to comment.