Skip to content

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
update migrated content
Browse files Browse the repository at this point in the history
abvthecity committed May 9, 2024
1 parent a727ca8 commit 7fd61d0
Showing 73 changed files with 1,680 additions and 2,083 deletions.
Original file line number Diff line number Diff line change
@@ -6,7 +6,7 @@ title: "Create a container and endpoint using the CLI"
## Overview

The `octoai` command-line interface (CLI) makes it easy for you create a custom endpoint for OctoAI Compute Service. The `octoai` CLI guides you through the process of creating an initial valid Python application with an example model, building it, and deploying it.
The `octoai` command-line interface (CLI) makes it easy for you create a custom endpoint for OctoAI Compute Service. The `octoai` CLI guides you through the process of creating an initial valid Python application with an example model, building it, and deploying it.

The `octoai` CLI includes some endpoint scaffolds with example models that you can deploy and try out right away. After you complete that initial workflow, you can follow the instructions in this document to modify the initial application to use the model or code of your choice on OctoAI Compute Service.

@@ -290,14 +290,14 @@ Congratulations! You have now created your first endpoint on OctoAI Compute Serv
The client code in `test_request.py` looks like this:

```Python Python
from octoai.client import Client
from octoai.client import OctoAI

inputs = {"prompt": "Hello world!"}

def main(endpoint):
"""Run inference against the endpoint."""
# create an OctoAI client
client = Client()
client = OctoAI()

# perform inference
response = client.infer(endpoint_url=f"{endpoint}/infer", inputs=inputs)
@@ -319,7 +319,7 @@ For example, using the CLI the logs look similar to this:

```Text Text
$ octoai logs --name hello-world
19s hello-world-<hash> octoai server
19s hello-world-<hash> octoai server
18s hello-world-<hash> Using service in service.HelloService.
18s hello-world-<hash> run
18s hello-world-<hash> Setting up.
@@ -346,7 +346,7 @@ This section shows you how to modify the `hello-world` endpoint implementation t

#### Step 1: Add Python Requirements

The `hello-world` endpoint implementation contains an empty `requirements.txt` file.
The `hello-world` endpoint implementation contains an empty `requirements.txt` file.

Edit this file and add the corresponding requirements for Flan-T5:

@@ -391,12 +391,12 @@ In this case, our model of interest is of the same modality as the `hello-world`

#### Step 3: Modify Sample Client Code

The `hello-world` endpoint implementation has a sample `test_request.py` that makes a request to your endpoint. In this case, our model of interest is of the same modality as the `hello-world` example, so you can just change the prompt.
The `hello-world` endpoint implementation has a sample `test_request.py` that makes a request to your endpoint. In this case, our model of interest is of the same modality as the `hello-world` example, so you can just change the prompt.

```Python Python
import argparse

from octoai.client import Client
from octoai.client import OctoAI

inputs = {"prompt": "What country is California in?"}
...
@@ -476,14 +476,13 @@ The `octoai.types` package contains helpful classes if you are customizing your

```Python Python
from octoai.service import Service
from octoai.types import Image

class MyService(Service):

def infer(self, image: Image) -> str:
image_pil = image.to_pil()
output = self.model(image_pil)

return output[0]
```

@@ -502,7 +501,7 @@ class MyService(Service):
def infer(self, audio: Audio) -> str:
audio_array, sampling_rate = audio.to_numpy()
output = self.model(audio_array, sampling_rate)

return output[0]
```

@@ -521,7 +520,7 @@ class MyService(Service):
def infer(self, video: Video) -> str:
video_frames = video.to_numpy()
output = self.model(video_frames)

return output[0]
```

@@ -598,19 +597,19 @@ The OctoAI SDK enables you to define additional routes in your endpoint. The fol
from octoai.service import Service, path

class MultipleRoutesService(Service):

def setup(self):
print("Setting up MultipleRoutesService")

# the infer() endpoint is always required
def infer(self, prompt: str) -> str:
return prompt

# this method is exposed as /new-route
@path("/new-route")
def my_new_route(self, input_text: str):
return input_text

# this method is exposed as /my-new-route2
def my_new_route2(self, input_text: str):
return input_text
@@ -642,7 +641,7 @@ endpoint_config:
env_overrides: # Environment variables to set in each replica (optional)
key1: value1
key2: value2
registry:
registry:
host: docker.io # Registry hostname (required)
path: username/yolov8 # Registry path to image (required)
tag: v1 # Tag (optional; not recommended to be set. Defaults to a generated UUID.)
Original file line number Diff line number Diff line change
@@ -2,7 +2,7 @@
title: "Pricing & billing"
description: Only pay for what you use.
---
At OctoAI you only pay for what you use. Upon sign up you will receive **$10 of free credit** in your account. This credit can be used until the end of your first month after sign up. That is equivalent of:
At OctoAI you only pay for what you use. Upon sign up you will receive **$10 of free credit** in your account, and these credits don't expire. That is equivalent of:
- Over 500,000 words with the largest Llama 2 70B model, and over a million words with the new Mixtral 8x7B model
- 1,000 SDXL default images
- 2+ hours of compute on our large tier hardware
@@ -84,16 +84,16 @@ Here are a few examples to illustrate how this works to assist you in applying t

We offer simple, competitive token-based pricing for text gen endpoints, with prices varying depending on parameter size and quantization level:

| Model | Per M Tokens (May 1, 2024) | Input Price | Output Price |
| ---------------------- | ---------------------------- | ----------------------- | -------------------- |
| Mixtral-8x7B models | $0.45 | $0.30 / 1M tokens | $0.50 / 1M tokens |
| Mixtral-8x22B models | $1.20 | $1.20 / 1M tokens | $1.20 / 1M tokens |
| 7B and 8B models (Mistral, Code Llama, Llama 2, Llama Guard, Llama 3) | $0.15 | $0.10 / 1M tokens | $0.25 / 1M tokens |
| 13B models (Llama 2 & Code Llama) | $0.20 | $0.20 / 1M tokens | $0.50 / 1M tokens |
| 32B models (Qwen) | $0.75 | $0.50 / 1M tokens | $1.00 / 1M tokens |
| 34B models (Code Llama)| $0.75 | $0.50 / 1M tokens | $1.00 / 1M tokens |
| 70B models (Llama 2, Llama 3) | $0.90 | $0.60 / 1M tokens | $1.90 / 1M tokens |
| GTE-large | $0.05 | | $0.05 / 1M tokens |
| Model Sizes | Per M Tokens |
| ------------------------- | ---------------------------- |
| Mixtral-8x7B models | $0.45 |
| Mixtral-8x22B models | $1.20 |
| 7B and 8B models | $0.15 |
| 13B models | $0.20 |
| 32B models | $0.75 |
| 34B models | $0.75 |
| 70B models | $0.90 |
| GTE-large | $0.05 |

If you would like to explore pricing for other models, quantization levels, or specific fine tunes, [contact us](https://octo.ai/contact-us/).

Original file line number Diff line number Diff line change
@@ -60,6 +60,8 @@ one of the greatest painters of all time, and his inventive and innovative works
influence artists and thinkers to this day. Some of his most famous works include the Mona Lisa,
The Last Supper, and Vitruvian Man.
```

### Learn with our demo apps
Get started today by following along with one of our demo apps:
- [DocTalk](https://octo.ai/demos/doctalk/)
- [Q&A app on a custom PDF](https://octo.ai/demos/q-a-on-custom-pdf/)
Original file line number Diff line number Diff line change
@@ -8,23 +8,65 @@ LlamaIndex strives to help manage the interactions between your language modles
If you are building your application and using LlamaIndex you benefit from the vast ecosystem of integrations, and top LLMs amd Embeddings models hosted by OctoAI.


## Using OctoAI's LLMs and LangChain
## Using OctoAI's LLMs and LlamaIndex
Get started reviewing more about [LlamaIndex](https://docs.llamaindex.ai/en/stable/), and [signing up for a free OctoAI account](https://identity.octoml.ai/oauth/account/sign-up?redirectUrl=https%3A%2F%2Foctoai.cloud%2Foauth%2Fcallback).
If you want to utilize models offered by OctoAI through LlamaIndex review the following code snippet to see an example of using the OctoAI embeddings with the `OctoAIEmbedding` class:

LlamaIndex has both Python and TypScript libraries, and OctoAI is available in the Python SDK.

To use OctoAI LLM endpoints with LlamaIndex start with the code below using Llama 3 8B as the LLM.

```python
from os import environ
from llama_index.llms.octoai import OctoAI

OCTOAI_API_KEY = environ.get("OCTOAI_TOKEN")

octoai = OctoAI(model="meta-llama-3-8b-instruct", token=OCTOAI_API_KEY)

# Using complete
response = octoai.complete("Octopi can not play chess because...")
print(response)

print("\n=====================\n")

# Using the chat interface
from llama_index.core.llms import ChatMessage

messages = [
ChatMessage(
role="system",
content="Below is an instruction that describes a task. Write a response that appropriately completes the request.",
),
ChatMessage(role="user", content="Write a short blog about Seattle"),
]
response = octoai.chat(messages)
print(response)
```
To use OctoAI Embedding endpoints with llamaindex
you can use the code below to get started. We’re using GTE large in the example below (default model).

```python
from os import environ
from llama_index.embeddings.octoai import OctoAIEmbedding

OCTOAI_API_KEY = environ.get("OCTOAI_TOKEN")
embed_model = OctoAIEmbedding(api_key=OCTOAI_API_KEY)

# Single embedding request
embeddings = embed_model.get_text_embedding("Once upon a time in Seattle.")
assert len(embeddings) == 1024
```
See another example using OctoAI's Llama 2 13B model:
print(embeddings[:10])

```python
from llama_index.llms.octoai import OctoAI
octoai = OctoAI(model="llama-2-13b-chat", token=OCTOAI_API_KEY)
response = octoai.complete("Octopi can not play chess because...")

# Batch embedding request
texts = [
"Once upon a time in Seattle.",
"This is a test.",
"Hello, world!"
]
embeddings = embed_model.get_text_embedding_batch(texts)
assert len(embeddings) == 3
print(embeddings[0][:10])
```

If you are using LlamaIndex you can easily switch model provider, and enjoy using models hosted and optimized for scale on OctoAI.
Original file line number Diff line number Diff line change
@@ -8,7 +8,7 @@ OpenRouter provides a unified interface for using various LLMs and allows users
This let's users find the right LLM and mix of price and performance for their use case.


## Using OctoAI's LLMs and LangChain
## Using OctoAI's LLMs and OpenRouter
To access OctoAI's best in class LLMs via OpenRouter [sign into OpenRouter](https://openrouter.ai/) and create an account to obtain an `OPENROUTER_API_KEY`.

Using the code snippet below you can route your calls to OpenRouter via OpenAI's client API.
Original file line number Diff line number Diff line change
@@ -1,22 +1,23 @@
---
title: "Integrations"
title: "All OctoAI Integrations"
sidebarTitle: "All"
description: "Browse OctoAI's partner integrations to help you build your custom solution."
---

<CardGroup cols={3}>
<Card title="LangChain" icon="square-1">
<CardGroup cols={2}>
<Card title="LangChain" icon="square-1" href="/integrations/langchain">
LangChain provides a framework to easily construct LLM-powered apps. Langchain developers can leverage OctoAI LLM and embedding endpoints to easily access efficient compute across a wide selection of LLMs.
</Card>
<Card title="Unstructured.io" icon="square-2">
<Card title="Unstructured.io" icon="square-2" href="/integrations/unstructured">
Unstructured provides components to very easily embed text documents lke PDFs, HTML, Word Docs, and more. The OctoAIEmbedingEncoder is available, so documents parsed with Unstructured can easily be embedded with the OctoAI embeddings endpoint.
</Card>
<Card title="Pinecone (Canopy)" icon="square-3">
Pinecone provides storage and retrieval infrastructure needed ror building and running AI apps. This integration allows a developer using Canopy to choose from the best LLMs on OctoAI.
<Card title="Pinecone (Canopy)" icon="square-3" href="/integrations/pinecone">
Pinecone provides storage and retrieval infrastructure needed for building and running AI apps. This integration allows a developer using Canopy to choose from the best LLMs on OctoAI.
</Card>
<Card title="OpenRouter" icon="square-4">
<Card title="OpenRouter" icon="square-4" href="/integrations/openrouter">
OpenRouter has a unified interface for using various LLMs, allowing users to find and compare models for their needs. The OpenRouter API users can leverage OctoAI's best in class LLM endpoints.
</Card>
<Card title="LlamaIndex" icon="square-5">
<Card title="LlamaIndex" icon="square-5" href="/integrations/llamaindex">
LlamaIndex aids in the management of interactions between your LLMs and private data. A developer building AI apps can now access highly optimized LLMs and Embeddings models on OctoAI.
</Card>
</CardGroup>
Original file line number Diff line number Diff line change
@@ -33,6 +33,8 @@ chat_engine:
batch_size: 2048
```

### Learn with our demo apps

Get started today by following along with one of our demo apps:
- [DocTalk](https://octo.ai/demos/doctalk/)

Original file line number Diff line number Diff line change
@@ -6,14 +6,14 @@ description: "Custom checkpoints are fine-tuned versions of the original model a
Custom Stable Diffusion checkpoints are fine-tuned versions of the original model, trained to capture particular styles, subjects, or objects. They are designed to provide users with more control and customization options when generating images. These checkpoints can be tailored to produce images in various styles, such as realistic photography, artwork, or even specific themes like landscapes or portraits.
While checkpoints represent a significant investment in terms of storage and computational resources, they excel in maintaining the desired customizations consistently. OctoAI's Asset Library boasts a rich collection of pre-loaded custom checkpoints, offering a diverse array of styles to enhance your images. Additionally, users have the flexibility to import bespoke checkpoints from external sources, integrating them seamlessly into OctoAI's Asset Library as personalized assets.

The image results with different checkpoints, even using the same prompt, can be significantly different. Using the simple prompt `A photo of an Australian cattle dog running through a park`, you can see see the results from the SDXL base model (left) and samaritan model (right). The samaritan model represents a 3D-cartoon image style.
The image results with different checkpoints, even using the same prompt, can be significantly different. Using the simple prompt `A photo of an Australian cattle dog running through a park`, you can see see the results from the SDXL base model (left) and samaritan model (right). The samaritan model represents a 3D-cartoon image style.

<CardGroup cols={2}>
<Card title="Using SDXL base model">
![](https://www.datocms-assets.com/45680/1706909113-4d5ffd9-1.jpeg?max-w=2000&auto=compress)
![](https://www.datocms-assets.com/45680/1706909113-4d5ffd9-1.jpeg?max-w=2000&auto=compress)
</Card>
<Card title="Using samaritan checkpoint">
![](https://www.datocms-assets.com/45680/1706909150-95229ac-2.jpeg?max-w=2000&auto=compress)
![](https://www.datocms-assets.com/45680/1706909150-95229ac-2.jpeg?max-w=2000&auto=compress)
</Card>
</CardGroup>

@@ -41,12 +41,12 @@ curl -X POST "https://image.octoai.run/generate/sdxl" \

```Python Python
import os
from octoai.clients.image_gen import Engine, ImageGenerator
from octoai.util import to_file
from octoai.client import OctoAI

if __name__ == "__main__":
image_gen = ImageGenerator(token=os.environ.get("OCTOAI_TOKEN"))
image_gen_response = image_gen.generate(
engine=Engine.SDXL,
client = OctoAI(api_key=os.environ.get("OCTOAI_TOKEN"))
image_gen_response = client.image_gen.generate_sdxl(
prompt="A photo of an Australian cattle dog running through a park",
negative_prompt="Blurry photo, distortion, low-res, poor quality",
checkpoint="octoai:samaritan",
@@ -63,33 +63,32 @@ if __name__ == "__main__":
images = image_gen_response.images

for i, image in enumerate(images):
image.to_file(f"result{i}.jpg")
to_file(image, f"result{i}.jpg")
```
```typescript Typescript
import fs from "fs";
import { Client } from "@octoai/client";
import { OctoAIClient } from "@octoai/sdk";

const client = new Client(OCTOAI_TOKEN);
const endpointUrl = "https://image.octoai.run/generate/sdxl";
const client = new OctoAIClient({
apiKey: "<OCTOAI_TOKEN>",
});

const inputs = {
const { images } = await client.imageGen.generateSdxl({
"prompt": "A photo of an Australian cattle dog running through a park",
"negative_prompt": "Blurry photo, distortion, low-res, poor quality",
"negativePrompt": "Blurry photo, distortion, low-res, poor quality",
"checkpoint": "octoai:samaritan",
"width": 1024,
"height": 1024,
"num_images": 1,
"numImages": 1,
"sampler": "DDIM",
"steps": 30,
"cfg_scale": 12,
"use_refiner": true,
"high_noise_frac": 0.8,
"style_preset": "base"
};

const outputs = await client.infer<any>(endpointUrl, inputs);
"cfgScale": 12,
"useRefiner": true,
"highNoiseFrac": 0.8,
"stylePreset": "base"
})

outputs.images.forEach((output, i) => {
images.forEach((output, i) => {
const buffer = Buffer.from(output.image_b64, "base64");
fs.writeFileSync(`result${i}.jpg`, buffer);
});
Loading

0 comments on commit 7fd61d0

Please sign in to comment.