Skip to content

Commit

Permalink
Refactor generateTranscription API.
Browse files Browse the repository at this point in the history
  • Loading branch information
lgrammel committed Jan 21, 2024
1 parent fa636bf commit a2d76f6
Show file tree
Hide file tree
Showing 32 changed files with 194 additions and 155 deletions.
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,13 +269,12 @@ Transcribe speech (audio) data into text. Also called speech-to-text (STT).

```ts
import { generateTranscription, openai } from "modelfusion";
import fs from "node:fs";

const transcription = await generateTranscription({
model: openai.Transcriber({ model: "whisper-1" }),
data: {
type: "mp3",
data: await fs.promises.readFile("data/test.mp3"),
},
mimeType: "audio/mp3",
audioData: await fs.promises.readFile("data/test.mp3"),
});
```

Expand Down
11 changes: 5 additions & 6 deletions docs/guide/function/generate-transcription.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,18 @@ Transcribe speech (audio) data into text. Also called speech-to-text (STT).

[generateTranscription API](/api/modules#generatetranscription)

`generateTranscription` uses a model, audio data, and a mime type to generate a transcription.

#### With OpenAI transcription model

```ts
import { generateTranscription, openai } from "modelfusion";

const data = await fs.promises.readFile("data/test.mp3");
import fs from "node:fs";

const transcription = await generateTranscription({
model: openai.Transcriber({ model: "whisper-1" }),
data: {
type: "mp3",
data,
},
mimeType: "audio/mp3",
audioData: await fs.promises.readFile("data/test.mp3"),
});
```

Expand Down
9 changes: 4 additions & 5 deletions docs/integration/model-provider/openai/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,19 +220,18 @@ const tokensAndTokenTexts = await tokenizer.tokenizeWithTexts(text);
const reconstructedText = await tokenizer.detokenize(tokens);
```

### Generate Transcription
### [Generate Transcription](/guide/function/generate-transcription)

[OpenAITranscriptionModel API](/api/classes/OpenAITranscriptionModel)

```ts
import { generateTranscription, openai } from "modelfusion";
import fs from "node:fs";
import { openai, generateTranscription } from "modelfusion";

const data = await fs.promises.readFile("data/test.mp3");

const transcription = await generateTranscription({
model: openai.Transcriber({ model: "whisper-1" }),
data: { type: "mp3", data },
mimeType: "audio/mp3",
audioData: await fs.promises.readFile("data/test.mp3"),
});
```

Expand Down
11 changes: 4 additions & 7 deletions docs/integration/model-provider/whispercpp.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,25 +19,22 @@ Without the `--convert` parameter, the server expects WAV files with 16kHz sampl
`ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav`
:::


## Model Functions

[Examples](https://github.com/lgrammel/modelfusion/tree/main/examples/basic/src/model-provider/whispercpp)

### Generate Transcription
### [Generate Transcription](/guide/function/generate-transcription)

[WhisperCppTranscriptionModel API](/api/classes/WhisperCppTranscriptionModel)

```ts
import fs from "node:fs";
import { whispercpp, generateTranscription } from "modelfusion";

const data = await fs.promises.readFile("data/test.wav");

const transcription = await generateTranscription({
// Whisper.cpp model:
model: whispercpp.Transcriber(),
data: { type: "wav", data },
mimeType: "audio/wav",
audioData: await fs.promises.readFile("data/test.wav"),
});
```

Expand All @@ -60,4 +57,4 @@ const model = whispercpp.Transcriber({
api,
// ...
});
```
```
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ import fs from "node:fs";
dotenv.config();

async function main() {
const data = await fs.promises.readFile("data/test.mp3");
const audioData = await fs.promises.readFile("data/test.mp3");

const run = new DefaultRun();

const transcription = await generateTranscription({
model: openai.Transcriber({ model: "whisper-1" }),
data: { type: "mp3", data },
run,
mimeType: "audio/mp3",
audioData,
});

console.log(transcription);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,10 @@ import fs from "node:fs";
dotenv.config();

async function main() {
const data = await fs.promises.readFile("data/test.mp3");

const transcription = await generateTranscription({
model: openai.Transcriber({ model: "whisper-1" }),
data: { type: "mp3", data },
mimeType: "audio/mp3",
audioData: await fs.promises.readFile("data/test.mp3"),
});

console.log(transcription);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,12 @@ import fs from "node:fs";
dotenv.config();

async function main() {
const data = await fs.promises.readFile("data/test.mp3");
const audioData = await fs.promises.readFile("data/test.mp3");

const transcription = await generateTranscription({
model: openai.Transcriber({ model: "whisper-1" }),
data: {
type: "mp3",
data,
},
mimeType: "audio/mp3",
audioData,
});

console.log(transcription);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,22 +5,17 @@ import fs from "node:fs";
dotenv.config();

async function main() {
const data = await fs.promises.readFile("data/test.wav");
const audioData = await fs.promises.readFile("data/test.wav");

const transcription = await generateTranscription({
model: whispercpp.Transcriber({
// Custom API configuration:
api: whispercpp.Api({
baseUrl: {
host: "localhost",
port: "8080",
},
baseUrl: { host: "localhost", port: "8080" },
}),
}),
data: {
type: "wav",
data,
},
mimeType: "audio/wav",
audioData,
});

console.log(transcription);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,10 @@ import fs from "node:fs";
dotenv.config();

async function main() {
const data = await fs.promises.readFile("data/test.wav");

const transcription = await generateTranscription({
model: whispercpp.Transcriber(),
data: { type: "wav", data },
mimeType: "audio/wav",
audioData: await fs.promises.readFile("data/test.wav"),
});

console.log(transcription);
Expand Down
7 changes: 2 additions & 5 deletions examples/nextjs/app/api/generate-transcription/route.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@ import {
generateTranscription,
openai,
} from "modelfusion";
import { getAudioFileExtension } from "modelfusion-experimental";

export const runtime = "edge";

Expand All @@ -27,10 +26,8 @@ export async function POST(req: Request) {
}),
model: "whisper-1",
}),
data: {
type: getAudioFileExtension(audioFile.type),
data: new Uint8Array(fileData),
},
mimeType: audioFile.type,
audioData: fileData,
});

return Response.json(transcription);
Expand Down
2 changes: 1 addition & 1 deletion examples/nextjs/app/generate-transcription/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import { Button } from "@/components/ui/button";
import { MicIcon } from "@/components/ui/mic-icon";
import { getAudioFileExtension } from "modelfusion-experimental";
import { getAudioFileExtension } from "modelfusion";
import { useRef, useState } from "react";

export default function () {
Expand Down
1 change: 0 additions & 1 deletion packages/modelfusion-experimental/src/index.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
export * from "./composed-function/index.js";
export * from "./cost/index.js";
export * from "./guard/index.js";
export * from "./util/index.js";
4 changes: 2 additions & 2 deletions packages/modelfusion-experimental/tsconfig.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@
"strict": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"types": ["vitest/globals"]
"types": ["vitest/globals"],
},
"include": ["src/**/*"],
"exclude": ["node_modules"]
"exclude": ["node_modules"],
}
4 changes: 2 additions & 2 deletions packages/modelfusion/src/core/api/postToApi.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { toUint8Array } from "../../util/UInt8Utils.js";
import { convertDataContentToUint8Array } from "../../util/format/DataContent.js";
import { Schema } from "../schema/Schema.js";
import { parseJSON, safeParseJSON } from "../schema/parseJSON.js";
import { ApiCallError } from "./ApiCallError.js";
Expand Down Expand Up @@ -122,7 +122,7 @@ export const createAudioMpegResponseHandler =
});
}

return toUint8Array(await response.arrayBuffer());
return convertDataContentToUint8Array(await response.arrayBuffer());
};

export const postJsonToApi = async <T>({
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { FunctionOptions } from "../../core/FunctionOptions.js";
import { base64ToUint8Array } from "../../util/UInt8Utils.js";
import { base64ToUint8Array } from "../../util/format/UInt8Utils.js";
import { ModelCallMetadata } from "../ModelCallMetadata.js";
import { executeStandardCall } from "../executeStandardCall.js";
import {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { uint8ArrayToBase64 } from "../../../util/UInt8Utils.js";
import { DataContent } from "../../../util/format/DataContent.js";
import { InvalidPromptError } from "./InvalidPromptError.js";

export interface TextPart {
Expand All @@ -14,9 +14,9 @@ export interface ImagePart {
type: "image";

/**
* Image data. Can either be a base64-encoded string, a Uint8Array, or a Buffer.
* Image data. Can either be a base64-encoded string, a Uint8Array, an ArrayBuffer, or a Buffer.
*/
image: string | Uint8Array | Buffer;
image: DataContent;

/**
* Optional mime type of the image.
Expand All @@ -39,14 +39,6 @@ export interface ToolResponsePart {
response: unknown;
}

export function getImageAsBase64(image: string | Uint8Array | Buffer): string {
if (typeof image === "string") {
return image;
}

return uint8ArrayToBase64(image);
}

export function validateContentIsString(
content: string | unknown,
prompt: unknown
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
import { DataContent } from "../../util/format/DataContent.js";
import { FunctionCallOptions } from "../../core/FunctionOptions.js";
import { Model, ModelSettings } from "../Model.js";

export interface TranscriptionModelSettings extends ModelSettings {}

export interface TranscriptionModel<
DATA,
SETTINGS extends TranscriptionModelSettings = TranscriptionModelSettings,
> extends Model<SETTINGS> {
doTranscribe: (
data: DATA,
input: {
mimeType: string;
audioData: DataContent;
},
options: FunctionCallOptions
) => PromiseLike<{
rawResponse: unknown;
Expand Down
Loading

0 comments on commit a2d76f6

Please sign in to comment.