Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test stt #5634

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
11 changes: 11 additions & 0 deletions app/client/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,16 @@ export interface SpeechOptions {
onController?: (controller: AbortController) => void;
}

export interface TranscriptionOptions {
model?: "whisper-1";
file: Blob;
language?: string;
prompt?: string;
response_format?: "json" | "text" | "srt" | "verbose_json" | "vtt";
temperature?: number;
onController?: (controller: AbortController) => void;
}

export interface ChatOptions {
messages: RequestMessage[];
config: LLMConfig;
Expand Down Expand Up @@ -98,6 +108,7 @@ export interface LLMModelProvider {
export abstract class LLMApi {
abstract chat(options: ChatOptions): Promise<void>;
abstract speech(options: SpeechOptions): Promise<ArrayBuffer>;
abstract transcription(options: TranscriptionOptions): Promise<string>;
abstract usage(): Promise<LLMUsage>;
abstract models(): Promise<LLMModel[]>;
}
Expand Down
41 changes: 41 additions & 0 deletions app/client/platforms/openai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,47 @@ export class ChatGPTApi implements LLMApi {
}
}

async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());

console.log("[Request] openai audio transcriptions payload: ", options);

const controller = new AbortController();
options.onController?.(controller);

try {
const path = this.path(OpenaiPath.TranscriptionPath, options.model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect number of arguments in this.path method call

In line 200, the this.path method is called with two arguments (OpenaiPath.TranscriptionPath, options.model), but the path method is defined to accept only one argument. This will result in a TypeError.

To fix this, update the path method signature to accept the optional model parameter:

-export class ChatGPTApi implements LLMApi {
-  // Existing code...
-  path(path: string): string {
+export class ChatGPTApi implements LLMApi {
+  // Existing code...
+  path(path: string, model?: string): string {
     // Method implementation...
   }

Ensure that you handle the model parameter within the path method as needed.

Committable suggestion was skipped due to low confidence.

const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers: headers,
};

// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
Comment on lines +215 to +218
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add response status check before parsing JSON

In lines 214-217, the code attempts to parse the JSON response without verifying if the request was successful. If the request fails, res.json() might throw an error or the response might not contain the expected text property.

Consider checking res.ok before parsing the response and handle errors appropriately:

const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
+ if (!res.ok) {
+   const errorText = await res.text();
+   console.error(`[Response] Transcription request failed: ${errorText}`);
+   throw new Error(`Transcription request failed with status ${res.status}`);
+ }
const json = await res.json();
return json.text;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
if (!res.ok) {
const errorText = await res.text();
console.error(`[Response] Transcription request failed: ${errorText}`);
throw new Error(`Transcription request failed with status ${res.status}`);
}
const json = await res.json();
return json.text;

} catch (e) {
console.log("[Request] failed to make a audio transcriptions request", e);
throw e;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and response parsing in the transcription method

The transcription method implementation looks good overall, but there are a few areas for improvement:

  1. Error handling could be more specific to provide better debugging information.
  2. The response parsing assumes the presence of a text property without checking.
  3. The headers variable is declared but not used in the fetch call.

Consider applying the following improvements:

  1. Enhance error handling:
 } catch (e) {
-  console.log("[Request] failed to make a audio transcriptions request", e);
-  throw e;
+  console.error("[Request] failed to make an audio transcriptions request", e);
+  throw new Error(`Transcription request failed: ${e.message}`);
 }
  1. Add response status check and error handling:
 const res = await fetch(path, payload);
 clearTimeout(requestTimeoutId);
+if (!res.ok) {
+  throw new Error(`Transcription request failed with status ${res.status}`);
+}
 const json = await res.json();
-return json.text;
+return json.text ?? '';
  1. Use the headers variable in the fetch call:
 const payload = {
   method: "POST",
   body: formData,
   signal: controller.signal,
-  headers: headers,
+  headers,
 };

These changes will improve the robustness and reliability of the transcription method.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());
console.log("[Request] openai audio transcriptions payload: ", options);
const controller = new AbortController();
options.onController?.(controller);
try {
const path = this.path(OpenaiPath.TranscriptionPath, options.model);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers: headers,
};
// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
} catch (e) {
console.log("[Request] failed to make a audio transcriptions request", e);
throw e;
}
}
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());
console.log("[Request] openai audio transcriptions payload: ", options);
const controller = new AbortController();
options.onController?.(controller);
try {
const path = this.path(OpenaiPath.TranscriptionPath, options.model);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers,
};
// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
if (!res.ok) {
throw new Error(`Transcription request failed with status ${res.status}`);
}
const json = await res.json();
return json.text ?? '';
} catch (e) {
console.error("[Request] failed to make an audio transcriptions request", e);
throw new Error(`Transcription request failed: ${e.message}`);
}
}


async chat(options: ChatOptions) {
const modelConfig = {
...useAppConfig.getState().modelConfig,
Expand Down
56 changes: 54 additions & 2 deletions app/components/chat.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import React, {
} from "react";

import SendWhiteIcon from "../icons/send-white.svg";
import VoiceWhiteIcon from "../icons/voice-white.svg";
import BrainIcon from "../icons/brain.svg";
import RenameIcon from "../icons/rename.svg";
import ExportIcon from "../icons/share.svg";
Expand Down Expand Up @@ -72,6 +73,7 @@ import {
isDalle3,
showPlugins,
safeLocalStorage,
isFirefox,
} from "../utils";

import { uploadImage as uploadImageRemote } from "@/app/utils/chat";
Expand All @@ -97,8 +99,9 @@ import {
} from "./ui-lib";
import { useNavigate } from "react-router-dom";
import {
CHAT_PAGE_SIZE,
DEFAULT_STT_ENGINE,
DEFAULT_TTS_ENGINE,
FIREFOX_DEFAULT_STT_ENGINE,
ModelProvider,
Path,
REQUEST_TIMEOUT_MS,
Expand All @@ -118,6 +121,7 @@ import { MultimodalContent } from "../client/api";
const localStorage = safeLocalStorage();
import { ClientApi } from "../client/api";
import { createTTSPlayer } from "../utils/audio";
import { OpenAITranscriptionApi, WebTranscriptionApi } from "../utils/speech";
import { MsEdgeTTS, OUTPUT_FORMAT } from "../utils/ms_edge_tts";

const ttsPlayer = createTTSPlayer();
Expand Down Expand Up @@ -546,6 +550,44 @@ export function ChatActions(props: {
}
}, [chatStore, currentModel, models]);

const [isListening, setIsListening] = useState(false);
const [isTranscription, setIsTranscription] = useState(false);
const [speechApi, setSpeechApi] = useState<any>(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using a more specific type for speechApi state

The speechApi state is initialized with any type. Consider using a more specific type to improve type safety.

- const [speechApi, setSpeechApi] = useState<any>(null);
+ const [speechApi, setSpeechApi] = useState<WebTranscriptionApi | OpenAITranscriptionApi | null>(null);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const [speechApi, setSpeechApi] = useState<any>(null);
const [speechApi, setSpeechApi] = useState<WebTranscriptionApi | OpenAITranscriptionApi | null>(null);


useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
setSpeechApi(
config.sttConfig.engine === DEFAULT_STT_ENGINE
? new WebTranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, []);

const startListening = async () => {
if (speechApi) {
await speechApi.start();
setIsListening(true);
}
};
const stopListening = async () => {
if (speechApi) {
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(true);
await speechApi.stop();
setIsListening(false);
}
};
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove console.log in production code

There's a console.log statement in the onRecognitionEnd function. Consider removing it or replacing it with a more appropriate logging mechanism for production code.

  const onRecognitionEnd = (finalTranscript: string) => {
-   console.log(finalTranscript);
    if (finalTranscript) props.setUserInput(finalTranscript);
    if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
      setIsTranscription(false);
  };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
const onRecognitionEnd = (finalTranscript: string) => {
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};


return (
<div className={styles["chat-input-actions"]}>
{couldStop && (
Expand Down Expand Up @@ -780,6 +822,16 @@ export function ChatActions(props: {
icon={<ShortcutkeyIcon />}
/>
)}

{config.sttConfig.enable && (
<ChatAction
onClick={async () =>
isListening ? await stopListening() : await startListening()
}
text={isListening ? Locale.Chat.StopSpeak : Locale.Chat.StartSpeak}
icon={<VoiceWhiteIcon />}
/>
)}
</div>
);
}
Expand Down Expand Up @@ -1505,7 +1557,7 @@ function _Chat() {
setAttachImages(images);
}

// 快捷键 shortcut keys
// 快捷键
const [showShortcutKeyModal, setShowShortcutKeyModal] = useState(false);

useEffect(() => {
Expand Down
12 changes: 12 additions & 0 deletions app/components/settings.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ import { nanoid } from "nanoid";
import { useMaskStore } from "../store/mask";
import { ProviderType } from "../utils/cloud";
import { TTSConfigList } from "./tts-config";
import { STTConfigList } from "./stt-config";

function EditPromptModal(props: { id: string; onClose: () => void }) {
const promptStore = usePromptStore();
Expand Down Expand Up @@ -1703,6 +1704,17 @@ export function Settings() {
/>
</List>

<List>
<STTConfigList
sttConfig={config.sttConfig}
updateConfig={(updater) => {
const sttConfig = { ...config.sttConfig };
updater(sttConfig);
config.update((config) => (config.sttConfig = sttConfig));
}}
/>
</List>

<DangerItems />
</div>
</ErrorBoundary>
Expand Down
51 changes: 51 additions & 0 deletions app/components/stt-config.tsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import { STTConfig, STTConfigValidator } from "../store";

import Locale from "../locales";
import { ListItem, Select } from "./ui-lib";
import { DEFAULT_STT_ENGINES } from "../constant";
import { isFirefox } from "../utils";

export function STTConfigList(props: {
sttConfig: STTConfig;
updateConfig: (updater: (config: STTConfig) => void) => void;
}) {
return (
<>
<ListItem
title={Locale.Settings.STT.Enable.Title}
subTitle={Locale.Settings.STT.Enable.SubTitle}
>
<input
type="checkbox"
checked={props.sttConfig.enable}
onChange={(e) =>
props.updateConfig(
(config) => (config.enable = e.currentTarget.checked),
)
}
></input>
</ListItem>
Comment on lines +14 to +27
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Refactor the config update to avoid assignment in expression.

The checkbox implementation for enabling/disabling STT is correct. However, the update logic can be improved to address the static analysis warning about assignment in expression.

Consider refactoring the onChange handler as follows:

 onChange={(e) =>
   props.updateConfig(
-    (config) => (config.enable = e.currentTarget.checked),
+    (config) => ({ ...config, enable: e.currentTarget.checked })
   )
 }

This change creates a new object with the updated enable property, which is a more idiomatic way to update state in React and avoids the assignment in expression issue.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<ListItem
title={Locale.Settings.STT.Enable.Title}
subTitle={Locale.Settings.STT.Enable.SubTitle}
>
<input
type="checkbox"
checked={props.sttConfig.enable}
onChange={(e) =>
props.updateConfig(
(config) => (config.enable = e.currentTarget.checked),
)
}
></input>
</ListItem>
<ListItem
title={Locale.Settings.STT.Enable.Title}
subTitle={Locale.Settings.STT.Enable.SubTitle}
>
<input
type="checkbox"
checked={props.sttConfig.enable}
onChange={(e) =>
props.updateConfig(
(config) => ({ ...config, enable: e.currentTarget.checked })
)
}
></input>
</ListItem>
🧰 Tools
🪛 Biome

[error] 23-23: The assignment should not be in an expression.

The use of assignments in expressions is confusing.
Expressions are often considered as side-effect free.

(lint/suspicious/noAssignInExpressions)

{!isFirefox() && (
<ListItem title={Locale.Settings.STT.Engine.Title}>
<Select
value={props.sttConfig.engine}
onChange={(e) => {
props.updateConfig(
(config) =>
(config.engine = STTConfigValidator.engine(
e.currentTarget.value,
)),
);
}}
>
{DEFAULT_STT_ENGINES.map((v, i) => (
<option value={v} key={i}>
{v}
</option>
))}
</Select>
</ListItem>
)}
</>
);
}
119 changes: 119 additions & 0 deletions app/components/stt.module.scss
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
@import "../styles/animation.scss";
.plugin-page {
height: 100%;
display: flex;
flex-direction: column;

.plugin-page-body {
padding: 20px;
overflow-y: auto;

.plugin-filter {
width: 100%;
max-width: 100%;
margin-bottom: 20px;
animation: slide-in ease 0.3s;
height: 40px;

display: flex;

.search-bar {
flex-grow: 1;
max-width: 100%;
min-width: 0;
outline: none;
}

.search-bar:focus {
border: 1px solid var(--primary);
}

.plugin-filter-lang {
height: 100%;
margin-left: 10px;
}

.plugin-create {
height: 100%;
margin-left: 10px;
box-sizing: border-box;
min-width: 80px;
}
}

.plugin-item {
display: flex;
justify-content: space-between;
padding: 20px;
border: var(--border-in-light);
animation: slide-in ease 0.3s;

&:not(:last-child) {
border-bottom: 0;
}

&:first-child {
border-top-left-radius: 10px;
border-top-right-radius: 10px;
}

&:last-child {
border-bottom-left-radius: 10px;
border-bottom-right-radius: 10px;
}

.plugin-header {
display: flex;
align-items: center;

.plugin-icon {
display: flex;
align-items: center;
justify-content: center;
margin-right: 10px;
}

.plugin-title {
.plugin-name {
font-size: 14px;
font-weight: bold;
}
.plugin-info {
font-size: 12px;
}
.plugin-runtime-warning {
font-size: 12px;
color: #f86c6c;
}
}
}

.plugin-actions {
display: flex;
flex-wrap: nowrap;
transition: all ease 0.3s;
justify-content: center;
align-items: center;
}

@media screen and (max-width: 600px) {
display: flex;
flex-direction: column;
padding-bottom: 10px;
border-radius: 10px;
margin-bottom: 20px;
box-shadow: var(--card-shadow);

&:not(:last-child) {
border-bottom: var(--border-in-light);
}

.plugin-actions {
width: 100%;
justify-content: space-between;
padding-top: 10px;
}
}
}
}
}
5 changes: 5 additions & 0 deletions app/constant.ts
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ export const Anthropic = {
export const OpenaiPath = {
ChatPath: "v1/chat/completions",
SpeechPath: "v1/audio/speech",
TranscriptionPath: "v1/audio/transcriptions",
ImagePath: "v1/images/generations",
UsagePath: "dashboard/billing/usage",
SubsPath: "dashboard/billing/subscription",
Expand Down Expand Up @@ -270,6 +271,10 @@ export const DEFAULT_TTS_VOICES = [
"shimmer",
];

export const DEFAULT_STT_ENGINE = "WebAPI";
export const DEFAULT_STT_ENGINES = ["WebAPI", "OpenAI Whisper"];
export const FIREFOX_DEFAULT_STT_ENGINE = "OpenAI Whisper";

const openaiModels = [
"gpt-3.5-turbo",
"gpt-3.5-turbo-1106",
Expand Down
Loading
Loading