Skip to content

Commit

Permalink
Fix markdown
Browse files Browse the repository at this point in the history
  • Loading branch information
richelbilderbeek committed Jan 17, 2025
1 parent 195d08f commit c719cef
Showing 1 changed file with 77 additions and 77 deletions.
154 changes: 77 additions & 77 deletions docs/software/whisper.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,156 +16,156 @@ automatic speech recognition system. Whisper is available on Bianca. It can eith

!!! warning "AI tool caution"

Like all other AI models, Whisper too hallucinates while transcribing or translating. ie, "make-up" words or even sentences, resulting in misinterpretation or misrepresentation of the speaker.
Like all other AI models, Whisper too hallucinates while transcribing or translating. ie, "make-up" words or even sentences, resulting in misinterpretation or misrepresentation of the speaker.


??? info "Quality of transcriptions/ translations and audio formats"
??? info "Quality of transcriptions/ translations and audio formats"

Transcriptions (error rate):- Swedish: ~10% , English: ~5% , English with heavy accent: ~ 20%

Translations:- Any to English: "DeepL" level performance. Slightly better than google translate.
Transcriptions (error rate):- Swedish: ~10% , English: ~5% , English with heavy accent: ~ 20%

Supported file types: mp3, mp4, mpeg, mpga, m4a, wav, webm and wma.
Translations:- Any to English: "DeepL" level performance. Slightly better than google translate.

Quality as a factor of duration of recordings:-
A few minutes: Excellent
Supported file types: mp3, mp4, mpeg, mpga, m4a, wav, webm and wma.

Quality as a factor of duration of recordings:-
A few minutes: Excellent
A few minutes to an hour: Excellent at the beginning, then detoriates.
An hour or more: Excellent at the beginning, then detoriates.
An hour or more: Excellent at the beginning, then detoriates.

Quality as a factor of noise and count of speakers:-
2 speakers: Excellent
Background noise: Good
2+ speakers: Very Good
Conversational overlap: Average. Difficulty disambiguating speakers.
Quality as a factor of noise and count of speakers:-
2 speakers: Excellent
Background noise: Good
2+ speakers: Very Good
Conversational overlap: Average. Difficulty disambiguating speakers.
Long silences: Good. Might repeat sentences and get stuck in loop.

Whisper also tries to give separate sentences for different speakers. But it is not guaranteed.
Whisper also tries to give separate sentences for different speakers. But it is not guaranteed.

!!! warning "Recordings from Dictaphone "

If you record using dictaphone such as Olympus DS-9000, it would by default record in `.DS` or `.DS2` file formats which are NOT supported by Whisper.
Make sure to change the settings on the dictaphone to `.mp3` format before you start recording.
If you record using dictaphone such as Olympus DS-9000, it would by default record in `.DS` or `.DS2` file formats which are NOT supported by Whisper.
Make sure to change the settings on the dictaphone to `.mp3` format before you start recording.
Follow this [guide](https://audiosupport.omsystem.com/wp-content/uploads/2021/11/DictationModule.pdf) to convert your `DS` or `.DS2` recording to `.mp3` using the software that comes with your dictaphone. Else, you can also download the sofware from [here](https://audiosupport.omsystem.com/en/product/odms-r8/) and then follow the same guide.

## Glossary
## Glossary

**SUPR account** : Gives access to project management account for submitting project proposals on SUPR.
**UPPMAX account** : Gives access to UPPMAX servers, like Bianca.
**GUI** : Graphical User Interface for taking transcription/translation inputs.
**WinSCP / FileZilla**: user interface to send data from your computer to Bianca and vice-versa.
**Terminal** : Black text-based environment that is used for performing jobs.
**Wharf**: private folder in Bianca that is used to transfer data to and from your computer.
**Proj**: project folder in Bianca that is shared among all project members.
**Job**: A request for transcribing/translating one or many recordings.
**Slurm**: "job" handler.
**SUPR account** : Gives access to project management account for submitting project proposals on SUPR.
**UPPMAX account** : Gives access to UPPMAX servers, like Bianca.
**GUI** : Graphical User Interface for taking transcription/translation inputs.
**WinSCP / FileZilla**: user interface to send data from your computer to Bianca and vice-versa.
**Terminal** : Black text-based environment that is used for performing jobs.
**Wharf**: private folder in Bianca that is used to transfer data to and from your computer.
**Proj**: project folder in Bianca that is shared among all project members.
**Job**: A request for transcribing/translating one or many recordings.
**Slurm**: "job" handler.

!!! info inline end "Checklist for new project"
!!! info inline end "Checklist for new project"

* [x] SUPR account
* [x] Submit project proposal
* [x] UPPMAX username and password
* [x] UPPMAX two factor authentication.
* [x] SUPR account
* [x] Submit project proposal
* [x] UPPMAX username and password
* [x] UPPMAX two factor authentication.


## Accessing your project
## Accessing your project

Following steps are derived from [UPPMAX User Accounts](https://www.uu.se/en/centre/uppmax/get-started/create-account-and-apply-for-project/user-account):
Following steps are derived from [UPPMAX User Accounts](https://www.uu.se/en/centre/uppmax/get-started/create-account-and-apply-for-project/user-account):

1. Register an [account on SUPR](https://supr.naiss.se/person/register/).
1. Register an [account on SUPR](https://supr.naiss.se/person/register/).

2. Apply for a project for [sensitive data at Bianca](https://supr.naiss.se/round/senssmall2024/create_proposal). Give adequate information while creating your proposal by following [this template](#proposal-template).
2. Apply for a project for [sensitive data at Bianca](https://supr.naiss.se/round/senssmall2024/create_proposal). Give adequate information while creating your proposal by following [this template](#proposal-template).

4. Register an [account for UPPMAX](https://supr.naiss.se/account/) at SUPR by clicking "Request Account at UPPMAX" button. You will receive an UPPMAX username and password via email.
4. Register an [account for UPPMAX](https://supr.naiss.se/account/) at SUPR by clicking "Request Account at UPPMAX" button. You will receive an UPPMAX username and password via email.

Check failure on line 78 in docs/software/whisper.md

View workflow job for this annotation

GitHub Actions / check_markdown

Ordered list item prefix [Expected: 3; Actual: 4; Style: 1/2/3]

5. [Setup two factor authentication](https://www.uu.se/en/centre/uppmax/get-started/2-factor) for this newly created UPPMAX account. ([Video](https://www.youtube.com/watch?v=eSn0kLkU5Dc&ab_channel=Rich%C3%A8lJ.C.Bilderbeek))
5. [Setup two factor authentication](https://www.uu.se/en/centre/uppmax/get-started/2-factor) for this newly created UPPMAX account. ([Video](https://www.youtube.com/watch?v=eSn0kLkU5Dc&ab_channel=Rich%C3%A8lJ.C.Bilderbeek))

Check failure on line 80 in docs/software/whisper.md

View workflow job for this annotation

GitHub Actions / check_markdown

Ordered list item prefix [Expected: 4; Actual: 5; Style: 1/2/3]

6. Check access to your project on [Bianca](https://bianca.uppmax.uu.se/). ([Video](https://youtube.com/clip/UgkxKxnaebkokAuFGEXqafoIQo-RjTPEZbeJ?si=VFFfDSY2DhAYMvDO))

Check failure on line 82 in docs/software/whisper.md

View workflow job for this annotation

GitHub Actions / check_markdown

Ordered list item prefix [Expected: 5; Actual: 6; Style: 1/2/3]

## Whisper App

### Step 1: Data transfer from local computer to Bianca
### Step 1: Data transfer from local computer to Bianca

1. Transfer your data from your local computer to Wharf using [WinSCP](https://docs.uppmax.uu.se/software/bianca_file_transfer_using_winscp/) app (for Windows only) or [FileZilla](https://docs.uppmax.uu.se/software/bianca_file_transfer_using_filezilla/) app (Mac, Windows or Linux). Instruction on how to do it is in their respective links or watch FileZilla [Video](https://www.youtube.com/watch?v=V-iPQLjvByc&t=136s&ab_channel=Rich%C3%A8lJ.C.Bilderbeek).

### Step 2: Transcribing/Translating
### Step 2: Transcribing/Translating

1. Login to [Bianca](https://bianca.uppmax.uu.se/). It requires your UPPMAX username (visible in SUPR), project name and two factor authentication code. Make sure you are inside SUNET for the link to work.
1. Login to [Bianca](https://bianca.uppmax.uu.se/). It requires your UPPMAX username (visible in SUPR), project name and two factor authentication code. Make sure you are inside SUNET for the link to work.

1. Click on the Terminal icon on the bottom of the Desktop and enter the following command in it to load Whisper GUI.
1. Click on the Terminal icon on the bottom of the Desktop and enter the following command in it to load Whisper GUI.

```bash
module load Whisper-gui
```
```

![Terminal on Bianca Desktop](./img/whisper_terminal.png)
![Terminal on Bianca Desktop](./img/whisper_terminal.png)

1. You shall now see `proj` and `wharf` folders on your Desktop along with a Whisper application icon. `wharf` contains the data that was transferred in [Step 1](#step-1-data-transfer-from-local-computer-to-bianca).
1. You shall now see `proj` and `wharf` folders on your Desktop along with a Whisper application icon. `wharf` contains the data that was transferred in [Step 1](#step-1-data-transfer-from-local-computer-to-bianca).
(Next time you start transcribing/translating by logging in again to Bianca, you can start from this step and skip the previous one, since `wharf` and `proj` folder are already created.)

![Desktop view on Bianca after running `module load Whisper-gui`](./img/whisper_desktop.png)

1. Open `wharf` and `proj` folder. Select all the data that you transferred in `wharf`, drag and drop it into the `proj` folder.
NOTE: if you drag and drop, it will cut-paste your data instead of copy-paste. Do not keep files in `wharf` for a long period, as this folder is connected to the outside world and hence is a security risk. `proj`, on the other hand, is safe to keep data in as it is cut-off from the internet, so move your data there.
1. Open `wharf` and `proj` folder. Select all the data that you transferred in `wharf`, drag and drop it into the `proj` folder.
NOTE: if you drag and drop, it will cut-paste your data instead of copy-paste. Do not keep files in `wharf` for a long period, as this folder is connected to the outside world and hence is a security risk. `proj`, on the other hand, is safe to keep data in as it is cut-off from the internet, so move your data there.

![whisper gui](./img/whisper_data_transfer.png){: style="height:90%;width:90%"}

1. Click on Whisper application on Desktop. It would look like this:
1. Click on Whisper application on Desktop. It would look like this:
![whisper gui](./img/whisper-gui.png){: style="height:90%;width:90%"}


1. Select appropriate options, or use the following for the best results:
1. Select appropriate options, or use the following for the best results:

**Total audio length in hours**: [give a rough average if transcribing files in bulk, rounding up to nearest hour]
**Total audio length in hours**: [give a rough average if transcribing files in bulk, rounding up to nearest hour]

**Language used in recordings (leave blank for autodetection)**: If you have multiple languages in the selected recordings or you are unsure about the spoken language, leave it blank. If your language of choice is unavailable in the drop down, check the "Languages available" list for its availability and [contact support](https://supr.naiss.se/support/).
**Language used in recordings (leave blank for autodetection)**: If you have multiple languages in the selected recordings or you are unsure about the spoken language, leave it blank. If your language of choice is unavailable in the drop down, check the "Languages available" list for its availability and [contact support](https://supr.naiss.se/support/).

**Select whether to transcribe or translate (English only)**: 'Transcribe' [for language X -> language X]. 'Translate' [for language X -> English].
**Select whether to transcribe or translate (English only)**: 'Transcribe' [for language X -> language X]. 'Translate' [for language X -> English].

**Model**: large-v2
**Model**: large-v2

**Initial Prompt**: [leave blank]
**Initial Prompt**: [leave blank]

### Step 3: Monitoring jobs
### Step 3: Monitoring jobs

1. Your job will first wait in a queue and then start executing. To first check if your job is waiting in the queue, type `squeue --me -o "%.30j"` on terminal. If you see your job name `Whisper_xxx` it means it is in the queue, where `xxx` is the date and time of job submission, example: Whisper_2024-10-25_11-10-30.
1. Your job will first wait in a queue and then start executing. To first check if your job is waiting in the queue, type `squeue --me -o "%.30j"` on terminal. If you see your job name `Whisper_xxx` it means it is in the queue, where `xxx` is the date and time of job submission, example: Whisper_2024-10-25_11-10-30.

2. To check if your job has started executing, locate a file named `[Whisper_xxx_yyy].out` that will get created in `Whisper_logs` folder inside `proj` folder, where `xxx` is date and time of job submission and `yyy` is your username followed by a "job id", example: Whisper_2024-10-25_11-10-30_jayan_234.out. This contains a progress bar for each recording that you sent for transcribing/translating.
2. To check if your job has started executing, locate a file named `[Whisper_xxx_yyy].out` that will get created in `Whisper_logs` folder inside `proj` folder, where `xxx` is date and time of job submission and `yyy` is your username followed by a "job id", example: Whisper_2024-10-25_11-10-30_jayan_234.out. This contains a progress bar for each recording that you sent for transcribing/translating.

3. If neither job name `Whisper_xxx` was found in queue, nor a `[Whisper_xxx_yyy].out` was created in `Whisper_logs`, [contact support](https://supr.naiss.se/support/).

### Step 4: Data transfer from project to local computer

1. Drag and drop your transcriptions/translations from `proj` folder to `wharf`.
1. Drag and drop your transcriptions/translations from `proj` folder to `wharf`.

2. Use WinSCP/FileZilla like you did in [Step 1](#step-1-data-transfer-from-local-computer-to-bianca) and transfer your data from `wharf` to your local computer.

### Output files

By default you receive 5 types of output files for each file you transcribe/translate:
With timestamps: `.srt`, `.vtt`, `.tsv`
Without timestamps: `.txt`
With detailed model metadata: `.json`.
The most popular ones are `.srt` and `.txt` formats.
By default you receive 5 types of output files for each file you transcribe/translate:
With timestamps: `.srt`, `.vtt`, `.tsv`
Without timestamps: `.txt`
With detailed model metadata: `.json`.
The most popular ones are `.srt` and `.txt` formats.

On Mac, `.txt`, `.srt` and `.vtt` can be opened in Word by:
Tap with two fingers. Select Encoding as "Unicode (UTF-8)". Change the name of the file like `some_name.docx` and change type of file to `.docx`. Open the file and then Save As a new file.
![Mac setting for UTF-8 export](../img/mac_utf8.png){: style="height:90%;width:90%"}
On Mac, `.txt`, `.srt` and `.vtt` can be opened in Word by:
Tap with two fingers. Select Encoding as "Unicode (UTF-8)". Change the name of the file like `some_name.docx` and change type of file to `.docx`. Open the file and then Save As a new file.
![Mac setting for UTF-8 export](../img/mac_utf8.png){: style="height:90%;width:90%"}

??? tip "Advance settings"

Use below features only if transcriptions/translations are not satisfactory and for less spoken languages or languages that are not having good resources online for understanding :

1. When asked for Initial Prompt, provide a list of comma separated words or sentences (less than 80 words) that describe what the recording is about or the words used by the speaker in the recording. It should be in written in same language as the language in spoken in the recordings.
1. When asked for Initial Prompt, provide a list of comma separated words or sentences (less than 80 words) that describe what the recording is about or the words used by the speaker in the recording. It should be in written in same language as the language in spoken in the recordings.

2. Try switching to Model: large-v3.
2. Try switching to Model: large-v3.

3. Use combination of both 1 and 2.
3. Use combination of both 1 and 2.

4. If you are sure about the language used in the recording, use the dropdown menu and select the appropriate language.
4. If you are sure about the language used in the recording, use the dropdown menu and select the appropriate language.

??? note "Languages available"

Following languages are available for transcribing. If your language of choice does not appear in Whisper application but is listed here, [contact support](https://supr.naiss.se/support/):
Following languages are available for transcribing. If your language of choice does not appear in Whisper application but is listed here, [contact support](https://supr.naiss.se/support/):

`en`: "english",
`zh`: "chinese",
Expand Down Expand Up @@ -270,17 +270,17 @@ Tap with two fingers. Select Encoding as "Unicode (UTF-8)". Change the name of t

## Proposal template

Under the Basic Information section on NAISS SUPR, provide the following compulsory details pertaining to your project in the following fashion:
Under the Basic Information section on NAISS SUPR, provide the following compulsory details pertaining to your project in the following fashion:

* **Project Title** : Whisper service for [Name of the project]

* **Abstract**: [What is the project about, give links, funding info, duration etc.]
* **Abstract**: [What is the project about, give links, funding info, duration etc.]

* **Resource Usage**: [Explain where transcriptions/translations are needed like interview recordings on device/ zoom or other forms of audio/video recordings from offline/online sources. Give the average and maximum number of recordings to be transcribed/translated. Give the average and maximum size of recordings in mins/hours. Mention if it is a transcribing or translation requirement. Mention the language spoken in the recordings, if known, and a rough estimate of number of recordings for each of these languages. Ignore the "core-hours" and "hours required to analyse one sample" requirement.]
* **Resource Usage**: [Explain where transcriptions/translations are needed like interview recordings on device/ zoom or other forms of audio/video recordings from offline/online sources. Give the average and maximum number of recordings to be transcribed/translated. Give the average and maximum size of recordings in mins/hours. Mention if it is a transcribing or translation requirement. Mention the language spoken in the recordings, if known, and a rough estimate of number of recordings for each of these languages. Ignore the "core-hours" and "hours required to analyse one sample" requirement.]

* **Abridged Data Management Plan**: [Address all points. Mention the recording file types example: .mp3, .mp4, .wav etc.]
* **Abridged Data Management Plan**: [Address all points. Mention the recording file types example: .mp3, .mp4, .wav etc.]

* **Primary Classification**: [Either follow the Standard för svensk indelning av forskningsämnen link given or search by entering the field of research such as 'Social Work', 'Human Geography' etc. ]
* **Primary Classification**: [Either follow the Standard för svensk indelning av forskningsämnen link given or search by entering the field of research such as 'Social Work', 'Human Geography' etc. ]

* **Requested Duration**: [Mention the duration for which Whisper service is strictly required. Mentioning more duration than actually required might reflect negatively when a new allocation is requested for the same or new project next time. It is possible to request for a shorter duration of 1 month at first and then ask for a new one once the need arises again in the future.]

Expand All @@ -299,7 +299,7 @@ Under the Basic Information section on NAISS SUPR, provide the following compuls
[jayan@sens2024544-bianca jayan]$ module list
Currently Loaded Modules:
1) uppmax 2) python/3.11.4 3) FFmpeg/5.1.2 4) Whisper/20240930
```
```

### Command-line

Expand Down

0 comments on commit c719cef

Please sign in to comment.