Indigenous Language Speech-to-Text (STT) for Cameroon/Africa #49

FotieMConstant · 2024-08-04T10:43:18Z

Introduction

Building a speech-to-text model that supports indigenous languages in Cameroon/Africa. This project aims to fine-tune existing models like OpenAI’s Whisper(The code is open-source) to understand and transcribe local languages that are not currently represented in mainstream STT technologies.

Description

In Cameroon and across Africa, there’s a lack of representation in speech-to-text technologies for indigenous languages. Current STT systems predominantly support international languages like English and French. This lack of accessibility kinda limits the development of apps and services that could cater to local populations, especially in rural areas where indigenous languages are more prevalent.

Relevant Technology

Most existing STT systems, such as Google Speech-to-Text and Whisper, support only a limited number of international languages. While these systems are effective, they do not address the linguistic diversity in regions like Cameroon. As seen on the image below;

ref link: https://github.com/openai/whisper?tab=readme-ov-file

Proposed Solution:
I suggest to fine-tune Whisper with datasets of indigenous languages from Cameroon. This will involve collecting and curating a large dataset of audio samples in these languages, potentially requiring collaboration with local communities and linguistic experts to ensure accuracy and comprehensiveness. The project could also include research into better documenting these languages, which are often not well-represented in written form.

Disclaimer:
This idea draws inspiration from initiatives focused on improving technological accessibility in underrepresented regions and languages.

Complexity

Beginner - This project requires no or little prior knowledge of the technolog(y|ies) specified to contribute to the project
Intermediate - The user should have some prior knowledge of the technolog(y|ies) to the point where they know how to use it, but not necessarily all the nooks and crannies of the technology
Advanced - The project requires the user to have a good understanding of all components of the project to contribute

Required time

Little work - A couple of days
Medium work - A week or two
Much work - The project will take more than a couple of weeks and serious planning is required

It's great having you contribute to this project

Welcome to the community 🤓, we will carefully review your project idea and get back to you.

If you would like to follow our community's work you should join us on our Telegram chat group and Channel, we help and encourage each other to contribute to open source.
You can also support us financially here to help us build Cameroon one open source at a time.

billmetangmo · 2024-08-07T08:32:09Z

Cool project . Why do you call it indigenous, i think the word has unfortunately a pejorative connotation

FotieMConstant · 2024-08-07T10:11:01Z

Cool project . Why do you call it indigenous, i think the word has unfortunately a pejorative connotation

My bad, I thought it would better explain my point but yeah I see what you mean. But hey I shared the idea here for the community to talk around it and we all make it better.

billmetangmo · 2024-08-08T11:29:58Z

if i want to help, what can i do @FotieMConstant ?

pythonbrad · 2024-08-15T07:06:28Z

Before go in this side, I recommend you to check out this similar project
https://huggingface.co/Orange/SSA-HuBERT-base-60k/tree/main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indigenous Language Speech-to-Text (STT) for Cameroon/Africa #49

Indigenous Language Speech-to-Text (STT) for Cameroon/Africa #49

FotieMConstant commented Aug 4, 2024

github-actions bot commented Aug 4, 2024

billmetangmo commented Aug 7, 2024

FotieMConstant commented Aug 7, 2024

billmetangmo commented Aug 8, 2024 •

edited

Loading

pythonbrad commented Aug 15, 2024

Indigenous Language Speech-to-Text (STT) for Cameroon/Africa #49

Indigenous Language Speech-to-Text (STT) for Cameroon/Africa #49

Comments

FotieMConstant commented Aug 4, 2024

Introduction

Description

Relevant Technology

Complexity

Required time

Categories

github-actions bot commented Aug 4, 2024

It's great having you contribute to this project

billmetangmo commented Aug 7, 2024

FotieMConstant commented Aug 7, 2024

billmetangmo commented Aug 8, 2024 • edited Loading

pythonbrad commented Aug 15, 2024

billmetangmo commented Aug 8, 2024 •

edited

Loading