-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indigenous Language Speech-to-Text (STT) for Cameroon/Africa #49
Comments
It's great having you contribute to this projectWelcome to the community 🤓, we will carefully review your project idea and get back to you.If you would like to follow our community's work you should join us on our Telegram chat group and Channel, we help and encourage each other to contribute to open source. |
Cool project . Why do you call it |
My bad, I thought it would better explain my point but yeah I see what you mean. But hey I shared the idea here for the community to talk around it and we all make it better. |
if i want to help, what can i do @FotieMConstant ? |
Before go in this side, I recommend you to check out this similar project |
Introduction
Building a speech-to-text model that supports indigenous languages in Cameroon/Africa. This project aims to fine-tune existing models like OpenAI’s Whisper(The code is open-source) to understand and transcribe local languages that are not currently represented in mainstream STT technologies.
Description
In Cameroon and across Africa, there’s a lack of representation in speech-to-text technologies for indigenous languages. Current STT systems predominantly support international languages like English and French. This lack of accessibility kinda limits the development of apps and services that could cater to local populations, especially in rural areas where indigenous languages are more prevalent.
Relevant Technology
Most existing STT systems, such as Google Speech-to-Text and Whisper, support only a limited number of international languages. While these systems are effective, they do not address the linguistic diversity in regions like Cameroon. As seen on the image below;
ref link: https://github.com/openai/whisper?tab=readme-ov-file
Proposed Solution:
I suggest to fine-tune Whisper with datasets of indigenous languages from Cameroon. This will involve collecting and curating a large dataset of audio samples in these languages, potentially requiring collaboration with local communities and linguistic experts to ensure accuracy and comprehensiveness. The project could also include research into better documenting these languages, which are often not well-represented in written form.
Disclaimer:
This idea draws inspiration from initiatives focused on improving technological accessibility in underrepresented regions and languages.
Complexity
Required time
Categories
The text was updated successfully, but these errors were encountered: