Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

English and Spanish optimized? #40

Open
masters3d opened this issue Sep 26, 2022 · 6 comments
Open

English and Spanish optimized? #40

masters3d opened this issue Sep 26, 2022 · 6 comments

Comments

@masters3d
Copy link

I’m wondering for folks who write English and Spanish if there is common version that they can use.

@Lobo-Feroz
Copy link

Lobo-Feroz commented Sep 26, 2022

No, at least not that I know of. But this shouldn't be too hard to optimize.

English character frequency is well documented. A big Spanish corpus was cleaned up and analyzed by Ian Doug with my help, see details here: #21

So by weighing the frequencies of characters, bigrams and trigrams you could use Arno's code to optimize for example a keyboard for 50/50 english/spanish, or some other ratio.

@masters3d
Copy link
Author

Is there a Spanish letter frequency similar to the English "Letter frequencies (Norvig, 2012)"?

@binarybottle
Copy link
Owner

Is there a Spanish letter frequency similar to the English "Letter frequencies (Norvig, 2012)"?

Derived from https://zenodo.org/record/5501931

@edugit
Copy link

edugit commented Apr 19, 2023

I’m wondering for folks who write English and Spanish if there is common version that they can use.

Hi, masters3d,
Could you succeed?. We are many that have to write both in english and Spanish . Your idea would be the perfect balance

@leogama
Copy link

leogama commented Apr 28, 2023

Edit: moved this to a dedicated issue


An alternative solution: excerpt from my original comment

Hello, there! I'm a user of the (programmer's) Dvorak layout for almost a decade now, and it was a huge improvement over good ol' QWERTY to learn it. However, while it is really widespread and readily available on most current systems, its performance for the English language is sub-optimal. Also, its variations for languages with similar alphabets —like my dear Portuguese— are still "super-terrible" (a bit less terrible than QWERTY due to the vowels at the left home row).

A "Latin" or "Romance-Germanic" base keyboard layout

For whoever is interested, I propose the development of a base layout using the Latin alphabet that is optimized for all of these 5 languages [English, Spanish, French, Portuguese and German]. It wouldn't be a simple weighted optimization though. What I would expect to achieve with this design is:

  1. To have a common base for creating a new layout for each of the 5 languages;
  2. It must be really good at English, at least as good as other current designs by the same metrics;
  3. It should be reasonably good for the other 4 languages, but must not be terrible for any of them;
  4. The differences between the layouts should be minimal, so that one can constantly switch between layouts without hassle, create a custom hybrid bilingual layout or don't even need to switch at all.
Full original comment

Hello, there! I'm a user of the (programmer's) Dvorak layout for almost a decade now, and it was a huge improvement over good ol' QWERTY to learn it. However, while it is really widespread and readily available on most current systems, its performance for the English language is sub-optimal. Also, its variations for languages with similar alphabets —like my dear Portuguese— are still "super-terrible" (a bit less terrible than QWERTY due to the vowels at the left home row).

The elephant in the room

I took a look at some of these newer designs, including yours. Congratulations, by the way! Amazing work. But the OP touched a very important point that is still unaddressed by all of these: we live in an international, interconnected world now. Until the early 2000's, it wasn't a problem to have totally different keyboard layouts for every language. We even used different, incompatible text encodings! But now the most used encoding in both new devices and the Internet is Unicode. I believe the same transition should happen to keyboard layouts.

But is there a need for it? Well, most professionals that type a lot (journalists, academics, programmers, etc.) will need to either create content in more than one language, usually in their native one and in English, or at least communicate with foreigners through text often. It applies even to countries that have English as their primary language, like the US, where there's more and more people speaking Spanish as a primary or secondary language each year (> 50 million today).

Is an "international" keyboard layout possible?

I know that many languages use completely different alphabets and, even when they use similar ones (like variations of the Latin or Cyrillic scripts), they have extra characters and wildly varying letter/n-gram frequencies. Therefore, there can't be a truly international base layout for keyboards. But can we do better?

Starting from English, the de facto international language, a non-monolingual layout can't be much distant from ASCII. Looking at the languages with most speakers in the world that use a Latin script alphabet, we have in the top positions (Wikipedia/Ethnologue 2022):

Position Language Family Branch 1st language 2nd language Total speakers
1 English Indo-European Germanic 372.9 million 1.080 billion 1.452 billion
4 Spanish Indo-European Romance 474.7 million 73.6 million 548.3 million
5 French Indo-European Romance 79.9 million 194.2 million 274.1 million
9 Portuguese Indo-European Romance 232.4 million 25.2 million 257.7 million
12 German Indo-European Germanic 75.6 million 59.1 million 134.6 million

I think it would be feasible to analyze these 5 languages, from two branches of the same language family —you already did it for two, and find a design that isn't awesome for one of them but sucks for all the others...

A "Latin" or "Romance-Germanic" base keyboard layout

For whoever is interested, I propose the development of a base layout using the Latin alphabet that is optimized for all of these 5 languages. It wouldn't be a simple weighted optimization though. What I would expect to achieve with this design is:

  1. To have a common base for creating a new layout for each of the 5 languages;
  2. It must be really good at English, at least as good as other current designs by the same metrics;
  3. It should be reasonably good for the other 4 languages, but must not be terrible for any of them;
  4. The differences between the layouts should be minimal, so that one can constantly switch between layouts without hassle, create a custom hybrid bilingual layout or don't even need to switch at all.

Steps necessary to achieve these goals:

  1. Obtain a text corpus and n-gram frequency for French, German and Portuguese;
  2. Find the similarities between the 5 languages using some kind of distance measure(s);
  3. Define optimization weights for them considering these similarities, number of speakers, etc.;
  4. Develop a method for searching the layout space by optimizing primarily for English and secondarily for the 4 other languages (using the weights), with penalization if the layout starts becoming too bad for any single language middle search;
  5. Choose a winner base layout and then search for full layouts for each individual language, positioning specific keys and maybe repositioning some punctuation keys in the process.
  6. Profit. 😎

Advantages

  • Beyond the obvious advantages for multilingual typists, this base layout and its derivatives would benefit from having a unified, larger user base —likely very small in the beginning, but it's plausible to reach a critical size eventually.
  • Its software implementations could have common codebases, following the pattern of a base layout file (either the English layout or just the base itself) and modifications of it. Would be easier to maintain and port to different systems.
  • Being multilingual could be an eye-catching feature for anyone looking for a better layout to learn beyond QWERTY/Dvorak.
  • The new methods developed could be useful for custom/personal layout creation and also for other language subfamilies, like those that use the Cyrillic script.

I'm seriously considering to learn once more a new keyboard layout, but it would have to be a killer layout. It would have to be one to rule them all.

I am willing to dedicate some time to this idea if there are others interested. If not, maybe I'll end up trying to create my own Portuguese or Portuguese-English Engram layout.

Greetings from Brazil! 🇧🇷

@binarybottle
Copy link
Owner

I think that this issue could be well addressed by an English-Spanish-French key layout -- see: binarybottle/engram-v2#58 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants