Original discussion recommending a Spanish version of the Engram layout #3
Replies: 3 comments 34 replies
-
http://patorjk.com/keyboard-layout-analyzer/#/load/31nf5cXb Added some diacritics. |
Beta Was this translation helpful? Give feedback.
-
I'm using the frequencies from here: http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/spanish-letter-frequencies/ |
Beta Was this translation helpful? Give feedback.
-
Extracts from corpus: I assume hbitos, pgina and lmites are typos ... hope there's not too many nasties like that, will mess up the analysis. |
Beta Was this translation helpful? Give feedback.
-
Copied from https://github.com/binarybottle/engram/issues/25
NickG13 commented 19 days ago •
Spanish is one of the most spoken languages in the world. Yet, we really lack a specialized kw layout. We dont have any. Even the DVORAK variant are clueless of our letter frequency.
Would you be capable of doing a spanish engram variant, following all the same method?
You would make millions of typers very happy
Thanks
@binarybottle
Owner
binarybottle commented 19 days ago •
I sure could! Could you identify the following resources?:
Large, accurate, and representative Spanish letter and bigram frequency counts, like Peter Norvig's for English: http://norvig.com/mayzner.html
Spanish punctuation frequency counts, like this example for English: http://www.viviancook.uk/Punctuation/PunctFigs.htm
#1 could potentially be generated from a large and representative Spanish text corpus comparable to what I assembled for English text: https://github.com/binarybottle/text_data, either with my python code, or with available tools like https://onlinecryptotools.com/analyze-letter-frequency
@binarybottle
Owner
binarybottle commented 19 days ago
How does one deal with diacritical marks for the seven additional letters?: Á, É, Í, Ñ, Ó, Ü and
(https://www.sttmedia.com/characterfrequency-spanish). Should these be accessed using key binding with some other key besides the Shift key, such as the Alt-Right key?
@NickG13
Author
NickG13 commented 19 days ago •
You have an awesome attitude, thank you for helping out spanish typers!
I was trying to create one on my own, but I think your system is a lot better and all I want is to help and hopefully reach a end point where we have a great spanish layout.
I will reply to you in order:
That said, here is some data you need, for several sources:
Letter frequency (I've seen it's actual legit data, I been searching for literally days for the best source and this is the most complete): https://www.sttmedia.com/characterfrequency-spanish I would add up all the accents % to calculate the real frequency of the accent ( ` )
Bigrams, trigrams... here is a range of different options
https://www.ngrams.info/spanish.asp
Some reflections and top bigrams and other interesting stuff
Maybe we could use some actual online-spanish texts in http://patorjk.com/keyboard-layout-analyzer if the other samples are not enough
On traditional keyboard layout, we accent characters with a certain key, as you may know. For example, to write "é" we first put " ´ " and then "e". I've been talking with a colleague and he seems to think like you: diacritical marks could get a more rapid access with other keys, but I don't consider it to bee urgent, personally. We already use a lot of key combinations to use apps and games, so maybe that could became counterproductive while writing in those apps with that new layout. You could try making two versions (one with usual accent system and another with key bindings) but maybe it's complicating things too much.
@NickG13
Author
NickG13 commented 19 days ago •
This is some actual ranking I've found in my research (sorry, can't recall where I found it right now)
E
A
O
S
R
N
I
D
L
C
T
U
M
P
B
.
,
G
V
Y
Q
H
F
Z
J
Ñ
X
K
W
So, yes, comma and point are very used
@NickG13
Author
NickG13 commented 19 days ago
I've been looking at http://www.viviancook.uk/Punctuation/PunctFigs.htm and want to give some extra info. As per punctuation points, we could theorically consider the following:
. full stop = Phrases and words tend to be larger in spanish than english, so full stops are not as frequent.
, comma = The same happens with comma
; semi-colon = semi-colon are almost never used in spanish nowadays, only maybe for lists.
: colon = colons are more used than semi-colons, but also less used than in English IMHO
! exclamation = important! Exclamations marks always end the phrase, but these always should begin with "¡" to be correct gramatically (not everyone uses them, but they are important)
? question = the same happens with question pharses, that will begin with ¿
as for ! or ?, I think they have almost the same frequency than in english
’ apostrophe/ single quotation = there are some variations of spanish that use apostrophe a lot, like Catalan, but spanish doesn't
“ double quotation = the same use as on english, maybe less
hyphen = we don't use it as much as in english
@NickG13
Author
NickG13 commented 19 days ago
As per text lenght - words lenght, I don't have any study but there is some reflections on experienced people (and I can recall as an experienced writer)
https://forum.wordreference.com/threads/word-length-in-english-vs-spanish.254504/
https://www.reddit.com/r/Spanish/comments/3hb8zx/does_spanish_language_on_average_have_longer/
Spanish seems to occupy almost 25% more space
@Lobo-Feroz
Lobo-Feroz commented 19 days ago •
Hello there!
I'm another spanish speaker interested in an optimised layout for spanish. Been researching a lot these last few years but didn't find any convincing alternative.
@NickG13 referred me to this thread. Engram was one of the algorithmically optimised layouts that I had investigated. I specially like that you kept no restrictions. None of that nonsense of keeping zxcv for windows shortcuts, or not moving around the punctuation.
I also want to thank you for extending your project to help spanish writers. There are really not many alternatives for a spanish keyboard, only adaptations from old human-designed layouts like dvorak. This might be a very helpful addition.
I will try to help where I can.
@Lobo-Feroz
Lobo-Feroz commented 19 days ago •
@NickG13 in regards to this:
! exclamation = important! Exclamations marks always end the phrase, but these always should begin with "¡" to be correct gramatically (not everyone uses them, but they are important)
? question = the same happens with question pharses, that will begin with ¿
I think that the opening exclamation mark '¡' shoud be the shift character to '!'. (and also, '¿' should be the shift for '?')
Why? Because the major use for shift is to change to majuscules. Where do you do that most of the time? At the beginning of a sentence. Just where you will use '¡' and '¿'
There is an example already on the spanish keyboard. The <> key. However it's upside down. Unshifted it's < (beginning) and shifted is > (end). Yes, the qwerty spanish layout is weird.
@Lobo-Feroz
Lobo-Feroz commented 19 days ago •
Ok, my take on this one:
How does one deal with diacritical marks for the seven additional letters?: Á, É, Í, Ñ, Ó, Ü and
(https://www.sttmedia.com/characterfrequency-spanish). Should these be accessed using key binding with some other key besides the Shift key, such as the Alt-Right key?
The name in spanish for all of these diacritical marks is "tilde" (https://en.wikipedia.org/wiki/Tilde). Currently the writing for all of those is differentiated.
For simple tildes ÁÉÍÓÚ, as @NickG13 said, it's the key '´' followed by the vowel key.
The two dots is called "Diaeresis" (https://en.wikipedia.org/wiki/Diaeresis_(diacritic)) and we have a key '¨' for that. Actually, it's the shift for the simple tilde key '´' (which, imho, it's not that bad). You can add these two dots to any vowel ÄËÏÖÜ äëïöü but it's only correct in the u. The rest are not used in proper spanish (but we can type them with our current keyboards).
The tilde for the Ñ is called "virgulilla" (weirdest name) and it's consireded a tilde as well. In qwerty-es we have a full key in the home row for the ñ, which I think it's very inefficient, as ñ has a frequency of 0.17% according to one of @NickG13 's links above. I think it could just be a modification on the n key: '´' + n = ñ
Now, here, I don't know what would be the best way to implement this tilde key. On qwerty-es, it's sequential, press tilde then press the vowel. An alternative could be simultaneous press with the opposite hand, but AltGr is already used for the vowel "e": AltGr + e = € (but just this one, 'aioun' don't have AltGr uses). Also, having a single key to switch to "tilde layer" would mean we'd need another method for the diaeresis '¨'.
And that's about all I have thought about tildes so far.
@NickG13
Author
NickG13 commented 17 days ago •
Just to update you guys,
I've been doing some test with Engram in Keyboard layout analyzer (the updated version, not the one that is uploaded there) and Engram has a HUGE advantage over all the other popular layouts, even Dvorak spanish variant (Jorge Sanz) and such.
The text used is mine, from three articles on https://nicolasgutierrez.com/ , which I've seen (afterwards) it reflects perfectly the spanish letter frequency described on the links we shared.
I'm excited about this being done and just want to give some extra info and support.
Here are some screenshots
image
image
image
image
Of course , these are some random numbers from an automated website. I think it doesn't count accents (tildes) (´) or diaéresis (¨) but it's a fact that Engram seems the BEST option nowadays for spanish typing. Maybe a little optimization will make it UNBEATABLE.
image
@NickG13
Author
NickG13 commented 17 days ago •
Just for the laughts, I've been trying to follow your system but it seems there's something I'm missing (or maybe the analysis system is not correct). I've found a way of having some layouts that feel more ergonomic but they seem to get a lot less points than your Engram layout. It's pretty logical since it seems your layout is faster but the program http://patorjk.com/keyboard-layout-analyzer/ doesnt seem to calculate ergonomics. Here they are:
v1
image
heatmap
image
v2
image
heatmap
image
v3
image
heatmap
image
Engram v2
heatmap
image
image
@NickG13
Author
NickG13 commented 16 days ago
Just saw I can share the URL
I've tested a little bit more, so the layouts are a bit different in some cases
http://patorjk.com/keyboard-layout-analyzer/#/load/kVFJVVLs
@NickG13
Author
NickG13 commented 16 days ago •
Comparing layouts a little more to take more ideas, I've revised Alice in the wonderland heatmap for Engram
image
This remembered me to your first criteria, which makes most common letters in ^ form, not / like I was using.
image
Then, I've rearranged a few letters and got this heatmap with spanish text:
image
I also have taken account of your other ergonomic criteria, the most difficult were:
7. Avoid stretching shorter fingers up and longer fingers down.
8. Avoid using the same finger.
Also taken into account most common bigrams, trying to avoid point 7 and 8, and making the movement being inwards, not outwards.
DE 2088635
ES 1882962
EN 1850294
EL 1638128
LA 1464005
OS 1460757
ON 1307790
AS 1268034
ER 1239281
RA 1197125
AD 1167809
I really like this layout, let's see what you think
@NickG13
Author
NickG13 commented 13 days ago •
@binarybottle Spanish corpus found https://wortschatz.uni-leipzig.de/en/download/Spanish huge database
Do you have any software that gets bigrams and monograms frequency?
We want to help you progress faster, if that's possible, count with us
@binarybottle
Owner
binarybottle commented 8 days ago
@NickG13 -- Thank you very much for sharing your experiments, the keyboard layout analyzer results, and the link to the Spanish corpus! I have downloaded all of the largest news corpora from each year in the top section, and the latest Wikipedia corpus.
@Lobo-Feroz -- Thank you for following up on the diacritical marks! I think I have an idea for how to accommodate these in and intuitive manner if I use an extra key like this Shift key.
I really appreciate your enthusiastic support. I am currently preoccupied with an impending deadline for a conference, but hope to begin tackling this next weekend…
@NickG13
Author
NickG13 commented 8 days ago
@NickG13 -- Thank you very much for sharing your experiments, the keyboard layout analyzer results, and the link to the Spanish corpus! I have downloaded all of the largest news corpora from each year in the top section, and the latest Wikipedia corpus.
@Lobo-Feroz -- Thank you for following up on the diacritical marks! I think I have an idea for how to accommodate these in and intuitive manner if I use an extra key like this Shift key.
I really appreciate your enthusiastic support. I am currently preoccupied with an impending deadline for a conference, but hope to begin tackling this next weekend…
Sure, deadlines are first! Thank you for the update!
@Lobo-Feroz
Lobo-Feroz commented 8 days ago
@binarybottle - thanks for your answer!
once you beging working on this layout (no hurries here), please don't hesitate to send us some grunt work. I'll be happy to help where I can.
@binarybottle
Owner
binarybottle commented 21 hours ago
@NickG13, @Lobo-Feroz, @sunaku -- I have created a new repo, setting up a Spanish version of the Engram layout, and am now crunching initial layouts:
https://github.com/binarybottle/engram-es
Let's continue our discussion there!
@binarybottle binarybottle closed this 21 hours ago
Beta Was this translation helpful? Give feedback.
All reactions