Statistically Tailored Keyboard Layout
Nothing about keyboards has appeared on this blog, because while I enjoy using nice keyboards (and building them!), I don’t have anything to say about them that’s worth reading. Besides, there’s plenty already written elsewhere.
Nothing about AI has been written here, because in the years since GitHub Copilot launched in 2021 and ChatGPT in 2022, the shock of these technologies has worn off, but my assessment of them has been in tumult. I wish I had a settled opinion, but lacking that, this post describes one small area where AI codegen served me well with no caveat.
Everything below is about both of those things! There’s keyboards! There’s robots! There’s robots showing me how to use keyboards!
This post is about AI, but was written entirely by a GOFH, on the keyboard in question. I do use LLMs for grammar & spell checking and finding other errors.
20 years ago, I learned about mechanical keyboards. Years passed, I bought several, started building custom keyboards, and eventually built a Corne.
It’s tiny and cut in half.
Meet mine:

With nearly all custom keyboards, the meaning of every key is fully customizable. The keycap labels you see are meaningless; what really matters is the USB HID key code emitted by the keyboard for each key, and those are fully customizable with keyboard firmwares like QMK, ZMK, and RMK.
Keyboards with non-standard layouts have a famously hard adjustment period. With no tradition or muscle memory, deciding which key should do which action is a bigger project than soldering the keyboard itself.
Think of it like this.

A blank canvas. Where would you put a? Where would you put Shift? Where would you put #?
For me, letters were laid out in traditional QWERTY, although with staggered columns instead of staggered rows. Keys like Enter, Escape, and so on were in more-or-less their standard locations. Since I knew the keyboard would be a big adjustment, I wanted to lean on muscle memory to the extent possible.
The alphabetic part of the keyboard was actually pretty easy to adjust to. I’m a programmer and programming is a profession rich in symbols, each of which needs a placement, but every key is already taken up by letters and Space/Shift/etc. Numbers too, need a home, as this keyboard has no room for a dedicated number row. Keyboard firmwares have a solution to this problem, layers. Similar to how one key can type a or A with the help of Shift, with layers any key can represent a different character with the help of a “layer key”.
I did my best to position numbers and symbols across two layers, guessing at placements by intuition, but the layout just didn’t click. It slowed me down while coding so much that I had difficulty maintaining focus and train of thought while fumbling over symbols. The once dreamlike fluidity of vim motions were reduced to a faltering mess. My pride was humbled, the context switching was too much for me, and the keyboard went back on the shelf.
A year later… I had an idea.
Guesswork had failed. So why not base the layout on something real, something measurable? Why not base it on my own code?
As the idea started to take shape, a few principles emerged:
- Symbol sequences should be based on real code that I’ve written, in the programming languages I tend to use.
- Symbol layout should maximize the likelihood of roll combos; when a sequence of symbols can be pressed on the same hand in one comfortable movement.
Implementation would look like this:
- Read every Rust file in a given directory.
- Run a lexer on the code and extract symbols.
- Create a Markov chain of symbols appearing adjacently.
- Produce a suggested layout by placing the most-used symbols in the most comfortable locations.
- Symbols that commonly appear adjacently could be placed adjacently to enable roll combos.
Now, while I liked this idea, I have a job and a family and there’s little time left to dabble with side-side-side projects like this. I’m not sure how long it would take me to write a working version of what I had in mind, but I knew I didn’t have the time, so I handed the idea to Gemini to see what it would spit out.
Here’s the full Gemini session.
I explained the idea, then ran each iteration against a repository containing about 20,000 lines of Rust, much of it written by me. Gemini chose Python, pygments, and wrote a script to do what I asked. I was impressed that the result was at all usable.
I’ll go ahead and disclose that I did not closely review the code, other than a quick scan to verify it wouldn’t make any network requests. This project was a sweet spot for AI codegen: disposable code. Once I got the layout I wanted, I’d never need this program again. That freed me from rigorously reviewing the code to ensure maintainability, comprehensibility, among other hallmarks of programming rigor. After reviewing for security concerns, high slop factor was not a deal-breaker for one-shot code like this.
If you can stomach reading AI-generated code, here it is:
corne-suggest.py
#!/usr/bin/env python3
import argparse
from pathlib import Path
from collections import defaultdict, Counter
import sys
try:
from pygments import lex
from pygments.lexers import RustLexer
from pygments.token import Punctuation, Operator
except ImportError:
print("Error: The 'Pygments' library is required. Please install it with 'pip install Pygments'", file=sys.stderr)
sys.exit(1)
BLACKLIST = {';', "'", '"', ',', '.', '/'}
def extract_symbols(directory: Path) -> list[str]:
all_symbols = []
lexer = RustLexer()
print(f"[*] Scanning for .rs files in '{directory}'...")
files_found = list(directory.rglob('*.rs'))
if not files_found:
print(f"[!] No .rs files found in {directory}", file=sys.stderr)
sys.exit(1)
for i, filepath in enumerate(files_found):
print(f"\r[*] Processing file {i+1}/{len(files_found)}: {filepath.name}", end="")
try:
code = filepath.read_text(encoding='utf-8')
tokens = lex(code, lexer)
for ttype, value in tokens:
if ttype in Punctuation or ttype in Operator:
all_symbols.extend(list(value))
except Exception as e:
print(f"\n[!] Error processing {filepath}: {e}", file=sys.stderr)
print("\n[*] Finished processing files.")
return all_symbols
def build_models(symbols: list[str]):
"""Builds both a directed markov chain and an undirected co-occurrence graph."""
chain = defaultdict(Counter)
for s1, s2 in zip(symbols, symbols[1:]):
chain[s1][s2] += 1
graph = defaultdict(int)
for s1, followers in chain.items():
for s2, count in followers.items():
edge = tuple(sorted((s1, s2)))
graph[edge] += count
return chain, graph
def print_frequency_analysis(chain, symbol_freq):
"""Prints the top 5 followers for the most common symbols."""
print("\n## Symbol Frequency Analysis (Top Followers)")
print("-" * 50)
# Sort symbols by overall frequency to show the most relevant ones first
sorted_symbols = [s for s, _ in symbol_freq.most_common() if s in chain]
for symbol in sorted_symbols:
followers = chain[symbol]
top_followers = followers.most_common(5)
followers_str = ", ".join([f"'{f}' ({c}x)" for f, c in top_followers])
print(f"'{symbol}' is most often followed by: {followers_str}")
print("-" * 50)
CORNE_KEYS = [(h, r, c) for h in ['L', 'R'] for r in range(3) for c in range(5)]
NUM_KEYS_PER_LAYER = len(CORNE_KEYS)
KEY_SCORES = {
('L', 0, 0): 1, ('L', 0, 1): 4, ('L', 0, 2): 8, ('L', 0, 3): 9, ('L', 0, 4): 7,
('L', 1, 0): 2, ('L', 1, 1): 6, ('L', 1, 2): 10, ('L', 1, 3): 10, ('L', 1, 4): 8,
('L', 2, 0): 1, ('L', 2, 1): 3, ('L', 2, 2): 5, ('L', 2, 3): 5, ('L', 2, 4): 4,
('R', 0, 0): 7, ('R', 0, 1): 9, ('R', 0, 2): 8, ('R', 0, 3): 4, ('R', 0, 4): 1,
('R', 1, 0): 8, ('R', 1, 1): 10, ('R', 1, 2): 10, ('R', 1, 3): 6, ('R', 1, 4): 2,
('R', 2, 0): 4, ('R', 2, 1): 5, ('R', 2, 2): 5, ('R', 2, 3): 3, ('R', 2, 4): 1,
}
IMPORTANT_PAIRS = [('(', ')'), ('[', ']'), ('{', '}'), ('<', '>')]
def partition_into_layers(graph, placeable_symbols):
forced_layer1_symbols = {'<', '^', '>', '(', ')'}
layer1 = {s for s in forced_layer1_symbols if s in placeable_symbols}
layer2 = set()
unassigned = [s for s in placeable_symbols if s not in layer1]
while unassigned:
best_attraction = -1
best_symbol, best_layer = None, None
for symbol in unassigned:
attr1 = sum(graph.get(tuple(sorted((symbol, s))), 0) for s in layer1)
attr2 = sum(graph.get(tuple(sorted((symbol, s))), 0) for s in layer2)
if len(layer1) < NUM_KEYS_PER_LAYER and attr1 >= best_attraction:
best_attraction, best_symbol, best_layer = attr1, symbol, layer1
if len(layer2) < NUM_KEYS_PER_LAYER and attr2 > best_attraction:
best_attraction, best_symbol, best_layer = attr2, symbol, layer2
if best_symbol:
best_layer.add(best_symbol)
unassigned.remove(best_symbol)
else: break
for symbol in unassigned:
if len(layer1) < NUM_KEYS_PER_LAYER: layer1.add(symbol)
elif len(layer2) < NUM_KEYS_PER_LAYER: layer2.add(symbol)
return list(layer1), list(layer2)
def get_roll_neighbors(key, layout):
(hand, r, c) = key; neighbors = []
if c > 0 and layout.get((hand, r, c - 1)) is not None: neighbors.append((hand, r, c - 1))
if c < 4 and layout.get((hand, r, c + 1)) is not None: neighbors.append((hand, r, c + 1))
return neighbors
def generate_corne_layout(graph, symbols_to_place, symbol_freq, apply_hardcoded_rules=False):
layout = {}; unplaced = symbols_to_place[:]; empty_keys = CORNE_KEYS[:]
if apply_hardcoded_rules:
vim_arrow_placements = {('R', 1, 0): '<', ('R', 1, 2): '^', ('R', 1, 3): '>'}
for key, symbol in vim_arrow_placements.items():
if symbol in unplaced and key in empty_keys:
layout[key] = symbol; unplaced.remove(symbol); empty_keys.remove(key)
key_pair_locations = []
for h in ['L', 'R']:
for r in range(3):
for c in range(4):
k1, k2 = (h, r, c), (h, r, c + 1)
score = KEY_SCORES[k1] + KEY_SCORES[k2]
key_pair_locations.append(((k1, k2), score))
key_pair_locations.sort(key=lambda x: x[1], reverse=True)
scored_symbol_pairs = []
for s1, s2 in IMPORTANT_PAIRS:
if s1 in unplaced and s2 in unplaced:
score = graph.get(tuple(sorted((s1, s2))), 0)
scored_symbol_pairs.append(((s1, s2), score))
scored_symbol_pairs.sort(key=lambda x: x[1], reverse=True)
for (s1, s2), _ in scored_symbol_pairs:
is_paren_pair = apply_hardcoded_rules and (s1, s2) == ('(', ')')
for (k1, k2), _ in key_pair_locations:
if is_paren_pair and k1[0] == 'L': continue
if k1 in empty_keys and k2 in empty_keys:
layout[k1], layout[k2] = s1, s2
unplaced.remove(s1); unplaced.remove(s2)
empty_keys.remove(k1); empty_keys.remove(k2)
break
while unplaced and empty_keys:
best_score = -1
best_symbol, best_key = None, None
for symbol in unplaced:
for key in empty_keys:
roll_score = sum(graph.get(tuple(sorted((symbol, layout[nk]))), 0) for nk in get_roll_neighbors(key, layout))
key_score = KEY_SCORES[key] * symbol_freq.get(symbol, 0) * 0.1
total_score = roll_score + key_score
if total_score > best_score:
best_score, best_symbol, best_key = total_score, symbol, key
if best_key:
layout[best_key] = best_symbol
unplaced.remove(best_symbol)
empty_keys.remove(best_key)
else: break
return layout
def print_corne_layout(layout, title):
print(f"\n## {title}"); print("-" * 50)
print(" Left Hand Right Hand")
print("+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+")
for r in range(3):
row_str_l, row_str_r = "| ", "| "
for c in range(5):
sym_l = layout.get(('L', r, c), ' '); sym_r = layout.get(('R', r, c), ' ')
row_str_l += f" {sym_l:^2} | "; row_str_r += f" {sym_r:^2} | "
print(row_str_l + " " + row_str_r)
print("+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+")
def main():
parser = argparse.ArgumentParser(description="Analyze Rust source code to recommend a two-layer Corne keyboard symbol layout.")
parser.add_argument("directory", type=str, help="The directory containing Rust source files to analyze.")
args = parser.parse_args()
src_path = Path(args.directory)
if not src_path.is_dir(): print(f"Error: Directory not found at '{src_path}'", file=sys.stderr); sys.exit(1)
all_symbols = extract_symbols(src_path)
if not all_symbols: print("[!] No symbols were extracted. Exiting.", file=sys.stderr); sys.exit(1)
chain, graph = build_models(all_symbols)
placeable_symbols = [s for s in set(all_symbols) if s not in BLACKLIST]
symbol_freq = Counter()
for (s1, s2), count in graph.items():
symbol_freq[s1] += count; symbol_freq[s2] += count
print_frequency_analysis(chain, symbol_freq)
print("\n[*] Partitioning into two layers...")
layer1_symbols, layer2_symbols = partition_into_layers(graph, placeable_symbols)
print("\n[*] Generating layouts based on symbol communities...")
layout1 = generate_corne_layout(graph, layer1_symbols, symbol_freq, apply_hardcoded_rules=True)
layout2 = generate_corne_layout(graph, layer2_symbols, symbol_freq, apply_hardcoded_rules=False)
print_corne_layout(layout1, "Layer 1: Primary Symbols (with HJKL arrows)")
print_corne_layout(layout2, "Layer 2: Secondary Symbols")
if __name__ == "__main__":
main()
Here’s the symbol frequency it found in my codebase.
## Symbol Frequency Analysis (Top Followers)
--------------------------------------------------
'>' is most often followed by: ':' (53780x), '>' (51523x), ',' (44392x), ';' (34996x), '{' (3618x)
'<' is most often followed by: '<' (95134x), '>' (53829x), ',' (35077x), '(' (312x), ''' (257x)
',' is most often followed by: '>' (78857x), '(' (45980x), ')' (27871x), '<' (12738x), ':' (9849x)
':' is most often followed by: ':' (86013x), '(' (32741x), ';' (14598x), '<' (13736x), ',' (7103x)
')' is most often followed by: ',' (42766x), ')' (35555x), ';' (19748x), '{' (14900x), '=' (3195x)
'(' is most often followed by: ')' (54563x), ',' (47161x), '<' (13975x), ':' (2680x), '&' (1299x)
';' is most often followed by: '=' (41617x), '(' (17385x), '}' (15370x), ':' (3757x), '.' (685x)
'=' is most often followed by: '<' (48891x), ';' (6373x), '>' (3402x), ':' (3044x), '.' (1031x)
'{' is most often followed by: '=' (14713x), ':' (3920x), '(' (1368x), '.' (836x), ',' (326x)
'}' is most often followed by: '(' (14890x), ';' (2901x), '}' (1170x), '{' (896x), ')' (745x)
'.' is most often followed by: '(' (7068x), '.' (1571x), '}' (562x), ':' (316x), ')' (243x)
'&' is most often followed by: ')' (763x), ',' (350x), '.' (323x), ':' (131x), '&' (80x)
'|' is most often followed by: '|' (519x), ':' (366x), '.' (187x), '{' (178x), '(' (96x)
'-' is most often followed by: '>' (1291x), '.' (9x), '/' (8x), ')' (6x), ';' (6x)
'*' is most often followed by: ':' (499x), ',' (346x), ')' (148x), ';' (97x), '*' (87x)
'!' is most often followed by: '(' (736x), '.' (110x), '=' (34x), '[' (29x), '{' (23x)
'[' is most often followed by: ']' (138x), '=' (104x), '(' (72x), ';' (54x), '&' (50x)
']' is most often followed by: ';' (97x), ',' (89x), ')' (74x), '#' (57x), '.' (50x)
''' is most often followed by: '>' (272x), '{' (62x), ',' (52x), ')' (16x), ';' (9x)
'?' is most often followed by: ';' (242x), '.' (33x), ')' (13x), '}' (5x), ',' (2x)
'@' is most often followed by: ':' (263x)
'#' is most often followed by: '[' (153x), '.' (15x), '#' (12x), ':' (5x), '/' (3x)
'+' is most often followed by: '=' (71x), ')' (12x), '.' (11x), ';' (8x), '<' (6x)
'/' is most often followed by: '/' (16x), '.' (12x), '-' (8x), ';' (6x), ')' (3x)
'%' is most often followed by: ';' (2x), '=' (2x), '*' (1x)
'^' is most often followed by: '=' (2x)
--------------------------------------------------
Gemini’s script was completely helpless at producing a perfect ASCII art layout diagram, despite repeated requests to fix the alignment.
Left Hand Right Hand
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+
| | # | [ | ] | | | | & | { | } | - | |
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+
| | * | ( | ) | = | | < | : | ^ | > | |
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+
| | | ! | ? | % | | | @ | + | | |
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+
Corrected by me, a GOFH.
Left Hand Right Hand
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+
| | # | [ | ] | | | | & | { | } | - | |
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+
| | * | { | } | = | | < | : | ^ | > | |
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+
| | | ! | ? | % | | | @ | + | | |
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+
Each suggested layout provoked a nitpick, so I began feeding Gemini additional constraints to refine the layout.
- Arrow keys must be on
hjkl(the vim motion keys) on layer 2. - Paired symbols (
(),[],{},<>) must be adjacent. - The most common symbols are placed on the stronger fingers on any row; least common symbols are given to the pinky on the home-row.
If you look too closely, you’ll notice several symbols are missing since they don’t actually appear in my code, like \. Numbers too, were not considered, since I decided to put them on the top row in layer 2.
But still, this suggested layout was far better than what I’d come up with myself. Rust programmers will recognize the roll possibilities.
#[]for attributes like#[Derive()]::<>for turbo-fish
The ! placement is tantalizingly close to (). I swapped it with * to enable the !() roll, very useful for writing Rust macros, such as panic!().
In the end, I made a flurry of manual changes to the layout, but kept the best bits like the rolls. I’m still faster on a standard keyboard, but I’m able to drive the corne through a full workday with a minimum of symbol fumbling.
Left Hand Right Hand
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+-------+
| | # | [ | ] | % | | ^ | { | } | * | \ | |
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+-------+
| | ! | { | } | = | | - | < | > | | | | ` | <-- mine is a 6-column corne
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+-------+
| | | & | : | + | | _ | $ | | | | |
+-------+-------+-------+-------+-------+ +-------+-------+-------+-------+-------+-------+
I’m happy with the outcome and won’t be working on this program any further (not that I worked on it in the first place). Still, I can’t help but imagine what this tool could become. Here are my ideas for future work, which I will not be doing.
- Turn this into a generic tool that could be used by anyone.
- Scan any number of directories, not just one.
- Support any programming language, or at least those supported by pygments.
- Generate suggested layouts for any non-standard keyboard, using whatever physical layout definition is used by VIA or VIAL.
- Add a way to declare personal constraints, like wanting arrow keys on
hjklor()on the right hand. - Add a way to consider non-code symbol use, like vim keymaps, perhaps by running a keylogger during a vim session and feeding that in.
As cool as all that would be, I don’t have the time even with robot help, but perhaps someone out there will read this and take up the idea.
Ps. it’s no surprise that after inviting AI into my keyboard, it seems to have become haunted. I captured the apparition in this photograph.
