What you type is what you get? Not with modern touch keyboards. This article visually explains four features at the heart of your smarpthone’s keyboard – including personalisation, auto-correction, and word prediction. Based on material I created for my "Intelligent User Interfaces" lecture, we examine the inner workings of these features in our daily typing, and conclude with takeaways for inspiring, evaluating, and critically reflecting on data-driven and "intelligent" user interfaces.
Feature 1: Adapting key areas
Modern smartphone keyboards personalise the screen areas assigned to each key. This is usually not shown in the graphical user interface to avoid confusion and undesirable co-adaptation – but we can reveal it here.
Touch data
The keyboard collects touch locations (2D points: x,y) in the background while the user is typing. The figure shows such touch data for two people, collected over several days.

Touch keyboard model
Using these touches we can create personalised keys for each user: These key models capture each user’s finger placement behaviour on the keyboard. For example, for Anna, the letter "x" might be best located at a slightly different location than for Bob.

As shown, we model a personalised keyboard with a normal distribution p(t|k) per key k, fitted on the touch locations for that key. Formally:

For each key k, the model thus stores the mean location (x, y) of the touches observed for that key, and the covariance matrix (which can be intuitively thought of as describing the key’s size and shape). Visually, the circles show two and three standard deviations of these distributions.
We can also look at this in 3D: Each key is a "hill", and height corresponds to the touch likelihood p(t|k). The figure shows an example for the "c" key.

Which key did the user press? Decoding a touch
This keyboard model can be used to decode touches. "Decoding" simply means working out which key the user intended to press.
The result is shown in these plots, where colour indicated the most likely key for each touch location (i.e. pixel). We have arrived at a personalised keyboard, with pixel-to-key assignments based on the individual touch behaviour of each user.

Formally, to get to these plots, we evaluate p(k|t), that is, the likelihood of keys k given the touch location t. The keyboard model yielded p(t|k), which we flip to p(k|t) with Bayes’ rule. This yields for the most likely key k’:

Notes:
p(k) is the prior distribution over keys k and describes the likelihood of a key in general, without considering the touch location. A simple prior is uniform (i.e. all keys equally likely). A better prior uses language context, which we examine later.
p(t|k) is the likelihood of the touch t assuming the key was k. This is the Gaussian key model described above.
Uncertainty
Since this is a probabilistic model, we can look at the keyboard’s uncertainty. The entropy of the posterior p(k|t) **** is such a measure of uncertainty. Intuitively, it is high if many keys __ k are equally likely for a touch location t. Inversely, it is low if one specific key is considered the only likely one.

The plots above show this entropy: Uncertainty is highest at the "borders" between different pixel-to-key assignments. This matches our intuition that a touch near the edge of a key is "sloppy", or in other words, less clear to interpret than a touch that hits a key dead centre.
Feature 2: Integrating language context
This personalised keyboard can be further improved by integrating language context. For illustration, we use a simple bigram language model here. A bigram is a pair of letters (e.g. "th"). It is easy to build: Simply count letter pairs in a large amount of text and compute their relative frequencies.
Formally, this bigram language model is the likelihood of the next key, given the previous one. Using this as a prior in the decoding equation we have:

We can examine the influence of the language model by comparing the example case where the previous letter was "q" to the case where it was "t". Considering common pairings for these letters in English, we expect "u" to gain screen space after "q", and "h" to gain screen space after "t". Indeed, the figure reveals this change.

To further examine this result we next compare the difference in the model’s uncertainty between two language contexts. This shows the gain (or loss) in certainty at each pixel due to the changing context.
For our example of switching from "q" to "t" as the previous letter, we expect the pixels around "u" to lose certainty and the pixels around "h" to gain certainty. This is because:
- Many letters are likely to follow after "t"— in contrast to "q", which is not often followed by anything other than "u".
- Inversely, "th" is very common, increasing the model’s certainty of "h" after "t".
Indeed, the figure reveals this effect.

Feature 3: Decoding whole words
So far, we have decoded a single touch at a time, which amounts to key area personalisation. Typing consists of many touches in sequence. Hence, we can also examine how to decode sequences of touches into words, including language context. This can be used to implement "auto correction".
Generalising to sequences
We now seek to find the most likely letter sequence s, given an observed sequence o of touch locations. That is:

The equation is the same as for single touches and the principle remains the same. To decode words we only need to generalise its components to sequences of keys/touches. Assuming n touches for a word of length n for simplicity _(_i.e. __ no missing or spurious touches), this can be formalised as:

Notes:
p(s) is a prior over letter sequences. For illustration, we reuse our bigram model. That is, the joint probability of a sequence of n letters (e.g. of a word) is the product of its key-to-key transitions (i.e. bigrams).
p(o|s) is the likelihood of the touch observations o assuming the letter sequence was s. Here we reuse our Gaussian key model that yields p(t|k) per touch. We aggregate this over the sequence via multiplication as the joint probability.
Finding the most likely word with the token passing algorithm
Trying out all possible letter sequences s of length n to find the most likely one is prohibitively expensive (exponential). We need to make compromises. Concretely, here we use a token passing algorithm with beam pruning.
In short, this algorithm keeps track of a set of partial sequences as "tokens" which can be split (i.e. a "fork" in the path: exploring multiple continuations of the current sequence) or discarded (i.e. a "dead end": the sequence represented by the token has become too unlikely).
A concrete example is shown below: A user intended to type "hello" (left plot). The algorithm explored two paths in the hypothesis space (right plot): "hello" and "helli". As indicated by the line thickness, "hello" is more likely; decoding found the correct word as the most likely one (red path).

Beam search width
Intuitively, the beam width defines how much exploration we want to allow. More exploration might increase the chance of finding the correct word, yet also increases computing time.
The influence of the beam width becomes visible by comparing explored hypotheses for different beam widths, as shown below: On the right, the algorithm additionally explores "jello", yet it considers this as less likely than the correct "hello" (see the thinner path starting with "je" , compared to "he").

Insertion and deletion
Extensions of the token passing algorithm **** introduce insertion (producing a letter without processing the next touch) and deletion (processing the next touch without producing a letter). These address two common user errors: spurious (mis)touches and accidentally skipped keys.
While we do not go into technical details here, the next two figures illustrate decoding with insertion and deletion with concrete examples.
For insertion, we look at the example "hllo" (accidentally skipped "e"). The insertion decoder in the figure below correctly finds "hello" as the most likely word: Intuitively, inserting "e" yields a higher overall likelihood than simply following the touch evidence because "he" is more likely in English than "hl".

Note that the decoder in this example also explored "thllo" by inserting "t" at the beginning, because "th" is highly likely in English.
Complementary, for deletion, we look at the example of "hqello" (accidental "q"). The decoder correctly finds "hello" as the most likely hypothesis: While "hq" is likely given the touches, "qe" is less likely than skipping "q" via "ε" (empty) and taking "he" instead.

Feature 4: Suggesting next words
So far, all features assumed that the user has already touched keys. In contrast, if we have no touches (yet) we have to rely solely on the language context. This gives rise to the feature of "word suggestions".

For example, we could look at the last n-1 words to predict the next one:

This is a simple n-gram model on word level and can be trained by counting word sequences in a large text corpus. More recently, deep learning language models have also been explored for this task. They offer some advantages, such as being able to include longer contexts.
Discussion & Takeaways
After this deep-dive into your keyboard, what does this all mean? Here are three insights that might inform the design of future "intelligent" user interfaces, regarding 1) ideation, 2) evaluation, and 3) a critical reflection beyond interaction.
A lot is going on behind the scenes of your data-driven keyboard
Go back to a keyboard app e.g. from five to ten years ago. I’ve recently done this for fun and something didn’t feel quite right: I felt oddly clumsy and slow – likely because that old keyboard did not have touch data about me yet, and certainly did not have the quality of language modelling and decoding that I’ve grown accustomed to.
→ Ideation: To inspire new ideas for "intelligent" UIs we might ask how to transfer the successful data-driven and probabilistic concepts from keyboards to GUIs more generally (also see suggested reading at the end).
UX is more than the UI, in particular with "intelligent" UIs
That old keyboard looked almost the same as my latest keyboard app yet my user experience was a lot worse, as mentioned above. Today’s keyboards are a prime example of an interactive system where UX cannot be discerned from the visual UI, which does not reveal underlying algorithmic qualities.
→ Evaluation: Empirical evaluation is crucial for the design of adaptive UIs. Future progress can be expected to benefit from building on knowledge and methodology from both HCI and AI.
How is the data in data-driven UIs used?
We already interact with data-driven UIs everyday, and these collect our data. Keyboards are one of very few interactive systems that, at least in principle, only collect input data to improve the input method itself.
→ Reflection: We might critically examine data-driven UIs by inquiring into their use of user data in-situ (i.e. in interaction) vs ex-situ (i.e. beyond use).
Recap and thinking points
To stimulate further discussion and insight, consider that we can view today’s touch keyboards as:
- Data-driven UIs: Keyboards use touch data and language data to improve your typing experience and efficiency.
- Probabilistic UIs: Keyboards take into account uncertainty when interpreting your input.
- Dynamic and adaptive UIs: Integrating a language model is not "magic" detached from typing but rather results in dynamic changes in pixel-to-key assignments.
- Biometric UIs: To adapt their keys, keyboards learn a representation of your individual touch behaviour, which varies between users, like a behavioural fingerprint.
- Deceptive UIs: Keyboards internally use different pixel-to-key assignments than the keys shown on the screen (with good intentions: to avoid confusion and deteriorating co-adaptation).
- Searchers: Keyboards search a vast hypothesis space to "decode" your input touches into words and sentences.
Conclusion
We have derived the core features that make today’s smartphone keyboards smart and explored their effects and central parameters via visualisations based on concrete touch and language data.
Crucially, the four core features explained here all arise from one probabilistic framework (Bayes’ rule), by filling in its slots with different concrete components. Here is a summary:

To conclude, Mobile touch keyboards have become a prevalent everyday example of an "intelligent", data-driven and probabilistic user interface.
Further reading:
- Probabilistic concepts for other touch GUIs than keyboards
- More on token passing and input decoding
- Looking for slides, code etc.? This is based on material I created for my "Intelligent User Interfaces" lecture at University of Bayreuth, Germany. An open version, co-taught with others, is available here: https://iui-lecture.org/ (incl. Python notebooks to recreate the plots).