The future of user interface design is multi-modal and context aware

Current user interfaces are primarily limited to a single input device. We type on a keyboard, we point and drag with a mouse or we speak to an application that accepts only voice. Our computer doesn’t know if we are sat in front of it, let alone whether we are talking to someone else or looking at something else. As a result computers appear stupid and rude. They are oblivious to our frantic mouse movements on screen 2 while we are looking at screen 1 wondering where the mouse pointer is hiding. They pop up messages and make noises even though we are busy with another task or in the middle of a conversation with someone else.

This will change. In the future computers will become more aware of their surroundings and of their users. They will track your eyes, they will listen constantly to what you are saying and what is going on in the room around them and they will begin to appear to be smarter and a whole lot less rude than they are today.

But I also think that UI will become multimodal. We will combine mouse movement with speech, computers will ask questions and learn. For example, I was just working on a repetitive edit on a block of code. In the future computers will ask, using a spoken voice, whether we want them to apply the same edit to the remainder of the lines in that block. And because the interaction is multi-modal it does not interrupt the flow of what you are doing. There is no pop-up message to distract you visually, there is no dismiss button you need to click to make Clippy go away. As humans we do fairly well at isolating different inputs (visual, verbal, auditory, tactile, …) and can ignore one while we focus on another. Multi-modal UI will take advantage of that and will appear less intrusive and far less annoying than previous attempts at assistive technology.

But even without computerized assistants, multimodal UI will become common. You will be able to select a block of text and say “make it larger” without having to go find the right command in a dialog launched from a menu. As we move from desktops to tablets this capability becomes even more important because we have less screen real-estate and because typing and mousing are harder.

Multimodal UI also benefits from the combinatorial, multiplicative effect of different inputs. You only need one command “make it larger” rather than make paragraph larger, make image larger, make space between lines larger, make margin larger, …

Multimodal UI also saves a lot of mouse movement and solves many of the issues around selections (e.g. do they go away when you select a menu item, or click on a different application).

Multimodal UI will also dramatically reduce how often you need to click: merely pointing at something and saying the command will be sufficient.

I strongly believe that this will be a key part of future user interfaces, but in the meantime, let me go dismiss this dialog asking if I want to reboot to install updates…



Thu May 08 2014 16:23:39 GMT-0700 (Pacific Daylight Time)


Previous page: Understanding Dates and Times in Natural Language


Disqus goes here