The creator of the famous voice assistant dreams of a world where Alexa is everywhere, anticipating your every need.
Speaking with MIT Technology Review, Rohit Prasad, Alexa’s head scientist, revealed further details about where Alexa is headed next. The crux of the plan is for the voice assistant to move from passive to proactive interactions. Rather than wait for and respond to requests, Alexa will anticipate what the user might want. The idea is to turn Alexa into an omnipresent companion that actively shapes and orchestrates your life. This will require Alexa to get to know you better than ever before.

In June at the re:Mars conference, he demoed [view from 53:54] a feature called Alexa Conversations, showing how it might be used to help you plan a night out. Instead of manually initiating a new request for every part of the evening, you would need only to begin the conversation—for example, by asking to book movie tickets. Alexa would then follow up to ask whether you also wanted to make a restaurant reservation or call an Uber.
A more intelligent Alexa
Here’s how Alexa’s software updates will come together to execute the night-out planning scenario. In order to follow up on a movie ticket request with prompts for dinner and an Uber, a neural network learns—through billions of user interactions a week—to recognize which skills are commonly used with one another. This is how intelligent prediction comes into play. When enough users book a dinner after a movie, Alexa will package the skills together and recommend them in conjunction.
But reasoning is required to know what time to book the Uber. Taking into account your and the theater’s location, the start time of your movie, and the expected traffic, Alexa figures out when the car should pick you up to get you there on time.

Prasad imagines many other scenarios that might require more complex reasoning. You could imagine a skill, for example, that would allow you to ask your Echo Buds where the tomatoes are while you’re standing in Whole Foods. The Buds will need to register that you’re in the Whole Foods, access a map of its floor plan, and then tell you the tomatoes are in aisle seven.
In another scenario, you might ask Alexa through your communal home Echo to send you a notification if your flight is delayed. When it’s time to do so, perhaps you are already driving. Alexa needs to realize (by identifying your voice in your initial request) that you, not a roommate or family member, need the notification—and, based on the last Echo-enabled device you interacted with, that you are now in your car. Therefore, the notification should go to your car rather than your home.
This level of prediction and reasoning will also need to account for video data as more and more Alexa-compatible products include cameras. Let’s say you’re not home, Prasad muses, and a Girl Scout knocks on your door selling cookies. The Alexa on your Amazon Ring, a camera-equipped doorbell, should register (through video and audio input) who is at your door and why, know that you are not home, send you a note on a nearby Alexa device asking how many cookies you want, and order them on your behalf.
To make this possible, Prasad’s team is now testing a new software architecture for processing user commands. It involves filtering audio and visual information through many more layers. First Alexa needs to register which skill the user is trying to access among the roughly 100,000 available. Next it will have to understand the command in the context of who the user is, what device that person is using, and where. Finally it will need to refine the response on the basis of the user’s previously expressed preferences.
Why It’s Hot: “This is what I believe the next few years will be about: reasoning and making it more personal, with more context,” says Prasad. “It’s like bringing everything together to make these massive decisions.”