The Alexa Skills revolution that wasn’t

Metavives October 30, 2024

An Amazon Echo breaking under the pressure of many different Skills surrounding it. — Image: Mojo Wang for The Verge

Ten years ago, Amazon imagined a future beyond apps — and it had the idea basically right. But the perfect ambient computer remains frustratingly far away.

The first Amazon Echo, all the way back in 2014, was pitched as a device for a few simple things: playing music, asking basic questions, getting the weather. Since then, Amazon has found a few new things for people to do, like control smart home devices. But a decade later, Alexa is still mostly for playing music, asking basic questions, and getting the weather. And that’s largely because, even as Amazon made Alexa ubiquitous in devices and homes all over the place, it never convinced developers to care.

Alexa was never supposed to have an app store. Instead, it had “skills,” which Amazon hoped developers would use to connect Alexa to new functionality and information. Developers weren’t supposed to build their own things on top of an operating system, they were supposed to build new things for Alexa to do. The difference is subtle but important. Our phones are mostly a series of disconnected experiences — Instagram is a universe entirely apart from TikTok and Snapchat and your calendar app and Gmail. That just doesn’t work for Alexa or any other successful assistant. If it knows your to-do list but not your calendar or knows your favorite kind of pizza but not your credit card number, it can’t do much. It needs access to everything, and all the necessary tools at its disposal, to get things done for you.

In Amazon’s dream world, where “ambient computing” is perfect and everywhere, you’d just ask Alexa a question or give it an instruction: “Find me something fun to do this weekend.” “Book my train to New York next week.” “Get me up to speed on deep learning.” Alexa would have access to all the apps and information sources it needs, but you’d never need to worry about that; Alexa would just handle it however it needed and bring you the answers. There are a thousand complicated questions about how it actually works, but that’s still the big idea.

“Alexa Skills made it fast and easy for developers to build voice-driven experiences, unlocking an entirely new way for developers and brands to engage with their customers,” Amazon spokesperson Jill Tornifoglio said in a statement. Customers use them billions of times a year, she said, and as the company embraces generative AI, “we’re excited for what’s next.”

In retrospect, Amazon’s idea was pretty much exactly right. All these years later, OpenAI and other companies are also trying to build their own third-party ecosystems around chatbots, which are just another take on the idea of an interactive interface for the internet. But for all its prescience on the AI revolution, Amazon never figured out how to make skills work. It never solved some fundamental problems for developers, never cracked the user interface, and never found a way to show people all the things their Alexa device could do if only they’d ask.

Amazon certainly tried its best to make skills happen. The company steadily rolled out new tools for developers, paid them in AWS credits and cash when their skills got used (though it recently stopped doing so), and tried to make skill development practically effortless. And on some level, all that effort paid off: Amazon says there are more than 160,000 skills available for the platform. That pales next to the millions of app store apps on smartphones, but it’s still a big number.

The interface for finding and using all those skills, though, has always been a mess. Let’s just take one simple example: if you ask Alexa to order you pizza, it might tell you it has a few skills for that and recommend Domino’s. (If you’re wondering why Amazon would pick Domino’s and not Pizza Hut or DoorDash or any other pizza-summoning service? Great question. No idea.) You respond yes. “Here’s Domino’s,” Alexa says. Then a moment later: “Here’s the skill Domino’s, by Domino’s Pizza, LLC.” Another moment, then: “To link your Domino’s Pizza Profile please go to the Skills setting in your Alexa app. We’ll need your email address to place a guest order. Please enable ‘Email Address’ permissions in your Alexa app.” At this point, you have to find a buried setting in an app you might not even have on your phone; it would be vastly easier to just go to Domino’s website. Or, heck, call the place.

If you know the skill you’re looking for, the system is a little better. You can say “Alexa, open Nature Sounds” or “Alexa, enable Jeopardy,” and it’ll open the skill with that name. But if you don’t remember that the skill is called “Easy Yoga,” asking Alexa to start a yoga workout won’t get you anywhere.

A screenshot of a video showing guidance for Alexa skills. — *Alexa can do a lot of things. Figuring out which ones is the real challenge.*

There are little friction points like this all across the system. When you’ve activated a skill, you have to explicitly say “stop” or “cancel” to back out of it in order to use another one. You can’t easily do things across skills — I’d like to price-check my pizza, but Alexa won’t let me. And maybe most frustrating of all, even once you’ve enabled a skill, you still have to address it specifically. Saying “Alexa, ask AnyList to add spaghetti to my grocery list” is not seamless interaction with an all-knowing assistant; that’s having to learn a computer’s incredibly specific language just to use it properly.

As it has turned out, many of the most popular Alexa skills have two things in common: they’re simple Q&A games, and they’re made by a company called Volley. From Song Quiz to Jeopardy to Who Wants to Be a Millionaire to Are You Smarter Than a 5th Grader, Volley is one of the companies that has figured out how to make skills that really work. And Max Child, Volley’s cofounder and CEO, says that getting your skill in front of people is one of the most important — and hardest — parts of the job.

“I think one of the underrated reasons that the iOS and Android app stores are so successful is because Facebook ads are so good,” he says. The pipeline from a hyper-targeted ad to an app install has been ruthlessly perfected over the years, and there’s just nothing like that for voice assistants. The nearest equivalent is probably people asking their Alexa devices what they can do — which Child says does happen! — but there’s just no competing with in-feed ads and hours of social scrolling. “Because you don’t have that hyper-targeted marketing, you end up having to do broad marketing, and you have to build broad games.” Hence games like Jeopardy and Millionaire, which are huge brands that appeal to practically everyone.

One way Volley makes money is through subscriptions. The full Jeopardy experience, for instance, is $12.99 a month, and like so many other modern subscriptions, it’s a lot easier to subscribe than to cancel. It’s also one of the few ways to make money with a skill: developers are allowed to have audio ads in some kinds of skills, or to ask users to add their credit card details directly the way Domino’s does, but asking a voice-first user to pick up their phone and dig through settings is a high bar to clear. Ads are only useful at vast scale — there was a brief moment when a lot of media companies thought the so-called “flash briefings” might be a hit, but that hasn’t turned into much.

These are hardly unique challenges, by the way. Mobile app stores have similar huge discovery problems, issues with monetization, sketchy subscription systems, and more. It’s just that with Alexa, the solution seemed so enticing: you shouldn’t, and wouldn’t, even need an app store. You should just be able to ask for what you want, and Alexa can go do it for you.

A decade on, it appears that an all-powerful, omni-capable voice AI might just be impossible to pull off. If Amazon were to make everything so seamless and fast that you never even have to know you’re interacting with a third-party developer and your pizza just magically appears at your door, it raises some huge privacy concerns and questions about how Amazon picks those providers. If it asked you to choose all those defaults for yourself, it’s signing every new user up for an awful lot of busy work. If it allows developers to own and operate even more of the experience, it wrecks the ambient simplicity that makes Alexa so enticing in the first place. Too much simplicity and abstraction is actually a problem.

We’re at something of an inflection point, though. A decade after its launch, Alexa is changing in two key ways. One is good news for the future of skills, the other might be bad. The good is that Alexa is no longer a voice-only, or even voice-first, experience — as Echo Show and Fire TV devices have gotten more popular, more people are interacting with Alexa with a screen nearby. That could solve a lot of interaction problems and give developers new ways to put their skills in front of users. (Screens are also a great place to advertise your skill, a fact Amazon knows maybe too well.) When Alexa can show you things, it can do a lot more.

Already, Child says that a majority of Volley’s players are on a device with a screen. “We’re very long on smart TVs,” he says, laughing. “Every single smart TV that’s sold now has a microphone in the remote. I really think casual voice games … might make a lot of sense, and I think could be even more immersive.”

Amazon is also about to re-architect Alexa around LLMs, which could be the key to making all of this work. A smarter, AI-powered Alexa could finally understand what you’re actually trying to do, and do away with some of the awkward syntax required to use skills. It could understand more complicated questions and multistep instructions and use skills on your behalf. “Developers now need to only describe the capabilities of their device,” Amazon’s Charlie French said at Amazon’s AI Alexa launch event last year. “They don’t need to try and predict what a customer is going to say.” Amazon is just one of the companies promising that LLMs will be able to do things on your behalf with no extra work required; in that world, do skills even need to exist, or will the model simply figure out how to order pizza?

There’s some evidence that Amazon is behind in its AI work and that plugging in a language model won’t suddenly make Alexa amazing. (Even the best LLMs feel like they’re only sort of slightly close to almost being good enough to do this stuff.) But even if it does, it only makes the bigger question more important: what can virtual assistants really do for us? And how do we ask them to do it? The correct answers are “anything you want,” and “any way you like.” That requires a lot of developers to give Alexa new powers. Which requires Amazon to give them a product, and a business, worth the effort.