When I try to remember, I close my eyes.
When I try to remember my childhood home, I close my eyes and the first thing that comes to mind is a sound—a teaspoon stirring sugar into a cup of tea, then being dropped onto the granite countertop with satisfying completion. Only then do I have a vision—I see Dad in his pajamas carrying a cup of tea upstairs to wake Mom, like he did every morning. That sound embodies love and home and childhood more than any picture ever could.
The natural world is filled with sound. Sounds are streaming into your brain right now without you even paying attention. You can hear the air conditioning in the office, or maybe the refrigerator at home, or your fingers on the keyboard. If something out of the ordinary occurs and makes a sound, like a teacup being dropped or a fire alarm going off, you attend to that sound immediately and unconsciously. Hearing is a powerful sense. It is essentially impossible to close your ears.
The digital world, on the other hand, has been strangely silent. For most aspects of our digital user experience, sound is in the periphery. Designers are not guaranteed the user will be listening. Users turn off their speakers so that so they do not impose on coworkers, and they hesitate to wear headphones in public for fear of missing the honking horn of an oncoming bus. When they do wear headphones, they are intentionally becoming unavailable to the world around them.
The Rise of Audio-First Interfaces
Until now, designers have been forced to assume their interfaces were silent and only include sound that is secondary to the visual interface. As a result, the sound design industry is nowhere near the size of the visual design industry.
The current state of the audio-first interface is similar to the early World Wide Web. Content providers do not fully control how their content is rendered. In 1994-95, users controlled the styling of content, but within a few years, the balance of power shifted to the developer rather than the consumer. That shift allowed corporations to brand the look and feel of their websites.
The effect of the commercialization of the Web was almost immeasurable. As visual branding became possible, every business decided to build its own branded website. At the time, I worked on early commercial websites for Fortune 500 companies that spent tens of millions of dollars for a team of 60 designers, software engineers, database administrators, etc. to develop their websites. The commercialization of the Web was arguably the primary economic engine for the United States in the late ‘90s.
Beyond the financial impact, there was a rapid expansion in the design industry. According to the Bureau of Labor statistics, the number of graphic design jobs in the US increased by 77% during the period dating from the public release of the first Web browser in September of 1993 to the peak of in industry in June of 2001. We also saw the creation of new fields of design, most notably web design and interaction design. More importantly, the Web spread graphic design literacy. Discussions of typeface, whitespace, and hierarchy spread from a small office in the back of the marketing department to the entire company.
It would be foolish to predict the commercialization of audio-first interfaces will be the economic engine that the World Wide Web was, but it will be significant. In the near future, there will be a massive expansion in the demand for audio design as corporations embrace audio-first devices and will want to brand how their content sounds to better reflect their company. The discipline of audio design will be formalized to the level of graphic design, and the general public will become conversant in the language of audio.
When Fiction Become Reality
Audio-first interfaces have been the the stuff of science fiction and have fallen broadly into two categories: the assistant who lives in a place like HAL_9000 from 2001 A Space Odyssey, and the assistant who lives in your ear like J.A.R.V.I.S from Iron Man. Only recently has voice recognition technology become sufficiently advanced to build a ubiquitous, reliable voice user experience.
The Amazon Echo is an audio-first interface intended to be used in the home. It is essentially a bluetooth speaker you primarily control with your voice, whereas most bluetooth speakers are primarily remote controlled with your phone. In addition to the high-quality speakers, it has an array of microphones that provide far-field voice recognition, whether or not music is playing. When I first used the Amazon Echo, I made the obvious comparison to the Federation Computer from Star Trek, wishing that I had one in every room of my house, and the keyword was “computer” instead of “Alexa.”
To operate the Echo, you say a keyword (“Alexa”), followed by a command. For example, if you say: “Alexa, play Nina Simone,” the Echo would respond by playing a Nina Simone song. The Echo will answer questions through an interface that is similar to Apple’s Siri, and control home automation products: “Alexa, turn off the bedroom lights.” Among the Echo’s capabilities are ordering products online, making to-do lists, and telling jokes. However, the most impressive feature is its extensibility. Developers can easily write new skills for the Echo and deploy the new features to their own device, or share them online. Uber has just released a skill that allows users to say: “Alexa, call me an Uber,” and an Uber driver will arrive at their doorstep in minutes. I have programmed my Echo to tell me when my morning bus will arrive at the stop based on the bus’s GPS coordinates.
The Amazon Echo was so successful that Amazon ran out of them at Christmastime. It has over 33,000 reviews on Amazon.com (compared with 4,500 reviews of the best selling bluetooth speaker) with an average rating of 4.4 stars. Due to the popularity of the Echo, Amazon has recently announced it has expanded its line of voice-controlled hardware to include Amazon Echo, Echo Dot, and Amazon Tap. Echo Dot is half the price of the Echo and is intended to extend Echo’s reach throughout the home. The future Gene Roddenberry envisioned when he created Star Trek in 1966 has finally arrived.
January 2014 marked the US release of Her, a Spike Jonze film showcasing his version of an audio-first interface. Whenever Theodore (Joaquin Phoenix) is interacting with his operating system, Samantha (Scarlett Johansen), he wears a conspicuous single-ear earbud. This earbud allows Theodore to listen to Samantha, but also keeps his other ear open to the world around him.
In the months that followed, nearly 16,000 Kickstarter backers supported the creation of the “world’s first wireless smart in ear headphones” with well over $3m in pledges. Now ready to ship the first commercial model, Bragi Dash boasts an expensive $299 price tag. The Dash allows the wearer to hear the world around them while also hearing and talking to their smartphone via Bluetooth.
The Dash is more than a pair of wireless headphones as Echo is more than a Bluetooth speaker. The Dash is an audio-first computing platform that includes 23 sensors to provide context. For example, the Dash can tell when you shake your head and when your heart-rate increases. More importantly, the Dash can help the user isolate which audio they want to hear—real world or virtual. Competitive products with a wide array of features have been announced by Intel, Apple, Samsung, Microsoft and Google, as well as niche players like Here Active Listening.
Enter, Audio Branding
Audio branding is a small but growing industry. Audio branding agencies develop a brand’s identity through sound, like traditional branding agencies do through visuals. Audio branding engagements might be as small as developing a sonic logo, or as large as the recent 22 month overhaul of Vienna’s public transit system audio.
Currently, the audio presentation on these audio-first devices is controlled by the device manufacturer. Amazon Echo, for example, always speaks in Alexa’s voice, and doesn’t change when running a given company’s software. However, when Home Depot launches an app for Amazon Echo, I suspect they will want the voice of their spokesman, Josh Lucas, to represent them. And Motel 6 will want Tom Bodet to echo the company’s signature phrase: “We’ll leave the light on for you,” instead of Alexa. In addition to the voice, brands will want their own music and earcons—audio icons—to represent them. Comprehensive audio design will be required for a brand’s content on these interfaces as graphic design was required for their website.
Silence is No Longer an Option
As designers, when we leave our applications silent, we ignore a key dimension of interaction. The coming adoption of audio-first interfaces is going to drastically increase the demand for audio services. Branding funded the Dot-com Boom and branding will fund significant growth in audio design. Audio-first interfaces are proving silence isn’t golden, let the new gold rush begin.