Interview with Ilana Shalowitz, Senior VUI Designer at Emmi
by Thomas Brandenburg
People and companies assume that because most everyone converses, it must be easy to build a conversation. They don’t realize, however, that to do it well requires an understanding of the underlying principles of conversation as well as the artful applications of those principles. That’s why sometimes you experience clunky conversational apps.
What language (keywords or phrases) do you like to use to explain Conversational UI to an audience who is not familiar with it?
I like to start by exposing the complexity present in even a small task, and then introduce keywords as they relate to the task.
Say, the complexity in finding a restaurant. First you have to find out how many different ways people will ask you. You may want to narrow down your options by neighborhood, what type of food, the price range, ratings preference, dietary preferences, or portion sizes. It’s pretty easy through Yelp’s interface to find what you need, but how does that translate to voice?
Each of these filters is called an entity in conversation design. Voice technology is able to sort through whole sentences to extract these entities. So, a user may ask “Where can I find Mexican food near me”, and just the entities will trigger the response you design. The designer decides which of these entities is required before the system says it has enough information to complete a search. For example, you may want neighborhood, price range, and dietary preference, while the rest are optional.
The designer also decides how much information the user hears: too much information can actually be a hindrance to decision-making. For example, if you list off the top twenty restaurants, the user will likely be overwhelmed, have trouble processing the information, and will not be any closer to their meal than when they started. The design of conversations should seek to limit this type of burden from effortful thinking, it’s called cognitive load. In this situation, simply listing the top three options, reduces the cognitive load.
Users may want to hear the full list eventually, but for a pleasant user experience the list is best punctuated with questions, called prompts. In this scenario, asking “Would you like to hear about other restaurants?” would do nicely to bring users to restaurants four through six in the list.
This technique of breaking up longs blocks of speech into interactions is part of another crucial part of conversation design: listenability. Have you ever noticed that some sentences are easier to glide your eyes across than others?
Pay attention to your effort reading the following sentences:
“There are twenty restaurants situated there” Vs. “Twenty restaurants are nearby”
The second example is much easier to absorb.
In conversation design, too, there are some ways of organizing information that are more listenable. Brevity is one way to make things easy on the ears. A designer must also consider the hard and soft sounds of each word, and the rhythm and sound of how they all fit together. In one short word, the way rhythm and sounds of speech fit together is called prosody.
Prompt, entity, cognitive load, listenability, prosody: These terms are a great start to understanding the elements of conversation design. Once a designer masters these concepts, they may like to start thinking about how conversations evolve over time, address sensitive information, and context.
Given we’re entering a new era of computing, what do you consider to be the promise of Conversational UI?
Linguistically, eventually we should be able to pick up when a user speaks multiple dialects of a language and the circumstances they use each. Beyond understanding when the user switches dialects, the conversational UI should be able to mimic that pattern to communicate back with the user. The same is true for conversing with people who speak multiple languages. For example, you may have heard someone try to speak in a combination of English and Spanish.
In addition to making it easier to use the interface, mimicking the users patterns of switching dialects or languages creates a common culture between the user and the interface. Ever notice in the office or among your friends there are certain words that are popular? It’s subtle, but using similar words signal who is an insider to your group.
Functionally, the promise comes from the power of historical data and the adaptability of the interface. It’s not unique to conversational interfaces, but they too will benefit from aggregating data and learning to type and anticipate the needs of users. Imagine there’s someone out there with the same values that behaves in a similar way to you. When they encountered X, they found Y to be very helpful. Right now, as you approach X, the conversational interface, having learned from its past experience with your type person with X tells you Y. The conversational element is that it will tell you in exactly the words it knows will get through to you. In this way, everyone is able to benefit from everyone else’s experiences in the new era of computing.
Do you think there are any specific aspects that make it difficult to adopt conversational UI for end-to-end experiences?
Our five sense each have their strengths. Conversation pulls primarily from sound and secondarily from sight.
For those new to conversational UIs, the first question to ask is whether an exclusively conversation-based experience is the right match for the task. Often, it’s difficult to adopt conversational UI because it’s simply not the best overall design choice. Sometimes you’ll want an experience that’s primarily sight-based. Or a combination of sight and sound. In the future, we will be able to incorporate even more senses. Many people/companies are anxious to incorporate anything voice, and that anxiety gets in their own way for delivering the best experience to their users.
Sometimes, too, people and companies assume that because most everyone converses, it must be easy to build a conversation. They don’t realize, however, that to do it well requires an understanding of the underlying principles of conversation as well as the artful applications of those principles. That’s why sometimes you experience clunky conversational apps.
For people and companies already well-versed in conversational UIs, there are some good end-to-end experiences. What holds them back from wider scope experiences is that experience is tied to particular devices. Alexa does some hopping from the Echo to the app, but as a discipline we don’t quite have sophistication to be platform agnostic.
We’ll get there.
Do you think that you can design interfaces to build trust?
Yes, for better, for worse. The same social engineering that makes phishing scams successful can empower design interfaces to engender user’s trust. We learn cues to tell us when something can be trusted and those cues can be codified and employed. The good news is as humans we have the ability to sniff out danger and adapt. So, when something becomes dangerous we alert each other and change what cues signal trust.
Tell us about the metrics that your team works toward to capture the value of conversational UI? How do you share it within your organization?
I work in Healthcare so the success of our designs is closely tied to our clients’ success.
We measure how many patients:
• Interact with our UI
• Continue to engage with the UI on subsequent occasions
• Transfer to schedule appointments
• Report health issues they have not brought up with their care team
• Report health issues requiring follow-up
• Complete the experience
Additionally, we assess our designs on the amount of successful interactions and erroring that occurs.
Oftentimes if there’s an issue we’ll hear it first from our Client Services team who have close relationships with our clients. After reviewing the design to identify the source of the problem, we’ll adjust the experience and communicate back to the Client Services team so they can share the changes with our clients.
We also collaborate closely with our Research and Marketing teams who often generate the metrics from our databases.
Any final thoughts you would like to share?
If you want to try your hand at conversation design, go to a cafe and diagram a conversation you overhear. Really look at it. How does it evolve? What is its shape?
Then, diagram some possible ways it could have gone if a single response had been different.
You’ll be able to learn the tools to make conversation designs fairly quickly. It’s this mode of creative thinking about conversation that takes time, and practice, to develop.