Soundhound is an artificial intelligence (AI) voice assistant company with a turbulent history. After a successful IPO in 2022, the company’s stock fell around 85% in 2023 as growth slowed and it failed to turn a profit.
However, Nvidia’s recent disclosure that it has invested in SoundHound – among a number of other AI-focused companies – has seen the company’s stock value more than double over the last month.
Fresh from speaking to sister publication Just Auto in January about his company’s new in-vehicle voice assistant features, SoundHound COO Michael Zagorsek caught up with Verdict to discuss its context in the broader AI market.
What is Nvidia’s relationship with SoundHound?
Zagorsek: They’re what we refer to as a strategic investor [alongside] Oracle, Samsung, Vizio and Hyundai. We also work with them because we use their product. We’ve been partners of theirs for a long time because we’re very much in the same space and they’re a big enabler of the market that we’re in. So, absolutely, there’s a business relationship on top of the investment.
Soundhound is pretty old by tech company standards. When did you integrate AI?
There were three co-founders, Stanford University graduates. They started in 2005 with a vision, which was that people would be talking to the products around them.
SoundHound’s first chapter was in the music recognition space, specifically a humming engine. You could hum a song and it would tell you [what it is]. In those first 10 years, that [voice assistant] part of the business was being built in stealth. The founders were working on our voice AI platform, which was essentially a voice assistant. So you asked a question, it provides a response, and 10 years almost to the day [after founding] in 2015 was when we announced our voice assistant capability.
How well do you really know your competitors?
Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.
Thank you!
Your download email will arrive shortly
Not ready to buy yet? Download a free sample
We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form
By GlobalDataWe have taken that voice interface, and I’ve been applying it in two different ways. One is in products. Cars are a great market for us because there’s voice interface already in cars and we’ve grown to over 20 brands. The market share of our customers in automotive represents 25% of the market there and they’re continuing to add our capabilities to all of their models. We’ve got a few TV brands in the US, so millions of televisions use our voice interface.
More recently, it’s in the service industry. In particular, the restaurant sector has staffing challenges. They need automation to help take orders faster, more consistently. So we can take orders over the phone when you call a restaurant. It’s entirely our conversation AI technology. Drive-thrus as well. You go to a drive-thru, you make your order [with our software]. That’s a big growing market for us.
We have our own proprietary voice AI technology. It takes what I’m saying and it transcribes it, but applies natural language understanding. This is the key difference. If you’re using a transcription engine, it’s not necessarily trying to figure out what you’re saying. It just needs to make sure that my words are accurately represented. Natural language understanding is really about making sense of what you’re saying.
We brought those two together early and with generative AI and ChatGPT it really understands well, but you can’t just plug a voice interface into that because it’ll get things wrong. We were able to incorporate it into our technology stack to become a part of our voice assistant, not the entirety of it. That’s really where the excitement is today.
What are the key differences between your offering and your Big Five competitors besides scale?
It’s always an interesting dilemma. If you’re a company that’s in a market and you’re the only one, that’s advantageous because you can capture more market share. If you’re in a market where there is competition, especially [from] massive companies like Google and Amazon, that can be positive too. When Google Assistant and Alexa came forward, it really accelerated people’s awareness and appreciation for voice interfaces. We weren’t in the market to build speakers in the home, and they were, so people started to appreciate it.
Then [companies] would say, well, we want a voice experience ourselves, and they didn’t necessarily want Amazon or Google to infiltrate their product. So one of the services we provide is a wake word. We will tune a phrase for customers so if you say something like Hey Hyundai, it activates the assistant. That’s important because that is a gateway into a customized voice experience that we can then build.
We also believe that smaller more focused companies can be disruptors. This is all we focus on, not just a part of it. For example, we have the ability to do compound statements. If you said something like, show me Asian restaurants within a mile of me except Japanese and Korean, it would understand some of those exceptions whereas the big tech provider’s systems weren’t built to do that. So it would just include them, [or] sometimes it would think you just want Japanese and Korean and only show those.
Are you confident that your software is avoiding the risk of hallucinations?
Our track record to date has been pretty strong, meaning that we haven’t experienced anything like that. I should also point out that we’re not limited to GPT; there are other large language models emerging, and the nice thing about what we’ve built is we are agnostic to who we use. We think there’s going to be a world where there are multiple LLMs – broad ones, narrow ones – and you want to be able to use the best of what’s out there. ChatGPT is [just] the first and furthest along so they’re dominating the headlines, and I think justifiably so. For the user, we want it to feel like one assistant, even when it’s switching from ChatGPT to navigation, for instance. If you try to coax a particular response, we have some proprietary things we do to really make sure we don’t enter the realm of these hallucinations or unpredictable behaviours.
Where do you see this going in a blue-sky world?
When we step back and really look at the road ahead, we have essentially two predictions with a bonus third one. The first is that customer service AI will become as important to businesses as electricity and Wi-Fi. If you have a business, you obviously need to have power, you need to have connectivity [and eventually] you’re going to have customer service AI. There will be an AI that will be able to interact with your customers and ultimately scale whatever team you have.
The second prediction is that voice AI becomes the preferred way to interact with all our devices. It’s natural, and it’s really not that hard to put a speaker and a microphone into a physical product; it’s much easier to do than a big screen and have to design an interface. As we know, when the next generation of kids are exposed to a product they can talk to, they talk to it right away even before they can type and swipe, so that becomes the natural way.
The third is that there will be companies that will capitalise on this, and they won’t always come from the places you expect. Naturally, everyone looks to the Big Tech players but I think the reason there’s interest in what we’re doing is because the landscape is broad. Very specialized, capable organizations who’ve been working on this for a while will be in a position to capitalize on it. Just yesterday Stellantis, which owns Peugeot, Citroen, Vauxhall, Opel, and the DS brand announced that they’re going live in production with our ChatGPT integration. They will be the first car company to have an in-production vehicle that incorporates ChatGPT. They’re pioneers thanks to the work we’re doing together.