AI Weekly: Nvidia’s Commitment to Voice AI — And A Farewell

Missed a session at the Data Summit? View on demand here.

This week, Nvidia announced a slew of AI-focused hardware and software innovations at its March GTC 2022 conference. The company unveiled the Grace CPU Superchip, a data center processor designed for high-performance computing and AI applications. And it describes the H100, the first in a new line of GPU hardware aimed at accelerating AI workloads, including training large natural language models.

But one announcement that slipped under the radar was the general availability of Nvidia’s Riva 2.0 SDK, as well as the company-run offerings from Riva Enterprise. Both can be used to build AI applications for speech and in particular point to the growing market for speech recognition. The speech and speech recognition market is expected to grow from $8.3 billion in 2021 to $22.0 billion in 2026, according to Markets and Markets, powered by enterprise applications.

In 2018, a Pindrop survey of 500 IT and business decision makers found that 28% were using voice technology with customers. Meanwhile, in 2019, Gartner predicted that 25% of digital workers will be using virtual employee assistants on a daily basis by 2021. And a recent Opus survey found that 73% of executives see value in AI speech technologies for “operational efficiency.”

“As speech AI expands into new applications, enterprise data scientists want to develop, adapt and deploy speech applications,” an Nvidia spokesperson told VentureBeat via email. “Riva 2.0 includes strong integration with TAO, a low-code solution for data scientists, to customize and deploy speech applications. This is an active area of ​​focus and we plan to make the workflow even more accessible to customers in the future. We’ve also introduced Riva to early access embedded platforms, and we’ll share more at a later date.”

Nvidia says Snap, the company behind Snapchat, has integrated Riva’s automatic speech recognition and text-to-speech technologies into their developer platform. RingCentral, another customer, uses Riva’s automatic speech recognition for live captioning of video conferences.

Speech technologies also include speech generation tools, including voice cloning tools that use AI to mimic the pitch and prosody of a person’s speech. Last fall, Nvidia unveiled Riva Custom Voice, a new toolkit that the company claims can enable customers to create custom, “human-like” voices with as little as 30 minutes of voice recording data.

Brand voices like Flo from Progressive are often tasked with incorporating phone trees and e-learning scripts into corporate training video series. For businesses, the costs can add up — one source puts the average hourly rate for voice actors at $39.63, plus additional costs for Interactive Voice Response (IVR) prompts. Synthesis could increase the productivity of actors by reducing the need for additional recordings, potentially allowing the actors to pursue more creative work – all while saving companies money.

According to Markets and Markets, the global speech cloning market could grow from $456 million in value in 2018 to $1.739 billion in 2023.

On the horizon, Nvidia sees new speech applications entering production via augmented reality, video conferencing and conversational AI. Customer expectations and focus are on high accuracy and ways to customize speech experiences, the company says.

“Low-code solutions for speech AI [will continue to grow] because non-software developers want to build, refine, and deploy voice solutions,” the spokesperson continued, referring to low-code development platforms that require little to no coding to build voice apps. “New research brings emotional text-to-speech and transforms the way humans interact with machines.”

As exciting as these technologies are, they will – and have already done so – introduce new ethical challenges. For example, fraudsters have used cloning to imitate a CEO’s voice well enough to initiate a wire transfer. And some speech recognition and text-to-speech algorithms have been shown to recognize the voices of minority users less accurately than those with more common inflections.

It is the duty of companies like Nvidia to make an effort to address these challenges before putting their technologies into production. To its credit, the company has taken steps in the right direction, such as banning the use of Riva to create “fraudulent, false, misleading or deceptive” content, as well as content that “promotes[s] discrimination, bigotry, racism, hatred, harassment or harm to any person or group.” Hopefully more will come along this line.

a goodbye

As a supplement to this week’s newsletter, I am saddened to announce that I am leaving VentureBeat to pursue professional opportunities elsewhere. This edition of AI Weekly will be my last – a bittersweet realization, indeed, as I try to find the words to put on paper.

When I joined VentureBeat as an AI staff writer four years ago, I had only a vague idea of ​​the arduous journey ahead. I wasn’t exceptionally well versed in AI – my background was in consumer technology – and the industry jargon was overwhelming to me, not to mention contradictory. But as I learned mostly from those on the academic side of data science, an open mind — and a willingness to admit ignorance, frankly — is perhaps the most important ingredient to understanding AI.

I haven’t always been able to do that. But as a reporter, I’ve tried not to lose sight of the fact that my domain knowledge pales in comparison to that of the titans of industry and academia. Whether addressing stories of bias in computer vision models or the environmental impact of training language systems, it is my policy to lean on others for their expert perspectives and present those perspectives, lightly edited, to readers. As I see it, my job is to contextualize and rely on it, not pontify. There’s a place for pontification, but it’s on opinion pages – not news articles.

I’ve learned that a healthy dose of skepticism also goes a long way when reporting on AI. It’s not just snake oil sellers that you should be wary of, but also the well-oiled PR companies, lobbyists and paid consultants that claim to prevent harm, but in fact do the opposite. I have lost track of the number of ethics committees that have been dissolved or have turned out to be toothless; the number of malicious algorithms has been resold to customers; and a number of companies have attempted to silence or reduce whistleblowers.

The silver lining is regulators’ growing awareness of the industry’s deception. But as elsewhere in Silicon Valley, techno-optimism has turned out to be nothing more than a publicity tool.

It’s easy to get carried away by the novelty of new technology. I once did – and still do. The challenge is to recognize the danger in this novelty. I am reminded of the Chilean writer Benjamín Labatut’s novel When We Cease to Understand the World, which examines great scientific discoveries that have led in equal parts to prosperity and untold suffering. For example, German chemist Fritz Haber developed the Haber-Bosch process, which synthesizes ammonia from nitrogen and hydrogen gases and almost certainly prevented famine by enabling mass production of fertilizers. At the same time, the Haber-Bosch process simplified the production of explosives and made it cheaper, contributing to millions of deaths by soldiers during World War I.

AI, like the Haber-Bosch process, has the potential for tremendous good — and good actors are desperately trying to make it happen. But any technology can be abused, and it’s the job of reporters to expose and highlight those abuses — ideally to bring about change. I hope that, along with my distinguished colleagues at VentureBeat, I have achieved this to some extent. To a future of strong AI reporting.

For AI reporting, subscribe to the AI ​​Weekly newsletter and bookmark our AI channel, The Machine.

Thank you for reading,

Kyle Wiggers

Senior AI Staff Writer

VentureBeat’s mission is to be a digital city square for tech decision makers to learn about transformative business technology and transactions. Learn more

This post AI Weekly: Nvidia’s Commitment to Voice AI — And A Farewell

was original published at “”