Translation: Technology Update: Conversion of Speech into Speech

Automatic Speech Recognition by Deepgram streamlines the creation of voice-enabled applications, delivering superior, swift, and cost-effective transcription capabilities on a large scale.

, and Administrator

2025 July 7 . 11:35 PM

3 min read

Translation: Latest Tech Updates: Voice Translation Technology

Translation: Technology Update: Conversion of Speech into Speech

Deepgram, a leading voice AI platform, has launched its most advanced speech-to-text model to date - Nova-3. This groundbreaking model is tailored for enterprise use cases, offering a range of features designed to streamline global operations and enhance customer engagement.

Nova-3's strength lies in its multilingual capabilities, high accuracy in noisy environments, real-time streaming integration, customization options, and seamless incorporation into a full conversational AI stack. These features make it highly suited for complex, global enterprise voice applications.

One of the standout features of Nova-3 is its real-time multilingual speech recognition with code-switching support. The model can process conversations that switch naturally between 10 languages (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch) without needing explicit routing or language-specific mechanisms. This makes it ideal for global operations handling diverse multilingual inputs seamlessly.

Another key advantage of Nova-3 is its ability to deliver accurate transcriptions under challenging audio environments. This is crucial for enterprise applications, where reliable speech-to-text results are essential.

Nova-3 also offers self-service customization through keyterm prompting, allowing enterprises to tailor the model to better recognize domain-specific terminology or key phrases, improving transcription relevance and user experience.

The model forms the transcription backbone of a unified voice agent system that combines speech-to-text, text-to-speech (with Aura-2), and large language model orchestration. This integrated runtime handles conversation flow, turn-taking, and knowledge retrieval in real time, reducing latency and increasing responsiveness.

Nova-3 also boasts improved interaction speed and responsiveness. Transcripts are incrementally processed and sent to language models as utterances occur, with speech synthesis starting before full punctuation or silence thresholds are detected. This lowers time-to-first-byte and supports interruptible, more natural conversations.

In terms of deployment, Nova-3 and associated features are available for both hosted and self-hosted use, supporting real-time streaming and pre-recorded transcription needs. The model also supports automatic formatting of entities such as dates, alphanumerics, currencies, payment details, SSNs, and time zones, improving readability and usability of transcriptions.

Deepgram emphasizes enterprise scalability and cost efficiency, with Nova-3 being part of a Voice Agent API priced affordably for large-scale deployment. The platform offers cost-efficiency and seamless updates, helping businesses stay competitive and future-proofed as they scale.

Nova-3 outperforms competitors in both batch and streaming use cases, with consistently lower Word Error Rates (WER). In streaming WER, Nova-3 has a WER of 6.84%, leading over the next-best competitor by 54.2% (14.92% WER).

Moreover, Nova-3 is the industry's first voice AI model to enable self-serve customization, allowing users to fine-tune the model for specialized domains without requiring deep expertise in machine learning. In multilingual testing, Nova-3 outperforms OpenAI's Whisper across seven languages, delivering up to 8:1 preference ratios in some languages.

Deepgram's focus on continuous model and platform improvements ensures users always have access to the latest advancements. Figures 1 and 2 compare the batch and streaming WER between various voice AI models, demonstrating Nova-3's superior performance.

In summary, Nova-3 is a significant leap forward in AI-driven speech-to-text technology. Its advanced real-time multilingual conversation transcription empowers enterprises to scale globally, enhancing international customer engagement and driving business efficiency. Trusted by industry leaders like Twilio, Jack in the Box, and Kore.ai, Nova-3 is set to redefine the landscape of enterprise voice applications.

[1] Source: Deepgram Press Release, 2022 [3] Source: Deepgram Product Documentation, 2022 [5] Source: Deepgram Pricing Page, 2022

Data-and-cloud-computing solutions enable enterprises to deploy and use Nova-3, the advanced speech-to-text model from Deepgram, which leverages artificial-intelligence to deliver real-time, multilingual speech recognition with code-switching support for seamless global operations and enhanced customer engagement.

Nova-3's integration into a full conversational AI stack, combined with its self-service customization options, makes it highly suited for complex, global enterprise voice applications that require a high degree of accuracy even in noisy environments and support for diverse multilingual inputs.

Latest

Threadripper-Optimized Version 25.00 of 7-Zip for Windows Debuts, Allowing for Parallel Processing...

All about technology.

7-Zip software for Windows computers introduces the 'Threadripper Edition' for enhanced parallel processing, now capable of managing more than 64 threads - a significant development five years following the initial release of Threadripper.

Version 25.00 of 7-Zip is the initial Windows iteration equipped to effectively manage over 64 concurrent threads.

, and Administrator

2025 July 8

Reason Behind Today's Ethereum Price Surge

All about technology.

Increase in Ethereum's Market Value Today: Causes Explained

Ethereum Ascends to $2,000 following Pectra Update, Financial Experts Anticipate a Possible Leap to $3,000. Simultaneously, XRP and Other Cryptocurrencies Experience Growth.

, and Administrator

2025 July 8

Ripple Report: Record-breaking XRP Trading Volume of $16 Billion Despite Decrease in On-chain...

All about technology.

Ripple Update: XRP Transaction Volume Reaches a High of $16 Billion Yet On-Chain Activity Decreases

Rise in XRP trading observed in early 2025, yet on-chain activity significantly decreases. Optimism surrounding potential ETFs and SEC withdrawal boosts bullish attitude, despite network performance issues.

, and Administrator

2025 July 8

Stellar's Price Breaks $0.10 Barrier, Bolstering Dogecoin Prediction; BlockDAG Airdrop Elevates...

All about technology.

Stellar Propels Toward $0.10 and Dogecoin Price Forecast Soars, as BlockDAG Airdrop Elevates Swindlers to the Leading Edge

Investigating current prices of Dogecoin (DOGE) and Stellar (XLM), seeking the prime crypto to invest in today? While fluctuations in DOGE and XLM present potential short-term prospects, BlockDAG's innovative approach could pique your interest. Budgeting based on your wallet balance won't...

, and Administrator

2025 July 8

Translation: Technology Update: Conversion of Speech into Speech

Translation: Technology Update: Conversion of Speech into Speech

Read also:

Related

Latest