Skip to content

Translation: Technology Update: Conversion of Speech into Speech

Automatic Speech Recognition by Deepgram streamlines the creation of voice-enabled applications, delivering superior, swift, and cost-effective transcription capabilities on a large scale.

Translation: Latest Tech Updates: Voice Translation Technology
Translation: Latest Tech Updates: Voice Translation Technology

Translation: Technology Update: Conversion of Speech into Speech

Deepgram, a leading voice AI platform, has launched its most advanced speech-to-text model to date - Nova-3. This groundbreaking model is tailored for enterprise use cases, offering a range of features designed to streamline global operations and enhance customer engagement.

Nova-3's strength lies in its multilingual capabilities, high accuracy in noisy environments, real-time streaming integration, customization options, and seamless incorporation into a full conversational AI stack. These features make it highly suited for complex, global enterprise voice applications.

One of the standout features of Nova-3 is its real-time multilingual speech recognition with code-switching support. The model can process conversations that switch naturally between 10 languages (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch) without needing explicit routing or language-specific mechanisms. This makes it ideal for global operations handling diverse multilingual inputs seamlessly.

Another key advantage of Nova-3 is its ability to deliver accurate transcriptions under challenging audio environments. This is crucial for enterprise applications, where reliable speech-to-text results are essential.

Nova-3 also offers self-service customization through keyterm prompting, allowing enterprises to tailor the model to better recognize domain-specific terminology or key phrases, improving transcription relevance and user experience.

The model forms the transcription backbone of a unified voice agent system that combines speech-to-text, text-to-speech (with Aura-2), and large language model orchestration. This integrated runtime handles conversation flow, turn-taking, and knowledge retrieval in real time, reducing latency and increasing responsiveness.

Nova-3 also boasts improved interaction speed and responsiveness. Transcripts are incrementally processed and sent to language models as utterances occur, with speech synthesis starting before full punctuation or silence thresholds are detected. This lowers time-to-first-byte and supports interruptible, more natural conversations.

In terms of deployment, Nova-3 and associated features are available for both hosted and self-hosted use, supporting real-time streaming and pre-recorded transcription needs. The model also supports automatic formatting of entities such as dates, alphanumerics, currencies, payment details, SSNs, and time zones, improving readability and usability of transcriptions.

Deepgram emphasizes enterprise scalability and cost efficiency, with Nova-3 being part of a Voice Agent API priced affordably for large-scale deployment. The platform offers cost-efficiency and seamless updates, helping businesses stay competitive and future-proofed as they scale.

Nova-3 outperforms competitors in both batch and streaming use cases, with consistently lower Word Error Rates (WER). In streaming WER, Nova-3 has a WER of 6.84%, leading over the next-best competitor by 54.2% (14.92% WER).

Moreover, Nova-3 is the industry's first voice AI model to enable self-serve customization, allowing users to fine-tune the model for specialized domains without requiring deep expertise in machine learning. In multilingual testing, Nova-3 outperforms OpenAI's Whisper across seven languages, delivering up to 8:1 preference ratios in some languages.

Deepgram's focus on continuous model and platform improvements ensures users always have access to the latest advancements. Figures 1 and 2 compare the batch and streaming WER between various voice AI models, demonstrating Nova-3's superior performance.

In summary, Nova-3 is a significant leap forward in AI-driven speech-to-text technology. Its advanced real-time multilingual conversation transcription empowers enterprises to scale globally, enhancing international customer engagement and driving business efficiency. Trusted by industry leaders like Twilio, Jack in the Box, and Kore.ai, Nova-3 is set to redefine the landscape of enterprise voice applications.

[1] Source: Deepgram Press Release, 2022 [3] Source: Deepgram Product Documentation, 2022 [5] Source: Deepgram Pricing Page, 2022

Data-and-cloud-computing solutions enable enterprises to deploy and use Nova-3, the advanced speech-to-text model from Deepgram, which leverages artificial-intelligence to deliver real-time, multilingual speech recognition with code-switching support for seamless global operations and enhanced customer engagement.

Nova-3's integration into a full conversational AI stack, combined with its self-service customization options, makes it highly suited for complex, global enterprise voice applications that require a high degree of accuracy even in noisy environments and support for diverse multilingual inputs.

Read also:

    Latest