
Google Gemini AI Introduces Audio Transcription Feature
Google's Gemini AI assistant has launched a new feature that allows users to upload audio files for transcription, summarization, and key information extraction. This feature processes recordings up to 10 minutes long, including voice memos, lectures, meetings, and interviews, converting them into searchable documents within the Gemini platform. Available on both web and mobile apps through the standard file-upload interface, it differs from Gemini Live, which focuses on real-time voice commands, by analyzing pre-recorded audio.
Josh Woodward, Google's VP of Gemini, stated that audio upload was the most requested feature, indicating a strong demand for streamlined audio handling. Testing revealed high transcription accuracy across various formats, such as comedy sketches and phone calls, though occasional errors in name recognition were noted. Gemini also demonstrated the ability to extract tasks, generate to-do lists, and highlight key elements from uploaded recordings, proving useful for both personal and professional workflows.
The update builds on Gemini's growing set of integrations, including app connections, testing of a card-based interface, and expanded personalization tools. Competitors like OpenAI's ChatGPT leverage the Whisper model for transcription, Anthropic's Claude supports audio in some developer environments, and Perplexity extracts data from YouTube. Gemini aims to distinguish itself by emphasizing everyday usability across a wide audience.
Beyond transcription, Gemini offers advanced audio data processing. Users can request simplified language outputs, isolate speaker-specific remarks, generate questions, or build study guides from recorded content. These features provide flexible options to repurpose audio into actionable insights.
However, limitations remain. The 10-minute cap restricts longer recordings, and free-tier users face daily usage limits, potentially hindering heavy users. Google has not revealed pricing for large-scale processing, though the service consumes standard Gemini quota, requiring mindful resource management.