Voice Input¶

Replace typing with speaking - everywhere on your computer.

Two powerful use cases:

Talk to DeskAgent - Give instructions by voice instead of typing. "Reply to this email professionally" or "Create an offer for this customer"
Dictate in any application - Use voice input in Word, your browser, email clients, chat apps - anywhere you can type. Your speech is accurately transcribed and inserted instantly.

Both use OpenAI's Whisper for professional-grade transcription that handles technical terms, names, and multiple languages with high accuracy.

Overview¶

DeskAgent supports voice input in two ways:

Method	Use Case
WebUI Microphone	Click the mic button in the chat input area
System Hotkey	Press a hotkey from any application - even with DeskAgent minimized

Both methods use OpenAI's Whisper for accurate speech recognition.

Requirements¶

OpenAI API Key Required

Voice input requires an OpenAI API key for the Whisper transcription service.

Cost: ~$0.006 per minute of audio (~0.5 cents per minute)

Setup¶

Get an API key from OpenAI Platform
Add it to config/backends.json:

"openai": {
  "type": "openai_api",
  "api_key": "sk-your-api-key-here"
}

WebUI Voice Input¶

How to Use¶

Click the microphone button (🎤) next to the text input
Speak your request - the button pulses while recording
Click again to stop - your speech is transcribed and optionally sent

Voice input button — The microphone button (🎤) next to the input field

Recording active — While recording: Red stop button to end recording

Keyboard Shortcut¶

Shortcut	Action
`Ctrl`+`M`	Start/stop recording
`Esc`	Cancel recording

Auto-Submit¶

By default, the transcribed text is automatically submitted. To review before sending:

config/system.json

"voice_input": {
  "auto_submit": false
}

Agent Input Dialogs¶

Voice input also works in agent pre-prompt dialogs. When an agent requires text input before starting (like a description or instructions), you can use the microphone button to dictate instead of typing.

This is especially useful for agents like:

Archive Files - Dictate the description for documents
Create Offer - Speak special requirements or notes
Any agent with text inputs - Look for the 🎤 button next to text fields

System-Wide Hotkeys¶

The real power comes from system-wide hotkeys. Use them from any application - Outlook, your browser, Word, anywhere.

Available Hotkeys¶

Hotkey	Name	Action
`Ctrl`+`Shift`+`Space`	Dictate	Record → paste text into active app
`Ctrl`+`Shift`+`Enter`	Dictate + Enter	Record → paste text → press Enter
`Ctrl`+`Shift`+`Backspace`	Agent	Record → start email reply agent

Dictate Mode¶

Dictate into any application:

1. Click in a text field (Word, browser, Notepad, chat, etc.)
2. Press Ctrl+Shift+Space → 🎤 Recording starts
3. Dictate your text
4. Press Ctrl+Shift+Space again → Text is pasted

Tip: Use Ctrl+Shift+Enter to paste and press Enter automatically - perfect for chat apps like Teams or Slack.

Agent Mode¶

Start the email reply agent with voice instructions:

1. Select an email in Outlook
2. Press Ctrl+Shift+Backspace → 🎤 Recording starts
3. Say: "Please reply professionally, mention our 30-day trial"
4. Press Ctrl+Shift+Backspace again → Recording stops
5. DeskAgent starts the reply agent with your instructions

The agent reads the selected email, drafts a reply based on your instructions, and opens it in Outlook for review.

Configuration¶

Full configuration options in config/system.json:

config/system.json

"voice_input": {
  "enabled": true,
  "language": "de",
  "auto_submit": true,
  "hotkey": "Ctrl+M",
  "dictate_hotkey": "Ctrl+Shift+Space",
  "dictate_hotkey_enter": "Ctrl+Shift+Enter",
  "agent_hotkey": "Ctrl+Shift+Backspace",
  "outlook_agent": "reply_email"
}

Option	Default	Description
`enabled`	`true`	Enable/disable voice input globally
`language`	`"de"`	Transcription language (de, en, fr, etc.)
`auto_submit`	`true`	Auto-send after transcription in WebUI
`hotkey`	`"Ctrl+M"`	WebUI recording hotkey
`dictate_hotkey`	`"Ctrl+Shift+Space"`	Dictate hotkey (paste text)
`dictate_hotkey_enter`	`"Ctrl+Shift+Enter"`	Dictate + Enter hotkey
`agent_hotkey`	`"Ctrl+Shift+Backspace"`	Agent hotkey (starts `outlook_agent`)
`outlook_agent`	`"reply_email"`	Agent to start with agent hotkey

Improving Recognition¶

Whisper works well out of the box, but you can improve accuracy for specialized terms.

Keywords File (Recommended)¶

Create knowledge/whisper_keywords.md with terms Whisper should recognize:

knowledge/whisper_keywords.md

realvirtual GmbH, game4automation, DeskAgent, Digital Twin, Unity
OPC UA, PLC, Siemens, Beckhoff, MQTT
Professional Edition, Research & Education Bundle
Thomas Strigl, Kranya

Include:

Company and product names
Industry terms and acronyms
People's names
Unusual spellings

Tip: Keep it to ~20 keywords for best performance.

Automatic Extraction¶

If you don't create a keywords file, DeskAgent automatically extracts terms from:

knowledge/company.md
knowledge/products.md

Audio Feedback¶

DeskAgent provides audio feedback so you know what's happening:

Sound	Meaning
High beep (800 Hz)	Recording started
Low beep (400 Hz)	Recording stopped
Soft ticks	Processing/transcribing

Outlook Web Support¶

The system hotkey also works with Outlook Web (Office 365 in browser):

Open Outlook Web in Chrome/Edge
Click on an email to select it
Press Ctrl+Shift+Space to record
DeskAgent extracts the message ID from the URL
The reply agent processes it like desktop Outlook

Browser Integration

First use may trigger a consent dialog for browser integration. This starts a browser with remote debugging to read the current URL.

Troubleshooting¶

Voice button not showing¶

Check: Is the OpenAI API key configured?

# In DeskAgent chat, ask:
"Is voice input available?"

"OpenAI API key not configured"¶

Add your API key to config/backends.json under ai_backends.openai.api_key.

Recording doesn't start¶

Check dependencies:

pip install sounddevice soundfile numpy pyperclip keyboard pynput

Text not pasting (generic mode)¶

Make sure a text field is focused
Try clicking in the target field before pressing the hotkey
Check if pyperclip is installed

Agent hotkey not starting agent¶

Make sure an email is selected in Outlook (single click, not opened)
Check that outlook_agent is configured in config/system.json
On Outlook Web: Make sure you're on an email detail page (URL contains message ID)

Poor transcription quality¶

Create a knowledge/whisper_keywords.md file
Speak clearly and at normal pace
Reduce background noise
Check your microphone settings

API Details¶

For developers integrating voice input:

Endpoint	Method	Description
`/transcribe/status`	GET	Check availability and config
`/transcribe`	POST	Transcribe audio file (multipart/form-data)

Transcription cost: $0.006 per minute (tracked in cost statistics)

Next Steps¶

Keyboard Shortcuts

Learn all the shortcuts for efficient work

Your Assistant
Email Automation

Automate your email workflows

Email Guide