Testing & Comparison¶
Test agents safely and compare AI backend performance.
Overview¶
DeskAgent includes powerful testing features for developers:
| Feature | Description |
|---|---|
| Dry-Run Mode | Execute agents without making actual changes |
| Backend Comparison | Run same agent across multiple AI backends |
| Split View | See parallel streaming output from all backends |
| Simulated Actions | See what actions would happen without executing |
Developer Mode Required
Testing features require developer_mode: true in config/system.json.
Quick Access¶
| Action | Trigger |
|---|---|
| Preview (Dry-Run) | Right-click agent → "Vorschau" |
| Compare Backends | Ctrl+Shift+Click on agent tile |
| Compare with Dialog | Right-click → "Vergleichen" |
Dry-Run Mode¶
Dry-run mode lets you test agents against real data without executing destructive operations.
How It Works¶
- Read operations execute normally (real emails, real documents)
- Write operations are simulated (moves, deletes, flags, transfers)
- Results show what would have happened
Simulated Operations¶
These tools return simulated success instead of executing:
| Category | Tools |
|---|---|
| Outlook | move_email, delete_email, flag_email, batch_email_actions |
| SEPA | create_sepa_transfer, create_sepa_batch |
| Paperless | upload_document, update_document, delete_document |
| Billomat | create_offer, create_invoice, finalize_offer |
Using Dry-Run¶
Via Right-Click Menu:
- Right-click on an agent tile
- Select "Vorschau" (Preview)
- Agent runs with dry-run enabled
- Review simulated actions in the output
Via API:
Output Format¶
In dry-run mode, simulated actions show:
{
"success": true,
"simulated": true,
"action": "outlook_move_email",
"args": {"entry_id": "ABC123", "folder": "ToDelete"},
"message": "[DRY-RUN] Would execute: outlook_move_email"
}
Backend Comparison¶
Compare how different AI backends handle the same agent.
Starting a Comparison¶
Method 1: Ctrl+Shift+Click
- Hold Ctrl+Shift and click an agent tile
- The comparison dialog opens
- Select which backends to test
- Toggle dry-run mode (recommended on)
- Click "Vergleichen starten"
Method 2: Right-Click Menu
- Right-click on an agent tile
- Select "Vergleichen" (Compare)
- Configure backends and options
- Start comparison
Split View UI¶
When comparing multiple backends, DeskAgent shows a split view:
┌─────────────────────────────────────────────────────┐
│ Backend Comparison: daily_check [DRY RUN] X│
├─────────────────┬─────────────────┬─────────────────┤
│ claude_sdk │ gemini │ openai │
│ ● Running... │ ✓ Completed │ ● Running... │
├─────────────────┼─────────────────┼─────────────────┤
│ Streaming │ Found 5 emails │ Streaming │
│ output here... │ to process... │ output here... │
│ │ │ │
├─────────────────┼─────────────────┼─────────────────┤
│ 12.5s │ 8.3s │ -- │
│ 1500/800 tok │ 1200/650 tok │ -- │
│ $0.0450 │ $0.0180 │ -- │
└─────────────────┴─────────────────┴─────────────────┘
│ 2/3 successful | Fastest: gemini | Cheapest: gemini │
└─────────────────────────────────────────────────────┘
Comparison Metrics¶
Each backend shows:
| Metric | Description |
|---|---|
| Time | Execution duration in seconds |
| Tokens | Input/Output token count |
| Cost | Estimated cost in USD |
| Status | Success/Error indicator |
Winners¶
After completion, DeskAgent identifies:
- Fastest: Lowest execution time
- Cheapest: Lowest cost (excluding free backends)
- Most Tokens: Highest output token count
Comparison Results¶
JSON Export¶
Click "Export JSON" to download comparison results:
{
"agent": "daily_check",
"timestamp": "2025-01-03T10:30:00",
"dry_run": true,
"backends": {
"claude_sdk": {
"success": true,
"duration_sec": 12.5,
"tokens": {"input": 1500, "output": 800},
"cost_usd": 0.045,
"simulated_actions": [
{"tool": "outlook_move_email", "args": {...}}
]
},
"gemini": {...}
},
"winner": {
"fastest": "gemini",
"cheapest": "gemini"
}
}
Saved Comparisons¶
Results are automatically saved to:
API Endpoints¶
| Endpoint | Description |
|---|---|
POST /test/compare | Run comparison |
GET /test/comparisons | List saved comparisons |
GET /test/comparison/{file} | Get specific comparison |
DELETE /test/comparisons | Clear all comparisons |
Backend Selection¶
Configuring Backends¶
In config/backends.json, enable the backends you want to test:
{
"ai_backends": {
"claude_sdk": {
"type": "claude_agent_sdk",
"enabled": true
},
"gemini": {
"type": "gemini_adk",
"enabled": true
},
"openai": {
"type": "openai_api",
"enabled": true
}
}
}
Backend Requirements¶
Each backend needs:
- API Key configured (if cloud-based)
- enabled: true in config
- Valid configuration (the comparison dialog shows unconfigured backends in gray)
Best Practices¶
When to Use Dry-Run¶
- Testing new agent logic
- Verifying email categorization rules
- Checking SEPA transfer amounts before execution
- Training new team members
When to Compare Backends¶
- Optimizing cost vs. quality
- Evaluating new AI models
- Finding the best backend for specific tasks
- Benchmarking performance
Recommended Workflow¶
- Develop agent with dry-run enabled
- Compare across backends to find best fit
- Test with real data (still dry-run)
- Deploy with chosen backend
Keyboard Shortcuts Summary¶
| Shortcut | Action |
|---|---|
| Click | Run agent normally |
| Ctrl+Click | Edit agent |
| Shift+Click | Add context before running |
| Ctrl+Shift+Click | Compare across backends |
| Right-Click | Open context menu |
Troubleshooting¶
Comparison Dialog Shows No Backends¶
- Check
developer_mode: truein system.json - Verify backends are configured in backends.json
- Restart DeskAgent after config changes
Backend Shows "Not Configured"¶
- Add required API key to backends.json
- Check API key format and validity
- Verify internet connection for cloud backends
Split View Not Loading¶
- Check browser console for errors (F12)
- Ensure DeskAgent server is running
- Refresh the page