Testing & Comparison¶

Test agents safely and compare AI backend performance.

Overview¶

DeskAgent includes powerful testing features for developers:

Feature	Description
Dry-Run Mode	Execute agents without making actual changes
Backend Comparison	Run same agent across multiple AI backends
Split View	See parallel streaming output from all backends
Simulated Actions	See what actions would happen without executing

Developer Mode Required

Testing features require developer_mode: true in config/system.json.

Quick Access¶

Action	Trigger
Preview (Dry-Run)	Right-click agent → "Vorschau"
Compare Backends	Ctrl+Shift+Click on agent tile
Compare with Dialog	Right-click → "Vergleichen"

Dry-Run Mode¶

Dry-run mode lets you test agents against real data without executing destructive operations.

How It Works¶

Read operations execute normally (real emails, real documents)
Write operations are simulated (moves, deletes, flags, transfers)
Results show what would have happened

Simulated Operations¶

These tools return simulated success instead of executing:

Category	Tools
Outlook	move_email, delete_email, flag_email, batch_email_actions
SEPA	create_sepa_transfer, create_sepa_batch
Paperless	upload_document, update_document, delete_document
Billomat	create_offer, create_invoice, finalize_offer

Using Dry-Run¶

Via Right-Click Menu:

Right-click on an agent tile
Select "Vorschau" (Preview)
Agent runs with dry-run enabled
Review simulated actions in the output

Via API:

GET /agent/{name}?dry_run=true

Output Format¶

In dry-run mode, simulated actions show:

{
  "success": true,
  "simulated": true,
  "action": "outlook_move_email",
  "args": {"entry_id": "ABC123", "folder": "ToDelete"},
  "message": "[DRY-RUN] Would execute: outlook_move_email"
}

Backend Comparison¶

Compare how different AI backends handle the same agent.

Starting a Comparison¶

Method 1: Ctrl+Shift+Click

Hold Ctrl+Shift and click an agent tile
The comparison dialog opens
Select which backends to test
Toggle dry-run mode (recommended on)
Click "Vergleichen starten"

Method 2: Right-Click Menu

Right-click on an agent tile
Select "Vergleichen" (Compare)
Configure backends and options
Start comparison

Split View UI¶

When comparing multiple backends, DeskAgent shows a split view:

┌─────────────────────────────────────────────────────┐
│ Backend Comparison: daily_check        [DRY RUN]   X│
├─────────────────┬─────────────────┬─────────────────┤
│ claude_sdk      │ gemini          │ openai          │
│ ● Running...    │ ✓ Completed     │ ● Running...    │
├─────────────────┼─────────────────┼─────────────────┤
│ Streaming       │ Found 5 emails  │ Streaming       │
│ output here...  │ to process...   │ output here...  │
│                 │                 │                 │
├─────────────────┼─────────────────┼─────────────────┤
│ 12.5s           │ 8.3s            │ --              │
│ 1500/800 tok    │ 1200/650 tok    │ --              │
│ $0.0450         │ $0.0180         │ --              │
└─────────────────┴─────────────────┴─────────────────┘
│ 2/3 successful | Fastest: gemini | Cheapest: gemini │
└─────────────────────────────────────────────────────┘

Comparison Metrics¶

Each backend shows:

Metric	Description
Time	Execution duration in seconds
Tokens	Input/Output token count
Cost	Estimated cost in USD
Status	Success/Error indicator

Winners¶

After completion, DeskAgent identifies:

Fastest: Lowest execution time
Cheapest: Lowest cost (excluding free backends)
Most Tokens: Highest output token count

Comparison Results¶

JSON Export¶

Click "Export JSON" to download comparison results:

{
  "agent": "daily_check",
  "timestamp": "2025-01-03T10:30:00",
  "dry_run": true,
  "backends": {
    "claude_sdk": {
      "success": true,
      "duration_sec": 12.5,
      "tokens": {"input": 1500, "output": 800},
      "cost_usd": 0.045,
      "simulated_actions": [
        {"tool": "outlook_move_email", "args": {...}}
      ]
    },
    "gemini": {...}
  },
  "winner": {
    "fastest": "gemini",
    "cheapest": "gemini"
  }
}

Saved Comparisons¶

Results are automatically saved to:

workspace/.logs/comparisons/compare_{agent}_{timestamp}.json

API Endpoints¶

Endpoint	Description
`POST /test/compare`	Run comparison
`GET /test/comparisons`	List saved comparisons
`GET /test/comparison/{file}`	Get specific comparison
`DELETE /test/comparisons`	Clear all comparisons

Backend Selection¶

Configuring Backends¶

In config/backends.json, enable the backends you want to test:

{
  "ai_backends": {
    "claude_sdk": {
      "type": "claude_agent_sdk",
      "enabled": true
    },
    "gemini": {
      "type": "gemini_adk",
      "enabled": true
    },
    "openai": {
      "type": "openai_api",
      "enabled": true
    }
  }
}

Backend Requirements¶

Each backend needs:

API Key configured (if cloud-based)
enabled: true in config
Valid configuration (the comparison dialog shows unconfigured backends in gray)

Best Practices¶

When to Use Dry-Run¶

Testing new agent logic
Verifying email categorization rules
Checking SEPA transfer amounts before execution
Training new team members

When to Compare Backends¶

Optimizing cost vs. quality
Evaluating new AI models
Finding the best backend for specific tasks
Benchmarking performance

Recommended Workflow¶

Develop agent with dry-run enabled
Compare across backends to find best fit
Test with real data (still dry-run)
Deploy with chosen backend

Keyboard Shortcuts Summary¶

Shortcut	Action
Click	Run agent normally
Ctrl+Click	Edit agent
Shift+Click	Add context before running
Ctrl+Shift+Click	Compare across backends
Right-Click	Open context menu

Troubleshooting¶

Comparison Dialog Shows No Backends¶

Check developer_mode: true in system.json
Verify backends are configured in backends.json
Restart DeskAgent after config changes

Backend Shows "Not Configured"¶

Add required API key to backends.json
Check API key format and validity
Verify internet connection for cloud backends

Split View Not Loading¶

Check browser console for errors (F12)
Ensure DeskAgent server is running
Refresh the page