Skip to content

Testing & Comparison

Test agents safely and compare AI backend performance.

Overview

DeskAgent includes powerful testing features for developers:

Feature Description
Dry-Run Mode Execute agents without making actual changes
Backend Comparison Run same agent across multiple AI backends
Split View See parallel streaming output from all backends
Simulated Actions See what actions would happen without executing

Developer Mode Required

Testing features require developer_mode: true in config/system.json.

Quick Access

Action Trigger
Preview (Dry-Run) Right-click agent → "Vorschau"
Compare Backends Ctrl+Shift+Click on agent tile
Compare with Dialog Right-click → "Vergleichen"

Dry-Run Mode

Dry-run mode lets you test agents against real data without executing destructive operations.

How It Works

  1. Read operations execute normally (real emails, real documents)
  2. Write operations are simulated (moves, deletes, flags, transfers)
  3. Results show what would have happened

Simulated Operations

These tools return simulated success instead of executing:

Category Tools
Outlook move_email, delete_email, flag_email, batch_email_actions
SEPA create_sepa_transfer, create_sepa_batch
Paperless upload_document, update_document, delete_document
Billomat create_offer, create_invoice, finalize_offer

Using Dry-Run

Via Right-Click Menu:

  1. Right-click on an agent tile
  2. Select "Vorschau" (Preview)
  3. Agent runs with dry-run enabled
  4. Review simulated actions in the output

Via API:

GET /agent/{name}?dry_run=true

Output Format

In dry-run mode, simulated actions show:

{
  "success": true,
  "simulated": true,
  "action": "outlook_move_email",
  "args": {"entry_id": "ABC123", "folder": "ToDelete"},
  "message": "[DRY-RUN] Would execute: outlook_move_email"
}

Backend Comparison

Compare how different AI backends handle the same agent.

Starting a Comparison

Method 1: Ctrl+Shift+Click

  1. Hold Ctrl+Shift and click an agent tile
  2. The comparison dialog opens
  3. Select which backends to test
  4. Toggle dry-run mode (recommended on)
  5. Click "Vergleichen starten"

Method 2: Right-Click Menu

  1. Right-click on an agent tile
  2. Select "Vergleichen" (Compare)
  3. Configure backends and options
  4. Start comparison

Split View UI

When comparing multiple backends, DeskAgent shows a split view:

┌─────────────────────────────────────────────────────┐
│ Backend Comparison: daily_check        [DRY RUN]   X│
├─────────────────┬─────────────────┬─────────────────┤
│ claude_sdk      │ gemini          │ openai          │
│ ● Running...    │ ✓ Completed     │ ● Running...    │
├─────────────────┼─────────────────┼─────────────────┤
│ Streaming       │ Found 5 emails  │ Streaming       │
│ output here...  │ to process...   │ output here...  │
│                 │                 │                 │
├─────────────────┼─────────────────┼─────────────────┤
│ 12.5s           │ 8.3s            │ --              │
│ 1500/800 tok    │ 1200/650 tok    │ --              │
│ $0.0450         │ $0.0180         │ --              │
└─────────────────┴─────────────────┴─────────────────┘
│ 2/3 successful | Fastest: gemini | Cheapest: gemini │
└─────────────────────────────────────────────────────┘

Comparison Metrics

Each backend shows:

Metric Description
Time Execution duration in seconds
Tokens Input/Output token count
Cost Estimated cost in USD
Status Success/Error indicator

Winners

After completion, DeskAgent identifies:

  • Fastest: Lowest execution time
  • Cheapest: Lowest cost (excluding free backends)
  • Most Tokens: Highest output token count

Comparison Results

JSON Export

Click "Export JSON" to download comparison results:

{
  "agent": "daily_check",
  "timestamp": "2025-01-03T10:30:00",
  "dry_run": true,
  "backends": {
    "claude_sdk": {
      "success": true,
      "duration_sec": 12.5,
      "tokens": {"input": 1500, "output": 800},
      "cost_usd": 0.045,
      "simulated_actions": [
        {"tool": "outlook_move_email", "args": {...}}
      ]
    },
    "gemini": {...}
  },
  "winner": {
    "fastest": "gemini",
    "cheapest": "gemini"
  }
}

Saved Comparisons

Results are automatically saved to:

workspace/.logs/comparisons/compare_{agent}_{timestamp}.json

API Endpoints

Endpoint Description
POST /test/compare Run comparison
GET /test/comparisons List saved comparisons
GET /test/comparison/{file} Get specific comparison
DELETE /test/comparisons Clear all comparisons

Backend Selection

Configuring Backends

In config/backends.json, enable the backends you want to test:

{
  "ai_backends": {
    "claude_sdk": {
      "type": "claude_agent_sdk",
      "enabled": true
    },
    "gemini": {
      "type": "gemini_adk",
      "enabled": true
    },
    "openai": {
      "type": "openai_api",
      "enabled": true
    }
  }
}

Backend Requirements

Each backend needs:

  1. API Key configured (if cloud-based)
  2. enabled: true in config
  3. Valid configuration (the comparison dialog shows unconfigured backends in gray)

Best Practices

When to Use Dry-Run

  • Testing new agent logic
  • Verifying email categorization rules
  • Checking SEPA transfer amounts before execution
  • Training new team members

When to Compare Backends

  • Optimizing cost vs. quality
  • Evaluating new AI models
  • Finding the best backend for specific tasks
  • Benchmarking performance
  1. Develop agent with dry-run enabled
  2. Compare across backends to find best fit
  3. Test with real data (still dry-run)
  4. Deploy with chosen backend

Keyboard Shortcuts Summary

Shortcut Action
Click Run agent normally
Ctrl+Click Edit agent
Shift+Click Add context before running
Ctrl+Shift+Click Compare across backends
Right-Click Open context menu

Troubleshooting

Comparison Dialog Shows No Backends

  • Check developer_mode: true in system.json
  • Verify backends are configured in backends.json
  • Restart DeskAgent after config changes

Backend Shows "Not Configured"

  • Add required API key to backends.json
  • Check API key format and validity
  • Verify internet connection for cloud backends

Split View Not Loading

  • Check browser console for errors (F12)
  • Ensure DeskAgent server is running
  • Refresh the page