Paperless Extension¶
Manage documents with Paperless-ngx, an open-source document management system with powerful OCR and full-text search.
Why Paperless-ngx?¶
| Feature | Benefit |
|---|---|
| Open Source | Free, transparent, community-driven |
| Docker-ready | Easy deployment, runs anywhere |
| Excellent API | Perfect integration with DeskAgent |
| OCR Built-in | Automatic text extraction from scans |
| Full-text Search | Find any document by content |
Perfect Fit for DeskAgent
Paperless-ngx has one of the best REST APIs for document management systems. DeskAgent can search, upload, classify, and organize documents seamlessly - making it the ideal choice for automated document workflows.
Requirement
Requires a running Paperless-ngx installation. Get started with Docker →
What You Can Do¶
Search Documents¶
Full-text search:
Find documents by their content thanks to OCR.
Filter by metadata:
Narrow results by correspondent, tags, or document type.
Find unprocessed:
See documents that haven't been classified yet.
Example requests:
- "Find invoices from 2024"
- "Search for documents containing 'contract renewal'"
- "Show unclassified documents"
- "Find documents tagged as 'Urgent'"
Read Documents¶
Get OCR text:
Read the extracted text from any document.
View metadata:
See tags, correspondent, document type, and dates.
Find similar:
Discover related documents automatically.
Example requests:
- "Read the content of document 42"
- "Show details for the latest invoice"
- "Find documents similar to this one"
Add Documents¶
Upload files:
Add new documents to Paperless.
Set metadata:
Assign title, correspondent, and tags on upload.
Monitor processing:
Check if OCR is complete.
Example requests:
- "Upload this PDF to Paperless"
- "Add this invoice from Supplier GmbH with tag 'Unpaid'"
Organize Documents¶
Update metadata:
Change tags, correspondent, or document type.
Bulk operations:
Apply changes to multiple documents at once.
Auto-tagging:
Set up automatic classification rules.
Example requests:
- "Add the 'Reviewed' tag to document 42"
- "Change the correspondent to 'New Supplier GmbH'"
- "Tag all invoices from last month as 'Q4-2024'"
Key Concepts¶
| Concept | Description |
|---|---|
| Correspondents | Who the document is from (sender/company) |
| Tags | Labels for categorization (can have multiple) |
| Document Types | Category like Invoice, Receipt, Contract |
| Storage Paths | Where the file is stored on disk |
| OCR | Automatic text extraction from scans |
DeskAgent Auto-Tagging¶
DeskAgent includes a powerful Auto-Tagging Agent that uses AI to automatically classify your documents. This is especially useful for processing months of unclassified documents.
How It Works¶
graph LR
A[Scan unclassified docs] --> B[Read OCR text]
B --> C[AI analyzes content]
C --> D[Suggest tags + correspondent]
D --> E[You confirm]
E --> F[Batch classify] - Find unclassified: DeskAgent searches for documents without a correspondent
- Read content: AI reads the OCR text of each document
- Classify: AI determines the document type, sender, and appropriate tags
- Confirm: You review the suggestions before applying
- Apply: All changes are made in one batch operation
Recommended Tag System¶
Set up these tags in Paperless for a complete filing system:
| Tag | Use For |
|---|---|
| Eingangsrechnung | Incoming invoices (from suppliers) |
| Ausgangsrechnung | Outgoing invoices (to customers) |
| Kontoauszug | Bank statements |
| Vertrag | Contracts, agreements |
| AnSteuerberater | Tax-relevant documents |
| Finanzen | General financial documents |
| NichtZugeordnet | Fallback for unclear documents |
Using Correspondents as Status Marker¶
Pro tip: Use the correspondent field to track classification status:
- No correspondent = Document needs classification
- Has correspondent = Already processed
This makes it easy to find unprocessed documents:
Storage Paths for Business vs Private¶
Organize documents by ownership:
| Storage Path | For Documents |
|---|---|
| Business | Company invoices, contracts, bank statements |
| Private | Personal documents, private purchases |
DeskAgent can automatically set the storage path based on document content.
Example: Monthly Classification¶
Say: "Classify all untagged documents from January 2025"
DeskAgent will:
- Find all documents from January without a correspondent
- Read each document's OCR text
- Show you a table with suggested classifications:
| Document | Correspondent | Tags | Storage Path |
|---|---|---|---|
| Invoice-001.pdf | Starlink | Eingangsrechnung, AnSteuerberater | Business |
| Amazon-Receipt.pdf | Amazon | Eingangsrechnung | Private |
| Bank-Statement.pdf | Sparkasse | Kontoauszug, AnSteuerberater | Private |
- After your confirmation, apply all changes at once
Time Savings
Processing 50 documents manually: ~2 hours With DeskAgent Auto-Tagging: ~5 minutes
Workflow Examples¶
Single Invoice Processing¶
- Upload: "Upload the invoice PDF from my Downloads"
- Wait for OCR: Paperless processes the document (usually seconds)
- Review: "Read the content of the new document"
- Classify: "Set correspondent to Supplier GmbH and tag as Invoice"
- Later: "Mark all invoices from last month as 'Paid'"
Bulk Classification¶
- Find: "Show unclassified documents from December"
- Auto-tag: "Classify these documents automatically"
- Review: Check the AI suggestions
- Apply: Confirm to apply all changes
Tax Preparation¶
- Export: "Find all documents tagged 'AnSteuerberater' from 2024"
- Download: "Download these as a ZIP file"
- Send: Ready for your accountant
Matching Rules¶
Paperless can also classify documents automatically using built-in rules:
| Match Type | How it works |
|---|---|
| Any | One keyword matches |
| All | All keywords must match |
| Literal | Exact text match |
| Regex | Pattern matching |
| Fuzzy | Approximate matching |
AI vs Rules
Paperless rules work great for predictable documents (same sender, same format). DeskAgent AI excels at documents that vary in format or require context understanding.
Setup¶
Step 1: Get Your API Token¶
In Paperless-ngx:
- Go to Settings > Users
- Click on your user
- Find or generate an API token
Or ask DeskAgent: "Get Paperless token for username password"
Step 2: Configure DeskAgent¶
- Open DeskAgent Settings
- Go to Integrations
- Find Paperless
- Enter:
- Server URL (e.g.,
http://localhost:8000) - API Token
- Click Test Connection
Paperless vs ecoDMS¶
| Feature | Paperless-ngx | ecoDMS |
|---|---|---|
| Price | Free (open source) | Commercial |
| OCR | Excellent | Good |
| Self-hosted | Required | Required |
| Mobile app | Web only | Yes |
| Setup complexity | Moderate | Easy |
Tips¶
-
Let OCR complete: Wait a few seconds after upload before reading content.
-
Use correspondents: Set up correspondents for frequent senders.
-
Tag system: Create a consistent tagging system (e.g., Year, Status, Type).
-
Bulk classify: Update multiple documents at once for efficiency.
Common Issues¶
| Problem | Solution |
|---|---|
| "Connection refused" | Check Paperless is running |
| "Invalid token" | Generate a new API token |
| "Document not found" | Wait for consumption to complete |
| "No text content" | OCR may still be processing |