Documentation Index
Fetch the complete documentation index at: https://sure-917046f5-docs-backup-restore-clarity.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
LLM usage is optional and must be explicitly configured. You can maintain full control by running local models on your own systems through Ollama, LM Studio, or similar tools. No data is sent to external services unless you configure a cloud provider.
Configuration via settings
Configure AI settings through the web interface:- Go to Settings → Self-Hosting
- Scroll to the AI Provider section
- Configure:
- OpenAI Access Token - Your API key
- OpenAI URI Base - Custom endpoint (leave blank for OpenAI)
- OpenAI Model - Model name (required for custom endpoints)
OpenAI compatible API
Sure supports any OpenAI-compatible API endpoint, giving you flexibility to use:- OpenAI - Direct access to GPT models
- Ollama - Run models locally on your hardware
- LM Studio - Local model hosting with a GUI
- OpenRouter - Access to multiple providers (Anthropic, Google, etc.)
- Other providers - Groq, Together AI, Anyscale, Replicate, and more
OpenAI
gpt-4.1- Default, best balance of speed and qualitygpt-5- Latest model, highest qualitygpt-4o-mini- Cheaper, good quality
Ollama (local)
LM Studio (local)
- Download from lmstudio.ai
- Download a model through the UI
- Start the local server
- Configure Sure:
OpenRouter
Access multiple providers through a single API:google/gemini-2.5-flash- Fast and capableanthropic/claude-sonnet-4.5- Excellent reasoninganthropic/claude-haiku-4.5- Fast and cost-effective
Token budget
Sure applies a token budget to every outbound LLM call — chat history, auto-categorization, merchant detection, and PDF processing. The defaults are conservative (2048-token context window) so small-context local models like Ollama work out of the box. If you use a cloud provider or a larger-context local model, raise these values.Configure via settings UI
- Go to Settings → Self-Hosting
- Scroll to the AI Provider section
- Under Token Budget, configure:
- Context Window — total tokens the model accepts (default:
2048) - Max Response Tokens — tokens reserved for the model’s reply (default:
512) - Max Items Per Batch — upper bound for auto-categorize and merchant detection batches (default:
25)
- Context Window — total tokens the model accepts (default:
Configure via environment variables
Environment variables take precedence over the settings UI.| Variable | Description | Default |
|---|---|---|
LLM_CONTEXT_WINDOW | Total tokens the model will accept | 2048 |
LLM_MAX_RESPONSE_TOKENS | Tokens reserved for the model’s reply | 512 |
LLM_MAX_HISTORY_TOKENS | Maximum tokens for conversation history. Derived automatically if unset (context_window - max_response_tokens - system_prompt_reserve) | Derived |
LLM_SYSTEM_PROMPT_RESERVE | Tokens reserved for the system prompt | 256 |
LLM_MAX_ITEMS_PER_CALL | Upper bound on auto-categorize / merchant detection batch size | 25 |
Large batches of transactions are automatically sliced to fit the configured context window. You no longer need to worry about the previous 25-item hard limit — it is now a soft default that adapts to your model’s capacity.
Recommended values
| Setup | Context window | Max response tokens | Max items per batch |
|---|---|---|---|
| Ollama / small local models | 2048 (default) | 512 (default) | 25 (default) |
| Cloud OpenAI (gpt-4.1, gpt-5) | 16384 or higher | 4096 | 50 |
| Large-context local models | 8192 | 2048 | 50 |
Responses API routing
Sure automatically routes chat requests to the OpenAI Responses API when using the official OpenAI endpoint, and falls back to the Chat Completions API for custom providers. You can override this behavior with theOPENAI_SUPPORTS_RESPONSES_ENDPOINT environment variable.
| Variable | Description | Default |
|---|---|---|
OPENAI_SUPPORTS_RESPONSES_ENDPOINT | Set to true to force the Responses API on a custom provider, or false to force the Chat Completions API on all providers | Auto-detected |
AI cache management
Sure caches AI-generated results (like auto-categorization and merchant detection) to avoid redundant API calls and costs.What is the AI cache?
When AI rules process transactions, Sure stores:- Enrichment records - Which attributes were set by AI (category, merchant, etc.)
- Attribute locks - Prevents rules from re-processing already-handled transactions
- Transactions won’t be sent to the LLM repeatedly
- Your API costs are minimized
- Processing is faster on subsequent rule runs
When to reset the AI cache
You might want to reset the cache when:- Switching LLM models - Different models may produce better categorizations
- Improving prompts - After system updates with better prompts
- Fixing miscategorizations - When AI made systematic errors
- Testing - During development or evaluation of AI features
How to reset the AI cache
Via UI (recommended):- Go to Settings → Rules
- Click the menu button (three dots)
- Select Reset AI cache
- Confirm the action
What happens when cache is reset
- AI-locked attributes are unlocked - Transactions can be re-enriched
- AI enrichment records are deleted - The history of AI changes is cleared
- User edits are preserved - If you manually changed a category after AI set it, your change is kept
Evaluation system
Test and compare different LLMs for your specific use case. The eval system helps you benchmark models for transaction categorization, merchant detection, and chat assistant functionality. See the evaluation framework documentation for details on:- Running evaluations
- Comparing models
- Creating custom datasets
- Langfuse integration for tracking experiments
Additional environment variables
These optional variables fine-tune the behavior of the OpenAI-compatible provider.| Variable | Description | Default |
|---|---|---|
OPENAI_ACCESS_TOKEN | API key for the provider | — |
OPENAI_URI_BASE | Custom endpoint URL (leave blank for OpenAI) | — |
OPENAI_MODEL | Model name (required for custom endpoints) | gpt-4.1 |
OPENAI_REQUEST_TIMEOUT | HTTP timeout in seconds. Raise for slow local models | 60 |
OPENAI_SUPPORTS_PDF_PROCESSING | Set to false for endpoints without vision support | true |
OPENAI_SUPPORTS_RESPONSES_ENDPOINT | Override Responses API vs Chat Completions routing | Auto-detected |
LLM_JSON_MODE | JSON output mode: auto, strict, json_object, or none | — |
LLM_CONTEXT_WINDOW, LLM_MAX_RESPONSE_TOKENS, and related variables.
Docker compose example
Basic Ollama setup
Advanced AI setup with OpenClaw
For advanced AI features including code execution and tool use, you can use thelocal-ai profile with OpenClaw:
local-ai profile includes:
- Ollama - Local LLM inference
- OpenClaw - Gateway providing enhanced AI capabilities and tool use
- Automatic routing between Sure and the AI stack