LLM support

LLM usage is optional and must be explicitly configured. You can maintain full control by running local models on your own systems through Ollama, LM Studio, or similar tools. No data is sent to external services unless you configure a cloud provider.

Configuration via settings

Configure AI settings through the web interface:

Go to Settings → Self-Hosting
Scroll to the AI Provider section
Configure:
- OpenAI Access Token - Your API key
- OpenAI URI Base - Custom endpoint (leave blank for OpenAI)
- OpenAI Model - Model name (required for custom endpoints)

Settings in the UI override environment variables.

OpenAI compatible API

Sure supports any OpenAI-compatible API endpoint, giving you flexibility to use:

OpenAI - Direct access to GPT models
Ollama - Run models locally on your hardware
LM Studio - Local model hosting with a GUI
OpenRouter - Access to multiple providers (Anthropic, Google, etc.)
Other providers - Groq, Together AI, Anyscale, Replicate, and more

OpenAI

OPENAI_ACCESS_TOKEN=sk-proj-...
# No other configuration needed

Recommended models:

gpt-4.1 - Default, best balance of speed and quality
gpt-5 - Latest model, highest quality
gpt-4o-mini - Cheaper, good quality

Ollama (local)

# Dummy token (Ollama doesn't need authentication)
OPENAI_ACCESS_TOKEN=ollama-local

# Ollama API endpoint
OPENAI_URI_BASE=http://localhost:11434/v1

# Model you pulled
OPENAI_MODEL=llama3.1:13b

Install and run Ollama:

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama
ollama serve

# Pull a model
ollama pull llama3.1:13b

LM Studio (local)

Download from lmstudio.ai
Download a model through the UI
Start the local server
Configure Sure:

OPENAI_ACCESS_TOKEN=lmstudio-local
OPENAI_URI_BASE=http://localhost:1234/v1
OPENAI_MODEL=your-model-name

OpenRouter

Access multiple providers through a single API:

OPENAI_ACCESS_TOKEN=your-openrouter-api-key
OPENAI_URI_BASE=https://openrouter.ai/api/v1
OPENAI_MODEL=google/gemini-2.0-flash-exp

Recommended models:

google/gemini-2.5-flash - Fast and capable
anthropic/claude-sonnet-4.5 - Excellent reasoning
anthropic/claude-haiku-4.5 - Fast and cost-effective

Token budget

Sure applies a token budget to every outbound LLM call — chat history, auto-categorization, merchant detection, and PDF processing. The defaults are conservative (2048-token context window) so small-context local models like Ollama work out of the box. If you use a cloud provider or a larger-context local model, raise these values.

Configure via settings UI

Go to Settings → Self-Hosting
Scroll to the AI Provider section
Under Token Budget, configure:
- Context Window — total tokens the model accepts (default: 2048)
- Max Response Tokens — tokens reserved for the model’s reply (default: 512)
- Max Items Per Batch — upper bound for auto-categorize and merchant detection batches (default: 25)

Configure via environment variables

Environment variables take precedence over the settings UI.

Variable	Description	Default
`LLM_CONTEXT_WINDOW`	Total tokens the model will accept	`2048`
`LLM_MAX_RESPONSE_TOKENS`	Tokens reserved for the model’s reply	`512`
`LLM_MAX_HISTORY_TOKENS`	Maximum tokens for conversation history. Derived automatically if unset (`context_window - max_response_tokens - system_prompt_reserve`)	Derived
`LLM_SYSTEM_PROMPT_RESERVE`	Tokens reserved for the system prompt	`256`
`LLM_MAX_ITEMS_PER_CALL`	Upper bound on auto-categorize / merchant detection batch size	`25`

Large batches of transactions are automatically sliced to fit the configured context window. You no longer need to worry about the previous 25-item hard limit — it is now a soft default that adapts to your model’s capacity.

Recommended values

Setup	Context window	Max response tokens	Max items per batch
Ollama / small local models	`2048` (default)	`512` (default)	`25` (default)
Cloud OpenAI (gpt-4.1, gpt-5)	`16384` or higher	`4096`	`50`
Large-context local models	`8192`	`2048`	`50`

Responses API routing

Sure automatically routes chat requests to the OpenAI Responses API when using the official OpenAI endpoint, and falls back to the Chat Completions API for custom providers. You can override this behavior with the OPENAI_SUPPORTS_RESPONSES_ENDPOINT environment variable.

Variable	Description	Default
`OPENAI_SUPPORTS_RESPONSES_ENDPOINT`	Set to `true` to force the Responses API on a custom provider, or `false` to force the Chat Completions API on all providers	Auto-detected

AI cache management

Sure caches AI-generated results (like auto-categorization and merchant detection) to avoid redundant API calls and costs.

What is the AI cache?

When AI rules process transactions, Sure stores:

Enrichment records - Which attributes were set by AI (category, merchant, etc.)
Attribute locks - Prevents rules from re-processing already-handled transactions

This caching means:

Transactions won’t be sent to the LLM repeatedly
Your API costs are minimized
Processing is faster on subsequent rule runs

When to reset the AI cache

You might want to reset the cache when:

Switching LLM models - Different models may produce better categorizations
Improving prompts - After system updates with better prompts
Fixing miscategorizations - When AI made systematic errors
Testing - During development or evaluation of AI features

Resetting the AI cache will cause all transactions to be re-processed by AI rules on the next run. This will incur API costs if using a cloud provider.

How to reset the AI cache

Via UI (recommended):

Go to Settings → Rules
Click the menu button (three dots)
Select Reset AI cache
Confirm the action

The cache is cleared asynchronously in the background. Automatic reset: The AI cache is automatically cleared for all users when the OpenAI model setting is changed. This ensures that the new model processes transactions fresh.

What happens when cache is reset

AI-locked attributes are unlocked - Transactions can be re-enriched
AI enrichment records are deleted - The history of AI changes is cleared
User edits are preserved - If you manually changed a category after AI set it, your change is kept

Evaluation system

Test and compare different LLMs for your specific use case. The eval system helps you benchmark models for transaction categorization, merchant detection, and chat assistant functionality. See the evaluation framework documentation for details on:

Running evaluations
Comparing models
Creating custom datasets
Langfuse integration for tracking experiments

Additional environment variables

These optional variables fine-tune the behavior of the OpenAI-compatible provider.

Variable	Description	Default
`OPENAI_ACCESS_TOKEN`	API key for the provider	—
`OPENAI_URI_BASE`	Custom endpoint URL (leave blank for OpenAI)	—
`OPENAI_MODEL`	Model name (required for custom endpoints)	`gpt-4.1`
`OPENAI_REQUEST_TIMEOUT`	HTTP timeout in seconds. Raise for slow local models	`60`
`OPENAI_SUPPORTS_PDF_PROCESSING`	Set to `false` for endpoints without vision support	`true`
`OPENAI_SUPPORTS_RESPONSES_ENDPOINT`	Override Responses API vs Chat Completions routing	Auto-detected
`LLM_JSON_MODE`	JSON output mode: `auto`, `strict`, `json_object`, or `none`	—

See Token budget for LLM_CONTEXT_WINDOW, LLM_MAX_RESPONSE_TOKENS, and related variables.

Docker compose example

Basic Ollama setup

services:
  sure:
    environment:
      - OPENAI_ACCESS_TOKEN=ollama-local
      - OPENAI_URI_BASE=http://ollama:11434/v1
      - OPENAI_MODEL=llama3.1:13b
      # Optional: raise token budget for larger-context models
      # - LLM_CONTEXT_WINDOW=8192
      # - LLM_MAX_RESPONSE_TOKENS=2048
      # - LLM_MAX_ITEMS_PER_CALL=50
    depends_on:
      - ollama

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    # Uncomment if you have an NVIDIA GPU
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]

volumes:
  ollama_data:

Advanced AI setup with OpenClaw

For advanced AI features including code execution and tool use, you can use the local-ai profile with OpenClaw:

services:
  sure:
    profiles:
      - local-ai
    environment:
      - OPENAI_ACCESS_TOKEN=ollama-local
      - OPENAI_URI_BASE=http://openclaw:8080/v1
      - OPENAI_MODEL=llama3.1:13b
    depends_on:
      - openclaw
      - ollama

  openclaw:
    image: openclaw/openclaw:latest
    profiles:
      - local-ai
    ports:
      - "8080:8080"
    environment:
      - OLLAMA_HOST=http://ollama:11434
    depends_on:
      - ollama

  ollama:
    image: ollama/ollama:latest
    profiles:
      - local-ai
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

volumes:
  ollama_data:

To use the local AI setup:

# Start with local AI profile
docker compose --profile local-ai up

# Or include it with other profiles
docker compose --profile local-ai --profile external-assistant up

The local-ai profile includes:

Ollama - Local LLM inference
OpenClaw - Gateway providing enhanced AI capabilities and tool use
Automatic routing between Sure and the AI stack

Getting started

Key concepts

App features

Development

Authentication

Third party providers

Knowledge Base

Configuration via settings

OpenAI compatible API

OpenAI

Ollama (local)

LM Studio (local)

OpenRouter

Token budget

Configure via settings UI

Configure via environment variables

Recommended values

Responses API routing

AI cache management

What is the AI cache?

When to reset the AI cache

How to reset the AI cache

What happens when cache is reset

Evaluation system

Additional environment variables

Docker compose example

Basic Ollama setup

Advanced AI setup with OpenClaw

Getting started

Key concepts

App features

Development

Authentication

Third party providers

Knowledge Base

Documentation Index

​Configuration via settings

​OpenAI compatible API

​OpenAI

​Ollama (local)

​LM Studio (local)

​OpenRouter

​Token budget

​Configure via settings UI

​Configure via environment variables

​Recommended values

​Responses API routing

​AI cache management

​What is the AI cache?

​When to reset the AI cache

​How to reset the AI cache

​What happens when cache is reset

​Evaluation system

​Additional environment variables

​Docker compose example

​Basic Ollama setup

​Advanced AI setup with OpenClaw

Configuration via settings

OpenAI compatible API

OpenAI

Ollama (local)

LM Studio (local)

OpenRouter

Token budget

Configure via settings UI

Configure via environment variables

Recommended values

Responses API routing

AI cache management

What is the AI cache?

When to reset the AI cache

How to reset the AI cache

What happens when cache is reset

Evaluation system

Additional environment variables

Docker compose example

Basic Ollama setup

Advanced AI setup with OpenClaw