brianletort.ai
All Posts

Building SemanticStudio Series

An 8-part deep dive into building a production-ready multi-agent chat platform

SemanticStudioRAGLLM ConfigurationEnterprise AI

RAG Chain Configuration: Models, Modes, and Fine-Tuning

The power user's guide to configuring SemanticStudio's RAG chain—multi-provider LLM support, mode parameters, and full control over cost vs. quality.

January 26, 20267 min read

TL;DR

  • Multi-provider LLM support: OpenAI, Anthropic, Ollama—mix and match by role
  • Five configurable modes (including Auto) with full parameter control
  • Cost vs. quality tradeoffs: tune max results, graph hops, memory tiers per mode

Most RAG systems give you one model, one mode, and hope it works.

SemanticStudio gives you full control: configure every model role, tune every mode parameter, and mix providers as needed. The cost vs. quality tradeoff is yours to make.

Multi-Provider LLM Support

You're not locked into one provider. SemanticStudio supports:

  • OpenAI: GPT-5.2, GPT-5-mini, o3-deep-research
  • Anthropic: Claude models
  • Ollama: Local models for private data
SemanticStudio model configuration showing all roles and providers

Provider Status

The admin dashboard shows provider availability:

  • OpenAI: Connected, API key configured
  • Anthropic: Not configured (add API key to enable)
  • Ollama: Not available (requires local setup)

You can run SemanticStudio with any combination of providers.

Model Role Configuration

SemanticStudio uses different models for different roles. Each role can be configured independently:

Model Roles

RolePurposeDefault Model
EmbeddingsVector embeddings for RAGtext-embedding-3-large
ComposerMain response generationgpt-5.2
Composer FastQuick mode responsesgpt-5-mini
PlannerQuery planning and domain detectiongpt-5-mini
ReflectionAnswer quality reviewgpt-5.2
Mode ClassifierClassify query complexitygpt-5-mini
Memory ExtractorExtract facts from conversationgpt-5-mini
ResearchDeep research modeo3-deep-research
Image GenerationImage creationgpt-image-1.5

Why Separate Roles?

Different tasks have different requirements:

  • Embeddings: Need consistency, not creativity
  • Composer: Need quality and reasoning
  • Planner: Need speed, handles high volume
  • Research: Need depth, can afford latency

By separating roles, you optimize each for its specific purpose.

Mixing Providers

You can mix providers per role:

Embeddings: OpenAI (text-embedding-3-large)
Composer: Anthropic (Claude)
Planner: OpenAI (gpt-5-mini)
Research: OpenAI (o3-deep-research)
Memory Extractor: Ollama (llama-3)  ← Private processing

Use Ollama for sensitive operations that shouldn't leave your infrastructure.

Mode Configuration Deep Dive

SemanticStudio has five modes, each fully configurable:

4 Configurable Modes

Every parameter adjustable. Cost vs. quality, your call.

ParameterQuickThinkDeepResearch
Max Results5153050
Graph Hops0123
Memory TiersTier 1 onlyTiers 1-2All tiersAll tiers
ReflectionDisabledEnabledEnabledEnabled
ClarificationDisabledDisabledDisabledEnabled
Response Modelgpt-5-minigpt-5.2gpt-5.2o3-deep-research

Think Mode

Balanced approach for standard questions

Pro tip: Start with Think mode for most queries. Use Quick for simple lookups, Deep for analysis, Research for investigations.

Auto Mode (Default)

Use case: New users, general queries where optimal mode isn't obvious

How it works:

  • LLM classifier analyzes query complexity, intent, and scope
  • Automatically selects Quick, Think, Deep, or Research
  • Returns confidence score with selection

Auto mode is the default for new users. It removes the cognitive load of choosing a mode while still providing optimal responses.

Quick Mode

Use case: Simple factual queries, lookups, definitions

Default settings:

  • Max Results: 5
  • Graph Hops: 0 (entity match only)
  • Memory Tiers: Tier 1 only
  • Reflection: Disabled
  • Response Model: gpt-5-mini

Quick mode is optimized for speed. Sub-second responses, minimal token usage.

SemanticStudio Quick mode configuration

Think Mode

Use case: Standard questions, balanced approach

Default settings:

  • Max Results: 15
  • Graph Hops: 1 (direct relationships)
  • Memory Tiers: Tiers 1-2
  • Reflection: Enabled
  • Response Model: gpt-5.2

Think mode is the default for most queries. Good quality, reasonable latency.

SemanticStudio Think mode pipeline visualization

Deep Mode

Use case: Complex analysis, multi-faceted questions

Default settings:

  • Max Results: 30
  • Graph Hops: 2 (extended network)
  • Memory Tiers: All tiers
  • Reflection: Enabled
  • Response Model: gpt-5.2

Deep mode retrieves more context and reasons more thoroughly.

SemanticStudio Deep mode configuration

Research Mode

Use case: Investigations, comprehensive research

Default settings:

  • Max Results: 50
  • Graph Hops: 3 (full exploration)
  • Memory Tiers: All tiers
  • Reflection: Enabled
  • Clarification: Enabled
  • Response Model: o3-deep-research

Research mode can ask clarifying questions and explore extensively.

SemanticStudio Research mode configuration

The Pipeline Visualization

Each mode shows its pipeline—the steps from query to response:

Memory T1+T2 → GraphRAG 1-hop → Web Search → Compose → Reflect → Judge

Pipeline steps are:

  • Required: Always runs (green badge)
  • Conditional: Runs if conditions met (orange badge)
  • Disabled: Won't run in this mode (gray)

The visualization helps you understand exactly what happens for each query.

Retrieval Settings

Each mode configures retrieval behavior:

Max Results

How many chunks to retrieve from the vector store:

  • Quick: 5 (minimal context)
  • Think: 15 (balanced)
  • Deep: 30 (extensive)
  • Research: 50 (comprehensive)

More results = more context = better grounding, but higher cost.

Graph Hops

How far to traverse the knowledge graph:

  • 0 hops: Entity match only
  • 1 hop: Direct relationships
  • 2 hops: Second-degree connections
  • 3 hops: Full exploration

More hops = discovers more relationships, but adds latency.

Web Results (when enabled)

When web search is toggled on:

  • Quick: 3 results
  • Think: 5 results
  • Deep: 8 results
  • Research: 12 results

Pipeline Settings

Enable Reflection

When enabled, the system reviews its own response:

  1. Generate initial response
  2. Reflection model evaluates quality
  3. Revise if issues found
  4. Return improved response

Adds latency but improves accuracy.

Enable Clarification

Research mode only. When enabled:

  1. Analyze query for ambiguity
  2. Ask clarifying questions if needed
  3. Use answers to refine search
  4. Generate comprehensive response

Show Evaluation in Chat

Display quality scores inline with responses:

  • Relevance: Does it answer the question?
  • Groundedness: Are claims supported?
  • Coherence: Is it logical and clear?
  • Completeness: Does it cover the scope?

Useful for testing and quality monitoring.

Model Assignment per Mode

Each mode can use a different response model:

ModeDefault Response Model
Quickcomposer_fast (gpt-5-mini)
Thinkcomposer (gpt-5.2)
Deepcomposer (gpt-5.2)
Researchresearch (o3-deep-research)

The Auto Mode Default setting determines which mode gets auto-selected.

Performance Tuning

Cost Optimization

To reduce costs:

  • Use gpt-5-mini for more roles
  • Reduce max results
  • Disable reflection for non-critical queries
  • Use Quick mode by default

Quality Optimization

To improve quality:

  • Use gpt-5.2 or Claude for composer
  • Increase max results
  • Enable reflection
  • Use Think or Deep by default
  • Increase graph hops

Latency Optimization

To reduce latency:

  • Use gpt-5-mini for fast response
  • Reduce max results
  • Disable reflection
  • Use Quick mode
  • Reduce graph hops to 0-1

When to Adjust Settings

Increase max results if:

  • Answers are missing relevant information
  • Users frequently say "you forgot about..."
  • Queries span multiple topics

Increase graph hops if:

  • Users ask relationship questions
  • "Who worked with..." queries fail
  • Connected information isn't discovered

Enable reflection if:

  • Response quality is inconsistent
  • Hallucinations are occurring
  • Accuracy is critical

Switch to Research mode if:

  • Questions are investigative
  • Users need comprehensive coverage
  • Time is less important than thoroughness

What's Next

Models and modes determine how SemanticStudio retrieves and generates. But what it retrieves depends on the memory system—how context persists within and across sessions.

Next up: Part 5 — Memory as Infrastructure, the complete 4-tier memory system.