Building Your First Spring AI RAG Application: A Complete Developer's Guide to Intelligent Document Search

Introduction to Spring AI RAG

Let me tell you about the day I almost threw my laptop out the window. I was desperately searching through hundreds of PDF documents, trying to find that one specific piece of information for a client demo. You know the drill – Ctrl+F, scroll, scan, repeat. My eyes were burning, my patience was gone, and I was questioning my life choices.

That’s when I discovered Spring AI RAG (Retrieval-Augmented Generation), and honestly, it changed everything. Instead of manually hunting through documents like a digital archaeologist, I could simply ask: “What are the key features of Baldur’s Gate 3?” and get an intelligent, contextual answer pulled from my entire document collection.

If you’ve ever felt frustrated with traditional search methods, or if you’re curious about building applications that can actually understand and reason about your data, you’re in for a treat. RAG isn’t just a buzzword – it’s the future of intelligent applications, and Spring AI makes it surprisingly accessible.

Building on our Spring AI and Ollama integration, we’re now taking things to the next level. Today, we’re going to build something pretty cool together: a fully functional Spring AI RAG application that can intelligently search through your documents and provide contextual answers. Think of it as having a super-smart assistant that has read every document in your collection and can answer questions with pinpoint accuracy.

Part 1: Understanding Spring AI RAG Fundamentals

What is Spring AI RAG and Why Should You Care?

Let me explain RAG with my favorite coffee shop analogy (because everything is better with coffee, right?).

Imagine you walk into a coffee shop and ask the barista: “What’s the best drink for someone who likes sweet but not too sweet, with a caffeine kick but not too jittery?”

A traditional search system would be like a barista who can only point you to the menu board and say “Figure it out yourself.” Frustrating, right?

But a RAG system is like having a barista who:

Retrieves relevant information from their knowledge base (maybe they remember you ordered a caramel macchiato last week and loved it)
Augments that information with current context (it’s 3 PM, you mentioned you have a meeting later)
Generates a personalized response (“I’d recommend our honey oat milk latte with an extra shot – it’s sweet but balanced, and the extra caffeine will keep you sharp for your meeting”)

That’s exactly what Spring AI RAG does with your documents and data.

RAG in Simple Terms:

Retrieval: Find relevant documents or chunks of information
Augmentation: Add that context to your question
Generation: Use an AI model to create a smart, contextual response

Why Spring AI Makes RAG Development a Breeze:

Spring AI is like having a Swiss Army knife for AI development. Before Spring AI, building RAG applications meant juggling multiple libraries, dealing with complex integrations, and writing tons of boilerplate code. Now? It’s as simple as adding a few annotations and beans.

Here’s what makes Spring AI RAG special:

Unified API: One consistent interface for different AI models and vector stores
Spring Boot Integration: All the conveniences you love about Spring Boot
Production Ready: Built-in monitoring, error handling, and scalability
Extensible: Easy to customize and extend for your specific needs

Real-World Use Cases That’ll Make You Excited:

Internal Knowledge Base: Your company’s documentation, policies, and procedures become instantly searchable
Customer Support: Automatically find relevant information to answer customer queries
Research Assistant: Quickly find information across academic papers, reports, and research documents
Legal Document Analysis: Search through contracts, legal precedents, and case studies
Personal Knowledge Management: Your own digital brain that remembers everything you’ve read

Spring AI RAG Architecture Deep Dive

Now, let’s peek under the hood and see how Spring AI RAG actually works. Don’t worry – I’ll keep it interesting and skip the boring technical jargon.

The Three Pillars: Retrieval, Augmentation, Generation

Think of Spring AI RAG as a three-act play:

Act 1: Retrieval (The Detective)
This is where your application becomes Sherlock Holmes. When you ask a question, the retrieval system:

Converts your question into a vector (think of it as a mathematical fingerprint)
Searches through your vector database to find similar content
Returns the most relevant document chunks

Here’s how the architecture looks of our Spring AI RAG application:

@Service
public class HybridSearchService {

    private final VectorStore vectorStore;
    private final List<Document> allDocuments;

    public List<Document> hybridSearch(String query) {
        // Vector similarity search
        SearchRequest vectorSearchRequest = SearchRequest.builder()
            .query(query)
            .topK(5)
            .similarityThreshold(0.65)
            .build();
        List<Document> vectorResults = vectorStore.similaritySearch(vectorSearchRequest);

        // Enhanced keyword search
        List<Document> keywordResults = allDocuments.stream()
            .filter(doc -> isRelevantForKeywordSearch(doc, query))
            .sorted(Comparator.comparingInt(
                doc -> -calculateRelevanceScore(doc.getFormattedContent(), query)))
            .limit(8)
            .collect(Collectors.toList());

        // Combine results using hybrid ranking
        return new HybridRanker().fuse(vectorResults, keywordResults);
    }
}

Act 2: Augmentation (The Librarian)
This is where the magic happens. The augmentation step takes your original question and says: “Hey, here’s some relevant context that might help answer this question better.”

@Service
public class ResponseService {

    private final String SYSTEM_PROMPT = """
        You are a helpful assistant that answers questions based on provided context.

        FORMATTING RULES:
        1. ONLY use information from the CONTEXT section below
        2. Format your response in clear, readable markdown
        3. Use **bold** for important terms
        4. Always include source references at the end
        5. If the answer is not in the context, respond with: "I don't have information about that in the provided documents."

        CONTEXT:
        {context}

        QUESTION: {question}
        """;

    public String generateResponse(String query, List<Document> context) {
        String formattedContext = formatContext(context);

        PromptTemplate template = new PromptTemplate(SYSTEM_PROMPT);
        Prompt prompt = template.create(Map.of(
            "context", formattedContext,
            "question", query
        ));

        return chatClient.prompt(prompt).call().content();
    }
}

Act 3: Generation (The Storyteller)
Finally, the AI model takes your question and the relevant context and crafts a response that’s both accurate and natural. It’s like having a really smart friend who’s just read all your documents and can explain things in a way that makes sense.

How Vector Databases Fit into the Spring AI RAG Ecosystem

Vector databases are the secret sauce that makes Spring AI RAG possible. Here’s why they’re so powerful:

Traditional Database: “Find me all documents where title = ‘Spring AI'”
Vector Database: “Find me all documents that are semantically similar to ‘Spring AI framework for building intelligent applications'”

See the difference? Vector databases understand meaning, not just exact matches.

In our Spring AI RAG setup, we’re using PGVector (PostgreSQL with vector extensions):

// Configuration for PGVector
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
spring.ai.vectorstore.pgvector.dimensions=768

Component Interaction Flow

Here’s how all the pieces work together in our Spring AI RAG application:

Document Ingestion: Documents are chunked and converted to vectors
Storage: Vectors are stored in PGVector database
Query Processing: User question is converted to a vector
Retrieval: Similar vectors (documents) are found
Augmentation: Context is added to the original question
Generation: AI model generates a response
Response: User gets an intelligent, contextual answer

@RestController
@RequestMapping("/api/rag")
public class RagController {

    private final HybridSearchService hybridSearchService;
    private final ResponseService responseService;

    @PostMapping("/query")
    public ResponseEntity<ChatResponse> queryRAG(@RequestBody ChatRequest request) {
        // 1. Retrieve relevant context documents
        List<Document> context = hybridSearchService.hybridSearch(request.getQuery());

        // 2. Generate response using context
        String llmResponse = responseService.generateResponse(request.getQuery(), context);

        // 3. Create structured response with sources
        ChatResponse response = new ChatResponse(
            llmResponse,
            context.size(),
            extractSources(context),
            LocalDateTime.now(),
            "success"
        );

        return ResponseEntity.ok(response);
    }
}

The beauty of Spring AI RAG is that it handles all the complex orchestration behind the scenes. You just need to focus on your business logic and let Spring AI handle the heavy lifting.

In the next section, we’ll dive into setting up your development environment and getting your hands dirty with some actual code. Trust me, once you see how easy it is to get started, you’ll wonder why you didn’t try Spring AI RAG sooner!

Part 2: Setting Up Your Spring AI RAG Environment

Spring AI RAG Development Environment Setup

Now that we’ve covered the theory, let’s get our hands dirty! Setting up a Spring AI RAG environment isn’t just about installing dependencies – it’s about creating a robust foundation that won’t crumble under pressure. Trust me, I’ve been there. Nothing’s worse than having your RAG system work perfectly in development, only to crash spectacularly in production because you skipped the proper setup.

PGVector Database Configuration for Spring AI RAG

For the PGVector database setup, I’ve prepared a comprehensive installation guide that covers everything from Docker setup to performance tuning. You can [download the complete PGVector setup guide here] – it’s got all the Docker commands, configuration files, and troubleshooting tips you’ll need. No more googling “why is my vector database so slow” at 2 AM! If you could nail the pg vector setup part and were able to spin it up successfully, the command prompt will look like something like below:

Ollama Integration with Spring AI RAG

Similarly, I’ve created a detailed Ollama setup guide that walks you through installing and configuring Ollama with the right models for your Spring AI RAG system. [Download the Ollama setup guide here] – it includes model recommendations, performance optimizations, and those little tricks that make all the difference. If things go well you should see something like this once you execute ollama serve command:

Spring Boot Project Structure for RAG Applications

Let’s dive into the project structure that makes Spring AI RAG applications maintainable and scalable. Here’s how I organize my RAG projects:

src/main/java/com/sundrymind/
├── config/
│   ├── SpringAiRagApplication.java
│   ├── DocumentConfig.java
│   └── DocumentIngestionRunner.java
├── controller/
│   └── RagController.java
├── service/
│   ├── DocumentService.java
│   ├── HybridSearchService.java
│   ├── ResponseService.java
│   └── RetrievalService.java
└── resources/
    ├── application.properties
    ├── data/
    └── static/index.html

Maven Dependencies Breakdown

Let’s start with the Maven dependencies that make Spring AI RAG magic happen:

<dependencies>
    <!-- Spring AI Core -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
    </dependency>

    <!-- Vector Store Support -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
    </dependency>

    <!-- Document Readers -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-pdf-document-reader</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-tika-document-reader</artifactId>
    </dependency>

    <!-- PostgreSQL Driver -->
    <dependency>
        <groupId>org.postgresql</groupId>
        <artifactId>postgresql</artifactId>
        <scope>runtime</scope>
    </dependency>

    <!-- Apache Commons for String Utils -->
    <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-lang3</artifactId>
    </dependency>
</dependencies>

Application Properties Deep Dive

Your application.properties file is the command center of your Spring AI RAG application. Here’s my battle-tested configuration:

spring.application.name=SpringAIRag

# Database Configuration for PGVector
spring.datasource.url=jdbc:postgresql://localhost:5432/vectordb
spring.datasource.username=postgres
spring.datasource.password=password
spring.datasource.driver-class-name=org.postgresql.Driver

# PGVector Configuration
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
spring.ai.vectorstore.pgvector.dimensions=768

# Ollama base endpoint
spring.ai.ollama.base-url=http://localhost:11434

# Use llama3.2 for chat
spring.ai.ollama.chat.enabled=true
spring.ai.ollama.chat.options.model=llama3.2
spring.ai.ollama.chat.options.temperature=0.3
spring.ai.ollama.chat.options.top-k=40

# Use nomic-embed-text for embeddings
spring.ai.ollama.embedding.enabled=true
spring.ai.ollama.embedding.options.model=nomic-embed-text

# Disable default ONNX auto-config to avoid conflict
spring.ai.embedding.transformers.enabled=false

Pro tip: That temperature=0.3 setting? That’s the sweet spot for RAG responses. Too low (0.1), and your AI sounds like a robot reading a manual. Too high (0.8) and it starts hallucinating like it’s at a creative writing workshop. But you can always play around with this and try to find what fits your case.

Configuration Class Explanations

The DocumentConfig class is where we set up our text processing pipeline:

@Configuration
public class DocumentConfig {

    @Bean
    public TokenTextSplitter documentSplitter() {
        return new TokenTextSplitter(
            512,    // Target chunk size
            128,    // Context-preserving overlap
            50,     // Minimum chunk size
            2048,   // Absolute maximum
            true    // Maintain paragraph breaks
        );
    }

    @Bean
    public ChatClient chatClient(ChatModel chatModel) {
        return ChatClient.builder(chatModel)
            .defaultSystem("You are a friendly AI assistant. " +
                "Keep responses concise and helpful. " +
                "Be conversational but professional.")
            .build();
    }
}

Those chunk size numbers aren’t random – they’re based on months of experimentation. 512 tokens is the goldilocks zone for most documents, with 128 overlap ensuring we don’t lose context at chunk boundaries.

Part 3: Building the Core Spring AI RAG Components

Implementing Document Ingestion in Spring AI RAG

Document ingestion is where the magic begins. It’s like preparing ingredients for a gourmet meal – do it wrong, and even the best chef can’t save your dish.

Document Service Architecture

Here’s our DocumentService that handles multiple document formats:

@Service
public class DocumentService {

    private final TextSplitter documentSplitter;
    private final VectorStore vectorStore;

    public DocumentService(TextSplitter documentSplitter, VectorStore vectorStore) {
        this.documentSplitter = documentSplitter;
        this.vectorStore = vectorStore;
    }

    public void ingestDocument(Resource resource) throws Exception {
        DocumentReader reader = switch (resource.getFilename()
            .substring(resource.getFilename().lastIndexOf('.'))) {
            case ".pdf" -> new PagePdfDocumentReader(resource);
            case ".docx" -> new TikaDocumentReader(resource);
            case ".txt", ".md" -> new TextReader(resource);
            default -> throw new IllegalArgumentException("Unsupported format");
        };

        List<Document> chunks = documentSplitter.apply(reader.get());
        vectorStore.add(chunks);
    }
}

Multi-format Support (PDF, DOCX, TXT)

The beauty of Spring AI RAG is its plug-and-play document readers. Want to add PowerPoint support? Just add the dependency and another case to the switch statement. It’s like having a Swiss Army knife for documents.

Chunking Strategies That Actually Work

Here’s the thing about chunking – it’s not just about splitting text. It’s about preserving meaning. Our TokenTextSplitter configuration with 128-token overlap ensures that important context doesn’t get lost between chunks. I learned this the hard way when my early RAG system kept giving fragmented answers because chunks were too isolated.

Vector Embedding Generation

The automatic document ingestion happens through our DocumentIngestionRunner:

@Component
public class DocumentIngestionRunner implements CommandLineRunner {

    private static final Logger logger = LoggerFactory.getLogger(DocumentIngestionRunner.class);
    private final DocumentService documentService;
    private final ResourceLoader resourceLoader;
    private static final String DOCUMENTS_FOLDER = "classpath:/data/";

    public DocumentIngestionRunner(DocumentService documentService, ResourceLoader resourceLoader) {
        this.documentService = documentService;
        this.resourceLoader = resourceLoader;
    }

    @Override
    public void run(String... args) throws Exception {
        logger.info("Starting document ingestion process...");

        try {
            Resource folderResource = resourceLoader.getResource(DOCUMENTS_FOLDER);
            Path folderPath = Paths.get(folderResource.getURI());

            try (Stream<Path> paths = Files.walk(folderPath)) {
                paths.filter(Files::isRegularFile)
                     .forEach(filePath -> {
                         try {
                             Resource fileResource = resourceLoader.getResource("file:" + filePath.toAbsolutePath());
                             logger.info("Ingesting document: {}", fileResource.getFilename());
                             documentService.ingestDocument(fileResource);
                         } catch (Exception e) {
                             logger.error("Failed to ingest document {}: {}", filePath.getFileName(), e.getMessage());
                         }
                     });
            }
            logger.info("Document ingestion process completed.");
        } catch (IOException e) {
            logger.error("Could not find or access the documents folder at {}", DOCUMENTS_FOLDER, e);
        }
    }
}

This runner automatically processes all documents in your data folder on startup. Just drop your files there and let Spring AI RAG do its thing!

Advanced Search Capabilities in Spring AI RAG

Hybrid Search Implementation

Here’s where we get fancy. Pure vector search is great, but combining it with keyword search is like having both a telescope and a magnifying glass – you see both the big picture and the fine details.

@Service
public class HybridSearchService {

    private final VectorStore vectorStore;
    private final List<Document> allDocuments;

    public HybridSearchService(VectorStore vectorStore, List<Document> allDocuments) {
        this.vectorStore = vectorStore;
        this.allDocuments = allDocuments;
    }

    public List<Document> hybridSearch(String query) {
        // 1. Vector similarity search with higher threshold
        SearchRequest vectorSearchRequest = SearchRequest.builder()
            .query(query)
            .topK(5)
            .similarityThreshold(0.65)
            .build();
        List<Document> vectorResults = vectorStore.similaritySearch(vectorSearchRequest);

        // 2. Enhanced keyword search with better filtering
        List<Document> keywordResults = allDocuments.stream()
            .filter(doc -> isRelevantForKeywordSearch(doc, query))
            .sorted(Comparator.comparingInt(
                doc -> -calculateRelevanceScore(doc.getFormattedContent(), query)))
            .limit(8)
            .collect(Collectors.toList());

        // 3. Combine and filter results
        List<Document> fusedResults = new HybridRanker().fuse(vectorResults, keywordResults);

        // 4. Post-process to ensure quality
        return fusedResults.stream()
            .filter(doc -> isHighQualityResult(doc, query))
            .limit(5)
            .collect(Collectors.toList());
    }
}

Vector Similarity vs Keyword Matching

Vector similarity is like having a conversation with someone who understands context – it gets the semantic meaning. Keyword matching is like having a really good librarian who knows exactly where everything is filed. Together, they’re unstoppable.

Result Ranking and Fusion Techniques

The HybridRanker uses Reciprocal Rank Fusion (RRF) to combine results from both search methods:

public static class HybridRanker {
    private static final double K = 60.0;

    public List<Document> fuse(List<Document> vectorResults, List<Document> keywordResults) {
        Map<String, Double> fusedScores = new HashMap<>();
        Map<String, Document> documentMap = new HashMap<>();

        // Process vector results
        for (int i = 0; i < vectorResults.size(); i++) {
            Document doc = vectorResults.get(i);
            String docId = doc.getId() != null ? doc.getId() : doc.getFormattedContent().hashCode() + "";
            double score = 1.0 / (K + (i + 1));
            fusedScores.merge(docId, score, Double::sum);
            documentMap.putIfAbsent(docId, doc);
        }

        // Process keyword results
        for (int i = 0; i < keywordResults.size(); i++) {
            Document doc = keywordResults.get(i);
            String docId = doc.getId() != null ? doc.getId() : doc.getFormattedContent().hashCode() + "";
            double score = 1.0 / (K + (i + 1));
            fusedScores.merge(docId, score, Double::sum);
            documentMap.putIfAbsent(docId, doc);
        }

        return fusedScores.entrySet().stream()
                .sorted(Map.Entry.<String, Double>comparingByValue().reversed())
                .map(entry -> documentMap.get(entry.getKey()))
                .collect(Collectors.toList());
    }
}

Response Generation with Spring AI RAG

Prompt Engineering Best Practices

Here’s where the rubber meets the road. Your prompt is like a recipe – get it wrong, and even the best ingredients won’t save you:

@Service
public class ResponseService {

    private final ChatClient chatClient;

    private final String SYSTEM_PROMPT = """
        You are a helpful assistant that answers questions based on provided context.

        FORMATTING RULES:
        1. ONLY use information from the CONTEXT section below
        2. Format your response in clear, readable markdown
        3. Use bullet points for lists
        4. Use numbered lists for step-by-step instructions
        5. Use **bold** for important terms
        6. Use code blocks for any technical terms or file names
        7. Always include source references at the end
        8. If the answer is not in the context, respond with: "I don't have information about that in the provided documents."

        CONTEXT:
        {context}

        QUESTION: {question}

        Provide a well-formatted, helpful response based only on the context above.
        """;

    public String generateResponse(String query, List<Document> context) {
        if (context == null || context.isEmpty()) {
            return "❌ I don't have any relevant documents to answer your question.";
        }

        String formattedContext = formatContext(context);
        if (!isContextRelevant(formattedContext, query)) {
            return "❌ I don't have information about that in the provided documents.";
        }

        PromptTemplate template = new PromptTemplate(SYSTEM_PROMPT);
        Prompt prompt = template.create(Map.of(
            "context", formattedContext,
            "question", query
        ));

        String response = chatClient.prompt(prompt)
                                   .call()
                                   .content();

        return response;
   }
}

Context Formatting for Better Results

The formatContext method is crucial for giving your AI the right information in the right format:

private String formatContext(List<Document> documents) {
    if (documents == null || documents.isEmpty()) {
        return "No relevant context found.";
    }

    return documents.stream()
            .map(document -> {
                String content = document.getFormattedContent();
                Map<String, Object> metadata = document.getMetadata();
                StringBuilder formattedDoc = new StringBuilder();

                formattedDoc.append("=== DOCUMENT SECTION ===n");
                if (metadata.containsKey("file_name")) {
                    formattedDoc.append("Source: ").append(metadata.get("file_name")).append("n");
                }
                if (metadata.containsKey("page_number")) {
                    formattedDoc.append("Page: ").append(metadata.get("page_number")).append("n");
                }
                formattedDoc.append("Content:n").append(content).append("n");
                formattedDoc.append("=== END SECTION ===n");

                return formattedDoc.toString();
            })
            .collect(Collectors.joining("n"));
}

Handling Edge Cases Gracefully

The isContextRelevant method prevents your AI from making stuff up when it doesn’t have good information:

private boolean isContextRelevant(String context, String query) {
    String[] queryWords = query.toLowerCase().split("\s+");
    String contextLower = context.toLowerCase();

    int matches = 0;
    for (String word : queryWords) {
        if (word.length() > 3 && contextLower.contains(word)) {
            matches++;
        }
    }

    return matches >= Math.max(1, queryWords.length * 0.2);
}

This ensures that at least 20% of significant query words (longer than 3 characters) appear in the context before we proceed with generation.

Part 4: Creating the User Interface

Building a Modern Chat Interface for Spring AI RAG

Let’s face it – nobody wants to interact with your brilliant Spring AI RAG system through curl commands. Users want a sleek, responsive interface that doesn’t make them feel like they’re using software from 1995.

RESTful API Design

Our RagController provides a clean REST API for the frontend:

@RestController
@RequestMapping("/api/rag")
@CrossOrigin(origins = "*")
public class RagController {

    private final HybridSearchService hybridSearchService;
    private final ResponseService responseService;

    @PostMapping("/query")
    public ResponseEntity<ChatResponse> queryRAG(@RequestBody ChatRequest request) {
        try {
            // 1. Retrieve relevant context documents
            List<Document> context = hybridSearchService.hybridSearch(request.getQuery());

            // 2. Generate response
            String llmResponse = responseService.generateResponse(request.getQuery(), context);

            // 3. Create structured response
            ChatResponse response = new ChatResponse(
                llmResponse,
                context.size(),
                extractSources(context),
                LocalDateTime.now(),
                "success"
            );

            return ResponseEntity.ok(response);

        } catch (Exception e) {
            ChatResponse errorResponse = new ChatResponse(
                "❌ Sorry, I encountered an error while processing your request. Please try again.",
                0,
                List.of(),
                LocalDateTime.now(),
                "error"
            );
            return ResponseEntity.ok(errorResponse);
        }
    }

    private List<SourceInfo> extractSources(List<Document> documents) {
        return documents.stream()
            .map(doc -> {
                Map<String, Object> metadata = doc.getMetadata();

                String source = Optional.ofNullable(metadata.get("source"))
                    .or(() -> Optional.ofNullable(metadata.get("file_name")))
                    .map(Object::toString)
                    .orElse("Unknown");

                return new SourceInfo(
                    source,
                    metadata.getOrDefault("page_number", "").toString(),
                    truncateContent(doc.getFormattedContent(), 150)
                );
            })
            .distinct()
            .toList();
    }
}

WebSocket Alternatives (Why We Chose REST)

You might be wondering, “Why not WebSockets for real-time chat?” Here’s the thing – for most Spring AI RAG applications, REST is perfect. WebSockets add complexity without much benefit unless you’re building a collaborative chat system. REST is simple, cacheable, and works great with our hybrid search approach.

Frontend Implementation with Vanilla JavaScript

I’m a big believer in keeping things simple. Here’s our vanilla JavaScript chat interface that doesn’t require a PhD in React:

async function sendMessage() {
  const message = messageInput.value.trim();
  if (!message) return;

  // Add user message
  addMessage(message, true);

  // Clear input and disable button
  messageInput.value = "";
  sendButton.disabled = true;

  // Add loading message
  addLoadingMessage();

  try {
    const response = await fetch("/api/rag/query", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ query: message }),
    });

    if (!response.ok) {
      throw new Error("Network response was not ok");
    }

    const data = await response.json();

    // Remove loading message
    removeLoadingMessage();

    // Add bot response
    addMessage(data.response, false, data.sources || []);
  } catch (error) {
    console.error("Error:", error);
    removeLoadingMessage();
    addMessage("❌ Sorry, I encountered an error. Please try again.", false);
  } finally {
    sendButton.disabled = false;
    messageInput.focus();
  }
}

Responsive Design Considerations

Our CSS uses flexbox and modern techniques to ensure the interface looks great on everything from phones to ultrawide monitors:

.chat-container {
  width: 800px;
  height: 700px;
  background: white;
  border-radius: 20px;
  box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);
  display: flex;
  flex-direction: column;
  overflow: hidden;
}

@media (max-width: 768px) {
  .chat-container {
    width: 95%;
    height: 95vh;
    margin: 10px;
  }
}

Enhancing User Experience in Spring AI RAG Applications

Loading States and Error Handling

Nothing kills user experience like uncertainty. Our loading animation gives users immediate feedback:

function addLoadingMessage() {
  const messageDiv = document.createElement("div");
  messageDiv.className = "message bot";
  messageDiv.id = "loadingMessage";

  const bubbleDiv = document.createElement("div");
  bubbleDiv.className = "message-bubble";

  const loadingDiv = document.createElement("div");
  loadingDiv.className = "loading";
  loadingDiv.innerHTML = `
        <span>Thinking</span>
        <div class="loading-dots">
            <div class="loading-dot"></div>
            <div class="loading-dot"></div>
            <div class="loading-dot"></div>
        </div>
    `;

  bubbleDiv.appendChild(loadingDiv);
  messageDiv.appendChild(bubbleDiv);
  messagesContainer.appendChild(messageDiv);

  messagesContainer.scrollTop = messagesContainer.scrollHeight;
}

Source Citation Display

One of the coolest features of our Spring AI RAG system is automatic source citation:

// Add sources if available
if (sources && sources.length > 0) {
  const sourcesDiv = document.createElement("div");
  sourcesDiv.className = "sources";
  sourcesDiv.innerHTML = "<strong>📚 Sources:</strong>";

  sources.forEach((source) => {
    const sourceDiv = document.createElement("div");
    sourceDiv.className = "source-item";
    sourceDiv.innerHTML = `
            <div class="source-file">${source.fileName}${
      source.pageNumber ? ` (Page ${source.pageNumber})` : ""
    }</div>
            ${
              source.snippet
                ? `<div class="source-snippet">${source.snippet}</div>`
                : ""
            }
        `;
    sourcesDiv.appendChild(sourceDiv);
  });

  contentDiv.appendChild(sourcesDiv);
}

Markdown Rendering for Rich Responses

We use the Marked.js library to render markdown responses beautifully:

if (isUser) {
  contentDiv.textContent = content;
} else {
  // Parse markdown for bot messages
  contentDiv.innerHTML = marked.parse(content);
}

The CSS styles ensure code blocks, lists, and headers all look professional:

.message-content h1,
.message-content h2,
.message-content h3 {
  margin: 10px 0;
  color: #1f2937;
}

.message-content ul,
.message-content ol {
  margin: 10px 0;
  padding-left: 20px;
}

.message-content code {
  background: #f1f5f9;
  padding: 2px 6px;
  border-radius: 4px;
  font-family: "Monaco", "Consolas", monospace;
  font-size: 0.9em;
}

.message-content pre {
  background: #f1f5f9;
  padding: 15px;
  border-radius: 8px;
  overflow-x: auto;
  margin: 10px 0;
}

Ok, so now our system is ready! Want to see this system in action? Check out the complete code repository at https://github.com/SundrymindTech/Spring-AI-RAG.

Part 5: Testing and Troubleshooting

Testing Your Spring AI RAG Application

Testing a Spring AI RAG system is like debugging a Rube Goldberg machine – there are so many moving parts that can go wrong! Let me share my testing strategy that’s saved me countless hours of debugging.

Unit Testing Strategies

Let’s start with unit tests for our core components:

@ExtendWith(MockitoExtension.class)
class RetrievalServiceTest {

    @Mock
    private VectorStore vectorStore;

    @InjectMocks
    private RetrievalService retrievalService;

    @Test
    void testRetrieveContext_WithValidQuery() {
        // Given
        String query = "What are the best PC games?";
        List<Document> expectedDocs = Arrays.asList(
            new Document("Baldur's Gate 3 is amazing", Map.of("source", "pcg.pdf"))
        );

        when(vectorStore.similaritySearch(any(SearchRequest.class)))
            .thenReturn(expectedDocs);

        // When
        List<Document> result = retrievalService.retrieveContext(query, null);

        // Then
        assertThat(result).isNotEmpty();
        assertThat(result).hasSize(1);
        assertThat(result.get(0).getFormattedContent()).contains("Baldur's Gate 3");
    }

    @Test
    void testRetrieveContext_WithSourceFilter() {
        // Given
        String query = "Canada information";
        String sourceFilter = "Canada_Info.docx";

        // When
        retrievalService.retrieveContext(query, sourceFilter);

        // Then
        ArgumentCaptor<SearchRequest> captor = ArgumentCaptor.forClass(SearchRequest.class);
        verify(vectorStore).similaritySearch(captor.capture());

        SearchRequest request = captor.getValue();
        assertThat(request.getFilterExpression()).contains("Canada_Info.docx");
    }
}

Testing the response generation:

@ExtendWith(MockitoExtension.class)
class ResponseServiceTest {

    @Mock
    private ChatClient chatClient;

    @Mock
    private ChatClient.CallSpec callSpec;

    @Mock
    private ChatResponse chatResponse;

    @InjectMocks
    private ResponseService responseService;

    @Test
    void testGenerateResponse_WithValidContext() {
        // Given
        String query = "What's the best RPG?";
        List<Document> context = Arrays.asList(
            new Document("Baldur's Gate 3 is the best RPG",
                Map.of("file_name", "pcg.pdf", "page_number", "1"))
        );

        when(chatClient.prompt(any(Prompt.class))).thenReturn(callSpec);
        when(callSpec.call()).thenReturn(chatResponse);
        when(chatResponse.content()).thenReturn("Baldur's Gate 3 is highly recommended");

        // When
        String result = responseService.generateResponse(query, context);

        // Then
        assertThat(result).contains("Baldur's Gate 3");
        assertThat(result).doesNotContain("❌");
    }

    @Test
    void testGenerateResponse_WithEmptyContext() {
        // Given
        String query = "What's the best RPG?";
        List<Document> emptyContext = Arrays.asList();

        // When
        String result = responseService.generateResponse(query, emptyContext);

        // Then
        assertThat(result).contains("❌");
        assertThat(result).contains("don't have any relevant documents");
    }
}

Testing from the real spring ai rag ui

Now for the exciting part – real integration testing! We’ll fire up the full application in a browser and interact with our chatbot interface, sending actual queries that will:

Process through our Spring backend
Query the live database
Generate AI-powered responses

This end-to-end test validates that all components work together seamlessly, from the UI down to the vector store. Watch as your RAG system comes to life, handling real user interactions exactly as it would in production!

Once you download the full code base and setup in your favorite IDE and if the application launches well, documents are ingested correctly you will be able to see something like below:

Once the application spins up, you can access the chatbot in http://localhost:8080/ and then fire queries about the documents you feed it. In my codebase I fed it two document, one pdf related to best PC games and the other one is about general information about Canada and below are how it responded:

Notice how it shows the information sources as well:

With the hard part behind us, it’s time to geek out on some advanced concepts!

Performance Benchmarking

Here’s how I benchmark my Spring AI RAG performance:

@Component
public class RagPerformanceBenchmark {

    private final RagController ragController;
    private final MeterRegistry meterRegistry;

    @EventListener(ApplicationReadyEvent.class)
    public void runBenchmarks() {
        benchmarkQueryPerformance();
        benchmarkConcurrentQueries();
    }

    private void benchmarkQueryPerformance() {
        List<String> testQueries = Arrays.asList(
            "What are the best PC games?",
            "Tell me about Canada",
            "What is Spring AI RAG?"
        );

        for (String query : testQueries) {
            Timer.Sample sample = Timer.start(meterRegistry);

            try {
                ChatRequest request = new ChatRequest(query);
                ragController.queryRAG(request);
            } finally {
                sample.stop(Timer.builder("rag.benchmark")
                    .tag("query", query)
                    .register(meterRegistry));
            }
        }
    }

    private void benchmarkConcurrentQueries() {
        int numberOfThreads = 10;
        int queriesPerThread = 5;

        ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
        List<Future<Long>> futures = new ArrayList<>();

        for (int i = 0; i < numberOfThreads; i++) {
            futures.add(executor.submit(() -> {
                long totalTime = 0;
                for (int j = 0; j < queriesPerThread; j++) {
                    long start = System.currentTimeMillis();

                    ChatRequest request = new ChatRequest("Test query " + j);
                    ragController.queryRAG(request);

                    totalTime += System.currentTimeMillis() - start;
                }
                return totalTime;
            }));
        }

        // Collect results
        futures.forEach(future -> {
            try {
                Long result = future.get();
                log.info("Thread completed in {} ms", result);
            } catch (Exception e) {
                log.error("Benchmark thread failed", e);
            }
        });

        executor.shutdown();
    }
}

Common Pitfalls and Solutions

Let me share the most common mistakes I’ve seen (and made myself):

Forgetting to initialize the vector store schema

// DON'T DO THIS - Will fail silently
@Bean
public VectorStore vectorStore() {
    return new PgVectorStore(jdbcTemplate, embeddingModel);
}

// DO THIS - Proper initialization
@Bean
public VectorStore vectorStore() {
    return new PgVectorStore.Builder(jdbcTemplate, embeddingModel)
        .withSchemaName("public")
        .withTableName("vector_store")
        .withInitializeSchema(true)
        .build();
}

Not handling empty search results gracefully

// Bad - Will throw NullPointerException
public String generateResponse(String query, List<Document> context) {
    String formattedContext = context.stream()
        .map(Document::getFormattedContent)
        .collect(Collectors.joining("n"));
    // ... rest of method
}

// Good - Always check for null/empty
public String generateResponse(String query, List<Document> context) {
    if (context == null || context.isEmpty()) {
        return "❌ I don't have any relevant documents to answer your question.";
    }
    // ... rest of method
}

Spring AI RAG Troubleshooting Guide

Time for the troubleshooting section – aka “What to do when everything breaks at 3 AM.”

Vector Store Connection Issues

Problem: Can’t connect to PostgreSQL/PGVector

org.postgresql.util.PSQLException: Connection to localhost:5432 refused

Solution:

@Configuration
public class DatabaseHealthConfig {

    @Bean
    @Primary
    public DataSource dataSource() {
        HikariConfig config = new HikariConfig();
        config.setJdbcUrl("jdbc:postgresql://localhost:5432/vectordb");
        config.setUsername("postgres");
        config.setPassword("password");
        config.setMaximumPoolSize(10);
        config.setConnectionTestQuery("SELECT 1");
        config.setConnectionTimeout(30000);
        config.setIdleTimeout(600000);
        config.setMaxLifetime(1800000);

        return new HikariDataSource(config);
    }

    @EventListener(ApplicationReadyEvent.class)
    public void testDatabaseConnection() {
        try (Connection conn = dataSource().getConnection()) {
            log.info("Database connection successful!");
        } catch (SQLException e) {
            log.error("Database connection failed: {}", e.getMessage());
            throw new RuntimeException("Cannot connect to database", e);
        }
    }
}

Ollama Model Problems

Problem: Ollama models not loading or responding

org.springframework.web.client.ResourceAccessException: I/O error on POST request for "http://localhost:11434/api/generate"

Solution:

@Component
public class OllamaHealthChecker {

    @Value("${spring.ai.ollama.base-url}")
    private String ollamaBaseUrl;

    @PostConstruct
    public void checkOllamaHealth() {
        try {
            RestTemplate restTemplate = new RestTemplate();
            String response = restTemplate.getForObject(
                ollamaBaseUrl + "/api/tags", String.class);

            log.info("Ollama is running, available models: {}", response);
        } catch (Exception e) {
            log.error("Ollama health check failed. Is Ollama running?", e);
            pullRequiredModels();
        }
    }

    private void pullRequiredModels() {
        try {
            Runtime.getRuntime().exec("ollama pull llama3.2");
            Runtime.getRuntime().exec("ollama pull nomic-embed-text");
            log.info("Required models pulled successfully");
        } catch (Exception e) {
            log.error("Failed to pull required models", e);
        }
    }
}

Search Quality Improvements

Problem: Poor search results, irrelevant context

Solution: Enhanced hybrid search with better relevance scoring:

@Service
public class ImprovedHybridSearchService {

    public List<Document> enhancedHybridSearch(String query) {
        // 1. Pre-process query for better matching
        String processedQuery = preprocessQuery(query);

        // 2. Multi-strategy search
        List<Document> vectorResults = performVectorSearch(processedQuery);
        List<Document> keywordResults = performKeywordSearch(processedQuery);
        List<Document> semanticResults = performSemanticSearch(processedQuery);

        // 3. Intelligent fusion with weighted scoring
        return fuseResultsWithWeights(vectorResults, keywordResults, semanticResults);
    }

    private String preprocessQuery(String query) {
        // Remove stop words, normalize, expand abbreviations
        return query.toLowerCase()
                   .replaceAll("\b(what|how|when|where|why|is|are|the|a|an)\b", "")
                   .replaceAll("\s+", " ")
                   .trim();
    }

    private List<Document> fuseResultsWithWeights(
            List<Document> vectorResults,
            List<Document> keywordResults,
            List<Document> semanticResults) {

        Map<String, ScoredDocument> scoredDocs = new HashMap<>();

        // Weight vector results highly (0.5)
        addWeightedResults(scoredDocs, vectorResults, 0.5);

        // Weight keyword results moderately (0.3)
        addWeightedResults(scoredDocs, keywordResults, 0.3);

        // Weight semantic results lightly (0.2)
        addWeightedResults(scoredDocs, semanticResults, 0.2);

        return scoredDocs.values().stream()
                .sorted(Comparator.comparing(ScoredDocument::getScore).reversed())
                .limit(5)
                .map(ScoredDocument::getDocument)
                .collect(Collectors.toList());
    }
}

Performance Bottlenecks

Problem: Slow response times, high memory usage

Solution: Comprehensive performance optimization:

@Component
public class RagPerformanceOptimizer {

    @Autowired
    private ApplicationContext applicationContext;

    @Scheduled(fixedRate = 60000) // Every minute
    public void optimizePerformance() {
        monitorMemoryUsage();
        optimizeVectorQueries();
        cleanupExpiredCache();
    }

    private void monitorMemoryUsage() {
        Runtime runtime = Runtime.getRuntime();
        long totalMemory = runtime.totalMemory();
        long freeMemory = runtime.freeMemory();
        long usedMemory = totalMemory - freeMemory;

        double memoryUsagePercent = (double) usedMemory / totalMemory * 100;

        if (memoryUsagePercent > 80) {
            log.warn("High memory usage detected: {}%", memoryUsagePercent);

            // Trigger garbage collection
            System.gc();

            // Clear non-essential caches
            clearNonEssentialCaches();
        }
    }

    private void optimizeVectorQueries() {
        // Analyze query patterns and optimize accordingly
        QueryAnalyzer analyzer = applicationContext.getBean(QueryAnalyzer.class);

        Map<String, Integer> queryPatterns = analyzer.getQueryPatterns();

        // Pre-warm cache for common queries
        queryPatterns.entrySet().stream()
            .filter(entry -> entry.getValue() > 10) // Queries asked more than 10 times
            .forEach(entry -> preWarmCache(entry.getKey()));
    }

    private void preWarmCache(String commonQuery) {
        try {
            // Asynchronously warm up cache
            CompletableFuture.runAsync(() -> {
                hybridSearchService.hybridSearch(commonQuery);
            });
        } catch (Exception e) {
            log.debug("Cache pre-warming failed for query: {}", commonQuery);
        }
    }
}

That’s a wrap on our advanced Spring AI RAG features and troubleshooting guide! With these tools and techniques, you’ll be able to build, optimize, and maintain a production-ready Spring AI RAG system that can handle real-world traffic and complexity.

Remember, building a great RAG system is like cooking a perfect meal – it takes the right ingredients, proper technique, and a lot of patience. But once you get it right, it’s absolutely delicious! 🍽️

Happy coding, and may your vectors always be relevant and your search results always satisfying!

In the next part, we’ll dive into advanced topics like performance optimization, deployment strategies, and scaling your Spring AI RAG system to handle thousands of users. Plus, I’ll share some war stories from production deployments that’ll save you from the same mistakes I made!

Part 6: Advanced Spring AI RAG Features

After building our basic Spring AI RAG application, let’s dive into the advanced features that will make your system production-ready. Trust me, this is where the real magic happens!

Optimizing Spring AI RAG Performance

Let me share some hard-earned wisdom from my journey with Spring AI RAG optimization. I’ve made every mistake in the book (and then some), so you don’t have to.

Caching Strategies for Spring AI RAG

First up – caching! Your Spring AI RAG system will be hitting the vector database and LLM frequently. Without proper caching, you’ll be burning through resources faster than a Tesla in ludicrous mode.

Here’s my battle-tested caching configuration:

@Configuration
@EnableCaching
public class RagCacheConfig {

    @Bean
    public CacheManager cacheManager() {
        RedisCacheManager.Builder builder = RedisCacheManager
            .RedisCacheManagerBuilder
            .fromConnectionFactory(redisConnectionFactory())
            .cacheDefaults(cacheConfiguration());
        return builder.build();
    }

    @Bean
    public RedisCacheConfiguration cacheConfiguration() {
        return RedisCacheConfiguration.defaultCacheConfig()
            .entryTtl(Duration.ofMinutes(30)) // Cache for 30 minutes
            .disableCachingNullValues()
            .serializeKeysWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new StringRedisSerializer()))
            .serializeValuesWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new GenericJackson2JsonRedisSerializer()));
    }
}

Now, let’s enhance our services with smart caching:

@Service
public class CachedRetrievalService {

    private final VectorStore vectorStore;

    @Cacheable(value = "vectorSearch", key = "#query + '-' + #sourceFilter")
    public List<Document> retrieveContext(String query, String sourceFilter) {
        SearchRequest.Builder requestBuilder = SearchRequest.builder()
            .query(query)
            .topK(5)
            .similarityThreshold(0.65);

        if (sourceFilter != null && !sourceFilter.isEmpty()) {
            requestBuilder.filterExpression(
                String.format("metadata.source == '%s'", sourceFilter));
        }

        SearchRequest request = requestBuilder.build();
        return vectorStore.similaritySearch(request);
    }

    @CacheEvict(value = "vectorSearch", allEntries = true)
    public void clearSearchCache() {
        // Called when new documents are added
    }
}

Pro tip: Cache the expensive vector searches, but be careful not to cache LLM responses if you want fresh answers each time!

Database Indexing for Vector Search

Your Spring AI RAG performance lives and dies by your vector database indexes. Here’s how I set up PGVector for optimal performance:

-- Create proper indexes for metadata filtering
CREATE INDEX CONCURRENTLY idx_documents_metadata_source
ON vector_store USING GIN ((metadata->'source'));

-- Index for similarity search optimization
CREATE INDEX CONCURRENTLY idx_documents_embedding_cosine
ON vector_store USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- For better query performance
CREATE INDEX CONCURRENTLY idx_documents_created_at
ON vector_store (created_at DESC);

And here’s the configuration to make it sing:

# Enhanced PGVector Configuration
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
spring.ai.vectorstore.pgvector.dimensions=768
spring.ai.vectorstore.pgvector.m=16
spring.ai.vectorstore.pgvector.ef-construction=200
spring.ai.vectorstore.pgvector.ef-search=100

Memory Management Tips

Nothing kills a Spring AI RAG application faster than memory leaks. Here’s my memory management strategy:

@Component
public class MemoryOptimizedDocumentProcessor {

    private final int BATCH_SIZE = 50;

    @Async
    public CompletableFuture<Void> processDocumentsInBatches(List<Document> documents) {
        List<List<Document>> batches = Lists.partition(documents, BATCH_SIZE);

        for (List<Document> batch : batches) {
            try {
                processBatch(batch);
                // Force garbage collection between batches
                System.gc();
            } catch (Exception e) {
                log.error("Error processing batch", e);
            }
        }

        return CompletableFuture.completedFuture(null);
    }

    private void processBatch(List<Document> batch) {
        // Process documents in smaller chunks
        vectorStore.add(batch);

        // Clear references to help GC
        batch.clear();
    }
}

JVM configuration for optimal Spring AI RAG performance:

# application.properties
spring.jvm.args=-Xmx4g -Xms2g -XX:+UseG1GC -XX:MaxGCPauseMillis=200

Scaling Considerations

When your Spring AI RAG application starts getting serious traffic, here’s how to scale:

@Configuration
public class RagScalingConfig {

    @Bean
    @ConfigurationProperties("app.rag.scaling")
    public RagScalingProperties ragScalingProperties() {
        return new RagScalingProperties();
    }

    @Bean
    public TaskExecutor ragTaskExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(10);
        executor.setMaxPoolSize(50);
        executor.setQueueCapacity(100);
        executor.setThreadNamePrefix("rag-async-");
        executor.initialize();
        return executor;
    }
}

Production-Ready Spring AI RAG Deployment

Let’s talk about taking your Spring AI RAG system from “it works on my machine” to “it’s ready for the world.”

Environment Configuration

Here’s my production-ready configuration structure:

# application-prod.properties
# Spring Profiles
spring.profiles.active=prod

# Ollama Configuration
spring.ai.ollama.base-url=${OLLAMA_BASE_URL:http://ollama-service:11434}
spring.ai.ollama.chat.options.model=${OLLAMA_CHAT_MODEL:llama3.2}
spring.ai.ollama.chat.options.temperature=${OLLAMA_TEMPERATURE:0.3}
spring.ai.ollama.embedding.options.model=${OLLAMA_EMBEDDING_MODEL:nomic-embed-text}

# Database Configuration
spring.datasource.url=${DB_URL:jdbc:postgresql://postgres-service:5432/vectordb}
spring.datasource.username=${DB_USERNAME:postgres}
spring.datasource.password=${DB_PASSWORD}

# PGVector Configuration
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
spring.ai.vectorstore.pgvector.dimensions=768

# Application RAG Configuration
app.rag.document-path=${DOCUMENT_PATH:/app/documents}
app.rag.max-documents=${MAX_DOCUMENTS:1000}
app.rag.cache-ttl=${CACHE_TTL:1800}

Monitoring and Logging

Production Spring AI RAG needs observability. Here’s my monitoring setup:

@Component
public class RagMetrics {

    private final Counter queryCounter;
    private final Timer responseTimer;
    private final Gauge documentCount;

    public RagMetrics(MeterRegistry meterRegistry) {
        this.queryCounter = Counter.builder("rag.queries.total")
            .description("Total RAG queries")
            .register(meterRegistry);

        this.responseTimer = Timer.builder("rag.response.time")
            .description("RAG response time")
            .register(meterRegistry);

        this.documentCount = Gauge.builder("rag.documents.count")
            .description("Number of documents in vector store")
            .register(meterRegistry, this, RagMetrics::getDocumentCount);
    }

    public void recordQuery() {
        queryCounter.increment();
    }

    public Timer.Sample startResponseTimer() {
        return Timer.start(responseTimer);
    }

    private double getDocumentCount() {
        // Return actual document count from vector store
        return vectorStore.count();
    }
}

Enhanced logging configuration:

@Aspect
@Component
public class RagLoggingAspect {

    private static final Logger logger = LoggerFactory.getLogger(RagLoggingAspect.class);

    @Around("@annotation(org.springframework.web.bind.annotation.PostMapping)")
    public Object logRagRequests(ProceedingJoinPoint joinPoint) throws Throwable {
        String methodName = joinPoint.getSignature().getName();
        Object[] args = joinPoint.getArgs();

        logger.info("RAG Request: {} with args: {}", methodName, Arrays.toString(args));

        long startTime = System.currentTimeMillis();
        try {
            Object result = joinPoint.proceed();
            long executionTime = System.currentTimeMillis() - startTime;

            logger.info("RAG Response: {} completed in {} ms", methodName, executionTime);
            return result;
        } catch (Exception e) {
            logger.error("RAG Error: {} failed with exception: {}", methodName, e.getMessage());
            throw e;
        }
    }
}

Security Best Practices

Security in Spring AI RAG is non-negotiable. Here’s my fortress-like approach:

@Configuration
@EnableWebSecurity
public class RagSecurityConfig {

    @Bean
    public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
        http
            .csrf(csrf -> csrf.disable())
            .authorizeHttpRequests(auth -> auth
                .requestMatchers("/api/rag/**").authenticated()
                .requestMatchers("/actuator/health").permitAll()
                .anyRequest().authenticated()
            )
            .oauth2ResourceServer(oauth2 -> oauth2
                .jwt(jwt -> jwt.jwtDecoder(jwtDecoder()))
            );

        return http.build();
    }

    @Bean
    public JwtDecoder jwtDecoder() {
        return NimbusJwtDecoder.withJwkSetUri("https://your-auth-server/.well-known/jwks.json")
            .build();
    }
}

Input validation and sanitization:

@RestController
@RequestMapping("/api/rag")
@Validated
public class SecureRagController {

    @PostMapping("/query")
    public ResponseEntity<ChatResponse> secureQuery(
            @Valid @RequestBody ChatRequest request,
            Authentication authentication) {

        // Sanitize input
        String sanitizedQuery = sanitizeInput(request.getQuery());

        // Rate limiting
        if (!rateLimitService.isAllowed(authentication.getName())) {
            return ResponseEntity.status(429).build();
        }

        // Proceed with query
        return processQuery(sanitizedQuery);
    }

    private String sanitizeInput(String input) {
        // Remove potential injection attempts
        return input.replaceAll("[<>"']", "")
                   .trim()
                   .substring(0, Math.min(input.length(), 1000));
    }
}

Docker Containerization

Let’s containerize our Spring AI RAG application properly:

FROM openjdk:17-jdk-slim

WORKDIR /app

COPY target/spring-ai-rag-*.jar app.jar

# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
RUN chown -R appuser:appuser /app
USER appuser

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 
  CMD curl -f http://localhost:8080/actuator/health || exit 1

EXPOSE 8080

ENTRYPOINT ["java", "-jar", "app.jar"]

And the docker-compose for the full stack:

version: "3.8"

services:
  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_DB: vectordb
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: password
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 30s
      timeout: 10s
      retries: 5

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 5

  spring-ai-rag:
    build: .
    ports:
      - "8080:8080"
    depends_on:
      postgres:
        condition: service_healthy
      ollama:
        condition: service_healthy
    environment:
      - DB_URL=jdbc:postgresql://postgres:5432/vectordb
      - DB_USERNAME=postgres
      - DB_PASSWORD=password
      - OLLAMA_BASE_URL=http://ollama:11434
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
      interval: 30s
      timeout: 10s
      retries: 5

volumes:
  postgres_data:
  ollama_data:

Part 7: Real-World Examples and Extensions

Extending Your Spring AI RAG Application

Now that we’ve built our Spring AI RAG foundation, let’s talk about taking it to the next level. Because let’s be honest – a basic RAG system is like a bicycle with training wheels. It gets you moving, but eventually, you’ll want to do some serious cycling!

Multi-Tenant Architecture for Spring AI RAG

One of the first questions I get is: “How do I make this work for multiple customers?” Well, buckle up, because multi-tenancy in Spring AI RAG is where things get spicy! 🌶️

Here’s how I typically handle tenant isolation in my Spring AI RAG applications:

@Service
public class TenantAwareRetrievalService {

    private final VectorStore vectorStore;
    private final TenantContextHolder tenantContext;

    public TenantAwareRetrievalService(VectorStore vectorStore,
                                     TenantContextHolder tenantContext) {
        this.vectorStore = vectorStore;
        this.tenantContext = tenantContext;
    }

    public List<Document> retrieveContext(String query) {
        String tenantId = tenantContext.getCurrentTenant();

        SearchRequest request = SearchRequest.builder()
            .query(query)
            .topK(5)
            .similarityThreshold(0.65)
            .filterExpression(String.format("metadata.tenant_id == '%s'", tenantId))
            .build();

        return vectorStore.similaritySearch(request);
    }
}

The magic happens in that filterExpression. By tagging each document with a tenant_id during ingestion, we ensure data isolation. It’s like having separate filing cabinets for each customer – nobody gets to peek into someone else’s documents!

Custom Document Processors

Sometimes the built-in document readers just don’t cut it. Maybe you’re dealing with proprietary formats, or you need special preprocessing. Here’s how I extend the DocumentService for custom scenarios:

@Service
public class EnhancedDocumentService extends DocumentService {

    private final TextSplitter documentSplitter;
    private final VectorStore vectorStore;
    private final CustomMetadataExtractor metadataExtractor;

    public void ingestCustomDocument(Resource resource, Map<String, Object> customMetadata)
            throws Exception {

        DocumentReader reader = createCustomReader(resource);
        List<Document> documents = reader.get();

        // Enhance with custom metadata
        documents.forEach(doc -> {
            Map<String, Object> enhancedMetadata = new HashMap<>(doc.getMetadata());
            enhancedMetadata.putAll(customMetadata);
            enhancedMetadata.put("processed_date", LocalDateTime.now().toString());
            enhancedMetadata.put("content_type", detectContentType(doc));

            doc.getMetadata().putAll(enhancedMetadata);
        });

        List<Document> chunks = documentSplitter.apply(documents);
        vectorStore.add(chunks);
    }

    private DocumentReader createCustomReader(Resource resource) {
        String extension = getFileExtension(resource.getFilename());

        return switch (extension) {
            case ".json" -> new JsonDocumentReader(resource);
            case ".csv" -> new CsvDocumentReader(resource);
            case ".xml" -> new XmlDocumentReader(resource);
            default -> throw new UnsupportedOperationException(
                "Custom reader not available for: " + extension);
        };
    }
}

Pro tip: Always think about metadata! It’s like seasoning in cooking – the right amount makes everything better, but too much ruins the dish.

Advanced Filtering Capabilities

Let’s supercharge our filtering game. Sometimes you need more than just tenant isolation – maybe you want to filter by document type, date ranges, or content categories:

@Service
public class AdvancedSearchService {

    private final VectorStore vectorStore;

    public List<Document> advancedSearch(SearchCriteria criteria) {
        StringBuilder filterBuilder = new StringBuilder();
        List<String> conditions = new ArrayList<>();

        // Tenant isolation (always include this!)
        if (criteria.getTenantId() != null) {
            conditions.add(String.format("metadata.tenant_id == '%s'", criteria.getTenantId()));
        }

        // Document type filtering
        if (criteria.getDocumentTypes() != null && !criteria.getDocumentTypes().isEmpty()) {
            String typeFilter = criteria.getDocumentTypes().stream()
                .map(type -> String.format("metadata.document_type == '%s'", type))
                .collect(Collectors.joining(" || "));
            conditions.add("(" + typeFilter + ")");
        }

        // Date range filtering
        if (criteria.getFromDate() != null) {
            conditions.add(String.format("metadata.created_date >= '%s'",
                criteria.getFromDate().toString()));
        }

        // Category filtering with OR logic
        if (criteria.getCategories() != null && !criteria.getCategories().isEmpty()) {
            String categoryFilter = criteria.getCategories().stream()
                .map(cat -> String.format("metadata.category == '%s'", cat))
                .collect(Collectors.joining(" || "));
            conditions.add("(" + categoryFilter + ")");
        }

        String finalFilter = String.join(" && ", conditions);

        SearchRequest request = SearchRequest.builder()
            .query(criteria.getQuery())
            .topK(criteria.getLimit())
            .similarityThreshold(criteria.getSimilarityThreshold())
            .filterExpression(finalFilter.isEmpty() ? null : finalFilter)
            .build();

        return vectorStore.similaritySearch(request);
    }
}

Integration with External APIs

Real-world Spring AI RAG applications rarely live in isolation. Here’s how I integrate with external services to enrich the retrieval process:

@Service
public class EnrichedRetrievalService {

    private final HybridSearchService hybridSearchService;
    private final ExternalKnowledgeService externalKnowledgeService;
    private final CacheManager cacheManager;

    @Cacheable(value = "enriched-search", key = "#query")
    public EnrichedSearchResult enrichedSearch(String query) {
        // 1. Get documents from our local RAG
        List<Document> localResults = hybridSearchService.hybridSearch(query);

        // 2. Enrich with external data if needed
        List<ExternalKnowledgeItem> externalData = Collections.emptyList();
        if (shouldFetchExternalData(query, localResults)) {
            externalData = externalKnowledgeService.searchExternal(query);
        }

        // 3. Combine and rank results
        return new EnrichedSearchResult(localResults, externalData,
                                      calculateConfidenceScore(localResults));
    }

    private boolean shouldFetchExternalData(String query, List<Document> localResults) {
        // Only fetch external data if local results are insufficient
        return localResults.isEmpty() ||
               localResults.stream()
                   .noneMatch(doc -> calculateRelevanceScore(doc, query) > 0.8);
    }
}

Spring AI RAG Best Practices and Patterns

After building dozens of Spring AI RAG applications, I’ve learned some hard lessons. Let me save you from making the same mistakes I did!

Code Organization Strategies

Here’s my go-to project structure for Spring AI RAG applications:

src/main/java/com/yourcompany/rag/
├── config/                    # Configuration classes
│   ├── DocumentConfig.java
│   ├── VectorStoreConfig.java
│   └── SecurityConfig.java
├── controller/               # REST endpoints
│   ├── RagController.java
│   └── AdminController.java
├── service/                  # Business logic
│   ├── core/                # Core RAG services
│   │   ├── RetrievalService.java
│   │   ├── ResponseService.java
│   │   └── HybridSearchService.java
│   ├── document/            # Document processing
│   │   ├── DocumentService.java
│   │   ├── DocumentProcessor.java
│   │   └── MetadataExtractor.java
│   └── integration/         # External integrations
│       ├── ExternalApiService.java
│       └── CacheService.java
├── domain/                  # Domain models and DTOs
│   ├── SearchCriteria.java
│   ├── SearchResult.java
│   └── DocumentMetadata.java
└── repository/              # Data access (if needed)
    └── DocumentRepository.java

Configuration Management

I learned the hard way that configuration management can make or break your Spring AI RAG application. Here’s my battle-tested approach:

@ConfigurationProperties(prefix = "app.rag")
@Configuration
public class RagProperties {

    private Search search = new Search();
    private Processing processing = new Processing();
    private Security security = new Security();

    public static class Search {
        private int topK = 5;
        private double similarityThreshold = 0.65;
        private int maxResults = 20;
        private boolean hybridSearchEnabled = true;

        // getters and setters...
    }

    public static class Processing {
        private int chunkSize = 512;
        private int chunkOverlap = 128;
        private boolean preserveParagraphs = true;
        private List<String> supportedFormats = List.of("pdf", "docx", "txt", "md");

        // getters and setters...
    }

    public static class Security {
        private boolean multiTenantEnabled = true;
        private String defaultTenant = "default";
        private boolean auditingEnabled = true;

        // getters and setters...
    }
}

And in your application.properties:

# RAG Search Configuration
app.rag.search.top-k=5
app.rag.search.similarity-threshold=0.65
app.rag.search.max-results=20
app.rag.search.hybrid-search-enabled=true

# RAG Processing Configuration
app.rag.processing.chunk-size=512
app.rag.processing.chunk-overlap=128
app.rag.processing.preserve-paragraphs=true
app.rag.processing.supported-formats=pdf,docx,txt,md

# RAG Security Configuration
app.rag.security.multi-tenant-enabled=true
app.rag.security.default-tenant=default
app.rag.security.auditing-enabled=true

Error Handling Patterns

Nothing ruins a user experience like cryptic error messages. Here’s how I handle errors in my Spring AI RAG applications:

@ControllerAdvice
public class RagExceptionHandler {

    private static final Logger logger = LoggerFactory.getLogger(RagExceptionHandler.class);

    @ExceptionHandler(DocumentProcessingException.class)
    public ResponseEntity<ErrorResponse> handleDocumentProcessing(
            DocumentProcessingException ex) {
        logger.warn("Document processing failed: {}", ex.getMessage());

        return ResponseEntity.status(HttpStatus.BAD_REQUEST)
            .body(new ErrorResponse(
                "DOCUMENT_PROCESSING_ERROR",
                "Unable to process the document. Please check the format and try again.",
                ex.getDocumentName()
            ));
    }

    @ExceptionHandler(SearchException.class)
    public ResponseEntity<ErrorResponse> handleSearchError(SearchException ex) {
        logger.error("Search operation failed", ex);

        return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
            .body(new ErrorResponse(
                "SEARCH_ERROR",
                "Something went wrong with the search. Our team has been notified.",
                null
            ));
    }

    @ExceptionHandler(VectorStoreException.class)
    public ResponseEntity<ErrorResponse> handleVectorStore(VectorStoreException ex) {
        logger.error("Vector store operation failed", ex);

        return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
            .body(new ErrorResponse(
                "SERVICE_UNAVAILABLE",
                "The search service is temporarily unavailable. Please try again later.",
                null
            ));
    }
}

Monitoring and Observability

You can’t improve what you don’t measure! Here’s my monitoring setup for Spring AI RAG:

@Component
public class RagMetricsCollector {

    private final MeterRegistry meterRegistry;
    private final Timer searchTimer;
    private final Counter successfulQueries;
    private final Counter failedQueries;
    private final Gauge vectorStoreSize;

    public RagMetricsCollector(MeterRegistry meterRegistry, VectorStore vectorStore) {
        this.meterRegistry = meterRegistry;
        this.searchTimer = Timer.builder("rag.search.duration")
            .description("Time taken for RAG search operations")
            .register(meterRegistry);

        this.successfulQueries = Counter.builder("rag.queries.successful")
            .description("Number of successful queries")
            .register(meterRegistry);

        this.failedQueries = Counter.builder("rag.queries.failed")
            .description("Number of failed queries")
            .register(meterRegistry);

        this.vectorStoreSize = Gauge.builder("rag.vectorstore.size")
            .description("Number of documents in vector store")
            .register(meterRegistry, vectorStore, this::getVectorStoreSize);
    }

    public void recordSearchTime(Duration duration) {
        searchTimer.record(duration);
    }

    public void recordSuccessfulQuery() {
        successfulQueries.increment();
    }

    public void recordFailedQuery(String errorType) {
        failedQueries.increment(Tags.of("error.type", errorType));
    }

    private double getVectorStoreSize(VectorStore vectorStore) {
        // Implementation depends on your vector store
        // This is a placeholder
        return 0.0;
    }
}

Conclusion

Whew! What a journey we’ve been on together! 🎉

What We’ve Accomplished

Let me take a moment to appreciate what we’ve built here. We started with a simple question: “How can I build a Spring AI RAG application that actually works in production?” And look where we ended up!

We’ve created:

A robust document ingestion pipeline that can handle multiple formats
A sophisticated hybrid search system that combines vector similarity with keyword matching
A responsive web interface that users actually want to use

Task for you to enhance it further based on snippets shared above:

A scalable architecture that can grow with your needs
Enable caching
Monitoring and error handling that keeps you sane at 3 AM

But more importantly, we’ve built something that solves real problems. Your users can now ask questions in natural language and get accurate, contextual answers from your documents. That’s pretty magical when you think about it!

Next Steps for Your Spring AI RAG Journey

This is just the beginning! Here are some exciting directions you can take your Spring AI RAG application:

Immediate Enhancements:

Add user authentication and authorization
Implement conversation memory to handle follow-up questions
Create a document management interface for easy uploads
Add support for real-time document updates

Advanced Features:

Multi-modal RAG (images, videos, audio)
Integration with enterprise systems (SharePoint, Confluence, etc.)
Custom fine-tuning for domain-specific language
Advanced analytics and user behavior tracking

Production Readiness:

Implement comprehensive logging and monitoring
Set up automated testing and CI/CD pipelines
Add rate limiting and security headers
Create backup and disaster recovery procedures

Resources for Continued Learning

The Spring AI ecosystem is evolving rapidly, and staying current is crucial. Here are my go-to resources:

Official Documentation:

Community Resources:

Spring AI GitHub repository (watch it for updates!)
Stack Overflow (tag: spring-ai)
Spring Community Forums

My Personal Recommendations:

Follow the Spring team on Twitter for announcements
Join RAG-focused Discord servers and Slack channels
Experiment with different embedding models and vector stores

Call to Action

If this tutorial helped you build your first Spring AI RAG application, I’d love to hear about it!

🌟 Star the GitHub repository: https://github.com/SundrymindTech/Spring-AI-RAG

💌 Subscribe to new blog notification for more Spring AI tutorials, tips, and real-world case studies. I promise no spam – just good content when I have something valuable to share.

And remember – building great software is a team sport. Don’t be afraid to:

Ask questions in the comments
Share your own implementations and improvements
Contribute back to the open-source community

The future of AI-powered applications is bright, and with Spring AI RAG in your toolkit, you’re well-equipped to be part of that future.

Now go forth and build amazing things! 🚀

P.S. If you run into any issues with the code, check out the troubleshooting guide here. And if that doesn’t help, don’t hesitate to reach out. We’re all in this together!

Building Your First Spring AI RAG Application: A Complete Developer’s Guide to Intelligent Document Search