Introduction to Spring AI RAG
Let me tell you about the day I almost threw my laptop out the window. I was desperately searching through hundreds of PDF documents, trying to find that one specific piece of information for a client demo. You know the drill – Ctrl+F, scroll, scan, repeat. My eyes were burning, my patience was gone, and I was questioning my life choices.
That’s when I discovered Spring AI RAG (Retrieval-Augmented Generation), and honestly, it changed everything. Instead of manually hunting through documents like a digital archaeologist, I could simply ask: “What are the key features of Baldur’s Gate 3?” and get an intelligent, contextual answer pulled from my entire document collection.
If you’ve ever felt frustrated with traditional search methods, or if you’re curious about building applications that can actually understand and reason about your data, you’re in for a treat. RAG isn’t just a buzzword – it’s the future of intelligent applications, and Spring AI makes it surprisingly accessible.
Building on our Spring AI and Ollama integration, we’re now taking things to the next level. Today, we’re going to build something pretty cool together: a fully functional Spring AI RAG application that can intelligently search through your documents and provide contextual answers. Think of it as having a super-smart assistant that has read every document in your collection and can answer questions with pinpoint accuracy.
Table of Contents
Part 1: Understanding Spring AI RAG Fundamentals
What is Spring AI RAG and Why Should You Care?
Let me explain RAG with my favorite coffee shop analogy (because everything is better with coffee, right?).
Imagine you walk into a coffee shop and ask the barista: “What’s the best drink for someone who likes sweet but not too sweet, with a caffeine kick but not too jittery?”
A traditional search system would be like a barista who can only point you to the menu board and say “Figure it out yourself.” Frustrating, right?
But a RAG system is like having a barista who:
- Retrieves relevant information from their knowledge base (maybe they remember you ordered a caramel macchiato last week and loved it)
- Augments that information with current context (it’s 3 PM, you mentioned you have a meeting later)
- Generates a personalized response (“I’d recommend our honey oat milk latte with an extra shot – it’s sweet but balanced, and the extra caffeine will keep you sharp for your meeting”)
That’s exactly what Spring AI RAG does with your documents and data.
RAG in Simple Terms:
- Retrieval: Find relevant documents or chunks of information
- Augmentation: Add that context to your question
- Generation: Use an AI model to create a smart, contextual response
Why Spring AI Makes RAG Development a Breeze:
Spring AI is like having a Swiss Army knife for AI development. Before Spring AI, building RAG applications meant juggling multiple libraries, dealing with complex integrations, and writing tons of boilerplate code. Now? It’s as simple as adding a few annotations and beans.
Here’s what makes Spring AI RAG special:
- Unified API: One consistent interface for different AI models and vector stores
- Spring Boot Integration: All the conveniences you love about Spring Boot
- Production Ready: Built-in monitoring, error handling, and scalability
- Extensible: Easy to customize and extend for your specific needs
Real-World Use Cases That’ll Make You Excited:
- Internal Knowledge Base: Your company’s documentation, policies, and procedures become instantly searchable
- Customer Support: Automatically find relevant information to answer customer queries
- Research Assistant: Quickly find information across academic papers, reports, and research documents
- Legal Document Analysis: Search through contracts, legal precedents, and case studies
- Personal Knowledge Management: Your own digital brain that remembers everything you’ve read
Spring AI RAG Architecture Deep Dive
Now, let’s peek under the hood and see how Spring AI RAG actually works. Don’t worry – I’ll keep it interesting and skip the boring technical jargon.
The Three Pillars: Retrieval, Augmentation, Generation
Think of Spring AI RAG as a three-act play:
Act 1: Retrieval (The Detective)
This is where your application becomes Sherlock Holmes. When you ask a question, the retrieval system:
- Converts your question into a vector (think of it as a mathematical fingerprint)
- Searches through your vector database to find similar content
- Returns the most relevant document chunks
Here’s how the architecture looks of our Spring AI RAG application:
@Service
public class HybridSearchService {
private final VectorStore vectorStore;
private final List<Document> allDocuments;
public List<Document> hybridSearch(String query) {
// Vector similarity search
SearchRequest vectorSearchRequest = SearchRequest.builder()
.query(query)
.topK(5)
.similarityThreshold(0.65)
.build();
List<Document> vectorResults = vectorStore.similaritySearch(vectorSearchRequest);
// Enhanced keyword search
List<Document> keywordResults = allDocuments.stream()
.filter(doc -> isRelevantForKeywordSearch(doc, query))
.sorted(Comparator.comparingInt(
doc -> -calculateRelevanceScore(doc.getFormattedContent(), query)))
.limit(8)
.collect(Collectors.toList());
// Combine results using hybrid ranking
return new HybridRanker().fuse(vectorResults, keywordResults);
}
}
Act 2: Augmentation (The Librarian)
This is where the magic happens. The augmentation step takes your original question and says: “Hey, here’s some relevant context that might help answer this question better.”
@Service
public class ResponseService {
private final String SYSTEM_PROMPT = """
You are a helpful assistant that answers questions based on provided context.
FORMATTING RULES:
1. ONLY use information from the CONTEXT section below
2. Format your response in clear, readable markdown
3. Use **bold** for important terms
4. Always include source references at the end
5. If the answer is not in the context, respond with: "I don't have information about that in the provided documents."
CONTEXT:
{context}
QUESTION: {question}
""";
public String generateResponse(String query, List<Document> context) {
String formattedContext = formatContext(context);
PromptTemplate template = new PromptTemplate(SYSTEM_PROMPT);
Prompt prompt = template.create(Map.of(
"context", formattedContext,
"question", query
));
return chatClient.prompt(prompt).call().content();
}
}
Act 3: Generation (The Storyteller)
Finally, the AI model takes your question and the relevant context and crafts a response that’s both accurate and natural. It’s like having a really smart friend who’s just read all your documents and can explain things in a way that makes sense.
How Vector Databases Fit into the Spring AI RAG Ecosystem
Vector databases are the secret sauce that makes Spring AI RAG possible. Here’s why they’re so powerful:
Traditional Database: “Find me all documents where title = ‘Spring AI'”
Vector Database: “Find me all documents that are semantically similar to ‘Spring AI framework for building intelligent applications'”
See the difference? Vector databases understand meaning, not just exact matches.
In our Spring AI RAG setup, we’re using PGVector (PostgreSQL with vector extensions):
// Configuration for PGVector
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
spring.ai.vectorstore.pgvector.dimensions=768
Component Interaction Flow
Here’s how all the pieces work together in our Spring AI RAG application:
- Document Ingestion: Documents are chunked and converted to vectors
- Storage: Vectors are stored in PGVector database
- Query Processing: User question is converted to a vector
- Retrieval: Similar vectors (documents) are found
- Augmentation: Context is added to the original question
- Generation: AI model generates a response
- Response: User gets an intelligent, contextual answer
@RestController
@RequestMapping("/api/rag")
public class RagController {
private final HybridSearchService hybridSearchService;
private final ResponseService responseService;
@PostMapping("/query")
public ResponseEntity<ChatResponse> queryRAG(@RequestBody ChatRequest request) {
// 1. Retrieve relevant context documents
List<Document> context = hybridSearchService.hybridSearch(request.getQuery());
// 2. Generate response using context
String llmResponse = responseService.generateResponse(request.getQuery(), context);
// 3. Create structured response with sources
ChatResponse response = new ChatResponse(
llmResponse,
context.size(),
extractSources(context),
LocalDateTime.now(),
"success"
);
return ResponseEntity.ok(response);
}
}
The beauty of Spring AI RAG is that it handles all the complex orchestration behind the scenes. You just need to focus on your business logic and let Spring AI handle the heavy lifting.
In the next section, we’ll dive into setting up your development environment and getting your hands dirty with some actual code. Trust me, once you see how easy it is to get started, you’ll wonder why you didn’t try Spring AI RAG sooner!
Part 2: Setting Up Your Spring AI RAG Environment
Spring AI RAG Development Environment Setup
Now that we’ve covered the theory, let’s get our hands dirty! Setting up a Spring AI RAG environment isn’t just about installing dependencies – it’s about creating a robust foundation that won’t crumble under pressure. Trust me, I’ve been there. Nothing’s worse than having your RAG system work perfectly in development, only to crash spectacularly in production because you skipped the proper setup.
PGVector Database Configuration for Spring AI RAG
For the PGVector database setup, I’ve prepared a comprehensive installation guide that covers everything from Docker setup to performance tuning. You can [download the complete PGVector setup guide here] – it’s got all the Docker commands, configuration files, and troubleshooting tips you’ll need. No more googling “why is my vector database so slow” at 2 AM! If you could nail the pg vector setup part and were able to spin it up successfully, the command prompt will look like something like below:
Ollama Integration with Spring AI RAG
Similarly, I’ve created a detailed Ollama setup guide that walks you through installing and configuring Ollama with the right models for your Spring AI RAG system. [Download the Ollama setup guide here] – it includes model recommendations, performance optimizations, and those little tricks that make all the difference. If things go well you should see something like this once you execute ollama serve command:
Spring Boot Project Structure for RAG Applications
Let’s dive into the project structure that makes Spring AI RAG applications maintainable and scalable. Here’s how I organize my RAG projects:
src/main/java/com/sundrymind/
├── config/
│ ├── SpringAiRagApplication.java
│ ├── DocumentConfig.java
│ └── DocumentIngestionRunner.java
├── controller/
│ └── RagController.java
├── service/
│ ├── DocumentService.java
│ ├── HybridSearchService.java
│ ├── ResponseService.java
│ └── RetrievalService.java
└── resources/
├── application.properties
├── data/
└── static/index.html
Maven Dependencies Breakdown
Let’s start with the Maven dependencies that make Spring AI RAG magic happen:
<dependencies>
<!-- Spring AI Core -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>
<!-- Vector Store Support -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>
<!-- Document Readers -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-pdf-document-reader</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-tika-document-reader</artifactId>
</dependency>
<!-- PostgreSQL Driver -->
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<scope>runtime</scope>
</dependency>
<!-- Apache Commons for String Utils -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>
</dependencies>
Application Properties Deep Dive
Your application.properties
file is the command center of your Spring AI RAG application. Here’s my battle-tested configuration:
spring.application.name=SpringAIRag
# Database Configuration for PGVector
spring.datasource.url=jdbc:postgresql://localhost:5432/vectordb
spring.datasource.username=postgres
spring.datasource.password=password
spring.datasource.driver-class-name=org.postgresql.Driver
# PGVector Configuration
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
spring.ai.vectorstore.pgvector.dimensions=768
# Ollama base endpoint
spring.ai.ollama.base-url=http://localhost:11434
# Use llama3.2 for chat
spring.ai.ollama.chat.enabled=true
spring.ai.ollama.chat.options.model=llama3.2
spring.ai.ollama.chat.options.temperature=0.3
spring.ai.ollama.chat.options.top-k=40
# Use nomic-embed-text for embeddings
spring.ai.ollama.embedding.enabled=true
spring.ai.ollama.embedding.options.model=nomic-embed-text
# Disable default ONNX auto-config to avoid conflict
spring.ai.embedding.transformers.enabled=false
Pro tip: That temperature=0.3
setting? That’s the sweet spot for RAG responses. Too low (0.1), and your AI sounds like a robot reading a manual. Too high (0.8) and it starts hallucinating like it’s at a creative writing workshop. But you can always play around with this and try to find what fits your case.
Configuration Class Explanations
The DocumentConfig
class is where we set up our text processing pipeline:
@Configuration
public class DocumentConfig {
@Bean
public TokenTextSplitter documentSplitter() {
return new TokenTextSplitter(
512, // Target chunk size
128, // Context-preserving overlap
50, // Minimum chunk size
2048, // Absolute maximum
true // Maintain paragraph breaks
);
}
@Bean
public ChatClient chatClient(ChatModel chatModel) {
return ChatClient.builder(chatModel)
.defaultSystem("You are a friendly AI assistant. " +
"Keep responses concise and helpful. " +
"Be conversational but professional.")
.build();
}
}
Those chunk size numbers aren’t random – they’re based on months of experimentation. 512 tokens is the goldilocks zone for most documents, with 128 overlap ensuring we don’t lose context at chunk boundaries.
Part 3: Building the Core Spring AI RAG Components
Implementing Document Ingestion in Spring AI RAG
Document ingestion is where the magic begins. It’s like preparing ingredients for a gourmet meal – do it wrong, and even the best chef can’t save your dish.
Document Service Architecture
Here’s our DocumentService
that handles multiple document formats:
@Service
public class DocumentService {
private final TextSplitter documentSplitter;
private final VectorStore vectorStore;
public DocumentService(TextSplitter documentSplitter, VectorStore vectorStore) {
this.documentSplitter = documentSplitter;
this.vectorStore = vectorStore;
}
public void ingestDocument(Resource resource) throws Exception {
DocumentReader reader = switch (resource.getFilename()
.substring(resource.getFilename().lastIndexOf('.'))) {
case ".pdf" -> new PagePdfDocumentReader(resource);
case ".docx" -> new TikaDocumentReader(resource);
case ".txt", ".md" -> new TextReader(resource);
default -> throw new IllegalArgumentException("Unsupported format");
};
List<Document> chunks = documentSplitter.apply(reader.get());
vectorStore.add(chunks);
}
}
Multi-format Support (PDF, DOCX, TXT)
The beauty of Spring AI RAG is its plug-and-play document readers. Want to add PowerPoint support? Just add the dependency and another case to the switch statement. It’s like having a Swiss Army knife for documents.
Chunking Strategies That Actually Work
Here’s the thing about chunking – it’s not just about splitting text. It’s about preserving meaning. Our TokenTextSplitter
configuration with 128-token overlap ensures that important context doesn’t get lost between chunks. I learned this the hard way when my early RAG system kept giving fragmented answers because chunks were too isolated.
Vector Embedding Generation
The automatic document ingestion happens through our DocumentIngestionRunner
:
@Component
public class DocumentIngestionRunner implements CommandLineRunner {
private static final Logger logger = LoggerFactory.getLogger(DocumentIngestionRunner.class);
private final DocumentService documentService;
private final ResourceLoader resourceLoader;
private static final String DOCUMENTS_FOLDER = "classpath:/data/";
public DocumentIngestionRunner(DocumentService documentService, ResourceLoader resourceLoader) {
this.documentService = documentService;
this.resourceLoader = resourceLoader;
}
@Override
public void run(String... args) throws Exception {
logger.info("Starting document ingestion process...");
try {
Resource folderResource = resourceLoader.getResource(DOCUMENTS_FOLDER);
Path folderPath = Paths.get(folderResource.getURI());
try (Stream<Path> paths = Files.walk(folderPath)) {
paths.filter(Files::isRegularFile)
.forEach(filePath -> {
try {
Resource fileResource = resourceLoader.getResource("file:" + filePath.toAbsolutePath());
logger.info("Ingesting document: {}", fileResource.getFilename());
documentService.ingestDocument(fileResource);
} catch (Exception e) {
logger.error("Failed to ingest document {}: {}", filePath.getFileName(), e.getMessage());
}
});
}
logger.info("Document ingestion process completed.");
} catch (IOException e) {
logger.error("Could not find or access the documents folder at {}", DOCUMENTS_FOLDER, e);
}
}
}
This runner automatically processes all documents in your data
folder on startup. Just drop your files there and let Spring AI RAG do its thing!
Advanced Search Capabilities in Spring AI RAG
Hybrid Search Implementation
Here’s where we get fancy. Pure vector search is great, but combining it with keyword search is like having both a telescope and a magnifying glass – you see both the big picture and the fine details.
@Service
public class HybridSearchService {
private final VectorStore vectorStore;
private final List<Document> allDocuments;
public HybridSearchService(VectorStore vectorStore, List<Document> allDocuments) {
this.vectorStore = vectorStore;
this.allDocuments = allDocuments;
}
public List<Document> hybridSearch(String query) {
// 1. Vector similarity search with higher threshold
SearchRequest vectorSearchRequest = SearchRequest.builder()
.query(query)
.topK(5)
.similarityThreshold(0.65)
.build();
List<Document> vectorResults = vectorStore.similaritySearch(vectorSearchRequest);
// 2. Enhanced keyword search with better filtering
List<Document> keywordResults = allDocuments.stream()
.filter(doc -> isRelevantForKeywordSearch(doc, query))
.sorted(Comparator.comparingInt(
doc -> -calculateRelevanceScore(doc.getFormattedContent(), query)))
.limit(8)
.collect(Collectors.toList());
// 3. Combine and filter results
List<Document> fusedResults = new HybridRanker().fuse(vectorResults, keywordResults);
// 4. Post-process to ensure quality
return fusedResults.stream()
.filter(doc -> isHighQualityResult(doc, query))
.limit(5)
.collect(Collectors.toList());
}
}
Vector Similarity vs Keyword Matching
Vector similarity is like having a conversation with someone who understands context – it gets the semantic meaning. Keyword matching is like having a really good librarian who knows exactly where everything is filed. Together, they’re unstoppable.
Result Ranking and Fusion Techniques
The HybridRanker
uses Reciprocal Rank Fusion (RRF) to combine results from both search methods:
public static class HybridRanker {
private static final double K = 60.0;
public List<Document> fuse(List<Document> vectorResults, List<Document> keywordResults) {
Map<String, Double> fusedScores = new HashMap<>();
Map<String, Document> documentMap = new HashMap<>();
// Process vector results
for (int i = 0; i < vectorResults.size(); i++) {
Document doc = vectorResults.get(i);
String docId = doc.getId() != null ? doc.getId() : doc.getFormattedContent().hashCode() + "";
double score = 1.0 / (K + (i + 1));
fusedScores.merge(docId, score, Double::sum);
documentMap.putIfAbsent(docId, doc);
}
// Process keyword results
for (int i = 0; i < keywordResults.size(); i++) {
Document doc = keywordResults.get(i);
String docId = doc.getId() != null ? doc.getId() : doc.getFormattedContent().hashCode() + "";
double score = 1.0 / (K + (i + 1));
fusedScores.merge(docId, score, Double::sum);
documentMap.putIfAbsent(docId, doc);
}
return fusedScores.entrySet().stream()
.sorted(Map.Entry.<String, Double>comparingByValue().reversed())
.map(entry -> documentMap.get(entry.getKey()))
.collect(Collectors.toList());
}
}
Response Generation with Spring AI RAG
Prompt Engineering Best Practices
Here’s where the rubber meets the road. Your prompt is like a recipe – get it wrong, and even the best ingredients won’t save you:
@Service
public class ResponseService {
private final ChatClient chatClient;
private final String SYSTEM_PROMPT = """
You are a helpful assistant that answers questions based on provided context.
FORMATTING RULES:
1. ONLY use information from the CONTEXT section below
2. Format your response in clear, readable markdown
3. Use bullet points for lists
4. Use numbered lists for step-by-step instructions
5. Use **bold** for important terms
6. Use code blocks for any technical terms or file names
7. Always include source references at the end
8. If the answer is not in the context, respond with: "I don't have information about that in the provided documents."
CONTEXT:
{context}
QUESTION: {question}
Provide a well-formatted, helpful response based only on the context above.
""";
public String generateResponse(String query, List<Document> context) {
if (context == null || context.isEmpty()) {
return "❌ I don't have any relevant documents to answer your question.";
}
String formattedContext = formatContext(context);
if (!isContextRelevant(formattedContext, query)) {
return "❌ I don't have information about that in the provided documents.";
}
PromptTemplate template = new PromptTemplate(SYSTEM_PROMPT);
Prompt prompt = template.create(Map.of(
"context", formattedContext,
"question", query
));
String response = chatClient.prompt(prompt)
.call()
.content();
return response;
}
}
Context Formatting for Better Results
The formatContext
method is crucial for giving your AI the right information in the right format:
private String formatContext(List<Document> documents) {
if (documents == null || documents.isEmpty()) {
return "No relevant context found.";
}
return documents.stream()
.map(document -> {
String content = document.getFormattedContent();
Map<String, Object> metadata = document.getMetadata();
StringBuilder formattedDoc = new StringBuilder();
formattedDoc.append("=== DOCUMENT SECTION ===n");
if (metadata.containsKey("file_name")) {
formattedDoc.append("Source: ").append(metadata.get("file_name")).append("n");
}
if (metadata.containsKey("page_number")) {
formattedDoc.append("Page: ").append(metadata.get("page_number")).append("n");
}
formattedDoc.append("Content:n").append(content).append("n");
formattedDoc.append("=== END SECTION ===n");
return formattedDoc.toString();
})
.collect(Collectors.joining("n"));
}
Handling Edge Cases Gracefully
The isContextRelevant
method prevents your AI from making stuff up when it doesn’t have good information:
private boolean isContextRelevant(String context, String query) {
String[] queryWords = query.toLowerCase().split("\s+");
String contextLower = context.toLowerCase();
int matches = 0;
for (String word : queryWords) {
if (word.length() > 3 && contextLower.contains(word)) {
matches++;
}
}
return matches >= Math.max(1, queryWords.length * 0.2);
}
This ensures that at least 20% of significant query words (longer than 3 characters) appear in the context before we proceed with generation.
Part 4: Creating the User Interface
Building a Modern Chat Interface for Spring AI RAG
Let’s face it – nobody wants to interact with your brilliant Spring AI RAG system through curl commands. Users want a sleek, responsive interface that doesn’t make them feel like they’re using software from 1995.
RESTful API Design
Our RagController
provides a clean REST API for the frontend:
@RestController
@RequestMapping("/api/rag")
@CrossOrigin(origins = "*")
public class RagController {
private final HybridSearchService hybridSearchService;
private final ResponseService responseService;
@PostMapping("/query")
public ResponseEntity<ChatResponse> queryRAG(@RequestBody ChatRequest request) {
try {
// 1. Retrieve relevant context documents
List<Document> context = hybridSearchService.hybridSearch(request.getQuery());
// 2. Generate response
String llmResponse = responseService.generateResponse(request.getQuery(), context);
// 3. Create structured response
ChatResponse response = new ChatResponse(
llmResponse,
context.size(),
extractSources(context),
LocalDateTime.now(),
"success"
);
return ResponseEntity.ok(response);
} catch (Exception e) {
ChatResponse errorResponse = new ChatResponse(
"❌ Sorry, I encountered an error while processing your request. Please try again.",
0,
List.of(),
LocalDateTime.now(),
"error"
);
return ResponseEntity.ok(errorResponse);
}
}
private List<SourceInfo> extractSources(List<Document> documents) {
return documents.stream()
.map(doc -> {
Map<String, Object> metadata = doc.getMetadata();
String source = Optional.ofNullable(metadata.get("source"))
.or(() -> Optional.ofNullable(metadata.get("file_name")))
.map(Object::toString)
.orElse("Unknown");
return new SourceInfo(
source,
metadata.getOrDefault("page_number", "").toString(),
truncateContent(doc.getFormattedContent(), 150)
);
})
.distinct()
.toList();
}
}
WebSocket Alternatives (Why We Chose REST)
You might be wondering, “Why not WebSockets for real-time chat?” Here’s the thing – for most Spring AI RAG applications, REST is perfect. WebSockets add complexity without much benefit unless you’re building a collaborative chat system. REST is simple, cacheable, and works great with our hybrid search approach.
Frontend Implementation with Vanilla JavaScript
I’m a big believer in keeping things simple. Here’s our vanilla JavaScript chat interface that doesn’t require a PhD in React:
async function sendMessage() {
const message = messageInput.value.trim();
if (!message) return;
// Add user message
addMessage(message, true);
// Clear input and disable button
messageInput.value = "";
sendButton.disabled = true;
// Add loading message
addLoadingMessage();
try {
const response = await fetch("/api/rag/query", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({ query: message }),
});
if (!response.ok) {
throw new Error("Network response was not ok");
}
const data = await response.json();
// Remove loading message
removeLoadingMessage();
// Add bot response
addMessage(data.response, false, data.sources || []);
} catch (error) {
console.error("Error:", error);
removeLoadingMessage();
addMessage("❌ Sorry, I encountered an error. Please try again.", false);
} finally {
sendButton.disabled = false;
messageInput.focus();
}
}
Responsive Design Considerations
Our CSS uses flexbox and modern techniques to ensure the interface looks great on everything from phones to ultrawide monitors:
.chat-container {
width: 800px;
height: 700px;
background: white;
border-radius: 20px;
box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);
display: flex;
flex-direction: column;
overflow: hidden;
}
@media (max-width: 768px) {
.chat-container {
width: 95%;
height: 95vh;
margin: 10px;
}
}
Enhancing User Experience in Spring AI RAG Applications
Loading States and Error Handling
Nothing kills user experience like uncertainty. Our loading animation gives users immediate feedback:
function addLoadingMessage() {
const messageDiv = document.createElement("div");
messageDiv.className = "message bot";
messageDiv.id = "loadingMessage";
const bubbleDiv = document.createElement("div");
bubbleDiv.className = "message-bubble";
const loadingDiv = document.createElement("div");
loadingDiv.className = "loading";
loadingDiv.innerHTML = `
<span>Thinking</span>
<div class="loading-dots">
<div class="loading-dot"></div>
<div class="loading-dot"></div>
<div class="loading-dot"></div>
</div>
`;
bubbleDiv.appendChild(loadingDiv);
messageDiv.appendChild(bubbleDiv);
messagesContainer.appendChild(messageDiv);
messagesContainer.scrollTop = messagesContainer.scrollHeight;
}
Source Citation Display
One of the coolest features of our Spring AI RAG system is automatic source citation:
// Add sources if available
if (sources && sources.length > 0) {
const sourcesDiv = document.createElement("div");
sourcesDiv.className = "sources";
sourcesDiv.innerHTML = "<strong>📚 Sources:</strong>";
sources.forEach((source) => {
const sourceDiv = document.createElement("div");
sourceDiv.className = "source-item";
sourceDiv.innerHTML = `
<div class="source-file">${source.fileName}${
source.pageNumber ? ` (Page ${source.pageNumber})` : ""
}</div>
${
source.snippet
? `<div class="source-snippet">${source.snippet}</div>`
: ""
}
`;
sourcesDiv.appendChild(sourceDiv);
});
contentDiv.appendChild(sourcesDiv);
}
Markdown Rendering for Rich Responses
We use the Marked.js library to render markdown responses beautifully:
if (isUser) {
contentDiv.textContent = content;
} else {
// Parse markdown for bot messages
contentDiv.innerHTML = marked.parse(content);
}
The CSS styles ensure code blocks, lists, and headers all look professional:
.message-content h1,
.message-content h2,
.message-content h3 {
margin: 10px 0;
color: #1f2937;
}
.message-content ul,
.message-content ol {
margin: 10px 0;
padding-left: 20px;
}
.message-content code {
background: #f1f5f9;
padding: 2px 6px;
border-radius: 4px;
font-family: "Monaco", "Consolas", monospace;
font-size: 0.9em;
}
.message-content pre {
background: #f1f5f9;
padding: 15px;
border-radius: 8px;
overflow-x: auto;
margin: 10px 0;
}
Ok, so now our system is ready! Want to see this system in action? Check out the complete code repository at https://github.com/SundrymindTech/Spring-AI-RAG.
Part 5: Testing and Troubleshooting
Testing Your Spring AI RAG Application
Testing a Spring AI RAG system is like debugging a Rube Goldberg machine – there are so many moving parts that can go wrong! Let me share my testing strategy that’s saved me countless hours of debugging.
Unit Testing Strategies
Let’s start with unit tests for our core components:
@ExtendWith(MockitoExtension.class)
class RetrievalServiceTest {
@Mock
private VectorStore vectorStore;
@InjectMocks
private RetrievalService retrievalService;
@Test
void testRetrieveContext_WithValidQuery() {
// Given
String query = "What are the best PC games?";
List<Document> expectedDocs = Arrays.asList(
new Document("Baldur's Gate 3 is amazing", Map.of("source", "pcg.pdf"))
);
when(vectorStore.similaritySearch(any(SearchRequest.class)))
.thenReturn(expectedDocs);
// When
List<Document> result = retrievalService.retrieveContext(query, null);
// Then
assertThat(result).isNotEmpty();
assertThat(result).hasSize(1);
assertThat(result.get(0).getFormattedContent()).contains("Baldur's Gate 3");
}
@Test
void testRetrieveContext_WithSourceFilter() {
// Given
String query = "Canada information";
String sourceFilter = "Canada_Info.docx";
// When
retrievalService.retrieveContext(query, sourceFilter);
// Then
ArgumentCaptor<SearchRequest> captor = ArgumentCaptor.forClass(SearchRequest.class);
verify(vectorStore).similaritySearch(captor.capture());
SearchRequest request = captor.getValue();
assertThat(request.getFilterExpression()).contains("Canada_Info.docx");
}
}
Testing the response generation:
@ExtendWith(MockitoExtension.class)
class ResponseServiceTest {
@Mock
private ChatClient chatClient;
@Mock
private ChatClient.CallSpec callSpec;
@Mock
private ChatResponse chatResponse;
@InjectMocks
private ResponseService responseService;
@Test
void testGenerateResponse_WithValidContext() {
// Given
String query = "What's the best RPG?";
List<Document> context = Arrays.asList(
new Document("Baldur's Gate 3 is the best RPG",
Map.of("file_name", "pcg.pdf", "page_number", "1"))
);
when(chatClient.prompt(any(Prompt.class))).thenReturn(callSpec);
when(callSpec.call()).thenReturn(chatResponse);
when(chatResponse.content()).thenReturn("Baldur's Gate 3 is highly recommended");
// When
String result = responseService.generateResponse(query, context);
// Then
assertThat(result).contains("Baldur's Gate 3");
assertThat(result).doesNotContain("❌");
}
@Test
void testGenerateResponse_WithEmptyContext() {
// Given
String query = "What's the best RPG?";
List<Document> emptyContext = Arrays.asList();
// When
String result = responseService.generateResponse(query, emptyContext);
// Then
assertThat(result).contains("❌");
assertThat(result).contains("don't have any relevant documents");
}
}
Testing from the real spring ai rag ui
Now for the exciting part – real integration testing! We’ll fire up the full application in a browser and interact with our chatbot interface, sending actual queries that will:
- Process through our Spring backend
- Query the live database
- Generate AI-powered responses
This end-to-end test validates that all components work together seamlessly, from the UI down to the vector store. Watch as your RAG system comes to life, handling real user interactions exactly as it would in production!
Once you download the full code base and setup in your favorite IDE and if the application launches well, documents are ingested correctly you will be able to see something like below:
Once the application spins up, you can access the chatbot in http://localhost:8080/ and then fire queries about the documents you feed it. In my codebase I fed it two document, one pdf related to best PC games and the other one is about general information about Canada and below are how it responded:
Notice how it shows the information sources as well:
With the hard part behind us, it’s time to geek out on some advanced concepts!
Performance Benchmarking
Here’s how I benchmark my Spring AI RAG performance:
@Component
public class RagPerformanceBenchmark {
private final RagController ragController;
private final MeterRegistry meterRegistry;
@EventListener(ApplicationReadyEvent.class)
public void runBenchmarks() {
benchmarkQueryPerformance();
benchmarkConcurrentQueries();
}
private void benchmarkQueryPerformance() {
List<String> testQueries = Arrays.asList(
"What are the best PC games?",
"Tell me about Canada",
"What is Spring AI RAG?"
);
for (String query : testQueries) {
Timer.Sample sample = Timer.start(meterRegistry);
try {
ChatRequest request = new ChatRequest(query);
ragController.queryRAG(request);
} finally {
sample.stop(Timer.builder("rag.benchmark")
.tag("query", query)
.register(meterRegistry));
}
}
}
private void benchmarkConcurrentQueries() {
int numberOfThreads = 10;
int queriesPerThread = 5;
ExecutorService executor = Executors.newFixedThreadPool(numberOfThreads);
List<Future<Long>> futures = new ArrayList<>();
for (int i = 0; i < numberOfThreads; i++) {
futures.add(executor.submit(() -> {
long totalTime = 0;
for (int j = 0; j < queriesPerThread; j++) {
long start = System.currentTimeMillis();
ChatRequest request = new ChatRequest("Test query " + j);
ragController.queryRAG(request);
totalTime += System.currentTimeMillis() - start;
}
return totalTime;
}));
}
// Collect results
futures.forEach(future -> {
try {
Long result = future.get();
log.info("Thread completed in {} ms", result);
} catch (Exception e) {
log.error("Benchmark thread failed", e);
}
});
executor.shutdown();
}
}
Common Pitfalls and Solutions
Let me share the most common mistakes I’ve seen (and made myself):
- Forgetting to initialize the vector store schema
// DON'T DO THIS - Will fail silently
@Bean
public VectorStore vectorStore() {
return new PgVectorStore(jdbcTemplate, embeddingModel);
}
// DO THIS - Proper initialization
@Bean
public VectorStore vectorStore() {
return new PgVectorStore.Builder(jdbcTemplate, embeddingModel)
.withSchemaName("public")
.withTableName("vector_store")
.withInitializeSchema(true)
.build();
}
- Not handling empty search results gracefully
// Bad - Will throw NullPointerException
public String generateResponse(String query, List<Document> context) {
String formattedContext = context.stream()
.map(Document::getFormattedContent)
.collect(Collectors.joining("n"));
// ... rest of method
}
// Good - Always check for null/empty
public String generateResponse(String query, List<Document> context) {
if (context == null || context.isEmpty()) {
return "❌ I don't have any relevant documents to answer your question.";
}
// ... rest of method
}
Spring AI RAG Troubleshooting Guide
Time for the troubleshooting section – aka “What to do when everything breaks at 3 AM.”
Vector Store Connection Issues
Problem: Can’t connect to PostgreSQL/PGVector
org.postgresql.util.PSQLException: Connection to localhost:5432 refused
Solution:
@Configuration
public class DatabaseHealthConfig {
@Bean
@Primary
public DataSource dataSource() {
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://localhost:5432/vectordb");
config.setUsername("postgres");
config.setPassword("password");
config.setMaximumPoolSize(10);
config.setConnectionTestQuery("SELECT 1");
config.setConnectionTimeout(30000);
config.setIdleTimeout(600000);
config.setMaxLifetime(1800000);
return new HikariDataSource(config);
}
@EventListener(ApplicationReadyEvent.class)
public void testDatabaseConnection() {
try (Connection conn = dataSource().getConnection()) {
log.info("Database connection successful!");
} catch (SQLException e) {
log.error("Database connection failed: {}", e.getMessage());
throw new RuntimeException("Cannot connect to database", e);
}
}
}
Ollama Model Problems
Problem: Ollama models not loading or responding
org.springframework.web.client.ResourceAccessException: I/O error on POST request for "http://localhost:11434/api/generate"
Solution:
@Component
public class OllamaHealthChecker {
@Value("${spring.ai.ollama.base-url}")
private String ollamaBaseUrl;
@PostConstruct
public void checkOllamaHealth() {
try {
RestTemplate restTemplate = new RestTemplate();
String response = restTemplate.getForObject(
ollamaBaseUrl + "/api/tags", String.class);
log.info("Ollama is running, available models: {}", response);
} catch (Exception e) {
log.error("Ollama health check failed. Is Ollama running?", e);
pullRequiredModels();
}
}
private void pullRequiredModels() {
try {
Runtime.getRuntime().exec("ollama pull llama3.2");
Runtime.getRuntime().exec("ollama pull nomic-embed-text");
log.info("Required models pulled successfully");
} catch (Exception e) {
log.error("Failed to pull required models", e);
}
}
}
Search Quality Improvements
Problem: Poor search results, irrelevant context
Solution: Enhanced hybrid search with better relevance scoring:
@Service
public class ImprovedHybridSearchService {
public List<Document> enhancedHybridSearch(String query) {
// 1. Pre-process query for better matching
String processedQuery = preprocessQuery(query);
// 2. Multi-strategy search
List<Document> vectorResults = performVectorSearch(processedQuery);
List<Document> keywordResults = performKeywordSearch(processedQuery);
List<Document> semanticResults = performSemanticSearch(processedQuery);
// 3. Intelligent fusion with weighted scoring
return fuseResultsWithWeights(vectorResults, keywordResults, semanticResults);
}
private String preprocessQuery(String query) {
// Remove stop words, normalize, expand abbreviations
return query.toLowerCase()
.replaceAll("\b(what|how|when|where|why|is|are|the|a|an)\b", "")
.replaceAll("\s+", " ")
.trim();
}
private List<Document> fuseResultsWithWeights(
List<Document> vectorResults,
List<Document> keywordResults,
List<Document> semanticResults) {
Map<String, ScoredDocument> scoredDocs = new HashMap<>();
// Weight vector results highly (0.5)
addWeightedResults(scoredDocs, vectorResults, 0.5);
// Weight keyword results moderately (0.3)
addWeightedResults(scoredDocs, keywordResults, 0.3);
// Weight semantic results lightly (0.2)
addWeightedResults(scoredDocs, semanticResults, 0.2);
return scoredDocs.values().stream()
.sorted(Comparator.comparing(ScoredDocument::getScore).reversed())
.limit(5)
.map(ScoredDocument::getDocument)
.collect(Collectors.toList());
}
}
Performance Bottlenecks
Problem: Slow response times, high memory usage
Solution: Comprehensive performance optimization:
@Component
public class RagPerformanceOptimizer {
@Autowired
private ApplicationContext applicationContext;
@Scheduled(fixedRate = 60000) // Every minute
public void optimizePerformance() {
monitorMemoryUsage();
optimizeVectorQueries();
cleanupExpiredCache();
}
private void monitorMemoryUsage() {
Runtime runtime = Runtime.getRuntime();
long totalMemory = runtime.totalMemory();
long freeMemory = runtime.freeMemory();
long usedMemory = totalMemory - freeMemory;
double memoryUsagePercent = (double) usedMemory / totalMemory * 100;
if (memoryUsagePercent > 80) {
log.warn("High memory usage detected: {}%", memoryUsagePercent);
// Trigger garbage collection
System.gc();
// Clear non-essential caches
clearNonEssentialCaches();
}
}
private void optimizeVectorQueries() {
// Analyze query patterns and optimize accordingly
QueryAnalyzer analyzer = applicationContext.getBean(QueryAnalyzer.class);
Map<String, Integer> queryPatterns = analyzer.getQueryPatterns();
// Pre-warm cache for common queries
queryPatterns.entrySet().stream()
.filter(entry -> entry.getValue() > 10) // Queries asked more than 10 times
.forEach(entry -> preWarmCache(entry.getKey()));
}
private void preWarmCache(String commonQuery) {
try {
// Asynchronously warm up cache
CompletableFuture.runAsync(() -> {
hybridSearchService.hybridSearch(commonQuery);
});
} catch (Exception e) {
log.debug("Cache pre-warming failed for query: {}", commonQuery);
}
}
}
That’s a wrap on our advanced Spring AI RAG features and troubleshooting guide! With these tools and techniques, you’ll be able to build, optimize, and maintain a production-ready Spring AI RAG system that can handle real-world traffic and complexity.
Remember, building a great RAG system is like cooking a perfect meal – it takes the right ingredients, proper technique, and a lot of patience. But once you get it right, it’s absolutely delicious! 🍽️
Happy coding, and may your vectors always be relevant and your search results always satisfying!
In the next part, we’ll dive into advanced topics like performance optimization, deployment strategies, and scaling your Spring AI RAG system to handle thousands of users. Plus, I’ll share some war stories from production deployments that’ll save you from the same mistakes I made!
Part 6: Advanced Spring AI RAG Features
After building our basic Spring AI RAG application, let’s dive into the advanced features that will make your system production-ready. Trust me, this is where the real magic happens!
Optimizing Spring AI RAG Performance
Let me share some hard-earned wisdom from my journey with Spring AI RAG optimization. I’ve made every mistake in the book (and then some), so you don’t have to.
Caching Strategies for Spring AI RAG
First up – caching! Your Spring AI RAG system will be hitting the vector database and LLM frequently. Without proper caching, you’ll be burning through resources faster than a Tesla in ludicrous mode.
Here’s my battle-tested caching configuration:
@Configuration
@EnableCaching
public class RagCacheConfig {
@Bean
public CacheManager cacheManager() {
RedisCacheManager.Builder builder = RedisCacheManager
.RedisCacheManagerBuilder
.fromConnectionFactory(redisConnectionFactory())
.cacheDefaults(cacheConfiguration());
return builder.build();
}
@Bean
public RedisCacheConfiguration cacheConfiguration() {
return RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(30)) // Cache for 30 minutes
.disableCachingNullValues()
.serializeKeysWith(RedisSerializationContext.SerializationPair
.fromSerializer(new StringRedisSerializer()))
.serializeValuesWith(RedisSerializationContext.SerializationPair
.fromSerializer(new GenericJackson2JsonRedisSerializer()));
}
}
Now, let’s enhance our services with smart caching:
@Service
public class CachedRetrievalService {
private final VectorStore vectorStore;
@Cacheable(value = "vectorSearch", key = "#query + '-' + #sourceFilter")
public List<Document> retrieveContext(String query, String sourceFilter) {
SearchRequest.Builder requestBuilder = SearchRequest.builder()
.query(query)
.topK(5)
.similarityThreshold(0.65);
if (sourceFilter != null && !sourceFilter.isEmpty()) {
requestBuilder.filterExpression(
String.format("metadata.source == '%s'", sourceFilter));
}
SearchRequest request = requestBuilder.build();
return vectorStore.similaritySearch(request);
}
@CacheEvict(value = "vectorSearch", allEntries = true)
public void clearSearchCache() {
// Called when new documents are added
}
}
Pro tip: Cache the expensive vector searches, but be careful not to cache LLM responses if you want fresh answers each time!
Database Indexing for Vector Search
Your Spring AI RAG performance lives and dies by your vector database indexes. Here’s how I set up PGVector for optimal performance:
-- Create proper indexes for metadata filtering
CREATE INDEX CONCURRENTLY idx_documents_metadata_source
ON vector_store USING GIN ((metadata->'source'));
-- Index for similarity search optimization
CREATE INDEX CONCURRENTLY idx_documents_embedding_cosine
ON vector_store USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- For better query performance
CREATE INDEX CONCURRENTLY idx_documents_created_at
ON vector_store (created_at DESC);
And here’s the configuration to make it sing:
# Enhanced PGVector Configuration
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
spring.ai.vectorstore.pgvector.dimensions=768
spring.ai.vectorstore.pgvector.m=16
spring.ai.vectorstore.pgvector.ef-construction=200
spring.ai.vectorstore.pgvector.ef-search=100
Memory Management Tips
Nothing kills a Spring AI RAG application faster than memory leaks. Here’s my memory management strategy:
@Component
public class MemoryOptimizedDocumentProcessor {
private final int BATCH_SIZE = 50;
@Async
public CompletableFuture<Void> processDocumentsInBatches(List<Document> documents) {
List<List<Document>> batches = Lists.partition(documents, BATCH_SIZE);
for (List<Document> batch : batches) {
try {
processBatch(batch);
// Force garbage collection between batches
System.gc();
} catch (Exception e) {
log.error("Error processing batch", e);
}
}
return CompletableFuture.completedFuture(null);
}
private void processBatch(List<Document> batch) {
// Process documents in smaller chunks
vectorStore.add(batch);
// Clear references to help GC
batch.clear();
}
}
JVM configuration for optimal Spring AI RAG performance:
# application.properties
spring.jvm.args=-Xmx4g -Xms2g -XX:+UseG1GC -XX:MaxGCPauseMillis=200
Scaling Considerations
When your Spring AI RAG application starts getting serious traffic, here’s how to scale:
@Configuration
public class RagScalingConfig {
@Bean
@ConfigurationProperties("app.rag.scaling")
public RagScalingProperties ragScalingProperties() {
return new RagScalingProperties();
}
@Bean
public TaskExecutor ragTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10);
executor.setMaxPoolSize(50);
executor.setQueueCapacity(100);
executor.setThreadNamePrefix("rag-async-");
executor.initialize();
return executor;
}
}
Production-Ready Spring AI RAG Deployment
Let’s talk about taking your Spring AI RAG system from “it works on my machine” to “it’s ready for the world.”
Environment Configuration
Here’s my production-ready configuration structure:
# application-prod.properties
# Spring Profiles
spring.profiles.active=prod
# Ollama Configuration
spring.ai.ollama.base-url=${OLLAMA_BASE_URL:http://ollama-service:11434}
spring.ai.ollama.chat.options.model=${OLLAMA_CHAT_MODEL:llama3.2}
spring.ai.ollama.chat.options.temperature=${OLLAMA_TEMPERATURE:0.3}
spring.ai.ollama.embedding.options.model=${OLLAMA_EMBEDDING_MODEL:nomic-embed-text}
# Database Configuration
spring.datasource.url=${DB_URL:jdbc:postgresql://postgres-service:5432/vectordb}
spring.datasource.username=${DB_USERNAME:postgres}
spring.datasource.password=${DB_PASSWORD}
# PGVector Configuration
spring.ai.vectorstore.pgvector.index-type=HNSW
spring.ai.vectorstore.pgvector.distance-type=COSINE_DISTANCE
spring.ai.vectorstore.pgvector.dimensions=768
# Application RAG Configuration
app.rag.document-path=${DOCUMENT_PATH:/app/documents}
app.rag.max-documents=${MAX_DOCUMENTS:1000}
app.rag.cache-ttl=${CACHE_TTL:1800}
Monitoring and Logging
Production Spring AI RAG needs observability. Here’s my monitoring setup:
@Component
public class RagMetrics {
private final Counter queryCounter;
private final Timer responseTimer;
private final Gauge documentCount;
public RagMetrics(MeterRegistry meterRegistry) {
this.queryCounter = Counter.builder("rag.queries.total")
.description("Total RAG queries")
.register(meterRegistry);
this.responseTimer = Timer.builder("rag.response.time")
.description("RAG response time")
.register(meterRegistry);
this.documentCount = Gauge.builder("rag.documents.count")
.description("Number of documents in vector store")
.register(meterRegistry, this, RagMetrics::getDocumentCount);
}
public void recordQuery() {
queryCounter.increment();
}
public Timer.Sample startResponseTimer() {
return Timer.start(responseTimer);
}
private double getDocumentCount() {
// Return actual document count from vector store
return vectorStore.count();
}
}
Enhanced logging configuration:
@Aspect
@Component
public class RagLoggingAspect {
private static final Logger logger = LoggerFactory.getLogger(RagLoggingAspect.class);
@Around("@annotation(org.springframework.web.bind.annotation.PostMapping)")
public Object logRagRequests(ProceedingJoinPoint joinPoint) throws Throwable {
String methodName = joinPoint.getSignature().getName();
Object[] args = joinPoint.getArgs();
logger.info("RAG Request: {} with args: {}", methodName, Arrays.toString(args));
long startTime = System.currentTimeMillis();
try {
Object result = joinPoint.proceed();
long executionTime = System.currentTimeMillis() - startTime;
logger.info("RAG Response: {} completed in {} ms", methodName, executionTime);
return result;
} catch (Exception e) {
logger.error("RAG Error: {} failed with exception: {}", methodName, e.getMessage());
throw e;
}
}
}
Security Best Practices
Security in Spring AI RAG is non-negotiable. Here’s my fortress-like approach:
@Configuration
@EnableWebSecurity
public class RagSecurityConfig {
@Bean
public SecurityFilterChain filterChain(HttpSecurity http) throws Exception {
http
.csrf(csrf -> csrf.disable())
.authorizeHttpRequests(auth -> auth
.requestMatchers("/api/rag/**").authenticated()
.requestMatchers("/actuator/health").permitAll()
.anyRequest().authenticated()
)
.oauth2ResourceServer(oauth2 -> oauth2
.jwt(jwt -> jwt.jwtDecoder(jwtDecoder()))
);
return http.build();
}
@Bean
public JwtDecoder jwtDecoder() {
return NimbusJwtDecoder.withJwkSetUri("https://your-auth-server/.well-known/jwks.json")
.build();
}
}
Input validation and sanitization:
@RestController
@RequestMapping("/api/rag")
@Validated
public class SecureRagController {
@PostMapping("/query")
public ResponseEntity<ChatResponse> secureQuery(
@Valid @RequestBody ChatRequest request,
Authentication authentication) {
// Sanitize input
String sanitizedQuery = sanitizeInput(request.getQuery());
// Rate limiting
if (!rateLimitService.isAllowed(authentication.getName())) {
return ResponseEntity.status(429).build();
}
// Proceed with query
return processQuery(sanitizedQuery);
}
private String sanitizeInput(String input) {
// Remove potential injection attempts
return input.replaceAll("[<>"']", "")
.trim()
.substring(0, Math.min(input.length(), 1000));
}
}
Docker Containerization
Let’s containerize our Spring AI RAG application properly:
FROM openjdk:17-jdk-slim
WORKDIR /app
COPY target/spring-ai-rag-*.jar app.jar
# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
RUN chown -R appuser:appuser /app
USER appuser
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3
CMD curl -f http://localhost:8080/actuator/health || exit 1
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]
And the docker-compose for the full stack:
version: "3.8"
services:
postgres:
image: pgvector/pgvector:pg16
environment:
POSTGRES_DB: vectordb
POSTGRES_USER: postgres
POSTGRES_PASSWORD: password
ports:
- "5432:5432"
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 30s
timeout: 10s
retries: 5
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 30s
timeout: 10s
retries: 5
spring-ai-rag:
build: .
ports:
- "8080:8080"
depends_on:
postgres:
condition: service_healthy
ollama:
condition: service_healthy
environment:
- DB_URL=jdbc:postgresql://postgres:5432/vectordb
- DB_USERNAME=postgres
- DB_PASSWORD=password
- OLLAMA_BASE_URL=http://ollama:11434
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/actuator/health"]
interval: 30s
timeout: 10s
retries: 5
volumes:
postgres_data:
ollama_data:
Part 7: Real-World Examples and Extensions
Extending Your Spring AI RAG Application
Now that we’ve built our Spring AI RAG foundation, let’s talk about taking it to the next level. Because let’s be honest – a basic RAG system is like a bicycle with training wheels. It gets you moving, but eventually, you’ll want to do some serious cycling!
Multi-Tenant Architecture for Spring AI RAG
One of the first questions I get is: “How do I make this work for multiple customers?” Well, buckle up, because multi-tenancy in Spring AI RAG is where things get spicy! 🌶️
Here’s how I typically handle tenant isolation in my Spring AI RAG applications:
@Service
public class TenantAwareRetrievalService {
private final VectorStore vectorStore;
private final TenantContextHolder tenantContext;
public TenantAwareRetrievalService(VectorStore vectorStore,
TenantContextHolder tenantContext) {
this.vectorStore = vectorStore;
this.tenantContext = tenantContext;
}
public List<Document> retrieveContext(String query) {
String tenantId = tenantContext.getCurrentTenant();
SearchRequest request = SearchRequest.builder()
.query(query)
.topK(5)
.similarityThreshold(0.65)
.filterExpression(String.format("metadata.tenant_id == '%s'", tenantId))
.build();
return vectorStore.similaritySearch(request);
}
}
The magic happens in that filterExpression
. By tagging each document with a tenant_id
during ingestion, we ensure data isolation. It’s like having separate filing cabinets for each customer – nobody gets to peek into someone else’s documents!
Custom Document Processors
Sometimes the built-in document readers just don’t cut it. Maybe you’re dealing with proprietary formats, or you need special preprocessing. Here’s how I extend the DocumentService
for custom scenarios:
@Service
public class EnhancedDocumentService extends DocumentService {
private final TextSplitter documentSplitter;
private final VectorStore vectorStore;
private final CustomMetadataExtractor metadataExtractor;
public void ingestCustomDocument(Resource resource, Map<String, Object> customMetadata)
throws Exception {
DocumentReader reader = createCustomReader(resource);
List<Document> documents = reader.get();
// Enhance with custom metadata
documents.forEach(doc -> {
Map<String, Object> enhancedMetadata = new HashMap<>(doc.getMetadata());
enhancedMetadata.putAll(customMetadata);
enhancedMetadata.put("processed_date", LocalDateTime.now().toString());
enhancedMetadata.put("content_type", detectContentType(doc));
doc.getMetadata().putAll(enhancedMetadata);
});
List<Document> chunks = documentSplitter.apply(documents);
vectorStore.add(chunks);
}
private DocumentReader createCustomReader(Resource resource) {
String extension = getFileExtension(resource.getFilename());
return switch (extension) {
case ".json" -> new JsonDocumentReader(resource);
case ".csv" -> new CsvDocumentReader(resource);
case ".xml" -> new XmlDocumentReader(resource);
default -> throw new UnsupportedOperationException(
"Custom reader not available for: " + extension);
};
}
}
Pro tip: Always think about metadata! It’s like seasoning in cooking – the right amount makes everything better, but too much ruins the dish.
Advanced Filtering Capabilities
Let’s supercharge our filtering game. Sometimes you need more than just tenant isolation – maybe you want to filter by document type, date ranges, or content categories:
@Service
public class AdvancedSearchService {
private final VectorStore vectorStore;
public List<Document> advancedSearch(SearchCriteria criteria) {
StringBuilder filterBuilder = new StringBuilder();
List<String> conditions = new ArrayList<>();
// Tenant isolation (always include this!)
if (criteria.getTenantId() != null) {
conditions.add(String.format("metadata.tenant_id == '%s'", criteria.getTenantId()));
}
// Document type filtering
if (criteria.getDocumentTypes() != null && !criteria.getDocumentTypes().isEmpty()) {
String typeFilter = criteria.getDocumentTypes().stream()
.map(type -> String.format("metadata.document_type == '%s'", type))
.collect(Collectors.joining(" || "));
conditions.add("(" + typeFilter + ")");
}
// Date range filtering
if (criteria.getFromDate() != null) {
conditions.add(String.format("metadata.created_date >= '%s'",
criteria.getFromDate().toString()));
}
// Category filtering with OR logic
if (criteria.getCategories() != null && !criteria.getCategories().isEmpty()) {
String categoryFilter = criteria.getCategories().stream()
.map(cat -> String.format("metadata.category == '%s'", cat))
.collect(Collectors.joining(" || "));
conditions.add("(" + categoryFilter + ")");
}
String finalFilter = String.join(" && ", conditions);
SearchRequest request = SearchRequest.builder()
.query(criteria.getQuery())
.topK(criteria.getLimit())
.similarityThreshold(criteria.getSimilarityThreshold())
.filterExpression(finalFilter.isEmpty() ? null : finalFilter)
.build();
return vectorStore.similaritySearch(request);
}
}
Integration with External APIs
Real-world Spring AI RAG applications rarely live in isolation. Here’s how I integrate with external services to enrich the retrieval process:
@Service
public class EnrichedRetrievalService {
private final HybridSearchService hybridSearchService;
private final ExternalKnowledgeService externalKnowledgeService;
private final CacheManager cacheManager;
@Cacheable(value = "enriched-search", key = "#query")
public EnrichedSearchResult enrichedSearch(String query) {
// 1. Get documents from our local RAG
List<Document> localResults = hybridSearchService.hybridSearch(query);
// 2. Enrich with external data if needed
List<ExternalKnowledgeItem> externalData = Collections.emptyList();
if (shouldFetchExternalData(query, localResults)) {
externalData = externalKnowledgeService.searchExternal(query);
}
// 3. Combine and rank results
return new EnrichedSearchResult(localResults, externalData,
calculateConfidenceScore(localResults));
}
private boolean shouldFetchExternalData(String query, List<Document> localResults) {
// Only fetch external data if local results are insufficient
return localResults.isEmpty() ||
localResults.stream()
.noneMatch(doc -> calculateRelevanceScore(doc, query) > 0.8);
}
}
Spring AI RAG Best Practices and Patterns
After building dozens of Spring AI RAG applications, I’ve learned some hard lessons. Let me save you from making the same mistakes I did!
Code Organization Strategies
Here’s my go-to project structure for Spring AI RAG applications:
src/main/java/com/yourcompany/rag/
├── config/ # Configuration classes
│ ├── DocumentConfig.java
│ ├── VectorStoreConfig.java
│ └── SecurityConfig.java
├── controller/ # REST endpoints
│ ├── RagController.java
│ └── AdminController.java
├── service/ # Business logic
│ ├── core/ # Core RAG services
│ │ ├── RetrievalService.java
│ │ ├── ResponseService.java
│ │ └── HybridSearchService.java
│ ├── document/ # Document processing
│ │ ├── DocumentService.java
│ │ ├── DocumentProcessor.java
│ │ └── MetadataExtractor.java
│ └── integration/ # External integrations
│ ├── ExternalApiService.java
│ └── CacheService.java
├── domain/ # Domain models and DTOs
│ ├── SearchCriteria.java
│ ├── SearchResult.java
│ └── DocumentMetadata.java
└── repository/ # Data access (if needed)
└── DocumentRepository.java
Configuration Management
I learned the hard way that configuration management can make or break your Spring AI RAG application. Here’s my battle-tested approach:
@ConfigurationProperties(prefix = "app.rag")
@Configuration
public class RagProperties {
private Search search = new Search();
private Processing processing = new Processing();
private Security security = new Security();
public static class Search {
private int topK = 5;
private double similarityThreshold = 0.65;
private int maxResults = 20;
private boolean hybridSearchEnabled = true;
// getters and setters...
}
public static class Processing {
private int chunkSize = 512;
private int chunkOverlap = 128;
private boolean preserveParagraphs = true;
private List<String> supportedFormats = List.of("pdf", "docx", "txt", "md");
// getters and setters...
}
public static class Security {
private boolean multiTenantEnabled = true;
private String defaultTenant = "default";
private boolean auditingEnabled = true;
// getters and setters...
}
}
And in your application.properties
:
# RAG Search Configuration
app.rag.search.top-k=5
app.rag.search.similarity-threshold=0.65
app.rag.search.max-results=20
app.rag.search.hybrid-search-enabled=true
# RAG Processing Configuration
app.rag.processing.chunk-size=512
app.rag.processing.chunk-overlap=128
app.rag.processing.preserve-paragraphs=true
app.rag.processing.supported-formats=pdf,docx,txt,md
# RAG Security Configuration
app.rag.security.multi-tenant-enabled=true
app.rag.security.default-tenant=default
app.rag.security.auditing-enabled=true
Error Handling Patterns
Nothing ruins a user experience like cryptic error messages. Here’s how I handle errors in my Spring AI RAG applications:
@ControllerAdvice
public class RagExceptionHandler {
private static final Logger logger = LoggerFactory.getLogger(RagExceptionHandler.class);
@ExceptionHandler(DocumentProcessingException.class)
public ResponseEntity<ErrorResponse> handleDocumentProcessing(
DocumentProcessingException ex) {
logger.warn("Document processing failed: {}", ex.getMessage());
return ResponseEntity.status(HttpStatus.BAD_REQUEST)
.body(new ErrorResponse(
"DOCUMENT_PROCESSING_ERROR",
"Unable to process the document. Please check the format and try again.",
ex.getDocumentName()
));
}
@ExceptionHandler(SearchException.class)
public ResponseEntity<ErrorResponse> handleSearchError(SearchException ex) {
logger.error("Search operation failed", ex);
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(new ErrorResponse(
"SEARCH_ERROR",
"Something went wrong with the search. Our team has been notified.",
null
));
}
@ExceptionHandler(VectorStoreException.class)
public ResponseEntity<ErrorResponse> handleVectorStore(VectorStoreException ex) {
logger.error("Vector store operation failed", ex);
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE)
.body(new ErrorResponse(
"SERVICE_UNAVAILABLE",
"The search service is temporarily unavailable. Please try again later.",
null
));
}
}
Monitoring and Observability
You can’t improve what you don’t measure! Here’s my monitoring setup for Spring AI RAG:
@Component
public class RagMetricsCollector {
private final MeterRegistry meterRegistry;
private final Timer searchTimer;
private final Counter successfulQueries;
private final Counter failedQueries;
private final Gauge vectorStoreSize;
public RagMetricsCollector(MeterRegistry meterRegistry, VectorStore vectorStore) {
this.meterRegistry = meterRegistry;
this.searchTimer = Timer.builder("rag.search.duration")
.description("Time taken for RAG search operations")
.register(meterRegistry);
this.successfulQueries = Counter.builder("rag.queries.successful")
.description("Number of successful queries")
.register(meterRegistry);
this.failedQueries = Counter.builder("rag.queries.failed")
.description("Number of failed queries")
.register(meterRegistry);
this.vectorStoreSize = Gauge.builder("rag.vectorstore.size")
.description("Number of documents in vector store")
.register(meterRegistry, vectorStore, this::getVectorStoreSize);
}
public void recordSearchTime(Duration duration) {
searchTimer.record(duration);
}
public void recordSuccessfulQuery() {
successfulQueries.increment();
}
public void recordFailedQuery(String errorType) {
failedQueries.increment(Tags.of("error.type", errorType));
}
private double getVectorStoreSize(VectorStore vectorStore) {
// Implementation depends on your vector store
// This is a placeholder
return 0.0;
}
}
Conclusion
Whew! What a journey we’ve been on together! 🎉
What We’ve Accomplished
Let me take a moment to appreciate what we’ve built here. We started with a simple question: “How can I build a Spring AI RAG application that actually works in production?” And look where we ended up!
We’ve created:
- A robust document ingestion pipeline that can handle multiple formats
- A sophisticated hybrid search system that combines vector similarity with keyword matching
- A responsive web interface that users actually want to use
Task for you to enhance it further based on snippets shared above:
- A scalable architecture that can grow with your needs
- Enable caching
- Monitoring and error handling that keeps you sane at 3 AM
But more importantly, we’ve built something that solves real problems. Your users can now ask questions in natural language and get accurate, contextual answers from your documents. That’s pretty magical when you think about it!
Next Steps for Your Spring AI RAG Journey
This is just the beginning! Here are some exciting directions you can take your Spring AI RAG application:
Immediate Enhancements:
- Add user authentication and authorization
- Implement conversation memory to handle follow-up questions
- Create a document management interface for easy uploads
- Add support for real-time document updates
Advanced Features:
- Multi-modal RAG (images, videos, audio)
- Integration with enterprise systems (SharePoint, Confluence, etc.)
- Custom fine-tuning for domain-specific language
- Advanced analytics and user behavior tracking
Production Readiness:
- Implement comprehensive logging and monitoring
- Set up automated testing and CI/CD pipelines
- Add rate limiting and security headers
- Create backup and disaster recovery procedures
Resources for Continued Learning
The Spring AI ecosystem is evolving rapidly, and staying current is crucial. Here are my go-to resources:
Official Documentation:
Community Resources:
- Spring AI GitHub repository (watch it for updates!)
- Stack Overflow (tag: spring-ai)
- Spring Community Forums
My Personal Recommendations:
- Follow the Spring team on Twitter for announcements
- Join RAG-focused Discord servers and Slack channels
- Experiment with different embedding models and vector stores
Call to Action
If this tutorial helped you build your first Spring AI RAG application, I’d love to hear about it!
🌟 Star the GitHub repository: https://github.com/SundrymindTech/Spring-AI-RAG
💌 Subscribe to new blog notification for more Spring AI tutorials, tips, and real-world case studies. I promise no spam – just good content when I have something valuable to share.
And remember – building great software is a team sport. Don’t be afraid to:
- Ask questions in the comments
- Share your own implementations and improvements
- Contribute back to the open-source community
The future of AI-powered applications is bright, and with Spring AI RAG in your toolkit, you’re well-equipped to be part of that future.
Now go forth and build amazing things! 🚀
P.S. If you run into any issues with the code, check out the troubleshooting guide here. And if that doesn’t help, don’t hesitate to reach out. We’re all in this together!