Introduction
Building AI-powered applications has become increasingly accessible with the rise of large language models (LLMs) and frameworks like Spring AI. While we’ve covered the strategic benefits and enterprise features of Spring AI 1.0 in our comprehensive guide, this Spring AI tutorial focuses on practical implementation—showing you exactly how to build your first AI application with Spring Boot and Ollama integration.
This step-by-step guide will walk you through creating a complete local AI chat application with memory using Spring AI from scratch, with detailed explanations of every component, configuration setup, and common troubleshooting scenarios. Unlike cloud-based AI services, using Ollama allows you to run AI models locally, giving you full control over your data and eliminating API costs.
By the end of this tutorial, you’ll have a working chat application that can engage in conversations, maintain context, and handle errors gracefully—all running entirely on your local machine.
Table of Contents
Understanding the Basics: LLMs and Ollama
What are Large Language Models (LLMs)?
Large Language Models are AI systems trained on vast amounts of text data to understand and generate human-like text. They work by:
- Predicting the next word: Given a sequence of words, they predict what comes next
- Understanding context: They can maintain context across conversations
- Following instructions: They can perform tasks based on natural language instructions
- Generating coherent responses: They produce human-like text that’s contextually relevant
Popular LLMs include:
- GPT models (OpenAI): General-purpose models great for conversation
- Llama models (Meta): Open-source alternatives with good performance
- Phi models (Microsoft): Smaller, efficient models for lightweight applications
What is Ollama?
Ollama is a tool that makes it easy to run LLMs locally on your machine. Here’s why it’s valuable:
Why Ollama?
Ollama lets you run LLMs (like Llama2) locally—avoiding cloud costs, API limits, and privacy risks.
- Privacy: Your data never leaves your machine
- Cost: Completely free to use
- Speed: No network latency for API calls
- Offline capability: Works without internet connection
How Ollama Works:
- Downloads pre-trained models to your local machine
- Provides a REST API interface to interact with models
- Handles model loading, memory management, and optimization
- Supports multiple models simultaneously
Popular Ollama Models:
- llama3.2 (3B parameters): Good balance of speed and quality
- phi3:mini (3.8B parameters): Faster, smaller model
- mistral (7B parameters): Excellent for coding tasks
- codellama (7B parameters): Specialized for code generation
Spring AI Framework
Spring AI is Spring’s framework for building AI-powered applications. It provides:
- Consistent API: Familiar Spring patterns for AI integration
- Multiple Provider Support: Works with OpenAI, Ollama, Hugging Face, etc.
- Chat Memory: Maintains conversation context
- Prompt Templates: Reusable prompt structures
- Auto-configuration: Spring Boot integration with minimal setup
Ollama vs. OpenAI: Which Should You Use?
Ollama + Spring AI vs. OpenAI API: Key Differences
Feature | Ollama + Spring AI | OpenAI API (e.g., GPT-4) |
---|---|---|
Cost | Free (runs locally) | Pay-per-use (~$0.01–$0.06 per 1K tokens) |
Privacy | Fully offline—no data leaves your machine | Requests sent to cloud servers |
Customization | Use any open-source model (Llama 2, Mistral) | Limited to OpenAI’s models |
Setup Complexity | Requires local setup (Ollama + Spring AI) | Just an API key |
Latency | Depends on your hardware (slower on CPU) | Fast (cloud GPUs) |
Best For | Privacy-sensitive apps, offline use, custom models | Quick prototyping, production apps needing scale |
Build Spring AI Chatbot with Memory
This tutorial creates a complete AI chat application with memory:
- REST API endpoints for programmatic access
- Web interface for interactive testing
- Conversation memory to maintain context
- Error handling for production readiness
- Health monitoring for system status
- Configurable AI models for different use cases
Prerequisites and Setup Requirements
System Requirements
Before starting, ensure you have:
- Java 17 or higher: Required for Spring Boot 3.x
- Maven 3.6+ or Gradle 7.0+: For dependency management
- IDE: IntelliJ IDEA, Eclipse, or VS Code
- Basic Spring Boot knowledge: Understanding of controllers, services, and configuration
Hardware Requirements for Ollama
- RAM: At least 8GB (16GB recommended for larger models)
- Disk Space: 10GB+ for model storage
- CPU: Modern multi-core processor (M1/M2 Macs work great)
Step 1: Setting Up Your Spring Boot Ollama Integration
Creating the Project
Visit start.spring.io
Configure your project:
- Project: Maven
- Language: Java
- Spring Boot: 3.2.0 or higher
- Packaging: Jar
- Java: 17
- Group: com.sundrymind
- Artifact: spring-ai-tutorial
- Name: Spring AI Tutorial
- Package name: com.sundrymind.springaitutorial
Add dependencies:
- Spring Web
- Spring Boot DevTools
- Thymeleaf
Understanding the Dependencies
Let’s examine each dependency in your pom.xml
:
<properties>
<java.version>17</java.version>
<spring-ai.version>1.0.0</spring-ai.version>
</properties>
<dependencies>
<!-- Core Spring Boot Dependencies -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<!-- Provides REST API capabilities, embedded Tomcat server -->
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-thymeleaf</artifactId>
<!-- Template engine for creating web pages -->
</dependency>
<!-- Spring AI Dependencies -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
<version>${spring-ai.version}</version>
<!-- Auto-configures Ollama client and ChatClient beans -->
</dependency>
<!-- Optional: Alternative AI providers -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
<version>${spring-ai.version}</version>
<!-- Enables OpenAI/Groq integration -->
</dependency>
</dependencies>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-bom</artifactId>
<version>${spring-ai.version}</version>
<type>pom</type>
<scope>import</scope>
<!-- Manages versions of all Spring AI components -->
</dependency>
</dependencies>
</dependencyManagement>
Common Setup Issues:
Issue 1: Version Conflicts
Error: Could not resolve dependencies for project
Solution: Ensure Spring Boot version is 3.2.0+ and Spring AI version is compatible.
Issue 2: Missing BOM
Error: Failed to resolve version for org.springframework.ai
Solution: Add the spring-ai-bom
in dependencyManagement
section.
Step 2: Installing and Configuring Ollama
Installation Process
For macOS:
# Using Homebrew (recommended)
brew install ollama
# Alternative: Direct download
curl -fsSL https://ollama.ai/install.sh | sh
For Linux:
curl -fsSL https://ollama.ai/install.sh | sh
For Windows:
Download the installer from ollama.ai and run it.
Starting Ollama Service
# Start the Ollama service (required before downloading models)
ollama serve
Common Ollama Issues:
Issue 1: Port Already in Use
Error: listen tcp 127.0.0.1:11434: bind: address already in use
Solution:
# Kill existing Ollama process
pkill ollama
# Or use different port
OLLAMA_HOST=0.0.0.0:11435 ollama serve
Issue 2: Permission Denied
Error: permission denied while trying to connect to Docker daemon
Solution:
# On Linux, add user to docker group or run as root
sudo ollama serve
Downloading Models
# Download Llama 3.2 (3B parameters - balanced performance)
ollama pull llama3.2
# Alternative: Smaller, faster model for testing
ollama pull phi3:mini
# Verify model download
ollama list
Model Selection Guide:
- phi3:mini (3.8B): Fastest, good for development/testing
- llama3.2 (3B): Best balance of speed and quality
- mistral (7B): Better quality, slower response
- codellama (7B): Specialized for coding tasks
Testing Your Installation
# Test the model interactively
ollama run llama3.2
# Test via API
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Hello, how are you?",
"stream": false
}'
Common Model Issues:
Issue 1: Model Not Found
Error: model 'llama3.2' not found
Solution: Ensure model is downloaded: ollama pull llama3.2
Issue 2: Out of Memory
Error: failed to load model: not enough memory
Solution: Use smaller model (phi3:mini
) or increase system RAM.
With the model successfully installed, let’s now explore how to Use Ollama with Spring AI.
Step 3: Spring Boot Configuration
Basic Configuration
Create SpringBoot main class SpringAIdemoApplication
.java:
package com.sundrymind.springaitutorial.config;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.ComponentScan;
@SpringBootApplication
@ComponentScan(basePackages = {"com.sundrymind.springaitutorial"})
public class SpringAIdemoApplication {
public static void main(String[] args) {
SpringApplication.run(SpringAIdemoApplication.class, args);
}
}
Configuring Ollama Models in Spring AI
Create src/main/resources/application.yml
:
spring:
application:
name: spring-ai-tutorial
# AI Configuration
ai:
ollama:
base-url: http://localhost:11434 # Ollama server URL
chat:
model: llama3.2 # Model name
options:
temperature: 0.7 # Creativity level (0.0-1.0)
top-p: 0.9 # Nucleus sampling parameter
max-tokens: 1000 # Maximum response length
autoconfigure:
exclude:
- org.springframework.ai.autoconfigure.vectorstore.chroma.ChromaVectorStoreAutoConfiguration
# Server Configuration
server:
port: 8080
# Logging Configuration
logging:
level:
com.sundrymind.springaitutorial: DEBUG
org.springframework.ai: INFO
org.springframework.web: INFO
Configuration Parameters Explained
Temperature (0.0 – 1.0):
- 0.0: Deterministic, same response every time
- 0.3: Focused, consistent responses
- 0.7: Balanced creativity and consistency (recommended)
- 1.0: Maximum creativity, more varied responses
Top-p (0.0 – 1.0):
- Controls diversity by limiting token selection
- 0.9: Good balance (recommended)
- Lower values = more focused responses
Max-tokens:
- Maximum number of tokens in response
- 1000: Good for chat applications
- Adjust based on your needs
Environment-Specific Configuration
Development (application-dev.yml
):
spring:
ai:
ollama:
chat:
options:
temperature: 0.8 # More creative for testing
logging:
level:
org.springframework.ai: DEBUG # Detailed logging
Production (application-prod.yml
):
spring:
ai:
ollama:
chat:
options:
temperature: 0.6 # More consistent for production
logging:
level:
org.springframework.ai: WARN # Less verbose logging
Configuration Issues:
Issue 1: Connection Refused
Error: Connection refused: http://localhost:11434
Solution: Ensure Ollama is running: ollama serve
Issue 2: Invalid Model Name
Error: model 'wrong-name' not found
Solution: Check available models: ollama list
Step 4: Creating the AI Service Layer
The service layer handles all AI interactions and business logic.
// src/main/java/com/example/springaitutorial/service/ChatService.java
package com.sundrymind.springaitutorial.service;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.memory.InMemoryChatMemory;
import org.springframework.stereotype.Service;
@Service
public class ChatService {
private static final Logger logger = LoggerFactory.getLogger(ChatService.class);
// The main interface for AI interactions
private final ChatClient chatClient;
/**
* Constructor that configures the ChatClient with default settings
*
* @param chatClientBuilder Auto-injected builder from Spring AI
*/
public ChatService(ChatClient.Builder chatClientBuilder) {
this.chatClient = chatClientBuilder
// Set default system prompt that defines AI behavior
.defaultSystem("You are a friendly AI assistant. " +
"Keep responses concise and helpful. " +
"Be conversational but professional.")
// Add memory advisor for conversation context
.defaultAdvisors(new MessageChatMemoryAdvisor(new InMemoryChatMemory()))
.build();
logger.info("ChatService initialized with Ollama client");
}
/**
* Generates a simple response without conversation context
* Used for stateless interactions
*
* @param userMessage The user's input message
* @return AI-generated response or error message
*/
public String generateResponse(String userMessage) {
// Log the request (truncated for privacy)
logger.debug("Processing message: {}",
userMessage.substring(0, Math.min(50, userMessage.length())));
try {
// Create a prompt and get response
String response = chatClient
.prompt() // Start building a prompt
.user(userMessage) // Add user message
.call() // Make the API call
.content(); // Extract response content
logger.debug("Generated response successfully");
return response;
} catch (Exception e) {
// Log error and return user-friendly message
logger.error("Error generating AI response: {}", e.getMessage(), e);
return "I'm sorry, I'm having trouble processing your request right now. " +
"Please try again in a moment.";
}
}
/**
* Generates response with conversation context
* Maintains conversation history for more coherent interactions
*
* @param userMessage The user's input message
* @param conversationId Unique identifier for this conversation
* @return AI-generated response with conversation context
*/
public String generateResponseWithContext(String userMessage, String conversationId) {
logger.debug("Processing contextual message for conversation: {}", conversationId);
try {
return chatClient
.prompt()
.user(userMessage)
// Configure memory advisor with conversation ID
.advisors(advisorSpec -> advisorSpec
.param(MessageChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY,
conversationId))
.call()
.content();
} catch (Exception e) {
logger.error("Error generating contextual response: {}", e.getMessage(), e);
return "I apologize, but I'm experiencing technical difficulties. " +
"Please try again in a moment.";
}
}
/**
* Health check method to verify AI service is working
* Used by monitoring endpoints
*
* @return true if service is healthy, false otherwise
*/
public boolean isServiceHealthy() {
try {
// Send a simple test message
String testResponse = chatClient
.prompt()
.user("Hello")
.call()
.content();
// Check if we got a valid response
boolean isHealthy = testResponse != null && !testResponse.trim().isEmpty();
logger.debug("Health check result: {}", isHealthy);
return isHealthy;
} catch (Exception e) {
logger.warn("Health check failed: {}", e.getMessage());
return false;
}
}
}
Key Concepts Explained
ChatClient: The main interface for AI interactions. It provides a fluent API for building prompts and getting responses.
MessageChatMemoryAdvisor: Handles conversation memory by storing previous messages and including them as context in new requests.
InMemoryChatMemory: Stores conversation history in application memory (lost when application restarts).
System Prompts: Instructions that define how the AI should behave throughout the conversation.
Conversation ID: A unique identifier that groups related messages together for context.
Step 5: Building the REST API Controller
The controller handles HTTP requests and responses, providing RESTful endpoints for AI interactions.
//src/main/java/com/example/springaitutorial/controller/ChatController.java
package com.sundrymind.springaitutorial.controller;
import com.sundrymind.springaitutorial.service.ChatService;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
import java.util.Map;
import java.util.UUID;
@RestController
@RequestMapping("/api/chat")
@CrossOrigin(origins = "*") // Allow requests from any origin (for development)
public class ChatController {
private static final Logger logger = LoggerFactory.getLogger(ChatController.class);
private final ChatService chatService;
/**
* Constructor injection of ChatService
* Spring automatically provides the ChatService instance
*/
public ChatController(ChatService chatService) {
this.chatService = chatService;
}
/**
* Endpoint for single message interactions (stateless)
* POST /api/chat/message
*
* @param request Contains the user message
* @return ChatResponse with AI reply and new conversation ID
*/
@PostMapping("/message")
public ResponseEntity<ChatResponse> sendMessage(@RequestBody ChatRequest request) {
logger.info("Received message request");
// Validate input
if (request.getMessage() == null || request.getMessage().trim().isEmpty()) {
logger.warn("Empty message received");
return ResponseEntity.badRequest()
.body(new ChatResponse("Message cannot be empty", null));
}
// Validate message length (prevent abuse)
if (request.getMessage().length() > 1000) {
logger.warn("Message too long: {} characters", request.getMessage().length());
return ResponseEntity.badRequest()
.body(new ChatResponse("Message too long. Please limit to 1000 characters.", null));
}
try {
// Generate AI response
String response = chatService.generateResponse(request.getMessage());
// Generate new conversation ID for potential follow-up
String conversationId = UUID.randomUUID().toString();
logger.info("Message processed successfully");
return ResponseEntity.ok(new ChatResponse(response, conversationId));
} catch (Exception e) {
logger.error("Unexpected error processing message", e);
return ResponseEntity.internalServerError()
.body(new ChatResponse("Service temporarily unavailable. Please try again later.", null));
}
}
/**
* Endpoint for continuing an existing conversation (stateful)
* POST /api/chat/conversation/{conversationId}
*
* @param conversationId ID of existing conversation
* @param request Contains the user message
* @return ChatResponse with contextual AI reply
*/
@PostMapping("/conversation/{conversationId}")
public ResponseEntity<ChatResponse> continueConversation(
@PathVariable String conversationId,
@RequestBody ChatRequest request) {
logger.info("Continuing conversation: {}", conversationId);
// Validate conversation ID
if (conversationId == null || conversationId.trim().isEmpty()) {
return ResponseEntity.badRequest()
.body(new ChatResponse("Invalid conversation ID", null));
}
// Validate message
if (request.getMessage() == null || request.getMessage().trim().isEmpty()) {
return ResponseEntity.badRequest()
.body(new ChatResponse("Message cannot be empty", conversationId));
}
try {
// Generate contextual response
String response = chatService.generateResponseWithContext(
request.getMessage(),
conversationId
);
return ResponseEntity.ok(new ChatResponse(response, conversationId));
} catch (Exception e) {
logger.error("Error continuing conversation: {}", conversationId, e);
return ResponseEntity.internalServerError()
.body(new ChatResponse("Unable to continue conversation. Please try again.", conversationId));
}
}
/**
* Health check endpoint
* GET /api/chat/health
*
* @return Service status information
*/
@GetMapping("/health")
public ResponseEntity<Map<String, Object>> healthCheck() {
boolean isHealthy = chatService.isServiceHealthy();
Map<String, Object> status = Map.of(
"status", isHealthy ? "healthy" : "unhealthy",
"service", "AI Chat Service",
"timestamp", System.currentTimeMillis()
);
// Return appropriate HTTP status
return isHealthy ?
ResponseEntity.ok(status) :
ResponseEntity.status(503).body(status);
}
/**
* Get conversation statistics (bonus endpoint)
* GET /api/chat/stats
*/
@GetMapping("/stats")
public ResponseEntity<Map<String, Object>> getStats() {
// In a real application, you'd track these metrics
Map<String, Object> stats = Map.of(
"totalConversations", 0,
"totalMessages", 0,
"averageResponseTime", "0ms",
"uptime", System.currentTimeMillis()
);
return ResponseEntity.ok(stats);
}
// DTO Classes for Request/Response
/**
* Request object for chat messages
*/
public static class ChatRequest {
private String message;
// Default constructor for JSON deserialization
public ChatRequest() {}
public ChatRequest(String message) {
this.message = message;
}
public String getMessage() {
return message;
}
public void setMessage(String message) {
this.message = message;
}
@Override
public String toString() {
return "ChatRequest{message='" +
(message != null ? message.substring(0, Math.min(50, message.length())) : "null") +
"'}";
}
}
/**
* Response object for chat messages
*/
public static class ChatResponse {
private String response;
private String conversationId;
private long timestamp;
public ChatResponse(String response, String conversationId) {
this.response = response;
this.conversationId = conversationId;
this.timestamp = System.currentTimeMillis();
}
// Getters
public String getResponse() { return response; }
public String getConversationId() { return conversationId; }
public long getTimestamp() { return timestamp; }
@Override
public String toString() {
return "ChatResponse{conversationId='" + conversationId +
"', timestamp=" + timestamp +
", responseLength=" + (response != null ? response.length() : 0) + "}";
}
}
}
Controller Concepts Explained
@RestController: Combines @Controller
and @ResponseBody
, automatically serializing return values to JSON.
@RequestMapping: Defines the base URL path for all endpoints in this controller.
@CrossOrigin: Allows requests from different origins (important for web development).
ResponseEntity: Provides fine-grained control over HTTP response status and headers.
@PathVariable: Extracts values from the URL path.
@RequestBody: Automatically deserializes JSON request body to Java objects.
UUID.randomUUID(): Generates unique conversation identifiers.
Common Controller Issues:
Issue 1: CORS Errors
Access to fetch at 'http://localhost:8080/api/chat/message' from origin 'null' has been blocked by CORS policy
Solution: Add @CrossOrigin
annotation or configure CORS globally.
Issue 2: JSON Serialization Errors
Error: Could not read JSON: Unrecognized field "msg"
Solution: Ensure request JSON matches DTO field names exactly.
Step 6: Creating the Web Interface
<!-- src/main/resources/templates/index.html -->
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Spring AI Chat Demo</title>
<style>
/* Modern, responsive styling */
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
"Helvetica Neue", Arial, sans-serif;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
min-height: 100vh;
display: flex;
align-items: center;
justify-content: center;
padding: 20px;
}
.chat-container {
background: rgba(255, 255, 255, 0.95);
backdrop-filter: blur(10px);
border-radius: 20px;
padding: 30px;
box-shadow: 0 20px 40px rgba(0, 0, 0, 0.15);
width: 100%;
max-width: 800px;
height: 80vh;
display: flex;
flex-direction: column;
}
.chat-header {
text-align: center;
margin-bottom: 20px;
padding-bottom: 15px;
border-bottom: 2px solid #f0f0f0;
}
.chat-header h1 {
color: #333;
font-size: 2rem;
margin-bottom: 10px;
}
.status-indicator {
display: inline-block;
width: 10px;
height: 10px;
border-radius: 50%;
background-color: #28a745;
margin-right: 8px;
animation: pulse 2s infinite;
}
@keyframes pulse {
0% {
opacity: 1;
}
50% {
opacity: 0.5;
}
100% {
opacity: 1;
}
}
.chat-messages {
flex: 1;
overflow-y: auto;
padding: 20px;
margin-bottom: 20px;
border: 1px solid #e0e0e0;
border-radius: 15px;
background: #fafafa;
scroll-behavior: smooth;
}
.message {
margin-bottom: 20px;
padding: 15px 20px;
border-radius: 18px;
max-width: 80%;
word-wrap: break-word;
animation: slideIn 0.3s ease-out;
}
@keyframes slideIn {
from {
opacity: 0;
transform: translateY(20px);
}
to {
opacity: 1;
transform: translateY(0);
}
}
.user-message {
background: linear-gradient(135deg, #007bff, #0056b3);
color: white;
margin-left: auto;
border-bottom-right-radius: 5px;
}
.ai-message {
background: linear-gradient(135deg, #f8f9fa, #e9ecef);
color: #333;
margin-right: auto;
border-bottom-left-radius: 5px;
border: 1px solid #dee2e6;
}
.message-time {
font-size: 0.75rem;
opacity: 0.7;
margin-top: 5px;
}
.typing-indicator {
display: flex;
align-items: center;
color: #666;
font-style: italic;
margin-right: auto;
}
.typing-dots {
display: inline-block;
margin-left: 10px;
}
.typing-dots span {
display: inline-block;
width: 8px;
height: 8px;
border-radius: 50%;
background-color: #999;
margin: 0 2px;
animation: typing 1.4s infinite both;
}
.typing-dots span:nth-child(2) {
animation-delay: 0.2s;
}
.typing-dots span:nth-child(3) {
animation-delay: 0.4s;
}
@keyframes typing {
0%,
60%,
100% {
transform: translateY(0);
opacity: 0.5;
}
30% {
transform: translateY(-15px);
opacity: 1;
}
}
.input-container {
display: flex;
gap: 15px;
align-items: flex-end;
}
.input-group {
flex: 1;
position: relative;
}
.input-group textarea {
width: 100%;
padding: 15px 20px;
border: 2px solid #e0e0e0;
border-radius: 25px;
font-size: 16px;
font-family: inherit;
resize: none;
outline: none;
transition: all 0.3s ease;
min-height: 50px;
max-height: 120px;
}
.input-group textarea:focus {
border-color: #007bff;
box-shadow: 0 0 0 3px rgba(0, 123, 255, 0.1);
}
.send-button {
padding: 15px 25px;
background: linear-gradient(135deg, #007bff, #0056b3);
color: white;
border: none;
border-radius: 25px;
cursor: pointer;
font-size: 16px;
font-weight: 600;
transition: all 0.3s ease;
white-space: nowrap;
}
.send-button:hover:not(:disabled) {
background: linear-gradient(135deg, #0056b3, #004085);
transform: translateY(-2px);
box-shadow: 0 5px 15px rgba(0, 123, 255, 0.3);
}
.send-button:disabled {
opacity: 0.6;
cursor: not-allowed;
transform: none;
}
.error-message {
background: #f8d7da;
color: #721c24;
border: 1px solid #f5c6cb;
padding: 10px 15px;
border-radius: 8px;
margin-bottom: 15px;
text-align: center;
}
/* Responsive design */
@media (max-width: 768px) {
.chat-container {
height: 90vh;
padding: 20px;
margin: 10px;
}
.message {
max-width: 90%;
}
.input-container {
flex-direction: column;
gap: 10px;
}
.send-button {
width: 100%;
}
}
</style>
</head>
<body>
<div class="chat-container">
<div class="chat-header">
<h1>🤖 Spring AI Chat Demo</h1>
<p><span class="status-indicator"></span>Connected to AI Service</p>
</div>
<div id="errorContainer"></div>
<div id="chatMessages" class="chat-messages">
<!-- Messages will be added here dynamically -->
</div>
<div class="input-container">
<div class="input-group">
<textarea
id="messageInput"
placeholder="Type your message here... (Press Ctrl+Enter to send)"
rows="1"
></textarea>
</div>
<button id="sendButton" class="send-button" onclick="sendMessage()">
Send
</button>
</div>
</div>
<script>
// Global variables for conversation management
let conversationId = null;
let isTyping = false;
/**
* Add a message to the chat interface
* @param {string} message - The message content
* @param {boolean} isUser - Whether this is a user message
* @param {boolean} isError - Whether this is an error message
*/
function addMessage(message, isUser = false, isError = false) {
const messagesDiv = document.getElementById("chatMessages");
const messageDiv = document.createElement("div");
if (isError) {
messageDiv.className = "message ai-message";
messageDiv.style.background = "#f8d7da";
messageDiv.style.color = "#721c24";
messageDiv.style.border = "1px solid #f5c6cb";
} else {
messageDiv.className = `message ${
isUser ? "user-message" : "ai-message"
}`;
}
// Create message content with timestamp
const messageContent = document.createElement("div");
messageContent.textContent = message;
messageDiv.appendChild(messageContent);
const timeDiv = document.createElement("div");
timeDiv.className = "message-time";
timeDiv.textContent = new Date().toLocaleTimeString();
messageDiv.appendChild(timeDiv);
messagesDiv.appendChild(messageDiv);
messagesDiv.scrollTop = messagesDiv.scrollHeight;
}
/**
* Show typing indicator
*/
function showTypingIndicator() {
const messagesDiv = document.getElementById("chatMessages");
const typingDiv = document.createElement("div");
typingDiv.id = "typingIndicator";
typingDiv.className = "message typing-indicator";
typingDiv.innerHTML = `
AI is thinking
<div class="typing-dots">
<span></span>
<span></span>
<span></span>
</div>
`;
messagesDiv.appendChild(typingDiv);
messagesDiv.scrollTop = messagesDiv.scrollHeight;
}
/**
* Remove typing indicator
*/
function hideTypingIndicator() {
const typingIndicator = document.getElementById("typingIndicator");
if (typingIndicator) {
typingIndicator.remove();
}
}
/**
* Show error message
* @param {string} message - Error message to display
*/
function showError(message) {
const errorContainer = document.getElementById("errorContainer");
errorContainer.innerHTML = `<div class="error-message">${message}</div>`;
setTimeout(() => {
errorContainer.innerHTML = "";
}, 5000);
}
/**
* Auto-resize textarea based on content
*/
function autoResizeTextarea() {
const textarea = document.getElementById("messageInput");
textarea.style.height = "auto";
textarea.style.height = Math.min(textarea.scrollHeight, 120) + "px";
}
/**
* Handle keyboard shortcuts
* @param {KeyboardEvent} event - Keyboard event
*/
function handleKeyPress(event) {
if (event.key === "Enter") {
if (event.ctrlKey || event.metaKey) {
// Ctrl+Enter or Cmd+Enter sends message
event.preventDefault();
sendMessage();
} else if (!event.shiftKey) {
// Enter without Shift sends message on mobile
if (window.innerWidth <= 768) {
event.preventDefault();
sendMessage();
}
}
}
}
/**
* Validate message before sending
* @param {string} message - Message to validate
* @returns {boolean} - Whether message is valid
*/
function validateMessage(message) {
if (!message || message.trim().length === 0) {
showError("Please enter a message");
return false;
}
if (message.length > 1000) {
showError("Message too long. Please limit to 1000 characters.");
return false;
}
return true;
}
/**
* Send message to AI service
*/
async function sendMessage() {
const input = document.getElementById("messageInput");
const sendButton = document.getElementById("sendButton");
const message = input.value.trim();
// Validate message
if (!validateMessage(message)) {
return;
}
// Disable input during processing
input.disabled = true;
sendButton.disabled = true;
sendButton.textContent = "Sending...";
// Add user message to chat
addMessage(message, true);
input.value = "";
autoResizeTextarea();
// Show typing indicator
showTypingIndicator();
isTyping = true;
try {
// Determine API endpoint based on conversation state
const url = conversationId
? `/api/chat/conversation/${conversationId}`
: "/api/chat/message";
// Make API request
const response = await fetch(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
Accept: "application/json",
},
body: JSON.stringify({ message: message }),
});
// Hide typing indicator
hideTypingIndicator();
isTyping = false;
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
const data = await response.json();
// Add AI response to chat
addMessage(data.response, false);
// Update conversation ID for future messages
if (data.conversationId) {
conversationId = data.conversationId;
}
} catch (error) {
hideTypingIndicator();
isTyping = false;
console.error("Error sending message:", error);
// Show user-friendly error message
let errorMessage = "Sorry, something went wrong. Please try again.";
if (error.name === "TypeError" && error.message.includes("fetch")) {
errorMessage =
"Unable to connect to the server. Please check your connection.";
} else if (error.message.includes("500")) {
errorMessage =
"The AI service is temporarily unavailable. Please try again later.";
} else if (error.message.includes("400")) {
errorMessage =
"Invalid message format. Please try rephrasing your message.";
}
addMessage(errorMessage, false, true);
showError(errorMessage);
} finally {
// Re-enable input
input.disabled = false;
sendButton.disabled = false;
sendButton.textContent = "Send";
input.focus();
}
}
/**
* Check service health periodically
*/
async function checkServiceHealth() {
try {
const response = await fetch("/api/chat/health");
const statusIndicator = document.querySelector(".status-indicator");
if (response.ok) {
statusIndicator.style.backgroundColor = "#28a745";
statusIndicator.title = "Service is healthy";
} else {
statusIndicator.style.backgroundColor = "#ffc107";
statusIndicator.title = "Service has issues";
}
} catch (error) {
const statusIndicator = document.querySelector(".status-indicator");
statusIndicator.style.backgroundColor = "#dc3545";
statusIndicator.title = "Service is unavailable";
}
}
/**
* Initialize the application
*/
function initializeApp() {
const input = document.getElementById("messageInput");
// Set up event listeners
input.addEventListener("input", autoResizeTextarea);
input.addEventListener("keydown", handleKeyPress);
// Check service health on load and periodically
checkServiceHealth();
setInterval(checkServiceHealth, 30000); // Check every 30 seconds
// Add welcome message
addMessage(
"👋 Hello! I'm your AI assistant powered by Spring AI and Ollama. How can I help you today?",
false
);
// Focus input
input.focus();
}
// Initialize when page loads
window.addEventListener("load", initializeApp);
// Handle page visibility changes (pause health checks when not visible)
document.addEventListener("visibilitychange", function () {
if (document.visibilityState === "visible") {
checkServiceHealth();
}
});
</script>
</body>
</html>
Let’s create a user-friendly web interface to test our AI integration.
Creating the Web Controller
Now create a controller to serve this web page:
// src/main/java/com/example/springaitutorial/controller/WebController.java
package com.sundrymind.springaitutorial.controller;
import org.springframework.stereotype.Controller;
import org.springframework.ui.Model;
import org.springframework.web.bind.annotation.GetMapping;
/**
* Controller for serving web pages
* Uses Thymeleaf template engine to render HTML pages
*/
@Controller
public class WebController {
/**
* Serve the main chat interface
* GET /
*
* @param model Spring's Model object for passing data to templates
* @return Template name (maps to src/main/resources/templates/index.html)
*/
@GetMapping("/")
public String index(Model model) {
// Add any model attributes needed by the template
model.addAttribute("appName", "Spring AI Chat Demo");
model.addAttribute("version", "1.0.0");
// Return template name (without .html extension)
return "index";
}
/**
* Alternative endpoint for the chat interface
* GET /chat
*/
@GetMapping("/chat")
public String chat() {
return "index";
}
}
Web Interface Features Explained
Responsive Design: The interface adapts to different screen sizes using CSS media queries.
Real-time Typing Indicators: Shows animated dots while AI is processing.
Message Validation: Prevents empty messages and enforces length limits.
Error Handling: Displays user-friendly error messages for different failure scenarios.
Health Monitoring: Periodically checks service status and updates the indicator.
Keyboard Shortcuts: Supports Ctrl+Enter for sending messages.
Auto-resize Textarea: Input field grows with content up to a maximum height.
Step 7: Testing Your Integration
Testing the Application
- Start Ollama Service:
ollama serve
- Verify Model is Available:
ollama list
# Should show llama3.2 or your chosen model
- Start Spring Boot Application:
mvn spring-boot:run
Or from your IDE, run the main application class.
Testing the REST API
Use curl to test the API endpoints:
Test Single Message:
curl -X POST http://localhost:8080/api/chat/message
-H "Content-Type: application/json"
-d '{"message": "Hello, how are you today?"}'
Expected Response:
{
"response": "Hello! I'm doing well, thank you for asking. As an AI assistant, I'm here and ready to help you with any questions or tasks you might have. How can I assist you today?",
"conversationId": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": 1703123456789
}
Test Conversation Context:
# Use the conversationId from the previous response
curl -X POST http://localhost:8080/api/chat/conversation/550e8400-e29b-41d4-a716-446655440000
-H "Content-Type: application/json"
-d '{"message": "What did I just ask you?"}'
Test Health Endpoint:
curl http://localhost:8080/api/chat/health
Testing the Web Interface
- Open your browser and navigate to
http://localhost:8080
- You should see the chat interface with a welcome message
- Try these test conversations:
Basic Interaction:
- Type: “Hello, what can you help me with?”
- Verify you get a relevant response
Context Testing:
- First message: “My name is John”
- Second message: “What’s my name?”
- The AI should remember your name
Error Handling:
- Try sending an empty message
- Try sending a very long message (>1000 characters)
- Verify appropriate error messages appear
Common Testing Issues
Issue 1: 404 Not Found
Error: GET http://localhost:8080/ 404
Solutions:
- Ensure WebController is in the correct package
- Check that
@Controller
annotation is present - Verify template file is in
src/main/resources/templates/
Issue 2: Template Resolution Error
Error: Template might not exist or might not be accessible
Solutions:
- Ensure file is named
index.html
(case-sensitive) - Check Thymeleaf dependency is included
- Verify template syntax is correct
Issue 3: AI Service Unavailable
Error: Service temporarily unavailable
Solutions:
- Check Ollama is running:
ollama serve
- Verify model is downloaded:
ollama list
- Check application.yml configuration
- Review application logs for detailed errors
Issue 4: Slow Responses
Solutions:
- Use a smaller model:
ollama pull phi3:mini
- Increase timeout in configuration
- Check system resources (RAM, CPU)
Step 8: Enhanced Error Handling and Monitoring
Let’s add comprehensive error handling and monitoring capabilities:
// src/main/java/com/example/springaitutorial/exception/GlobalExceptionHandler.java
package com.sundrymind.springaitutorial.exception;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.ControllerAdvice;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.context.request.WebRequest;
import java.time.LocalDateTime;
import java.util.HashMap;
import java.util.Map;
/**
* Global exception handler for the application
* Catches and handles exceptions across all controllers
*/
@ControllerAdvice
public class GlobalExceptionHandler {
private static final Logger logger = LoggerFactory.getLogger(GlobalExceptionHandler.class);
/**
* Handle AI service specific exceptions
*/
@ExceptionHandler(AIServiceException.class)
public ResponseEntity<Map<String, Object>> handleAIServiceException(
AIServiceException ex, WebRequest request) {
logger.error("AI Service error: {}", ex.getMessage(), ex);
Map<String, Object> response = new HashMap<>();
response.put("error", "AI Service Error");
response.put("message", ex.getMessage());
response.put("timestamp", LocalDateTime.now());
response.put("path", request.getDescription(false));
return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE).body(response);
}
/**
* Handle validation errors
*/
@ExceptionHandler(IllegalArgumentException.class)
public ResponseEntity<Map<String, Object>> handleValidationException(
IllegalArgumentException ex, WebRequest request) {
logger.warn("Validation error: {}", ex.getMessage());
Map<String, Object> response = new HashMap<>();
response.put("error", "Validation Error");
response.put("message", ex.getMessage());
response.put("timestamp", LocalDateTime.now());
return ResponseEntity.badRequest().body(response);
}
/**
* Handle all other exceptions
*/
@ExceptionHandler(Exception.class)
public ResponseEntity<Map<String, Object>> handleGenericException(
Exception ex, WebRequest request) {
logger.error("Unexpected error: {}", ex.getMessage(), ex);
Map<String, Object> response = new HashMap<>();
response.put("error", "Internal Server Error");
response.put("message", "An unexpected error occurred. Please try again later.");
response.put("timestamp", LocalDateTime.now());
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(response);
}
}
/**
* Custom exception for AI service errors
*/
class AIServiceException extends RuntimeException {
public AIServiceException(String message) {
super(message);
}
public AIServiceException(String message, Throwable cause) {
super(message, cause);
}
}
Enhanced Chat Service with Metrics
// Enhanced version of ChatService with monitoring
package com.sundrymind.springaitutorial.service;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.memory.InMemoryChatMemory;
import org.springframework.stereotype.Service;
import java.time.LocalDateTime;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;
@Service
public class ChatService {
private static final Logger logger = LoggerFactory.getLogger(ChatService.class);
private final ChatClient chatClient;
// Metrics for monitoring
private final AtomicInteger totalRequests = new AtomicInteger(0);
private final AtomicInteger successfulRequests = new AtomicInteger(0);
private final AtomicInteger failedRequests = new AtomicInteger(0);
private final AtomicLong totalResponseTime = new AtomicLong(0);
private final LocalDateTime startTime = LocalDateTime.now();
public ChatService(ChatClient.Builder chatClientBuilder) {
this.chatClient = chatClientBuilder
.defaultSystem("""
You are a helpful AI assistant created with Spring AI and Ollama.
Keep your responses concise but informative.
Be friendly and professional in your interactions.
If you're unsure about something, it's okay to say so.
""")
.defaultAdvisors(new MessageChatMemoryAdvisor(new InMemoryChatMemory()))
.build();
logger.info("ChatService initialized successfully");
}
/**
* Generate response with comprehensive error handling and metrics
*/
public String generateResponse(String userMessage) {
long startTime = System.currentTimeMillis();
totalRequests.incrementAndGet();
try {
// Validate input
validateMessage(userMessage);
logger.debug("Processing message: {}", truncateMessage(userMessage));
// Generate response
String response = chatClient
.prompt()
.user(userMessage)
.call()
.content();
// Validate response
if (response == null || response.trim().isEmpty()) {
throw new RuntimeException("AI service returned empty response");
}
// Update metrics
long responseTime = System.currentTimeMillis() - startTime;
totalResponseTime.addAndGet(responseTime);
successfulRequests.incrementAndGet();
logger.debug("Response generated successfully in {}ms", responseTime);
return response;
} catch (Exception e) {
failedRequests.incrementAndGet();
logger.error("Error generating response: {}", e.getMessage(), e);
// Return contextual error message based on exception type
if (e.getMessage().contains("connection")) {
return "I'm having trouble connecting to my AI service. Please try again in a moment.";
} else if (e.getMessage().contains("timeout")) {
return "I'm taking longer than usual to respond. Please try a shorter message or try again later.";
} else {
return "I apologize, but I'm experiencing technical difficulties right now. Please try again.";
}
}
}
/**
* Generate response with conversation context
*/
public String generateResponseWithContext(String userMessage, String conversationId) {
long startTime = System.currentTimeMillis();
totalRequests.incrementAndGet();
try {
validateMessage(userMessage);
validateConversationId(conversationId);
logger.debug("Processing contextual message for conversation: {}", conversationId);
String response = chatClient
.prompt()
.user(userMessage)
.advisors(advisorSpec -> advisorSpec
.param(MessageChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY, conversationId))
.call()
.content();
if (response == null || response.trim().isEmpty()) {
throw new RuntimeException("AI service returned empty response");
}
long responseTime = System.currentTimeMillis() - startTime;
totalResponseTime.addAndGet(responseTime);
successfulRequests.incrementAndGet();
logger.debug("Contextual response generated in {}ms", responseTime);
return response;
} catch (Exception e) {
failedRequests.incrementAndGet();
logger.error("Error generating contextual response: {}", e.getMessage(), e);
return "I'm having trouble maintaining our conversation context. Please try starting a new conversation.";
}
}
/**
* Comprehensive health check
*/
public boolean isServiceHealthy() {
try {
long startTime = System.currentTimeMillis();
String testResponse = chatClient
.prompt()
.user("Hello")
.call()
.content();
long responseTime = System.currentTimeMillis() - startTime;
boolean isHealthy = testResponse != null &&
!testResponse.trim().isEmpty() &&
responseTime < 10000; // Less than 10 seconds
logger.debug("Health check completed: {} ({}ms)", isHealthy, responseTime);
return isHealthy;
} catch (Exception e) {
logger.warn("Health check failed: {}", e.getMessage());
return false;
}
}
/**
* Get service metrics
*/
public ServiceMetrics getMetrics() {
long avgResponseTime = totalRequests.get() > 0 ?
totalResponseTime.get() / totalRequests.get() : 0;
return new ServiceMetrics(
totalRequests.get(),
successfulRequests.get(),
failedRequests.get(),
avgResponseTime,
startTime
);
}
// Helper methods
private void validateMessage(String message) {
if (message == null || message.trim().isEmpty()) {
throw new IllegalArgumentException("Message cannot be empty");
}
if (message.length() > 2000) {
throw new IllegalArgumentException("Message too long. Maximum 2000 characters allowed.");
}
}
private void validateConversationId(String conversationId) {
if (conversationId == null || conversationId.trim().isEmpty()) {
throw new IllegalArgumentException("Conversation ID cannot be empty");
}
}
private String truncateMessage(String message) {
return message.length() > 50 ? message.substring(0, 50) + "..." : message;
}
/**
* Data class for service metrics
*/
public static class ServiceMetrics {
private final int totalRequests;
private final int successfulRequests;
private final int failedRequests;
private final long averageResponseTime;
private final LocalDateTime startTime;
public ServiceMetrics(int totalRequests, int successfulRequests, int failedRequests,
long averageResponseTime, LocalDateTime startTime) {
this.totalRequests = totalRequests;
this.successfulRequests = successfulRequests;
this.failedRequests = failedRequests;
this.averageResponseTime = averageResponseTime;
this.startTime = startTime;
}
// Getters
public int getTotalRequests() { return totalRequests; }
public int getSuccessfulRequests() { return successfulRequests; }
public int getFailedRequests() { return failedRequests; }
public long getAverageResponseTime() { return averageResponseTime; }
public LocalDateTime getStartTime() { return startTime; }
public double getSuccessRate() {
return totalRequests > 0 ? (double) successfulRequests / totalRequests * 100 : 0;
}
}
}
Step 9: Configuration for Different Environments
Environment-Specific Configurations
Create separate configuration files for different environments:
Development (application-dev.yml
):
spring:
ai:
ollama:
base-url: http://localhost:11434
chat:
model: phi3:mini # Faster model for development
options:
temperature: 0.8
max-tokens: 500
timeout: 30s
logging:
level:
com.sundrymind.springaitutorial: DEBUG
org.springframework.ai: DEBUG
org.springframework.web: DEBUG
server:
port: 8080
# Development-specific settings
debug: true
management:
endpoints:
web:
exposure:
include: health,info,metrics
Production (application-prod.yml
):
spring:
ai:
ollama:
base-url: ${OLLAMA_BASE_URL:http://localhost:11434}
chat:
model: ${AI_MODEL:llama3.2}
options:
temperature: 0.6
max-tokens: 1000
timeout: 60s
logging:
level:
com.sundrymind.springaitutorial: INFO
org.springframework.ai: WARN
org.springframework.web: WARN
server:
port: ${PORT:8080}
# Production security and monitoring
management:
endpoints:
web:
exposure:
include: health,info
endpoint:
health:
show-details: when-authorized
Using Environment Variables
For production deployments, use environment variables:
# Set environment variables
export SPRING_PROFILES_ACTIVE=prod
export OLLAMA_BASE_URL=http://your-ollama-server:11434
export AI_MODEL=llama3.2
export PORT=8080
# Run the application
java -jar target/spring-ai-tutorial-1.0.0.jar
Step 10: Alternative AI Providers
Groq Integration
Groq provides fast cloud-based AI inference. Here’s how to set it up:
Get Groq API Key:
- Visit console.groq.com
- Create an account and generate an API key
Configuration:
spring:
ai:
openai: # Groq uses OpenAI-compatible API
api-key: ${GROQ_API_KEY}
base-url: https://api.groq.com/openai/v1
chat:
model: mixtral-8x7b-32768 # Fast, high-quality model
options:
temperature: 0.7
max-tokens: 1000
- Set Environment Variable:
export GROQ_API_KEY=your_groq_api_key_here
Hugging Face Integration
<!-- Add Hugging Face dependency -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-huggingface-spring-boot-starter</artifactId>
<version>${spring-ai.version}</version>
</dependency>
Configuration:
spring:
ai:
huggingface:
api-key: ${HUGGINGFACE_API_KEY}
chat:
model: microsoft/DialoGPT-large
Common Issues and Comprehensive Troubleshooting
Ollama Issues
Issue 1: Ollama Service Not Running
Error: Connection refused to http://localhost:11434
Solutions:
# Check if Ollama is running
ps aux | grep ollama
# Start Ollama service
ollama serve
# Check if port is available
netstat -an | grep 11434
Issue 2: Model Download Issues
Error: failed to pull model: network error
Solutions:
# Check internet connection
ping ollama.ai
# Try different model
ollama pull phi3:mini
# Clear cache and retry
rm -rf ~/.ollama/models
ollama pull llama3.2
Issue 3: Memory Issues
Error: failed to load model: not enough memory
Solutions:
- Use smaller model:
ollama pull phi3:mini
- Close other applications to free RAM
- Increase swap space (Linux/Mac)
- Use cloud-based alternative (Groq)
Spring Boot Issues
Issue 1: Auto-configuration Failures
Error: Consider defining a bean of type 'ChatClient' in your configuration
Solutions:
- Ensure Spring AI BOM is in dependencyManagement
- Check Spring Boot version is 3.2.0+
- Verify Ollama starter dependency is included
Issue 2: Template Resolution Issues
Error: Template might not exist or might not be accessible
Solutions:
- Check file path:
src/main/resources/templates/index.html
- Verify Thymeleaf dependency
- Ensure proper file encoding (UTF-8)
Issue 3: CORS Issues
Error: CORS policy: No 'Access-Control-Allow-Origin' header
Solutions:
// Add global CORS configuration
@Configuration
public class WebConfig implements WebMvcConfigurer {
@Override
public void addCorsMappings(CorsRegistry registry) {
registry.addMapping("/api/**")
.allowedOrigins("*")
.allowedMethods("GET", "POST", "PUT", "DELETE");
}
}
Performance Issues
Issue 1: Slow Response Times
Symptom: Chat responses taking 30+ seconds
Solutions:
// Optimize ChatClient configuration
@Configuration
public class ChatConfig {
@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
return builder
.defaultOptions(OllamaChatOptions.builder()
.withModel("llama3.2:1b") // Use smaller model
.withTemperature(0.7f)
.withMaxTokens(512) // Limit response length
.build())
.build();
}
}
Additional optimizations:
- Use GPU acceleration:
ollama serve --gpu
- Increase Ollama timeout:
OLLAMA_REQUEST_TIMEOUT=300s
- Implement connection pooling
- Cache frequent responses
Issue 2: High Memory Usage
Symptom: Application consuming excessive RAM
Solutions:
# application.yml
spring:
ai:
ollama:
base-url: http://localhost:11434
chat:
options:
model: phi3:mini # Smaller model
num-ctx: 2048 # Reduced context window
JVM optimizations:
# Limit heap size
java -Xmx2g -Xms1g -jar your-app.jar
# Use G1 garbage collector
java -XX:+UseG1GC -jar your-app.jar
Issue 3: Connection Timeouts
Error: Read timeout executing GET http://localhost:11434/api/chat
Solutions:
@Configuration
public class HttpClientConfig {
@Bean
public RestTemplate restTemplate() {
HttpComponentsClientHttpRequestFactory factory =
new HttpComponentsClientHttpRequestFactory();
factory.setConnectTimeout(30000); // 30 seconds
factory.setReadTimeout(60000); // 60 seconds
return new RestTemplate(factory);
}
}
Security Issues
Issue 1: API Key Exposure
Warning: Hardcoded API keys in source code
Solutions:
# application-dev.yml (for development)
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY:}
# Use environment variables
export OPENAI_API_KEY=your-key-here
Issue 2: Unsecured Endpoints
// Secure your chat endpoints
@RestController
@RequestMapping("/api/chat")
@PreAuthorize("hasRole('USER')")
public class SecureChatController {
@PostMapping
@PreAuthorize("@rateLimitService.isAllowed(authentication.name)")
public String chat(@RequestBody String message, Authentication auth) {
// Rate limiting and authentication required
return chatClient.prompt(message).call().content();
}
}
Issue 3: Input Validation
// Validate and sanitize user input
@Service
public class InputValidationService {
public String sanitizeInput(String input) {
if (input == null || input.trim().isEmpty()) {
throw new IllegalArgumentException("Input cannot be empty");
}
// Remove potentially harmful content
String sanitized = input.replaceAll("[<>"'&]", "");
// Limit length
if (sanitized.length() > 1000) {
sanitized = sanitized.substring(0, 1000);
}
return sanitized;
}
}
Production Deployment Issues
Issue 1: Docker Container Issues
# Complete Dockerfile for production
FROM openjdk:21-jdk-slim
# Install Ollama
RUN curl -fsSL https://ollama.ai/install.sh | sh
# Copy application
COPY target/spring-ai-chat-*.jar app.jar
# Expose ports
EXPOSE 8080 11434
# Start both services
CMD ["sh", "-c", "ollama serve & java -jar app.jar"]
Issue 2: Load Balancing with Multiple Instances
# docker-compose.yml for scaled deployment
version: "3.8"
services:
ollama:
image: ollama/ollama
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama
app:
build: .
ports:
- "8080-8082:8080"
depends_on:
- ollama
environment:
- SPRING_AI_OLLAMA_BASE_URL=http://ollama:11434
deploy:
replicas: 3
volumes:
ollama_data:
Issue 3: Health Checks and Monitoring
// Custom health indicator
@Component
public class OllamaHealthIndicator implements HealthIndicator {
@Autowired
private ChatClient chatClient;
@Override
public Health health() {
try {
// Simple health check
String response = chatClient.prompt("Hello").call().content();
return Health.up()
.withDetail("ollama", "Available")
.withDetail("response-time", System.currentTimeMillis())
.build();
} catch (Exception e) {
return Health.down()
.withDetail("ollama", "Unavailable")
.withDetail("error", e.getMessage())
.build();
}
}
}
Testing Issues
Issue 1: Integration Test Failures
// Robust integration testing
@SpringBootTest
@TestPropertySource(properties = {
"spring.ai.ollama.base-url=http://localhost:11434",
"spring.ai.ollama.chat.options.model=llama3.2:1b"
})
class ChatServiceIntegrationTest {
@Autowired
private ChatService chatService;
@Test
@Timeout(30) // Prevent hanging tests
void testChatResponse() {
// Skip if Ollama not available
assumeTrue(isOllamaAvailable());
String response = chatService.chat("Hello");
assertThat(response).isNotEmpty();
}
private boolean isOllamaAvailable() {
try {
RestTemplate rest = new RestTemplate();
rest.getForEntity("http://localhost:11434/api/tags", String.class);
return true;
} catch (Exception e) {
return false;
}
}
}
Issue 2: Mocking AI Services
// Mock Ollama for unit tests
@MockBean
private ChatClient chatClient;
@Test
void testChatServiceWithMock() {
// Mock response
ChatResponse mockResponse = new ChatResponse(
List.of(new Generation("Mocked response"))
);
when(chatClient.prompt(anyString()).call())
.thenReturn(mockResponse);
String result = chatService.chat("Test message");
assertEquals("Mocked response", result);
}
Advanced Configuration and Best Practices
Production-Ready Configuration
# application-prod.yml
spring:
ai:
ollama:
base-url: ${OLLAMA_URL:http://localhost:11434}
chat:
options:
model: ${AI_MODEL:llama3.2}
temperature: 0.7
max-tokens: 1024
timeout: 60s
# Database configuration for chat history
datasource:
url: jdbc:postgresql://localhost:5432/chatdb
username: ${DB_USER}
password: ${DB_PASSWORD}
# Redis for caching
redis:
host: ${REDIS_HOST:localhost}
port: 6379
password: ${REDIS_PASSWORD:}
# Logging configuration
logging:
level:
org.springframework.ai: DEBUG
com.your.package: INFO
pattern:
file: "%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n"
file:
name: logs/spring-ai-chat.log
# Actuator endpoints
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
endpoint:
health:
show-details: always
metrics:
export:
prometheus:
enabled: true
Caching Strategy
The below Spring AI code defines a Spring service that caches chat responses in Redis for 30 minutes using @Cacheable
, with the ability to manually evict the cache.
@Service
@EnableCaching
public class CachedChatService {
@Autowired
private ChatClient chatClient;
@Cacheable(value = "chatCache", key = "#message", unless = "#result.length() < 10")
public String getCachedResponse(String message) {
return chatClient.prompt(message).call().content();
}
@CacheEvict(value = "chatCache", allEntries = true)
public void clearCache() {
// Manual cache clearing
}
}
@Configuration
@EnableCaching
public class CacheConfig {
@Bean
public CacheManager cacheManager() {
RedisCacheManager.Builder builder = RedisCacheManager
.RedisCacheManagerBuilder
.fromConnectionFactory(redisConnectionFactory())
.cacheDefaults(cacheConfiguration());
return builder.build();
}
private RedisCacheConfiguration cacheConfiguration() {
return RedisCacheConfiguration.defaultCacheConfig()
.entryTtl(Duration.ofMinutes(30))
.disableCachingNullValues()
.serializeKeysWith(RedisSerializationContext.SerializationPair
.fromSerializer(new StringRedisSerializer()))
.serializeValuesWith(RedisSerializationContext.SerializationPair
.fromSerializer(new GenericJackson2JsonRedisSerializer()));
}
}
Rate Limiting
The below code implements a per-user rate limiter using Redis, allowing up to 100 requests per hour.
package com.sundrymind.springaitutorial.service;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.stereotype.Component;
import java.time.Duration;
@Component
public class RateLimitService {
private final RedisTemplate<String, String> redisTemplate;
private final int maxRequests = 100; // per hour
public RateLimitService(RedisTemplate<String, String> redisTemplate) {
this.redisTemplate = redisTemplate;
}
public boolean isAllowed(String userId) {
String key = "rate_limit:" + userId;
String currentCount = redisTemplate.opsForValue().get(key);
if (currentCount == null) {
redisTemplate.opsForValue().set(key, "1", Duration.ofHours(1));
return true;
}
int count = Integer.parseInt(currentCount);
if (count >= maxRequests) {
return false;
}
redisTemplate.opsForValue().increment(key);
return true;
}
}
Comprehensive Monitoring
The below code defines a Spring component that tracks the number of chat requests and measures their response time using Micrometer metrics.
package com.sundrymind.springaitutorial.service;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import org.springframework.stereotype.Component;
import java.util.function.Supplier;
@Component
public class ChatMetrics {
private final MeterRegistry meterRegistry;
private final Counter chatRequestCounter;
private final Timer chatResponseTimer;
public ChatMetrics(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.chatRequestCounter = Counter.builder("chat.requests.total")
.description("Total number of chat requests")
.register(meterRegistry);
this.chatResponseTimer = Timer.builder("chat.response.time")
.description("Chat response time")
.register(meterRegistry);
}
public String timedChatCall(String message, Supplier<String> chatCall) throws Exception {
chatRequestCounter.increment();
return chatResponseTimer.recordCallable(() -> chatCall.get());
}
}
@Component
— makes this class a Spring-managed bean.
MeterRegistry
— from Micrometer, used to register metrics.
Counter
and Timer
— track:
- total chat requests
- time taken for chat responses
timedChatCall()
— increments the counter and times the execution of a Supplier<String>
chat call (like a call to OpenAI, Ollama, etc.).
Example Usage
@Autowired
private ChatMetrics chatMetrics;
public String handleChat(String userInput) {
return chatMetrics.timedChatCall(userInput, () -> openAiService.callLLM(userInput));
}
Get the complete code
Explore the full Maven project on GitHub on this Spring AI ollama integration:
[Spring AI + Ollama Integration Code]
Includes:
✅ Pre-configured application.yml
✅ Ready-to-run Spring Boot project
✅ Ollama model loading examples
Frequently Asked Questions
Can I use multiple Ollama models simultaneously in Spring AI?
Yes! Configure different AiClient
instances in your application.properties
or yml:
spring.ai.ollama.chat.model=llama2
spring.ai.ollama.embedding.model=mistral
How to secure Ollama endpoints with Spring AI?
If exposing Ollama via REST:
Use Spring Security to add API keys.
Bind Ollama to localhost
(default) and avoid exposing ports.
Can I fine-tune an Ollama model for Spring AI?
Yes! Fine-tune a model using Ollama’s Modelfile, then reference it in Spring AI:
ollama create my-model -f Modelfile
In application.properties
or yml:
spring.ai.ollama.chat.model=my-model
Conclusion
This comprehensive guide has covered the complete setup, configuration, troubleshooting, and deployment of Spring AI Ollama local LLM.
This foundation gives you everything needed to integrate AI capabilities into existing Spring applications or build new AI-firstfeatures.
The combination of Spring AI’s familiar patterns with powerful language models opens up countless possibilities. Whether you’re building chatbots, content generators, or intelligent data processors, this setup provides a solid starting point.
Remember to experiment with different models and configurations to find what works best for your specific use case. The AI landscape evolves rapidly, but with Spring AI handling the integration complexity, you can focus on building great user experiences.
Key takeaways:
- Start Simple: Begin with basic setup and gradually add complexity
- Monitor Everything: Implement proper logging, metrics, and health checks
- Security First: Never expose API keys, validate inputs, and implement rate limiting
- Performance Matters: Use appropriate models, caching, and connection pooling
- Test Thoroughly: Include both unit and integration tests
- Plan for Scale: Design for horizontal scaling and load balancing
Remember that AI applications require careful consideration of resource usage, security implications, and user experience. Regular monitoring and optimization are essential for production success.
This Spring AI Ollama local LLM example showed how easy it is to deploy models offline and use that with Spring AI. Try it yourself and share your results in the comments!