Spring AI Tutorial: Build your first AI application with Spring Boot Ollama integration

Updated on:

Spring AI Tutorial

Introduction

Building AI-powered applications has become increasingly accessible with the rise of large language models (LLMs) and frameworks like Spring AI. While we’ve covered the strategic benefits and enterprise features of Spring AI 1.0 in our comprehensive guide, this Spring AI tutorial focuses on practical implementation—showing you exactly how to build your first AI application with Spring Boot and Ollama integration.

This step-by-step guide will walk you through creating a complete local AI chat application with memory using Spring AI from scratch, with detailed explanations of every component, configuration setup, and common troubleshooting scenarios. Unlike cloud-based AI services, using Ollama allows you to run AI models locally, giving you full control over your data and eliminating API costs.

By the end of this tutorial, you’ll have a working chat application that can engage in conversations, maintain context, and handle errors gracefully—all running entirely on your local machine.

 

Understanding the Basics: LLMs and Ollama

What are Large Language Models (LLMs)?

Large Language Models are AI systems trained on vast amounts of text data to understand and generate human-like text. They work by:

  • Predicting the next word: Given a sequence of words, they predict what comes next
  • Understanding context: They can maintain context across conversations
  • Following instructions: They can perform tasks based on natural language instructions
  • Generating coherent responses: They produce human-like text that’s contextually relevant

Popular LLMs include:

  • GPT models (OpenAI): General-purpose models great for conversation
  • Llama models (Meta): Open-source alternatives with good performance
  • Phi models (Microsoft): Smaller, efficient models for lightweight applications

What is Ollama?

Ollama is a tool that makes it easy to run LLMs locally on your machine. Here’s why it’s valuable:

Why Ollama?

Ollama lets you run LLMs (like Llama2) locally—avoiding cloud costs, API limits, and privacy risks.

  • Privacy: Your data never leaves your machine
  • Cost: Completely free to use
  • Speed: No network latency for API calls
  • Offline capability: Works without internet connection

How Ollama Works:

  1. Downloads pre-trained models to your local machine
  2. Provides a REST API interface to interact with models
  3. Handles model loading, memory management, and optimization
  4. Supports multiple models simultaneously
  • llama3.2 (3B parameters): Good balance of speed and quality
  • phi3:mini (3.8B parameters): Faster, smaller model
  • mistral (7B parameters): Excellent for coding tasks
  • codellama (7B parameters): Specialized for code generation

Spring AI Framework

Spring AI is Spring’s framework for building AI-powered applications. It provides:

  • Consistent API: Familiar Spring patterns for AI integration
  • Multiple Provider Support: Works with OpenAI, Ollama, Hugging Face, etc.
  • Chat Memory: Maintains conversation context
  • Prompt Templates: Reusable prompt structures
  • Auto-configuration: Spring Boot integration with minimal setup

Ollama vs. OpenAI: Which Should You Use?

Ollama + Spring AI vs. OpenAI API: Key Differences

FeatureOllama + Spring AIOpenAI API (e.g., GPT-4)
CostFree (runs locally)Pay-per-use (~$0.01–$0.06 per 1K tokens)
PrivacyFully offline—no data leaves your machineRequests sent to cloud servers
CustomizationUse any open-source model (Llama 2, Mistral)Limited to OpenAI’s models
Setup ComplexityRequires local setup (Ollama + Spring AI)Just an API key
LatencyDepends on your hardware (slower on CPU)Fast (cloud GPUs)
Best ForPrivacy-sensitive apps, offline use, custom modelsQuick prototyping, production apps needing scale

Build Spring AI Chatbot with Memory

This tutorial creates a complete AI chat application with memory:

  1. REST API endpoints for programmatic access
  2. Web interface for interactive testing
  3. Conversation memory to maintain context
  4. Error handling for production readiness
  5. Health monitoring for system status
  6. Configurable AI models for different use cases

Prerequisites and Setup Requirements

System Requirements

Before starting, ensure you have:

  • Java 17 or higher: Required for Spring Boot 3.x
  • Maven 3.6+ or Gradle 7.0+: For dependency management
  • IDE: IntelliJ IDEA, Eclipse, or VS Code
  • Basic Spring Boot knowledge: Understanding of controllers, services, and configuration

Hardware Requirements for Ollama

  • RAM: At least 8GB (16GB recommended for larger models)
  • Disk Space: 10GB+ for model storage
  • CPU: Modern multi-core processor (M1/M2 Macs work great)

Step 1: Setting Up Your Spring Boot Ollama Integration

Creating the Project

  1. Visit start.spring.io

  2. Configure your project:

    • Project: Maven
    • Language: Java
    • Spring Boot: 3.2.0 or higher
    • Packaging: Jar
    • Java: 17
    • Group: com.sundrymind
    • Artifact: spring-ai-tutorial
    • Name: Spring AI Tutorial
    • Package name: com.sundrymind.springaitutorial
  3. Add dependencies:

    • Spring Web
    • Spring Boot DevTools
    • Thymeleaf

Understanding the Dependencies

Let’s examine each dependency in your pom.xml:

<properties>
    <java.version>17</java.version>
    <spring-ai.version>1.0.0</spring-ai.version>
</properties>

<dependencies>
    <!-- Core Spring Boot Dependencies -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
        <!-- Provides REST API capabilities, embedded Tomcat server -->
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-thymeleaf</artifactId>
        <!-- Template engine for creating web pages -->
    </dependency>

    <!-- Spring AI Dependencies -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
        <version>${spring-ai.version}</version>
        <!-- Auto-configures Ollama client and ChatClient beans -->
    </dependency>

    <!-- Optional: Alternative AI providers -->
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
        <version>${spring-ai.version}</version>
        <!-- Enables OpenAI/Groq integration -->
    </dependency>
</dependencies>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>${spring-ai.version}</version>
            <type>pom</type>
            <scope>import</scope>
            <!-- Manages versions of all Spring AI components -->
        </dependency>
    </dependencies>
</dependencyManagement>

Common Setup Issues:

Issue 1: Version Conflicts

Error: Could not resolve dependencies for project

Solution: Ensure Spring Boot version is 3.2.0+ and Spring AI version is compatible.

Issue 2: Missing BOM

Error: Failed to resolve version for org.springframework.ai

Solution: Add the spring-ai-bom in dependencyManagement section.

Step 2: Installing and Configuring Ollama

Installation Process

For macOS:

# Using Homebrew (recommended)
brew install ollama

# Alternative: Direct download
curl -fsSL https://ollama.ai/install.sh | sh

For Linux:

curl -fsSL https://ollama.ai/install.sh | sh

For Windows:
Download the installer from ollama.ai and run it.

Starting Ollama Service

# Start the Ollama service (required before downloading models)
ollama serve

Common Ollama Issues:

Issue 1: Port Already in Use

Error: listen tcp 127.0.0.1:11434: bind: address already in use

Solution:

# Kill existing Ollama process
pkill ollama
# Or use different port
OLLAMA_HOST=0.0.0.0:11435 ollama serve

Issue 2: Permission Denied

Error: permission denied while trying to connect to Docker daemon

Solution:

# On Linux, add user to docker group or run as root
sudo ollama serve

Downloading Models

# Download Llama 3.2 (3B parameters - balanced performance)
ollama pull llama3.2

# Alternative: Smaller, faster model for testing
ollama pull phi3:mini

# Verify model download
ollama list

Model Selection Guide:

  • phi3:mini (3.8B): Fastest, good for development/testing
  • llama3.2 (3B): Best balance of speed and quality
  • mistral (7B): Better quality, slower response
  • codellama (7B): Specialized for coding tasks

Testing Your Installation

# Test the model interactively
ollama run llama3.2

# Test via API
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hello, how are you?",
  "stream": false
}'

Common Model Issues:

Issue 1: Model Not Found

Error: model 'llama3.2' not found

Solution: Ensure model is downloaded: ollama pull llama3.2

Issue 2: Out of Memory

Error: failed to load model: not enough memory

Solution: Use smaller model (phi3:mini) or increase system RAM.

With the model successfully installed, let’s now explore how to Use Ollama with Spring AI.

Step 3: Spring Boot Configuration

Basic Configuration

Create SpringBoot main class SpringAIdemoApplication.java:

package com.sundrymind.springaitutorial.config;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.ComponentScan;

@SpringBootApplication
@ComponentScan(basePackages = {"com.sundrymind.springaitutorial"})
public class SpringAIdemoApplication {

	public static void main(String[] args) {
		SpringApplication.run(SpringAIdemoApplication.class, args);
	}
}

Configuring Ollama Models in Spring AI

Create src/main/resources/application.yml:

spring:
  application:
    name: spring-ai-tutorial

  # AI Configuration
  ai:
    ollama:
      base-url: http://localhost:11434 # Ollama server URL
      chat:
        model: llama3.2 # Model name
        options:
          temperature: 0.7 # Creativity level (0.0-1.0)
          top-p: 0.9 # Nucleus sampling parameter
          max-tokens: 1000 # Maximum response length
    autoconfigure:
    exclude:
      - org.springframework.ai.autoconfigure.vectorstore.chroma.ChromaVectorStoreAutoConfiguration

# Server Configuration
server:
  port: 8080

# Logging Configuration
logging:
  level:
    com.sundrymind.springaitutorial: DEBUG
    org.springframework.ai: INFO
    org.springframework.web: INFO

Configuration Parameters Explained

Temperature (0.0 – 1.0):

  • 0.0: Deterministic, same response every time
  • 0.3: Focused, consistent responses
  • 0.7: Balanced creativity and consistency (recommended)
  • 1.0: Maximum creativity, more varied responses

Top-p (0.0 – 1.0):

  • Controls diversity by limiting token selection
  • 0.9: Good balance (recommended)
  • Lower values = more focused responses

Max-tokens:

  • Maximum number of tokens in response
  • 1000: Good for chat applications
  • Adjust based on your needs

Environment-Specific Configuration

Development (application-dev.yml):

spring:
  ai:
    ollama:
      chat:
        options:
          temperature: 0.8 # More creative for testing
logging:
  level:
    org.springframework.ai: DEBUG # Detailed logging

Production (application-prod.yml):

spring:
  ai:
    ollama:
      chat:
        options:
          temperature: 0.6 # More consistent for production
logging:
  level:
    org.springframework.ai: WARN # Less verbose logging

Configuration Issues:

Issue 1: Connection Refused

Error: Connection refused: http://localhost:11434

Solution: Ensure Ollama is running: ollama serve

Issue 2: Invalid Model Name

Error: model 'wrong-name' not found

Solution: Check available models: ollama list

Step 4: Creating the AI Service Layer

The service layer handles all AI interactions and business logic.

// src/main/java/com/example/springaitutorial/service/ChatService.java
package com.sundrymind.springaitutorial.service;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.memory.InMemoryChatMemory;
import org.springframework.stereotype.Service;

@Service
public class ChatService {

    private static final Logger logger = LoggerFactory.getLogger(ChatService.class);

    // The main interface for AI interactions
    private final ChatClient chatClient;

    /**
     * Constructor that configures the ChatClient with default settings
     *
     * @param chatClientBuilder Auto-injected builder from Spring AI
     */
    public ChatService(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder
            // Set default system prompt that defines AI behavior
            .defaultSystem("You are a friendly AI assistant. " +
                         "Keep responses concise and helpful. " +
                         "Be conversational but professional.")
            // Add memory advisor for conversation context
            .defaultAdvisors(new MessageChatMemoryAdvisor(new InMemoryChatMemory()))
            .build();

        logger.info("ChatService initialized with Ollama client");
    }

    /**
     * Generates a simple response without conversation context
     * Used for stateless interactions
     *
     * @param userMessage The user's input message
     * @return AI-generated response or error message
     */
    public String generateResponse(String userMessage) {
        // Log the request (truncated for privacy)
        logger.debug("Processing message: {}",
                    userMessage.substring(0, Math.min(50, userMessage.length())));

        try {
            // Create a prompt and get response
            String response = chatClient
                .prompt()                    // Start building a prompt
                .user(userMessage)          // Add user message
                .call()                     // Make the API call
                .content();                 // Extract response content

            logger.debug("Generated response successfully");
            return response;

        } catch (Exception e) {
            // Log error and return user-friendly message
            logger.error("Error generating AI response: {}", e.getMessage(), e);
            return "I'm sorry, I'm having trouble processing your request right now. " +
                   "Please try again in a moment.";
        }
    }

    /**
     * Generates response with conversation context
     * Maintains conversation history for more coherent interactions
     *
     * @param userMessage The user's input message
     * @param conversationId Unique identifier for this conversation
     * @return AI-generated response with conversation context
     */
    public String generateResponseWithContext(String userMessage, String conversationId) {
        logger.debug("Processing contextual message for conversation: {}", conversationId);

        try {
            return chatClient
                .prompt()
                .user(userMessage)
                // Configure memory advisor with conversation ID
                .advisors(advisorSpec -> advisorSpec
                    .param(MessageChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY,
                           conversationId))
                .call()
                .content();

        } catch (Exception e) {
            logger.error("Error generating contextual response: {}", e.getMessage(), e);
            return "I apologize, but I'm experiencing technical difficulties. " +
                   "Please try again in a moment.";
        }
    }

    /**
     * Health check method to verify AI service is working
     * Used by monitoring endpoints
     *
     * @return true if service is healthy, false otherwise
     */
    public boolean isServiceHealthy() {
        try {
            // Send a simple test message
            String testResponse = chatClient
                .prompt()
                .user("Hello")
                .call()
                .content();

            // Check if we got a valid response
            boolean isHealthy = testResponse != null && !testResponse.trim().isEmpty();
            logger.debug("Health check result: {}", isHealthy);
            return isHealthy;

        } catch (Exception e) {
            logger.warn("Health check failed: {}", e.getMessage());
            return false;
        }
    }
}

Key Concepts Explained

ChatClient: The main interface for AI interactions. It provides a fluent API for building prompts and getting responses.

MessageChatMemoryAdvisor: Handles conversation memory by storing previous messages and including them as context in new requests.

InMemoryChatMemory: Stores conversation history in application memory (lost when application restarts).

System Prompts: Instructions that define how the AI should behave throughout the conversation.

Conversation ID: A unique identifier that groups related messages together for context.

Step 5: Building the REST API Controller

The controller handles HTTP requests and responses, providing RESTful endpoints for AI interactions.

//src/main/java/com/example/springaitutorial/controller/ChatController.java
package com.sundrymind.springaitutorial.controller;

import com.sundrymind.springaitutorial.service.ChatService;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;

import java.util.Map;
import java.util.UUID;

@RestController
@RequestMapping("/api/chat")
@CrossOrigin(origins = "*") // Allow requests from any origin (for development)
public class ChatController {

	private static final Logger logger = LoggerFactory.getLogger(ChatController.class);

	private final ChatService chatService;

	/**
	 * Constructor injection of ChatService
	 * Spring automatically provides the ChatService instance
	 */
	public ChatController(ChatService chatService) {
		this.chatService = chatService;
	}

	/**
	 * Endpoint for single message interactions (stateless)
	 * POST /api/chat/message
	 *
	 * @param request Contains the user message
	 * @return ChatResponse with AI reply and new conversation ID
	 */
	@PostMapping("/message")
	public ResponseEntity<ChatResponse> sendMessage(@RequestBody ChatRequest request) {
		logger.info("Received message request");

		// Validate input
		if (request.getMessage() == null || request.getMessage().trim().isEmpty()) {
			logger.warn("Empty message received");
			return ResponseEntity.badRequest()
					.body(new ChatResponse("Message cannot be empty", null));
		}

		// Validate message length (prevent abuse)
		if (request.getMessage().length() > 1000) {
			logger.warn("Message too long: {} characters", request.getMessage().length());
			return ResponseEntity.badRequest()
					.body(new ChatResponse("Message too long. Please limit to 1000 characters.", null));
		}

		try {
			// Generate AI response
			String response = chatService.generateResponse(request.getMessage());

			// Generate new conversation ID for potential follow-up
			String conversationId = UUID.randomUUID().toString();

			logger.info("Message processed successfully");
			return ResponseEntity.ok(new ChatResponse(response, conversationId));

		} catch (Exception e) {
			logger.error("Unexpected error processing message", e);
			return ResponseEntity.internalServerError()
					.body(new ChatResponse("Service temporarily unavailable. Please try again later.", null));
		}
	}

	/**
	 * Endpoint for continuing an existing conversation (stateful)
	 * POST /api/chat/conversation/{conversationId}
	 *
	 * @param conversationId ID of existing conversation
	 * @param request Contains the user message
	 * @return ChatResponse with contextual AI reply
	 */
	@PostMapping("/conversation/{conversationId}")
	public ResponseEntity<ChatResponse> continueConversation(
			@PathVariable String conversationId,
			@RequestBody ChatRequest request) {

		logger.info("Continuing conversation: {}", conversationId);

		// Validate conversation ID
		if (conversationId == null || conversationId.trim().isEmpty()) {
			return ResponseEntity.badRequest()
					.body(new ChatResponse("Invalid conversation ID", null));
		}

		// Validate message
		if (request.getMessage() == null || request.getMessage().trim().isEmpty()) {
			return ResponseEntity.badRequest()
					.body(new ChatResponse("Message cannot be empty", conversationId));
		}

		try {
			// Generate contextual response
			String response = chatService.generateResponseWithContext(
					request.getMessage(),
					conversationId
					);

			return ResponseEntity.ok(new ChatResponse(response, conversationId));

		} catch (Exception e) {
			logger.error("Error continuing conversation: {}", conversationId, e);
			return ResponseEntity.internalServerError()
					.body(new ChatResponse("Unable to continue conversation. Please try again.", conversationId));
		}
	}

	/**
	 * Health check endpoint
	 * GET /api/chat/health
	 *
	 * @return Service status information
	 */
	@GetMapping("/health")
	public ResponseEntity<Map<String, Object>> healthCheck() {
		boolean isHealthy = chatService.isServiceHealthy();

		Map<String, Object> status = Map.of(
				"status", isHealthy ? "healthy" : "unhealthy",
						"service", "AI Chat Service",
						"timestamp", System.currentTimeMillis()
				);

		// Return appropriate HTTP status
		return isHealthy ?
				ResponseEntity.ok(status) :
					ResponseEntity.status(503).body(status);
	}

	/**
	 * Get conversation statistics (bonus endpoint)
	 * GET /api/chat/stats
	 */
	@GetMapping("/stats")
	public ResponseEntity<Map<String, Object>> getStats() {
		// In a real application, you'd track these metrics
		Map<String, Object> stats = Map.of(
				"totalConversations", 0,
				"totalMessages", 0,
				"averageResponseTime", "0ms",
				"uptime", System.currentTimeMillis()
				);

		return ResponseEntity.ok(stats);
	}

	// DTO Classes for Request/Response

	/**
	 * Request object for chat messages
	 */
	public static class ChatRequest {
		private String message;

		// Default constructor for JSON deserialization
		public ChatRequest() {}

		public ChatRequest(String message) {
			this.message = message;
		}

		public String getMessage() {
			return message;
		}

		public void setMessage(String message) {
			this.message = message;
		}

		@Override
		public String toString() {
			return "ChatRequest{message='" +
					(message != null ? message.substring(0, Math.min(50, message.length())) : "null") +
					"'}";
		}
	}

	/**
	 * Response object for chat messages
	 */
	public static class ChatResponse {
		private String response;
		private String conversationId;
		private long timestamp;

		public ChatResponse(String response, String conversationId) {
			this.response = response;
			this.conversationId = conversationId;
			this.timestamp = System.currentTimeMillis();
		}

		// Getters
		public String getResponse() { return response; }
		public String getConversationId() { return conversationId; }
		public long getTimestamp() { return timestamp; }

		@Override
		public String toString() {
			return "ChatResponse{conversationId='" + conversationId +
					"', timestamp=" + timestamp +
					", responseLength=" + (response != null ? response.length() : 0) + "}";
		}
	}
}

Controller Concepts Explained

@RestController: Combines @Controller and @ResponseBody, automatically serializing return values to JSON.

@RequestMapping: Defines the base URL path for all endpoints in this controller.

@CrossOrigin: Allows requests from different origins (important for web development).

ResponseEntity: Provides fine-grained control over HTTP response status and headers.

@PathVariable: Extracts values from the URL path.

@RequestBody: Automatically deserializes JSON request body to Java objects.

UUID.randomUUID(): Generates unique conversation identifiers.

Common Controller Issues:

Issue 1: CORS Errors

Access to fetch at 'http://localhost:8080/api/chat/message' from origin 'null' has been blocked by CORS policy

Solution: Add @CrossOrigin annotation or configure CORS globally.

Issue 2: JSON Serialization Errors

Error: Could not read JSON: Unrecognized field "msg"

Solution: Ensure request JSON matches DTO field names exactly.

Step 6: Creating the Web Interface

<!-- src/main/resources/templates/index.html -->
<!DOCTYPE html>
<html lang="en" xmlns:th="http://www.thymeleaf.org">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Spring AI Chat Demo</title>
    <style>
      /* Modern, responsive styling */
      * {
        margin: 0;
        padding: 0;
        box-sizing: border-box;
      }

      body {
        font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto,
          "Helvetica Neue", Arial, sans-serif;
        background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
        min-height: 100vh;
        display: flex;
        align-items: center;
        justify-content: center;
        padding: 20px;
      }

      .chat-container {
        background: rgba(255, 255, 255, 0.95);
        backdrop-filter: blur(10px);
        border-radius: 20px;
        padding: 30px;
        box-shadow: 0 20px 40px rgba(0, 0, 0, 0.15);
        width: 100%;
        max-width: 800px;
        height: 80vh;
        display: flex;
        flex-direction: column;
      }

      .chat-header {
        text-align: center;
        margin-bottom: 20px;
        padding-bottom: 15px;
        border-bottom: 2px solid #f0f0f0;
      }

      .chat-header h1 {
        color: #333;
        font-size: 2rem;
        margin-bottom: 10px;
      }

      .status-indicator {
        display: inline-block;
        width: 10px;
        height: 10px;
        border-radius: 50%;
        background-color: #28a745;
        margin-right: 8px;
        animation: pulse 2s infinite;
      }

      @keyframes pulse {
        0% {
          opacity: 1;
        }
        50% {
          opacity: 0.5;
        }
        100% {
          opacity: 1;
        }
      }

      .chat-messages {
        flex: 1;
        overflow-y: auto;
        padding: 20px;
        margin-bottom: 20px;
        border: 1px solid #e0e0e0;
        border-radius: 15px;
        background: #fafafa;
        scroll-behavior: smooth;
      }

      .message {
        margin-bottom: 20px;
        padding: 15px 20px;
        border-radius: 18px;
        max-width: 80%;
        word-wrap: break-word;
        animation: slideIn 0.3s ease-out;
      }

      @keyframes slideIn {
        from {
          opacity: 0;
          transform: translateY(20px);
        }
        to {
          opacity: 1;
          transform: translateY(0);
        }
      }

      .user-message {
        background: linear-gradient(135deg, #007bff, #0056b3);
        color: white;
        margin-left: auto;
        border-bottom-right-radius: 5px;
      }

      .ai-message {
        background: linear-gradient(135deg, #f8f9fa, #e9ecef);
        color: #333;
        margin-right: auto;
        border-bottom-left-radius: 5px;
        border: 1px solid #dee2e6;
      }

      .message-time {
        font-size: 0.75rem;
        opacity: 0.7;
        margin-top: 5px;
      }

      .typing-indicator {
        display: flex;
        align-items: center;
        color: #666;
        font-style: italic;
        margin-right: auto;
      }

      .typing-dots {
        display: inline-block;
        margin-left: 10px;
      }

      .typing-dots span {
        display: inline-block;
        width: 8px;
        height: 8px;
        border-radius: 50%;
        background-color: #999;
        margin: 0 2px;
        animation: typing 1.4s infinite both;
      }

      .typing-dots span:nth-child(2) {
        animation-delay: 0.2s;
      }

      .typing-dots span:nth-child(3) {
        animation-delay: 0.4s;
      }

      @keyframes typing {
        0%,
        60%,
        100% {
          transform: translateY(0);
          opacity: 0.5;
        }
        30% {
          transform: translateY(-15px);
          opacity: 1;
        }
      }

      .input-container {
        display: flex;
        gap: 15px;
        align-items: flex-end;
      }

      .input-group {
        flex: 1;
        position: relative;
      }

      .input-group textarea {
        width: 100%;
        padding: 15px 20px;
        border: 2px solid #e0e0e0;
        border-radius: 25px;
        font-size: 16px;
        font-family: inherit;
        resize: none;
        outline: none;
        transition: all 0.3s ease;
        min-height: 50px;
        max-height: 120px;
      }

      .input-group textarea:focus {
        border-color: #007bff;
        box-shadow: 0 0 0 3px rgba(0, 123, 255, 0.1);
      }

      .send-button {
        padding: 15px 25px;
        background: linear-gradient(135deg, #007bff, #0056b3);
        color: white;
        border: none;
        border-radius: 25px;
        cursor: pointer;
        font-size: 16px;
        font-weight: 600;
        transition: all 0.3s ease;
        white-space: nowrap;
      }

      .send-button:hover:not(:disabled) {
        background: linear-gradient(135deg, #0056b3, #004085);
        transform: translateY(-2px);
        box-shadow: 0 5px 15px rgba(0, 123, 255, 0.3);
      }

      .send-button:disabled {
        opacity: 0.6;
        cursor: not-allowed;
        transform: none;
      }

      .error-message {
        background: #f8d7da;
        color: #721c24;
        border: 1px solid #f5c6cb;
        padding: 10px 15px;
        border-radius: 8px;
        margin-bottom: 15px;
        text-align: center;
      }

      /* Responsive design */
      @media (max-width: 768px) {
        .chat-container {
          height: 90vh;
          padding: 20px;
          margin: 10px;
        }

        .message {
          max-width: 90%;
        }

        .input-container {
          flex-direction: column;
          gap: 10px;
        }

        .send-button {
          width: 100%;
        }
      }
    </style>
  </head>
  <body>
    <div class="chat-container">
      <div class="chat-header">
        <h1>🤖 Spring AI Chat Demo</h1>
        <p><span class="status-indicator"></span>Connected to AI Service</p>
      </div>

      <div id="errorContainer"></div>

      <div id="chatMessages" class="chat-messages">
        <!-- Messages will be added here dynamically -->
      </div>

      <div class="input-container">
        <div class="input-group">
          <textarea
            id="messageInput"
            placeholder="Type your message here... (Press Ctrl+Enter to send)"
            rows="1"
          ></textarea>
        </div>
        <button id="sendButton" class="send-button" onclick="sendMessage()">
          Send
        </button>
      </div>
    </div>

    <script>
      // Global variables for conversation management
      let conversationId = null;
      let isTyping = false;

      /**
       * Add a message to the chat interface
       * @param {string} message - The message content
       * @param {boolean} isUser - Whether this is a user message
       * @param {boolean} isError - Whether this is an error message
       */
      function addMessage(message, isUser = false, isError = false) {
        const messagesDiv = document.getElementById("chatMessages");
        const messageDiv = document.createElement("div");

        if (isError) {
          messageDiv.className = "message ai-message";
          messageDiv.style.background = "#f8d7da";
          messageDiv.style.color = "#721c24";
          messageDiv.style.border = "1px solid #f5c6cb";
        } else {
          messageDiv.className = `message ${
            isUser ? "user-message" : "ai-message"
          }`;
        }

        // Create message content with timestamp
        const messageContent = document.createElement("div");
        messageContent.textContent = message;
        messageDiv.appendChild(messageContent);

        const timeDiv = document.createElement("div");
        timeDiv.className = "message-time";
        timeDiv.textContent = new Date().toLocaleTimeString();
        messageDiv.appendChild(timeDiv);

        messagesDiv.appendChild(messageDiv);
        messagesDiv.scrollTop = messagesDiv.scrollHeight;
      }

      /**
       * Show typing indicator
       */
      function showTypingIndicator() {
        const messagesDiv = document.getElementById("chatMessages");
        const typingDiv = document.createElement("div");
        typingDiv.id = "typingIndicator";
        typingDiv.className = "message typing-indicator";
        typingDiv.innerHTML = `
                AI is thinking
                <div class="typing-dots">
                    <span></span>
                    <span></span>
                    <span></span>
                </div>
            `;
        messagesDiv.appendChild(typingDiv);
        messagesDiv.scrollTop = messagesDiv.scrollHeight;
      }

      /**
       * Remove typing indicator
       */
      function hideTypingIndicator() {
        const typingIndicator = document.getElementById("typingIndicator");
        if (typingIndicator) {
          typingIndicator.remove();
        }
      }

      /**
       * Show error message
       * @param {string} message - Error message to display
       */
      function showError(message) {
        const errorContainer = document.getElementById("errorContainer");
        errorContainer.innerHTML = `<div class="error-message">${message}</div>`;
        setTimeout(() => {
          errorContainer.innerHTML = "";
        }, 5000);
      }

      /**
       * Auto-resize textarea based on content
       */
      function autoResizeTextarea() {
        const textarea = document.getElementById("messageInput");
        textarea.style.height = "auto";
        textarea.style.height = Math.min(textarea.scrollHeight, 120) + "px";
      }

      /**
       * Handle keyboard shortcuts
       * @param {KeyboardEvent} event - Keyboard event
       */
      function handleKeyPress(event) {
        if (event.key === "Enter") {
          if (event.ctrlKey || event.metaKey) {
            // Ctrl+Enter or Cmd+Enter sends message
            event.preventDefault();
            sendMessage();
          } else if (!event.shiftKey) {
            // Enter without Shift sends message on mobile
            if (window.innerWidth <= 768) {
              event.preventDefault();
              sendMessage();
            }
          }
        }
      }

      /**
       * Validate message before sending
       * @param {string} message - Message to validate
       * @returns {boolean} - Whether message is valid
       */
      function validateMessage(message) {
        if (!message || message.trim().length === 0) {
          showError("Please enter a message");
          return false;
        }

        if (message.length > 1000) {
          showError("Message too long. Please limit to 1000 characters.");
          return false;
        }

        return true;
      }

      /**
       * Send message to AI service
       */
      async function sendMessage() {
        const input = document.getElementById("messageInput");
        const sendButton = document.getElementById("sendButton");
        const message = input.value.trim();

        // Validate message
        if (!validateMessage(message)) {
          return;
        }

        // Disable input during processing
        input.disabled = true;
        sendButton.disabled = true;
        sendButton.textContent = "Sending...";

        // Add user message to chat
        addMessage(message, true);
        input.value = "";
        autoResizeTextarea();

        // Show typing indicator
        showTypingIndicator();
        isTyping = true;

        try {
          // Determine API endpoint based on conversation state
          const url = conversationId
            ? `/api/chat/conversation/${conversationId}`
            : "/api/chat/message";

          // Make API request
          const response = await fetch(url, {
            method: "POST",
            headers: {
              "Content-Type": "application/json",
              Accept: "application/json",
            },
            body: JSON.stringify({ message: message }),
          });

          // Hide typing indicator
          hideTypingIndicator();
          isTyping = false;

          if (!response.ok) {
            throw new Error(`HTTP ${response.status}: ${response.statusText}`);
          }

          const data = await response.json();

          // Add AI response to chat
          addMessage(data.response, false);

          // Update conversation ID for future messages
          if (data.conversationId) {
            conversationId = data.conversationId;
          }
        } catch (error) {
          hideTypingIndicator();
          isTyping = false;

          console.error("Error sending message:", error);

          // Show user-friendly error message
          let errorMessage = "Sorry, something went wrong. Please try again.";

          if (error.name === "TypeError" && error.message.includes("fetch")) {
            errorMessage =
              "Unable to connect to the server. Please check your connection.";
          } else if (error.message.includes("500")) {
            errorMessage =
              "The AI service is temporarily unavailable. Please try again later.";
          } else if (error.message.includes("400")) {
            errorMessage =
              "Invalid message format. Please try rephrasing your message.";
          }

          addMessage(errorMessage, false, true);
          showError(errorMessage);
        } finally {
          // Re-enable input
          input.disabled = false;
          sendButton.disabled = false;
          sendButton.textContent = "Send";
          input.focus();
        }
      }

      /**
       * Check service health periodically
       */
      async function checkServiceHealth() {
        try {
          const response = await fetch("/api/chat/health");
          const statusIndicator = document.querySelector(".status-indicator");

          if (response.ok) {
            statusIndicator.style.backgroundColor = "#28a745";
            statusIndicator.title = "Service is healthy";
          } else {
            statusIndicator.style.backgroundColor = "#ffc107";
            statusIndicator.title = "Service has issues";
          }
        } catch (error) {
          const statusIndicator = document.querySelector(".status-indicator");
          statusIndicator.style.backgroundColor = "#dc3545";
          statusIndicator.title = "Service is unavailable";
        }
      }

      /**
       * Initialize the application
       */
      function initializeApp() {
        const input = document.getElementById("messageInput");

        // Set up event listeners
        input.addEventListener("input", autoResizeTextarea);
        input.addEventListener("keydown", handleKeyPress);

        // Check service health on load and periodically
        checkServiceHealth();
        setInterval(checkServiceHealth, 30000); // Check every 30 seconds

        // Add welcome message
        addMessage(
          "👋 Hello! I'm your AI assistant powered by Spring AI and Ollama. How can I help you today?",
          false
        );

        // Focus input
        input.focus();
      }

      // Initialize when page loads
      window.addEventListener("load", initializeApp);

      // Handle page visibility changes (pause health checks when not visible)
      document.addEventListener("visibilitychange", function () {
        if (document.visibilityState === "visible") {
          checkServiceHealth();
        }
      });
    </script>
  </body>
</html>

Let’s create a user-friendly web interface to test our AI integration.

Creating the Web Controller

Now create a controller to serve this web page:

// src/main/java/com/example/springaitutorial/controller/WebController.java
package com.sundrymind.springaitutorial.controller;

import org.springframework.stereotype.Controller;
import org.springframework.ui.Model;
import org.springframework.web.bind.annotation.GetMapping;

/**
 * Controller for serving web pages
 * Uses Thymeleaf template engine to render HTML pages
 */
@Controller
public class WebController {

    /**
     * Serve the main chat interface
     * GET /
     *
     * @param model Spring's Model object for passing data to templates
     * @return Template name (maps to src/main/resources/templates/index.html)
     */
    @GetMapping("/")
    public String index(Model model) {
        // Add any model attributes needed by the template
        model.addAttribute("appName", "Spring AI Chat Demo");
        model.addAttribute("version", "1.0.0");

        // Return template name (without .html extension)
        return "index";
    }

    /**
     * Alternative endpoint for the chat interface
     * GET /chat
     */
    @GetMapping("/chat")
    public String chat() {
        return "index";
    }
}

Web Interface Features Explained

Responsive Design: The interface adapts to different screen sizes using CSS media queries.

Real-time Typing Indicators: Shows animated dots while AI is processing.

Message Validation: Prevents empty messages and enforces length limits.

Error Handling: Displays user-friendly error messages for different failure scenarios.

Health Monitoring: Periodically checks service status and updates the indicator.

Keyboard Shortcuts: Supports Ctrl+Enter for sending messages.

Auto-resize Textarea: Input field grows with content up to a maximum height.

Step 7: Testing Your Integration

Testing the Application

  1. Start Ollama Service:
ollama serve
  1. Verify Model is Available:
ollama list
# Should show llama3.2 or your chosen model
  1. Start Spring Boot Application:
mvn spring-boot:run

Or from your IDE, run the main application class.

Testing the REST API

Use curl to test the API endpoints:

Test Single Message:

curl -X POST http://localhost:8080/api/chat/message 
  -H "Content-Type: application/json" 
  -d '{"message": "Hello, how are you today?"}'

Expected Response:

{
  "response": "Hello! I'm doing well, thank you for asking. As an AI assistant, I'm here and ready to help you with any questions or tasks you might have. How can I assist you today?",
  "conversationId": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": 1703123456789
}

Test Conversation Context:

# Use the conversationId from the previous response
curl -X POST http://localhost:8080/api/chat/conversation/550e8400-e29b-41d4-a716-446655440000 
  -H "Content-Type: application/json" 
  -d '{"message": "What did I just ask you?"}'

Test Health Endpoint:

curl http://localhost:8080/api/chat/health

Testing the Web Interface

  1. Open your browser and navigate to http://localhost:8080
  2. You should see the chat interface with a welcome message
  3. Try these test conversations:

Basic Interaction:

  • Type: “Hello, what can you help me with?”
  • Verify you get a relevant response

Context Testing:

  • First message: “My name is John”
  • Second message: “What’s my name?”
  • The AI should remember your name

Error Handling:

  • Try sending an empty message
  • Try sending a very long message (>1000 characters)
  • Verify appropriate error messages appear

Common Testing Issues

Issue 1: 404 Not Found

Error: GET http://localhost:8080/ 404

Solutions:

  • Ensure WebController is in the correct package
  • Check that @Controller annotation is present
  • Verify template file is in src/main/resources/templates/

Issue 2: Template Resolution Error

Error: Template might not exist or might not be accessible

Solutions:

  • Ensure file is named index.html (case-sensitive)
  • Check Thymeleaf dependency is included
  • Verify template syntax is correct

Issue 3: AI Service Unavailable

Error: Service temporarily unavailable

Solutions:

  • Check Ollama is running: ollama serve
  • Verify model is downloaded: ollama list
  • Check application.yml configuration
  • Review application logs for detailed errors

Issue 4: Slow Responses
Solutions:

  • Use a smaller model: ollama pull phi3:mini
  • Increase timeout in configuration
  • Check system resources (RAM, CPU)

Step 8: Enhanced Error Handling and Monitoring

Let’s add comprehensive error handling and monitoring capabilities:

// src/main/java/com/example/springaitutorial/exception/GlobalExceptionHandler.java
package com.sundrymind.springaitutorial.exception;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.ControllerAdvice;
import org.springframework.web.bind.annotation.ExceptionHandler;
import org.springframework.web.context.request.WebRequest;

import java.time.LocalDateTime;
import java.util.HashMap;
import java.util.Map;

/**
 * Global exception handler for the application
 * Catches and handles exceptions across all controllers
 */
@ControllerAdvice
public class GlobalExceptionHandler {

    private static final Logger logger = LoggerFactory.getLogger(GlobalExceptionHandler.class);

    /**
     * Handle AI service specific exceptions
     */
    @ExceptionHandler(AIServiceException.class)
    public ResponseEntity<Map<String, Object>> handleAIServiceException(
            AIServiceException ex, WebRequest request) {

        logger.error("AI Service error: {}", ex.getMessage(), ex);

        Map<String, Object> response = new HashMap<>();
        response.put("error", "AI Service Error");
        response.put("message", ex.getMessage());
        response.put("timestamp", LocalDateTime.now());
        response.put("path", request.getDescription(false));

        return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE).body(response);
    }

    /**
     * Handle validation errors
     */
    @ExceptionHandler(IllegalArgumentException.class)
    public ResponseEntity<Map<String, Object>> handleValidationException(
            IllegalArgumentException ex, WebRequest request) {

        logger.warn("Validation error: {}", ex.getMessage());

        Map<String, Object> response = new HashMap<>();
        response.put("error", "Validation Error");
        response.put("message", ex.getMessage());
        response.put("timestamp", LocalDateTime.now());

        return ResponseEntity.badRequest().body(response);
    }

    /**
     * Handle all other exceptions
     */
    @ExceptionHandler(Exception.class)
    public ResponseEntity<Map<String, Object>> handleGenericException(
            Exception ex, WebRequest request) {

        logger.error("Unexpected error: {}", ex.getMessage(), ex);

        Map<String, Object> response = new HashMap<>();
        response.put("error", "Internal Server Error");
        response.put("message", "An unexpected error occurred. Please try again later.");
        response.put("timestamp", LocalDateTime.now());

        return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(response);
    }
}

/**
 * Custom exception for AI service errors
 */
class AIServiceException extends RuntimeException {
    public AIServiceException(String message) {
        super(message);
    }

    public AIServiceException(String message, Throwable cause) {
        super(message, cause);
    }
}

Enhanced Chat Service with Metrics

// Enhanced version of ChatService with monitoring
package com.sundrymind.springaitutorial.service;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.chat.client.ChatClient;
import org.springframework.ai.chat.client.advisor.MessageChatMemoryAdvisor;
import org.springframework.ai.chat.memory.InMemoryChatMemory;
import org.springframework.stereotype.Service;

import java.time.LocalDateTime;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicLong;

@Service
public class ChatService {

    private static final Logger logger = LoggerFactory.getLogger(ChatService.class);

    private final ChatClient chatClient;

    // Metrics for monitoring
    private final AtomicInteger totalRequests = new AtomicInteger(0);
    private final AtomicInteger successfulRequests = new AtomicInteger(0);
    private final AtomicInteger failedRequests = new AtomicInteger(0);
    private final AtomicLong totalResponseTime = new AtomicLong(0);
    private final LocalDateTime startTime = LocalDateTime.now();

    public ChatService(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder
            .defaultSystem("""
                You are a helpful AI assistant created with Spring AI and Ollama.
                Keep your responses concise but informative.
                Be friendly and professional in your interactions.
                If you're unsure about something, it's okay to say so.
                """)
            .defaultAdvisors(new MessageChatMemoryAdvisor(new InMemoryChatMemory()))
            .build();

        logger.info("ChatService initialized successfully");
    }

    /**
     * Generate response with comprehensive error handling and metrics
     */
    public String generateResponse(String userMessage) {
        long startTime = System.currentTimeMillis();
        totalRequests.incrementAndGet();

        try {
            // Validate input
            validateMessage(userMessage);

            logger.debug("Processing message: {}", truncateMessage(userMessage));

            // Generate response
            String response = chatClient
                .prompt()
                .user(userMessage)
                .call()
                .content();

            // Validate response
            if (response == null || response.trim().isEmpty()) {
                throw new RuntimeException("AI service returned empty response");
            }

            // Update metrics
            long responseTime = System.currentTimeMillis() - startTime;
            totalResponseTime.addAndGet(responseTime);
            successfulRequests.incrementAndGet();

            logger.debug("Response generated successfully in {}ms", responseTime);
            return response;

        } catch (Exception e) {
            failedRequests.incrementAndGet();
            logger.error("Error generating response: {}", e.getMessage(), e);

            // Return contextual error message based on exception type
            if (e.getMessage().contains("connection")) {
                return "I'm having trouble connecting to my AI service. Please try again in a moment.";
            } else if (e.getMessage().contains("timeout")) {
                return "I'm taking longer than usual to respond. Please try a shorter message or try again later.";
            } else {
                return "I apologize, but I'm experiencing technical difficulties right now. Please try again.";
            }
        }
    }

    /**
     * Generate response with conversation context
     */
    public String generateResponseWithContext(String userMessage, String conversationId) {
        long startTime = System.currentTimeMillis();
        totalRequests.incrementAndGet();

        try {
            validateMessage(userMessage);
            validateConversationId(conversationId);

            logger.debug("Processing contextual message for conversation: {}", conversationId);

            String response = chatClient
                .prompt()
                .user(userMessage)
                .advisors(advisorSpec -> advisorSpec
                    .param(MessageChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY, conversationId))
                .call()
                .content();

            if (response == null || response.trim().isEmpty()) {
                throw new RuntimeException("AI service returned empty response");
            }

            long responseTime = System.currentTimeMillis() - startTime;
            totalResponseTime.addAndGet(responseTime);
            successfulRequests.incrementAndGet();

            logger.debug("Contextual response generated in {}ms", responseTime);
            return response;

        } catch (Exception e) {
            failedRequests.incrementAndGet();
            logger.error("Error generating contextual response: {}", e.getMessage(), e);
            return "I'm having trouble maintaining our conversation context. Please try starting a new conversation.";
        }
    }

    /**
     * Comprehensive health check
     */
    public boolean isServiceHealthy() {
        try {
            long startTime = System.currentTimeMillis();

            String testResponse = chatClient
                .prompt()
                .user("Hello")
                .call()
                .content();

            long responseTime = System.currentTimeMillis() - startTime;

            boolean isHealthy = testResponse != null &&
                               !testResponse.trim().isEmpty() &&
                               responseTime < 10000; // Less than 10 seconds

            logger.debug("Health check completed: {} ({}ms)", isHealthy, responseTime);
            return isHealthy;

        } catch (Exception e) {
            logger.warn("Health check failed: {}", e.getMessage());
            return false;
        }
    }

    /**
     * Get service metrics
     */
    public ServiceMetrics getMetrics() {
        long avgResponseTime = totalRequests.get() > 0 ?
            totalResponseTime.get() / totalRequests.get() : 0;

        return new ServiceMetrics(
            totalRequests.get(),
            successfulRequests.get(),
            failedRequests.get(),
            avgResponseTime,
            startTime
        );
    }

    // Helper methods

    private void validateMessage(String message) {
        if (message == null || message.trim().isEmpty()) {
            throw new IllegalArgumentException("Message cannot be empty");
        }
        if (message.length() > 2000) {
            throw new IllegalArgumentException("Message too long. Maximum 2000 characters allowed.");
        }
    }

    private void validateConversationId(String conversationId) {
        if (conversationId == null || conversationId.trim().isEmpty()) {
            throw new IllegalArgumentException("Conversation ID cannot be empty");
        }
    }

    private String truncateMessage(String message) {
        return message.length() > 50 ? message.substring(0, 50) + "..." : message;
    }

    /**
     * Data class for service metrics
     */
    public static class ServiceMetrics {
        private final int totalRequests;
        private final int successfulRequests;
        private final int failedRequests;
        private final long averageResponseTime;
        private final LocalDateTime startTime;

        public ServiceMetrics(int totalRequests, int successfulRequests, int failedRequests,
                            long averageResponseTime, LocalDateTime startTime) {
            this.totalRequests = totalRequests;
            this.successfulRequests = successfulRequests;
            this.failedRequests = failedRequests;
            this.averageResponseTime = averageResponseTime;
            this.startTime = startTime;
        }

        // Getters
        public int getTotalRequests() { return totalRequests; }
        public int getSuccessfulRequests() { return successfulRequests; }
        public int getFailedRequests() { return failedRequests; }
        public long getAverageResponseTime() { return averageResponseTime; }
        public LocalDateTime getStartTime() { return startTime; }
        public double getSuccessRate() {
            return totalRequests > 0 ? (double) successfulRequests / totalRequests * 100 : 0;
        }
    }
}

Step 9: Configuration for Different Environments

Environment-Specific Configurations

Create separate configuration files for different environments:

Development (application-dev.yml):

spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        model: phi3:mini # Faster model for development
        options:
          temperature: 0.8
          max-tokens: 500
          timeout: 30s

logging:
  level:
    com.sundrymind.springaitutorial: DEBUG
    org.springframework.ai: DEBUG
    org.springframework.web: DEBUG

server:
  port: 8080

# Development-specific settings
debug: true
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics

Production (application-prod.yml):

spring:
  ai:
    ollama:
      base-url: ${OLLAMA_BASE_URL:http://localhost:11434}
      chat:
        model: ${AI_MODEL:llama3.2}
        options:
          temperature: 0.6
          max-tokens: 1000
          timeout: 60s

logging:
  level:
    com.sundrymind.springaitutorial: INFO
    org.springframework.ai: WARN
    org.springframework.web: WARN

server:
  port: ${PORT:8080}

# Production security and monitoring
management:
  endpoints:
    web:
      exposure:
        include: health,info
  endpoint:
    health:
      show-details: when-authorized

Using Environment Variables

For production deployments, use environment variables:

# Set environment variables
export SPRING_PROFILES_ACTIVE=prod
export OLLAMA_BASE_URL=http://your-ollama-server:11434
export AI_MODEL=llama3.2
export PORT=8080

# Run the application
java -jar target/spring-ai-tutorial-1.0.0.jar

Step 10: Alternative AI Providers

Groq Integration

Groq provides fast cloud-based AI inference. Here’s how to set it up:

  1. Get Groq API Key:

  2. Configuration:


spring:
  ai:
    openai: # Groq uses OpenAI-compatible API
      api-key: ${GROQ_API_KEY}
      base-url: https://api.groq.com/openai/v1
      chat:
        model: mixtral-8x7b-32768 # Fast, high-quality model
        options:
          temperature: 0.7
          max-tokens: 1000
  1. Set Environment Variable:
export GROQ_API_KEY=your_groq_api_key_here

Hugging Face Integration

<!-- Add Hugging Face dependency -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-huggingface-spring-boot-starter</artifactId>
    <version>${spring-ai.version}</version>
</dependency>

Configuration:

spring:
  ai:
    huggingface:
      api-key: ${HUGGINGFACE_API_KEY}
      chat:
        model: microsoft/DialoGPT-large

Common Issues and Comprehensive Troubleshooting

Ollama Issues

Issue 1: Ollama Service Not Running

Error: Connection refused to http://localhost:11434

Solutions:

# Check if Ollama is running
ps aux | grep ollama

# Start Ollama service
ollama serve

# Check if port is available
netstat -an | grep 11434

Issue 2: Model Download Issues

Error: failed to pull model: network error

Solutions:

# Check internet connection
ping ollama.ai

# Try different model
ollama pull phi3:mini

# Clear cache and retry
rm -rf ~/.ollama/models
ollama pull llama3.2

Issue 3: Memory Issues

Error: failed to load model: not enough memory

Solutions:

  • Use smaller model: ollama pull phi3:mini
  • Close other applications to free RAM
  • Increase swap space (Linux/Mac)
  • Use cloud-based alternative (Groq)

Spring Boot Issues

Issue 1: Auto-configuration Failures

Error: Consider defining a bean of type 'ChatClient' in your configuration

Solutions:

  • Ensure Spring AI BOM is in dependencyManagement
  • Check Spring Boot version is 3.2.0+
  • Verify Ollama starter dependency is included

Issue 2: Template Resolution Issues

Error: Template might not exist or might not be accessible

Solutions:

  • Check file path: src/main/resources/templates/index.html
  • Verify Thymeleaf dependency
  • Ensure proper file encoding (UTF-8)

Issue 3: CORS Issues

Error: CORS policy: No 'Access-Control-Allow-Origin' header

Solutions:

// Add global CORS configuration
@Configuration
public class WebConfig implements WebMvcConfigurer {
    @Override
    public void addCorsMappings(CorsRegistry registry) {
        registry.addMapping("/api/**")
                .allowedOrigins("*")
                .allowedMethods("GET", "POST", "PUT", "DELETE");
    }
}

Performance Issues

Issue 1: Slow Response Times

Symptom: Chat responses taking 30+ seconds

Solutions:

// Optimize ChatClient configuration
@Configuration
public class ChatConfig {
    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) {
        return builder
            .defaultOptions(OllamaChatOptions.builder()
                .withModel("llama3.2:1b")  // Use smaller model
                .withTemperature(0.7f)
                .withMaxTokens(512)  // Limit response length
                .build())
            .build();
    }
}

Additional optimizations:

  • Use GPU acceleration: ollama serve --gpu
  • Increase Ollama timeout: OLLAMA_REQUEST_TIMEOUT=300s
  • Implement connection pooling
  • Cache frequent responses

Issue 2: High Memory Usage

Symptom: Application consuming excessive RAM

Solutions:

# application.yml
spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: phi3:mini # Smaller model
          num-ctx: 2048 # Reduced context window

JVM optimizations:

# Limit heap size
java -Xmx2g -Xms1g -jar your-app.jar

# Use G1 garbage collector
java -XX:+UseG1GC -jar your-app.jar

Issue 3: Connection Timeouts

Error: Read timeout executing GET http://localhost:11434/api/chat

Solutions:

@Configuration
public class HttpClientConfig {
    @Bean
    public RestTemplate restTemplate() {
        HttpComponentsClientHttpRequestFactory factory =
            new HttpComponentsClientHttpRequestFactory();
        factory.setConnectTimeout(30000);  // 30 seconds
        factory.setReadTimeout(60000);     // 60 seconds
        return new RestTemplate(factory);
    }
}

Security Issues

Issue 1: API Key Exposure

Warning: Hardcoded API keys in source code

Solutions:

# application-dev.yml (for development)
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY:}

# Use environment variables
export OPENAI_API_KEY=your-key-here

Issue 2: Unsecured Endpoints

// Secure your chat endpoints
@RestController
@RequestMapping("/api/chat")
@PreAuthorize("hasRole('USER')")
public class SecureChatController {

    @PostMapping
    @PreAuthorize("@rateLimitService.isAllowed(authentication.name)")
    public String chat(@RequestBody String message, Authentication auth) {
        // Rate limiting and authentication required
        return chatClient.prompt(message).call().content();
    }
}

Issue 3: Input Validation

// Validate and sanitize user input
@Service
public class InputValidationService {

    public String sanitizeInput(String input) {
        if (input == null || input.trim().isEmpty()) {
            throw new IllegalArgumentException("Input cannot be empty");
        }

        // Remove potentially harmful content
        String sanitized = input.replaceAll("[<>"'&]", "");

        // Limit length
        if (sanitized.length() > 1000) {
            sanitized = sanitized.substring(0, 1000);
        }

        return sanitized;
    }
}

Production Deployment Issues

Issue 1: Docker Container Issues

# Complete Dockerfile for production
FROM openjdk:21-jdk-slim

# Install Ollama
RUN curl -fsSL https://ollama.ai/install.sh | sh

# Copy application
COPY target/spring-ai-chat-*.jar app.jar

# Expose ports
EXPOSE 8080 11434

# Start both services
CMD ["sh", "-c", "ollama serve & java -jar app.jar"]

Issue 2: Load Balancing with Multiple Instances

# docker-compose.yml for scaled deployment
version: "3.8"
services:
  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama

  app:
    build: .
    ports:
      - "8080-8082:8080"
    depends_on:
      - ollama
    environment:
      - SPRING_AI_OLLAMA_BASE_URL=http://ollama:11434
    deploy:
      replicas: 3

volumes:
  ollama_data:

Issue 3: Health Checks and Monitoring

// Custom health indicator
@Component
public class OllamaHealthIndicator implements HealthIndicator {

    @Autowired
    private ChatClient chatClient;

    @Override
    public Health health() {
        try {
            // Simple health check
            String response = chatClient.prompt("Hello").call().content();
            return Health.up()
                .withDetail("ollama", "Available")
                .withDetail("response-time", System.currentTimeMillis())
                .build();
        } catch (Exception e) {
            return Health.down()
                .withDetail("ollama", "Unavailable")
                .withDetail("error", e.getMessage())
                .build();
        }
    }
}

Testing Issues

Issue 1: Integration Test Failures

// Robust integration testing
@SpringBootTest
@TestPropertySource(properties = {
    "spring.ai.ollama.base-url=http://localhost:11434",
    "spring.ai.ollama.chat.options.model=llama3.2:1b"
})
class ChatServiceIntegrationTest {

    @Autowired
    private ChatService chatService;

    @Test
    @Timeout(30) // Prevent hanging tests
    void testChatResponse() {
        // Skip if Ollama not available
        assumeTrue(isOllamaAvailable());

        String response = chatService.chat("Hello");
        assertThat(response).isNotEmpty();
    }

    private boolean isOllamaAvailable() {
        try {
            RestTemplate rest = new RestTemplate();
            rest.getForEntity("http://localhost:11434/api/tags", String.class);
            return true;
        } catch (Exception e) {
            return false;
        }
    }
}

Issue 2: Mocking AI Services

// Mock Ollama for unit tests
@MockBean
private ChatClient chatClient;

@Test
void testChatServiceWithMock() {
    // Mock response
    ChatResponse mockResponse = new ChatResponse(
        List.of(new Generation("Mocked response"))
    );

    when(chatClient.prompt(anyString()).call())
        .thenReturn(mockResponse);

    String result = chatService.chat("Test message");
    assertEquals("Mocked response", result);
}

Advanced Configuration and Best Practices

Production-Ready Configuration

# application-prod.yml
spring:
  ai:
    ollama:
      base-url: ${OLLAMA_URL:http://localhost:11434}
      chat:
        options:
          model: ${AI_MODEL:llama3.2}
          temperature: 0.7
          max-tokens: 1024
          timeout: 60s

  # Database configuration for chat history
  datasource:
    url: jdbc:postgresql://localhost:5432/chatdb
    username: ${DB_USER}
    password: ${DB_PASSWORD}

  # Redis for caching
  redis:
    host: ${REDIS_HOST:localhost}
    port: 6379
    password: ${REDIS_PASSWORD:}

# Logging configuration
logging:
  level:
    org.springframework.ai: DEBUG
    com.your.package: INFO
  pattern:
    file: "%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n"
  file:
    name: logs/spring-ai-chat.log

# Actuator endpoints
management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus
  endpoint:
    health:
      show-details: always
  metrics:
    export:
      prometheus:
        enabled: true

Caching Strategy

The below Spring AI code defines a Spring service that caches chat responses in Redis for 30 minutes using @Cacheable, with the ability to manually evict the cache.

@Service
@EnableCaching
public class CachedChatService {

    @Autowired
    private ChatClient chatClient;

    @Cacheable(value = "chatCache", key = "#message", unless = "#result.length() < 10")
    public String getCachedResponse(String message) {
        return chatClient.prompt(message).call().content();
    }

    @CacheEvict(value = "chatCache", allEntries = true)
    public void clearCache() {
        // Manual cache clearing
    }
}

@Configuration
@EnableCaching
public class CacheConfig {

    @Bean
    public CacheManager cacheManager() {
        RedisCacheManager.Builder builder = RedisCacheManager
            .RedisCacheManagerBuilder
            .fromConnectionFactory(redisConnectionFactory())
            .cacheDefaults(cacheConfiguration());

        return builder.build();
    }

    private RedisCacheConfiguration cacheConfiguration() {
        return RedisCacheConfiguration.defaultCacheConfig()
            .entryTtl(Duration.ofMinutes(30))
            .disableCachingNullValues()
            .serializeKeysWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new StringRedisSerializer()))
            .serializeValuesWith(RedisSerializationContext.SerializationPair
                .fromSerializer(new GenericJackson2JsonRedisSerializer()));
    }
}

Rate Limiting

The below code implements a per-user rate limiter using Redis, allowing up to 100 requests per hour.

package com.sundrymind.springaitutorial.service;

import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.stereotype.Component;

import java.time.Duration;

@Component
public class RateLimitService {

    private final RedisTemplate<String, String> redisTemplate;
    private final int maxRequests = 100; // per hour

    public RateLimitService(RedisTemplate<String, String> redisTemplate) {
        this.redisTemplate = redisTemplate;
    }

    public boolean isAllowed(String userId) {
        String key = "rate_limit:" + userId;
        String currentCount = redisTemplate.opsForValue().get(key);

        if (currentCount == null) {
            redisTemplate.opsForValue().set(key, "1", Duration.ofHours(1));
            return true;
        }

        int count = Integer.parseInt(currentCount);
        if (count >= maxRequests) {
            return false;
        }

        redisTemplate.opsForValue().increment(key);
        return true;
    }
}

Comprehensive Monitoring

The below code defines a Spring component that tracks the number of chat requests and measures their response time using Micrometer metrics.

package com.sundrymind.springaitutorial.service;

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import org.springframework.stereotype.Component;

import java.util.function.Supplier;

@Component
public class ChatMetrics {

    private final MeterRegistry meterRegistry;
    private final Counter chatRequestCounter;
    private final Timer chatResponseTimer;

    public ChatMetrics(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
        this.chatRequestCounter = Counter.builder("chat.requests.total")
                .description("Total number of chat requests")
                .register(meterRegistry);
        this.chatResponseTimer = Timer.builder("chat.response.time")
                .description("Chat response time")
                .register(meterRegistry);
    }

    public String timedChatCall(String message, Supplier<String> chatCall) throws Exception {
        chatRequestCounter.increment();
        return chatResponseTimer.recordCallable(() -> chatCall.get());
    }
}

@Component — makes this class a Spring-managed bean.

MeterRegistry — from Micrometer, used to register metrics.

Counter and Timer — track:

  • total chat requests
  • time taken for chat responses

timedChatCall() — increments the counter and times the execution of a Supplier<String> chat call (like a call to OpenAI, Ollama, etc.).

Example Usage

@Autowired
private ChatMetrics chatMetrics;

public String handleChat(String userInput) {
return chatMetrics.timedChatCall(userInput, () -> openAiService.callLLM(userInput));
}

Get the complete code

Explore the full Maven project on GitHub on this Spring AI ollama integration:

[Spring AI + Ollama Integration Code]

Includes:
✅ Pre-configured application.yml
✅ Ready-to-run Spring Boot project
✅ Ollama model loading examples

Frequently Asked Questions

Can I use multiple Ollama models simultaneously in Spring AI?

Yes! Configure different AiClient instances in your application.properties or yml:
spring.ai.ollama.chat.model=llama2
spring.ai.ollama.embedding.model=mistral

How to secure Ollama endpoints with Spring AI?

If exposing Ollama via REST:
Use Spring Security to add API keys.
Bind Ollama to localhost (default) and avoid exposing ports.

Can I fine-tune an Ollama model for Spring AI?

 Yes! Fine-tune a model using Ollama’s Modelfile, then reference it in Spring AI:
ollama create my-model -f Modelfile
In application.properties or yml:
spring.ai.ollama.chat.model=my-model

Conclusion

This comprehensive guide has covered the complete setup, configuration, troubleshooting, and deployment of Spring AI Ollama local LLM.
This foundation gives you everything needed to integrate AI capabilities into existing Spring applications or build new AI-firstfeatures.
The combination of Spring AI’s familiar patterns with powerful language models opens up countless possibilities. Whether you’re building chatbots, content generators, or intelligent data processors, this setup provides a solid starting point.

Remember to experiment with different models and configurations to find what works best for your specific use case. The AI landscape evolves rapidly, but with Spring AI handling the integration complexity, you can focus on building great user experiences.

Key takeaways:

  1. Start Simple: Begin with basic setup and gradually add complexity
  2. Monitor Everything: Implement proper logging, metrics, and health checks
  3. Security First: Never expose API keys, validate inputs, and implement rate limiting
  4. Performance Matters: Use appropriate models, caching, and connection pooling
  5. Test Thoroughly: Include both unit and integration tests
  6. Plan for Scale: Design for horizontal scaling and load balancing

Remember that AI applications require careful consideration of resource usage, security implications, and user experience. Regular monitoring and optimization are essential for production success.

This Spring AI Ollama local LLM example showed how easy it is to deploy models offline and use that with Spring AI. Try it yourself and share your results in the comments!

Leave a comment