Designing a Distributed Rate Limiter Service with Spring Boot and Redis

May 7, 2026

Repository for Reference: https://github.com/Jain1shh/RateLimiterService

Rate limiting becomes unavoidable once APIs start handling real traffic. Authentication systems need brute-force protection, public endpoints need abuse prevention, and expensive operations need request control. Instead of embedding throttling logic into every backend separately, this project treats rate limiting as an independent infrastructure service.


The Core Idea

Most backend applications eventually introduce request throttling somewhere inside controllers, middleware, or API gateways.

A typical implementation usually starts small.

if(requestCount > limit) {
    return 429;
}

Over time, this logic spreads across multiple services.

Auth Service       → own limiter logic
Payment Service    → own limiter logic
Analytics Service  → own limiter logic

Every service duplicates: expiration handling, request tracking, retry logic, logging infrastructure

The goal of this project was to separate throttling completely from business applications and expose it as a reusable service.

Client
   │
   ▼
Backend Application
   │
   ▼
RateLimiter Service
   │
 Allow / Deny

Any backend can call the service before processing requests.

This keeps application logic clean while centralizing request control into a single system.


Architecture Overview

The service follows a fairly simple distributed architecture.

Client Request
      │
      ▼
┌─────────────────────────┐
│   Backend Application   │
└────────────┬────────────┘
             │
             ▼
┌─────────────────────────┐
│   RateLimiter Service   │
└────────────┬────────────┘
             │
     ┌───────┴────────┐
     ▼                ▼
┌──────────────┐   ┌──────────────┐
│    Redis     │   │    MySQL     │
│ (Counters)   │   │ (Audit Logs) │
└──────────────┘   └──────────────┘

The request flow looks like this:

1. Client hits backend API
2. Backend calls RateLimiter Service
3. Redis counter is checked
4. Service responds allowed / denied
5. Request gets logged asynchronously

The backend service itself never deals with: - distributed counters - expiration windows - concurrency handling - request tracking

All of that becomes infrastructure responsibility instead of application responsibility.


Why Redis Fits Perfectly Here

Request throttling fundamentally depends on counters.

Those counters need to be: - extremely fast - concurrency safe - temporary

Redis solves all three very efficiently.

The core operation behind the limiter is essentially:

INCREMENT counter_key

Redis guarantees atomicity for this operation.

Even under heavy concurrent traffic: - no explicit locks are required - race conditions are avoided - counters remain consistent

Another important feature is TTL support.

192.168.1.1:/login → expires in 60 seconds

Redis automatically removes expired keys after the request window finishes.

That removes the need for cleanup schedulers or cron jobs.

Since Redis operates in memory, the latency overhead remains extremely small.


Request Windows and Counter Isolation

One of the more important design choices is how requests are grouped.

The system identifies counters using a clientKey.

Example:

192.168.1.1:/login

This key becomes the Redis counter key itself.

Different key construction strategies completely change the behavior of the limiter.

Strategy Example Effect
IP only 192.168.1.1 Shared limit across all routes
IP + Route 192.168.1.1:/login Independent route limits
API Key + Route apikey_abc:/payments Per-client API throttling
User ID + Route user_123:/upload Logged-in user throttling

Note: Replace / in the URI with - before generating the clientKey.

Example:

/api/users → -api-users and /auth/login → -auth-login

This prevents conflicts and keeps Redis keys clean and consistent.

The recommended approach is:

IP + Route

because every endpoint receives an isolated request window.

Example:

192.168.1.1:/login
192.168.1.1:/analytics
192.168.1.1:/shorten

A spike on /analytics no longer affects /login.

This becomes especially useful in systems where different APIs require different levels of protection.


Per-Route Rate Limits

Not every endpoint should behave the same way.

Authentication routes typically need aggressive throttling.

Public read endpoints can tolerate larger request volumes.

Examples:

POST /login
maxReq = 3
resetInSeconds = 300
POST /shorten
maxReq = 5
resetInSeconds = 60
GET /analytics
maxReq = 50
resetInSeconds = 60

The service exposes these limits dynamically through query parameters.

POST /api/rate-limit/check

Example:

POST /api/rate-limit/check?clientKey=user:/login&maxReq=5&resetInSeconds=60

This keeps the limiter flexible without embedding hardcoded business rules inside the service itself.


API Design

The API surface remains intentionally small.

Check Rate Limit

POST /api/rate-limit/check

Parameters:

Parameter Purpose
clientKey Unique request identifier
maxReq Maximum requests allowed
resetInSeconds Window duration

Allowed response:

{
  "allowed": true,
  "remainingRequests": 4,
  "resetInSeconds": 60
}

Rejected response:

{
  "allowed": false,
  "remainingRequests": 0,
  "resetInSeconds": 60
}

The backend application only cares about: - whether the request is allowed - how many requests remain - when the counter resets

The internal counter implementation remains hidden behind the service boundary.


Asynchronous Audit Logging

Request counters are temporary.

Audit logs are not.

The service stores request history inside MySQL for: - debugging - analytics - monitoring - abuse investigation

Example log:

{
  "clientKey": "192.168.1.1:/login",
  "allowed": false,
  "remainingReq": 0,
  "timestamp": "2026-05-07T11:21:51"
}

The logs are written asynchronously using Spring's @Async.

This is important because synchronous database writes would increase request latency unnecessarily.

The service responds immediately while logs are persisted separately in the background.


Documentation Dashboard

The service also exposes its own documentation UI.

GET /api/rate-limit

The dashboard includes: - endpoint documentation - request examples - response examples - integration examples - testing commands - architecture overview

The UI is rendered using Thymeleaf templates inside Spring Boot.

Instead of relying entirely on external documentation, the service becomes self-documenting.


Containerized Deployment

The entire stack runs through Docker Compose.

docker compose up --build

This starts: - Spring Boot application - Redis - MySQL

Containerization removes dependency setup problems and keeps environments reproducible.

A multi-stage Docker build is used to reduce final image size by separating the Maven build stage from the runtime stage.


Health Monitoring

The service exposes:

GET /actuator/health

This endpoint becomes useful for: - Docker health checks - Kubernetes readiness probes - uptime monitoring - orchestration systems

Example:

{
  "status": "UP"
}


Closing Thoughts

Rate limiting initially appears to be a small backend feature.

In distributed systems, it becomes a much deeper infrastructure concern involving: - atomic counters - distributed consistency - concurrency handling - expiration windows - request isolation - scalable deployment

Treating throttling as a dedicated microservice creates cleaner application boundaries and makes the limiter reusable across multiple backend systems.

Instead of scattering request control logic everywhere, the entire responsibility becomes centralized behind a lightweight HTTP service.


Tech Stack

Layer Technology
Backend Spring Boot
Request Counting Redis
Audit Logs MySQL
ORM Spring Data JPA
Async Processing Spring @Async
Containerization Docker