Repository for Reference: https://github.com/Jain1shh/RateLimiterService
Rate limiting becomes unavoidable once APIs start handling real traffic. Authentication systems need brute-force protection, public endpoints need abuse prevention, and expensive operations need request control. Instead of embedding throttling logic into every backend separately, this project treats rate limiting as an independent infrastructure service.
The Core Idea
Most backend applications eventually introduce request throttling somewhere inside controllers, middleware, or API gateways.
A typical implementation usually starts small.
if(requestCount > limit) {
return 429;
}
Over time, this logic spreads across multiple services.
Auth Service → own limiter logic
Payment Service → own limiter logic
Analytics Service → own limiter logic
Every service duplicates: expiration handling, request tracking, retry logic, logging infrastructure
The goal of this project was to separate throttling completely from business applications and expose it as a reusable service.
Client
│
▼
Backend Application
│
▼
RateLimiter Service
│
Allow / Deny
Any backend can call the service before processing requests.
This keeps application logic clean while centralizing request control into a single system.
Architecture Overview
The service follows a fairly simple distributed architecture.
Client Request
│
▼
┌─────────────────────────┐
│ Backend Application │
└────────────┬────────────┘
│
▼
┌─────────────────────────┐
│ RateLimiter Service │
└────────────┬────────────┘
│
┌───────┴────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Redis │ │ MySQL │
│ (Counters) │ │ (Audit Logs) │
└──────────────┘ └──────────────┘
The request flow looks like this:
1. Client hits backend API
2. Backend calls RateLimiter Service
3. Redis counter is checked
4. Service responds allowed / denied
5. Request gets logged asynchronously
The backend service itself never deals with: - distributed counters - expiration windows - concurrency handling - request tracking
All of that becomes infrastructure responsibility instead of application responsibility.
Why Redis Fits Perfectly Here
Request throttling fundamentally depends on counters.
Those counters need to be: - extremely fast - concurrency safe - temporary
Redis solves all three very efficiently.
The core operation behind the limiter is essentially:
INCREMENT counter_key
Redis guarantees atomicity for this operation.
Even under heavy concurrent traffic: - no explicit locks are required - race conditions are avoided - counters remain consistent
Another important feature is TTL support.
192.168.1.1:/login → expires in 60 seconds
Redis automatically removes expired keys after the request window finishes.
That removes the need for cleanup schedulers or cron jobs.
Since Redis operates in memory, the latency overhead remains extremely small.
Request Windows and Counter Isolation
One of the more important design choices is how requests are grouped.
The system identifies counters using a clientKey.
Example:
192.168.1.1:/login
This key becomes the Redis counter key itself.
Different key construction strategies completely change the behavior of the limiter.
| Strategy | Example | Effect |
|---|---|---|
| IP only | 192.168.1.1 |
Shared limit across all routes |
| IP + Route | 192.168.1.1:/login |
Independent route limits |
| API Key + Route | apikey_abc:/payments |
Per-client API throttling |
| User ID + Route | user_123:/upload |
Logged-in user throttling |
Note: Replace / in the URI with - before generating the clientKey.
Example:/api/users → -api-users and /auth/login → -auth-login
This prevents conflicts and keeps Redis keys clean and consistent.
The recommended approach is:
IP + Route
because every endpoint receives an isolated request window.
Example:
192.168.1.1:/login
192.168.1.1:/analytics
192.168.1.1:/shorten
A spike on /analytics no longer affects /login.
This becomes especially useful in systems where different APIs require different levels of protection.
Per-Route Rate Limits
Not every endpoint should behave the same way.
Authentication routes typically need aggressive throttling.
Public read endpoints can tolerate larger request volumes.
Examples:
POST /login
maxReq = 3
resetInSeconds = 300
POST /shorten
maxReq = 5
resetInSeconds = 60
GET /analytics
maxReq = 50
resetInSeconds = 60
The service exposes these limits dynamically through query parameters.
POST /api/rate-limit/check
Example:
POST /api/rate-limit/check?clientKey=user:/login&maxReq=5&resetInSeconds=60
This keeps the limiter flexible without embedding hardcoded business rules inside the service itself.
API Design
The API surface remains intentionally small.
Check Rate Limit
POST /api/rate-limit/check
Parameters:
| Parameter | Purpose |
|---|---|
clientKey |
Unique request identifier |
maxReq |
Maximum requests allowed |
resetInSeconds |
Window duration |
Allowed response:
{
"allowed": true,
"remainingRequests": 4,
"resetInSeconds": 60
}
Rejected response:
{
"allowed": false,
"remainingRequests": 0,
"resetInSeconds": 60
}
The backend application only cares about: - whether the request is allowed - how many requests remain - when the counter resets
The internal counter implementation remains hidden behind the service boundary.
Asynchronous Audit Logging
Request counters are temporary.
Audit logs are not.
The service stores request history inside MySQL for: - debugging - analytics - monitoring - abuse investigation
Example log:
{
"clientKey": "192.168.1.1:/login",
"allowed": false,
"remainingReq": 0,
"timestamp": "2026-05-07T11:21:51"
}
The logs are written asynchronously using Spring's @Async.
This is important because synchronous database writes would increase request latency unnecessarily.
The service responds immediately while logs are persisted separately in the background.
Documentation Dashboard
The service also exposes its own documentation UI.
GET /api/rate-limit
The dashboard includes: - endpoint documentation - request examples - response examples - integration examples - testing commands - architecture overview
The UI is rendered using Thymeleaf templates inside Spring Boot.
Instead of relying entirely on external documentation, the service becomes self-documenting.
Containerized Deployment
The entire stack runs through Docker Compose.
docker compose up --build
This starts: - Spring Boot application - Redis - MySQL
Containerization removes dependency setup problems and keeps environments reproducible.
A multi-stage Docker build is used to reduce final image size by separating the Maven build stage from the runtime stage.
Health Monitoring
The service exposes:
GET /actuator/health
This endpoint becomes useful for: - Docker health checks - Kubernetes readiness probes - uptime monitoring - orchestration systems
Example:
{
"status": "UP"
}
Closing Thoughts
Rate limiting initially appears to be a small backend feature.
In distributed systems, it becomes a much deeper infrastructure concern involving: - atomic counters - distributed consistency - concurrency handling - expiration windows - request isolation - scalable deployment
Treating throttling as a dedicated microservice creates cleaner application boundaries and makes the limiter reusable across multiple backend systems.
Instead of scattering request control logic everywhere, the entire responsibility becomes centralized behind a lightweight HTTP service.
Tech Stack
| Layer | Technology |
|---|---|
| Backend | Spring Boot |
| Request Counting | Redis |
| Audit Logs | MySQL |
| ORM | Spring Data JPA |
| Async Processing | Spring @Async |
| Containerization | Docker |