V9 Rate Limits

This page defines the default rate limits for the backend services. The limits are separated by context to offer granular control over limits within the application area.

What are rate limits?

Rate limiting is a mechanism used to protect an application from excessive traffic by controlling how many requests can be processed over a period of time. It helps prevent system overload, abuse, and performance degradation by limiting request rates, controlling traffic bursts, or restricting the number of concurrent operations. When a limit is reached, requests may either be placed in a queue and processed later or be rejected with an HTTP 429 (Too Many Requests) response. Different rate limiting strategies are available depending on the scenario: Fixed Window limits requests within a fixed time period, Sliding Window provides smoother traffic control using a moving time window, Token Bucket allows short bursts while enforcing a sustainable average rate, and Concurrency limits the number of requests that can execute at the same time. All queued requests in this implementation are processed using an OldestFirst approach.

Rate Limiting Configuration

Global Behaviour

Rejected requests return HTTP 429 (Too Many Requests).
Response trailer includes:
- error_detail: too many requests
Queue processing order for all policies:
- OldestFirst

Built-in Rate Limiter Policies

Policy Name	Type	Purpose	Configuration
fixed	Fixed Window	Allows a fixed number of requests within a time period. Counter resets at the end of each window.	`PermitLimit=100`, `Window=20s`, `QueueLimit=50`
sliding	Sliding Window	Similar to Fixed Window but uses a moving time window for smoother traffic control.	`PermitLimit=25`, `Window=9s`, `SegmentsPerWindow=3`, `QueueLimit=10`
token	Token Bucket	Uses tokens that are consumed by requests and replenished over time. Allows short bursts of traffic.	`TokenLimit=50`, `TokensPerPeriod=1`, `ReplenishmentPeriod=5s`, `AutoReplenishment=true`, `QueueLimit=10`
concurrency	Concurrency	Limits the number of requests that can execute simultaneously.	`PermitLimit=2`, `QueueLimit=3`

Limiter Type Comparison

Limiter Type	What It Limits	Example
Fixed Window	Number of requests in a fixed time period	Allow 100 requests every 20 seconds
Sliding Window	Number of requests in a continuously moving time period	Allow 25 requests within any rolling 9-second period
Token Bucket	Requests based on available tokens that refill over time	Allow bursts of requests but enforce a sustainable rate
Concurrency	Number of requests running simultaneously	Allow only 2 imports to execute at the same time

Token Bucket Business Policies

Configuration values are sourced from application settings, with the defaults shown below.

Policy Name	Configuration Section	Purpose	Token Limit	Tokens / Period	Replenishment Period	Queue Limit
api-policy	`ApiRateLimitPolicy:*`	General API request throttling. Allows small bursts while protecting the API from excessive traffic.	60	10	10s	10
import-policy	`ImportRateLimitPolicy:*`	Supports high-volume import operations while preventing imports from overwhelming the system.	1,000	200	10s	20
signify-signing-policy	`SigningRateLimitPolicy:*`	Designed for high-throughput document signing workloads.	4,000	1,000	5s	10
signify-email-policy	`EmailRateLimitPolicy:*`	Controls email sending throughput and protects downstream email providers.	6,000	1,000	10s	50
signify-sms-policy	`SMSRateLimitPolicy:*`	Limits SMS traffic to avoid overwhelming SMS gateways and third-party providers.	1,000	100	10s	10

What This Means in Practice

E.g. the api-policy for API requests:

Handle a burst of up to 60 requests immediately.
Recover at a rate of 10 requests every 10 seconds (approximately 1 request per second on average).
Queue up to 10 additional requests while waiting for tokens to become available.
Reject further requests with HTTP 429 when both the bucket and queue are full.

This configuration is useful because it allows short traffic spikes while still protecting the API from sustained high request volumes. The same token bucket policy is enforced for the other policies defined above.

Queue Behaviour

Scenario	Result
Permit available	Request executes immediately
Permit unavailable, queue has space	Request waits in queue
Permit unavailable, queue full	Request rejected with HTTP 429
Multiple requests queued	Processed in `OldestFirst` order

Summary

All rate limiters:

Return HTTP 429 when requests are rejected.
Include the trailer error_detail: too many requests.
Process queued requests using OldestFirst ordering.

Business-specific policies use the Token Bucket algorithm and are configurable on application level.

The Elastic Stack

Implementing Elastic Search

Implementing Fluentd

Implementing Kibana

Implementing SeriLog