V9 Rate Limits

This page defines the default rate limits for the backend services. The limits are separated by context to offer granular control over limits within the application area. 
 What are rate limits? 
 Rate limiting is a mechanism used to protect an application from excessive traffic by controlling how many requests can be processed over a period of time. It helps prevent system overload, abuse, and performance degradation by limiting request rates, controlling traffic bursts, or restricting the number of concurrent operations. When a limit is reached, requests may either be placed in a queue and processed later or be rejected with an  HTTP 429 (Too Many Requests) response. Different rate limiting strategies are available depending on the scenario: Fixed Window limits requests within a fixed time period, Sliding Window provides smoother traffic control using a moving time window, Token Bucket allows short bursts while enforcing a sustainable average rate, and Concurrency limits the number of requests that can execute at the same time. All queued requests in this implementation are processed using an OldestFirst approach. 
 Rate Limiting Configuration 
 Global Behaviour 
 
 Rejected requests return HTTP 429 (Too Many Requests) . 
 Response trailer includes:
 
 error_detail: too many requests 
 
 
 Queue processing order for all policies:
 
 OldestFirst 
 
 
 
 
 Built-in Rate Limiter Policies 
 
 
 
 
 
 Policy Name 
 Type 
 Purpose 
 Configuration 
 
 
 
 
 fixed 
 Fixed Window 
 Allows a fixed number of requests within a time period. Counter resets at the end of each window. 
 PermitLimit=100 , Window=20s , QueueLimit=50 
 
 
 sliding 
 Sliding Window 
 Similar to Fixed Window but uses a moving time window for smoother traffic control. 
 PermitLimit=25 , Window=9s , SegmentsPerWindow=3 , QueueLimit=10 
 
 
 token 
 Token Bucket 
 Uses tokens that are consumed by requests and replenished over time. Allows short bursts of traffic. 
 TokenLimit=50 , TokensPerPeriod=1 , ReplenishmentPeriod=5s , AutoReplenishment=true , QueueLimit=10 
 
 
 concurrency 
 Concurrency 
 Limits the number of requests that can execute simultaneously. 
 PermitLimit=2 , QueueLimit=3 
 
 
 
 
 
 Limiter Type Comparison 
 
 
 
 Limiter Type 
 What It Limits 
 Example 
 
 
 
 
 Fixed Window 
 Number of requests in a fixed time period 
 Allow 100 requests every 20 seconds 
 
 
 Sliding Window 
 Number of requests in a continuously moving time period 
 Allow 25 requests within any rolling 9-second period 
 
 
 Token Bucket 
 Requests based on available tokens that refill over time 
 Allow bursts of requests but enforce a sustainable rate 
 
 
 Concurrency 
 Number of requests running simultaneously 
 Allow only 2 imports to execute at the same time 
 
 
 
 Token Bucket Business Policies 
 Configuration values are sourced from application settings, with the defaults shown below. 
 
 
 
 
 
 Policy Name 
 Configuration Section 
 Purpose 
 Token Limit 
 Tokens / Period 
 Replenishment Period 
 Queue Limit 
 
 
 
 
 api-policy 
 ApiRateLimitPolicy:* 
 General API request throttling. Allows small bursts while protecting the API from excessive traffic. 
 60 
 10 
 10s 
 10 
 
 
 import-policy 
 ImportRateLimitPolicy:* 
 Supports high-volume import operations while preventing imports from overwhelming the system. 
 1,000 
 200 
 10s 
 20 
 
 
 signify-signing-policy 
 SigningRateLimitPolicy:* 
 Designed for high-throughput document signing workloads. 
 4,000 
 1,000 
 5s 
 10 
 
 
 signify-email-policy 
 EmailRateLimitPolicy:* 
 Controls email sending throughput and protects downstream email providers. 
 6,000 
 1,000 
 10s 
 50 
 
 
 signify-sms-policy 
 SMSRateLimitPolicy:* 
 Limits SMS traffic to avoid overwhelming SMS gateways and third-party providers. 
 1,000 
 100 
 10s 
 10 
 
 
 
 
 
 What This Means in Practice 
 E.g. the api-policy  for API requests: 
 
 Handle a burst of up to 60 requests immediately . 
 Recover at a rate of 10 requests every 10 seconds (approximately 1 request per second on average). 
 Queue up to 10 additional requests while waiting for tokens to become available. 
 Reject further requests with HTTP 429 when both the bucket and queue are full. 
 
 This configuration is useful because it allows short traffic spikes while still protecting the API from sustained high request volumes. The same token bucket policy is enforced for the other policies defined above. 
 Queue Behaviour 
 
 
 
 Scenario 
 Result 
 
 
 
 
 Permit available 
 Request executes immediately 
 
 
 Permit unavailable, queue has space 
 Request waits in queue 
 
 
 Permit unavailable, queue full 
 Request rejected with HTTP 429 
 
 
 Multiple requests queued 
 Processed in OldestFirst order 
 
 
 
 Summary 
 All rate limiters: 
 
 Return HTTP 429 when requests are rejected. 
 Include the trailer error_detail: too many requests . 
 Process queued requests using OldestFirst ordering. 
 
 Business-specific policies use the Token Bucket algorithm and are configurable on application level.