Rate limiting protects the APIs from overuse by limiting how often eachuser can call the API. After rate limiting is enabled, they are limited to a fixed number of requests per second.
Rate Limiting AlgorithmsThere are various algorithms for rate limiting, each with their ownbenefits and drawbacks. Let’s review each of them so you can pickthe best one for your needs.
#1. Leaky Bucket
Leaky bucket(related to tokenbucket)is an algorithm that provides a simple, intuitive approach to rate limiting via a queue which you can think of as a bucket holding therequests. When a request is registered, it is appended to the end ofthe queue. At a regular interval, the first item on the queue isprocessed. This is also known as a first in first out (FIFO)queue. If the queue is full, then additional requests are discarded(or leaked).
The advantage of this algorithm is that it smooths out bursts of requestsand processes them at an approximately average rate. It’s also easyto implement on a single server or load balancer, and is memoryefficient for each user given the limited queue size.
However, a burst of traffic can fill up the queue with old requests and starvemore recent requests from being processed. It also provides noguarantee that requests get processed in a fixed amount of time. Additionally, if you load balance servers for fault tolerance orincreased throughput, you must use a policy to coordinate and enforcethe limit between them. We will come back to challenges of distributed environments later.
#2 Fixed WindowIn a fixed window algorithm, a window size of n seconds (typically using human-friendly values,such as 60 or 3600 seconds) is used to track the rate. Each incomingrequest increments the counter for the window. If the counter exceedsa threshold, the request is discarded. The windows are typicallydefined by the floor of the current timestamp, so 12:00:03 with a 60second window length, would be in the 12:00:00 window.
The advantage of this algorithm is that it ensures more recent requestsgets processed without being starved by old requests. However, asingle burst of traffic that occurs near the boundary of a window canresult in twice the rate of requests being processed, because it willallow requests for both the current and next windows within a shorttime. Additionally, if many consumers wait for a reset window, forexample at the top of the hour, then they may stampede your API atthe same time.
#3 Sliding Log
Sliding Log rate limiting involves tracking a time stamped log for each consumer’srequest. These logs are usually stored in a hash set or table that issorted by time. Logs with timestamps beyond a threshold arediscarded. When a new request comes in, we calculate the sum of logsto determine the request rate. If the request would exceed thethreshold rate, then it is held.
The advantage of this algorithm is that it does not suffer from theboundary conditions of fixed windows. The rate limit will be enforcedprecisely. Also, because the sliding log is tracked for eachconsumer, you don’t have the stampede effect that challenges fixedwindows. However, it can be very expensive to store an unlimitednumber of logs for every request. It’s also expensive to computebecause each request requires calculating a summation over theconsumer’s prior requests, potentially across a cluster of servers.As a result, it does not scale well to handle large bursts of trafficor denial of service attacks.
#4 Sliding Window
This is a hybrid approach that combines the low processing cost of thefixed window algorithm, and the improved boundary conditions of thesliding log. Like the fixed window algorithm, we track a counter foreach fixed window. Next, we account for a weighted value of theprevious window’s request rate based on the current timestamp tosmooth out bursts of traffic. For example, if the current window is25% through, then we weight the previous window’s count by 75%. Therelatively small number of data points needed to track per key allowsus to scale and distribute across large clusters.
We recommend the sliding window approach because it gives the flexibility to scale rate limiting with goodperformance. The rate windows are an intuitive way she to presentrate limit data to API consumers. It also avoids the starvationproblem of leaky bucket, and the bursting problems of fixed windowimplementations.