Introduction To Gunicorn Architecture Design

Gunicorn is a very common uWSGI server used when deploying Django applications. But it can be neglected due to its simple setup. In this post, we'll cover the basic design and how it works, also some best practices and recommendations when using it.

Gunicorn is based on the pre-fork worker model. It basically means a master process creates forks that handle each request. Pre-forking can be used when you have libraries that are NOT thread-safe. It also means that if an error happens, it will only affect the process by which they are processed and not the entire server. This means that there is a central master process that manages a set of worker processes. The master never knows anything about individual clients. All requests and responses are handled completely by worker processes.

Master process

The master process is a simple loop that manages a list of processes/workers. It manages the processes by listening for specific signals (e.g., TTIN and TTOU). These signals handle where the master process should increase or decrease the number of running workers, indicates if a child process has terminated, and restarts the failed workers.

Workers

A worker is a process responsible for handling incoming client requests. It operates independently and can handle one request at a time, depending on the type. It can also use different concurrency models, such as threads or processes.

The number of workers can be configured based on the expected traffic to the web application. Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second. The recommendation is to start with `(2 x $num_cores) + 1`, where $num_cores is the number of cores available on the system/processor.

Sync Workers

The most basic and default worker type is a synchronous worker class that handles a single request at a time. A single Worker class can handle several hundred requests per second, depending on the configuration. They are best suited for applications that do not have a high level of concurrency or where each request requires a significant amount of computation or I/O.

When a client sends a request to the Gunicorn server, the server distributes the request to an available sync worker, which then processes the request synchronously. The sync worker holds onto the client's connection until it has finished processing the request and returns a response.

They are not well-suited for handling high levels of concurrent requests, as they can only handle one request at a time. This is why it is important to keep your application fast; things like slow queries and file uploads can hold a single worker for more than it should, occasioning a slowdown of the application, as the concurrent requests will be pilled up.

Click here for more details about the other workers.

Conclusion

Gunicorn is a popular web server used for deploying Django applications due to its simple setup and basic design. The recommended number of workers can be configured using the simple formula `(2 x $num_cores) + 1` and considering the expected amount and type of requests. With that, we can expect that with only 3 workers, Gunicorn can handle hundreds of requests per second. However, things like slow queries, file uploads, long pooling, etc., can cause a single worker to hold onto a client's connection for longer than necessary, leading to a slowdown in the application. Therefore, it is important to keep the application fast and efficient to ensure it can handle concurrent requests without impacting the user experience.