Tomcat Server Threading Model
Introduce#
Apache Tomcat works on a multi-threaded model for handling concurrent client requests. A thread pool assigns incoming connection requests to worker threads under the supervision of an Acceptor Thread. Reusing a thread instead of creating a new one for every request translates into performance gains.
Tomcat has different threading modes:
- BIO (Blocking I/O): One thread per request (legacy model).
- NIO (Non-Blocking I/O): Event-driven fewer threads (default in modern Tomcat).
- NIO2 (Asynchronous I/O): Provides a complete asynchronous way of high scalability.
- APR (Apache Portable Runtime): Provides slightly better performance on Unix by using OS-level threading.
By configuring attributes like maxThreads
, minSpareThreads
, and acceptCount
in server.xml
, developers can optimize the threading model of Tomcat for high-performance applications.
Tomcat Threading Model#
Tomcat is a renowned open-source web application servlet container, and Spring Boot web applications default to the Tomcat server; it uses an efficient thread pool, referred to as TomcatExecutor, to process incoming HTTP requests. On the arrival of a request, it is processed by a worker thread, which takes that request from the pool and, after processing, returns it to the pool. This increases performance and decreases resource overhead while promoting resource utility.
Implementation:#
- Controller Class (Handles HTTP Requests)
- Service Class (Handles Business Logic with Asynchronous Calls)
CompletableFuture
is used for asynchronous, non-blocking programming in Java, improving performance by running tasks in parallel. It enhances efficiency by freeing up threads and supports chaining (thenApply
,thenCompose
) and exception handling (exceptionally
,handle
). Ideal for parallel execution of independent tasks, making multiple API/database calls, and handling long-running operations efficiently.
For further information about CompletableFuture
, you can check out this website https://www.codingshuttle.com/blogs/a-comprehensive-guide-to-java-completable-future/
- Data Class (Model for Student Information)
- Thread Configuration (Manages Thread Pool). Here I am using
TaskScheduler
and
ThreadPoolTaskExecutor
Tomcat Blocking Flow (Synchronous Processing)#
An example of a blocking model is when application logic is slow, for example, waiting for a database call or processing a long-running task.
Here, the assigned worker thread is blocked, and thus:#
- The thread will not accept or handle any new requests until the job in hand is complete.
- If a number of threads become blocked waiting for a continuation of work, the new requests can end up queued or outright rejected.
- It can cause performance bottlenecks and result in slower response times.
- Asynchronous processing (for example, Spring's @Async, reactive programming, or optimized database queries) can help keep threads free and improve scalability, thus avoiding situations like the above.

Implementation:#
We need to call threads dynamically for this we need a class.
Synchronous request processing (blocking).
Output:#
After hitting the api.


.get()
waits for each method to complete before moving to the next one.
- The methods execute sequentially, not in parallel.
- Total execution time = 2s + 2s + 2s = 6 seconds.
Tomcat Async (Non-blocking) Flow#
In an asynchronous (non-blocking) flow, with a request that includes an async operation (like @Async
or DeferredResult
), the worker thread delegates the task to a callback thread (using either an executor or thread pool).
This is how it works:#
- A worker thread first handles the request.
- Asynchronous operations are sent to a callback thread for execution.
- The Tomcat worker thread now becomes free to keep servicing other requests.
- When the async operation is complete, the response is forwarded to the client.
It is intended for scalability since a long-running task is not directly blocking a thread and therefore allows Tomcat to process several concurrent requests.

Implementation:#
We need to call threads dynamically for this we need a class.
Handles requests asynchronously.
Output:#
After hitting the api.


**CompletableFuture.allOf()
**Instead of blocking .get()
, we should:
- Start all async tasks simultaneously.
- Wait for all of them to complete before proceeding.
- Total Time Taken: ~2 seconds instead of ~6
Default Config for Tomcat#
Tomcat comes with the following default threading configurations:
- Maximum Threads:
200
(limits concurrent request handling). - Minimum Idle Threads:
10
(ensures a minimum number of ready worker threads). - Queue Size: Unbounded by default (can cause memory issues if too many requests pile up).
Customization in Spring Boot (application.properties
)#
server.tomcat.threads.max=200
server.tomcat.threads.min-spare=10
server.tomcat.accept-count=100 //When the queue exceeds this limit, new
If the queue exceeds accept-count
, new requests are rejected with HTTP 503 (Service Unavailable).
Spring Boot Thread Safety#
Spring Beans are by default thread-safe, as that means shared instances of Controllers, Services, and Repositories across multiple threads that tend to handle concurrent user requests.
Thread Safety Best Practices#
- Design components to be stateless: don't keep user-dependent data in common beans.
- User state should be maintained in parameters: no shared mutable fields.
It is safe for repositories to have static configurations such as database URL, username, and password because these values are not user-specific at all.
Conclusion#
About the multi-threading model associated with Tomcat Server, this article presents a general view of how the worker threads handle incoming requests via a thread pool. It highlights the difference between blocking and non-blocking processing, how async operations affect overall computational performance, and thread configuration settings for better performance optimization.