StatefulSet in Kubernetes
Introduction#
In this blog we are going to cover how to create StatefulSet which are very important when we are working with stateful applications in kubernetes. In Kubernetes, managing stateless applications is straightforward using Deployments, but when it comes to stateful applications like databases, message brokers, or distributed systems, we need a more advanced approach. StatefulSet is a Kubernetes feature designed to manage such workloads.
In this blog, we’ll explore what StatefulSet is, how it differs from Deployments, and how to use it.
Understanding Stateful and Stateless Applications#
Stateless Applications#
A stateless application does not retain any information between requests. Each request is handled independently, without relying on past interactions.
Examples of Stateless Applications:
- Web servers (e.g., Nginx, Apache)
- Microservices APIs (RestFul APIs)
These applications do not require persistent storage because they don’t need to maintain data across restarts.
Stateful Applications#
A stateful application, on the other hand, retains information, or "state," across sessions. This means that data from previous interactions is stored and can influence future requests. Stateful applications are often more complex because they rely on persistent storage to maintain their state.
Examples of Stateful Applications:
- Databases (e.g., PostgreSQL, MySQL, MongoDB)
- Message brokers (e.g., Kafka, RabbitMQ)
- Distributed systems (e.g., Zookeeper, Elasticsearch)
For such applications, Kubernetes provides StatefulSet, ensuring that storage, network identities, and pod ordering remain consistent.
What is StatefulSet in Kubernetes?#
Kubernetes is best known for managing stateless services. A StatefulSet in kubernetes is a powerful concept designed to manage stateful applications, such as databases, messaging queues, and other applications that require stable network identities, unique persistent storage, and ordered deployment and scaling. Unlike traditional stateless applications, stateful applications maintain some form of internal state or data that needs to be preserved across pod restarts or rescheduling.
When Should You Use StatefulSet?#
Use StatefulSet when:
- Your application requires stable network identities (e.g., databases).
- You need ordered deployment and scaling (e.g., Kafka, Zookeeper).
- Your app requires persistent storage per pod (e.g., PostgreSQL, MySQL).
Why Not Use a Deployment Instead of a StatefulSet?#
While Deployments work well for stateless applications, they are not ideal for stateful workloads like databases. This is because:
- Pods in a Deployment do not retain a stable identity, whereas StatefulSets provide persistent network identities and storage per pod.
- Deployments scale up/down unpredictably, but StatefulSets ensure that pods are created and deleted in a controlled order, which is crucial for applications that rely on consistent state, like databases or distributed systems.
If your application needs stable storage, ordered scaling, or persistent network identity, a StatefulSet is the right choice. Otherwise, a Deployment is sufficient for most stateless applications like web servers or APIs.
Feature | Deployment | StatefulSet |
---|---|---|
Pod Names | Randomly assigned | Fixed, sequential (e.g., postgres-0 , postgres-1 ) |
Network Identity | Ephemeral | Stable |
Storage | Shared or ephemeral | Persistent, per-pod |
Scaling | Pods scale in any order | Pods scale in a defined order |
Use Case | Web apps, APIs | Databases, message brokers |
Example: When to Use Deployment vs. StatefulSet#
Use Deployment (Stateless Application)#
Imagine you are deploying a web application that serves user requests. It doesn’t store data locally because it relies on an external database. Here, a Deployment is ideal because:
- Pods can scale up or down in any order.
- They do not require persistent storage.
- Any pod can handle incoming requests since there is no unique identity requirement.
Example: A Node.js backend API that connects to an external database.
Use StatefulSet (Stateful Application)#
Now, suppose you need to deploy a PostgreSQL database. Each database instance needs:
1. A stable hostname to form a cluster or replication setup.#
In Kubernetes, every pod gets a unique name like pod-xyz-1234
, which changes when the pod restarts or scales. This can be a problem for applications that need to communicate reliably with specific instances, such as database clusters or message brokers.
Why is this important?
- In a database cluster (e.g., PostgreSQL with replication), the primary database must always know where its replicas are.
- If the pod name keeps changing (like in a Deployment), the primary can’t keep track of its replicas.
- StatefulSet ensures each pod has a predictable, fixed hostname, like
postgres-0
,postgres-1
, etc.
Example:
- A PostgreSQL replica set: The primary database (
postgres-0
) must know its replicas (postgres-1
,postgres-2
). - If we used a Deployment instead, pod names would change every time they restarted, breaking the replication setup.
2. Persistent storage to ensure data is not lost when a pod restarts.#
By default, when a pod is deleted in Kubernetes (due to restart, scaling down, or failure), its storage is also deleted. This is fine for stateless apps but a disaster for databases or applications that store data.
Why is this important?
- If a database pod restarts and loses its storage, all stored data would be lost.
- StatefulSet ensures that each pod gets persistent storage (Persistent Volume - PV), which remains even if the pod restarts.
Example:
- Suppose you have a Postgres database running in Kubernetes.
- If you deploy it with a Deployment, and the pod crashes or is rescheduled, the new pod will start with fresh storage, losing all previous data.
- But with a StatefulSet, Kubernetes ensures that the same storage (Persistent Volume) is attached to the new pod.
data:image/s3,"s3://crabby-images/c41a2/c41a22d857a9f209ffb4c3916ea034e74de76ee2" alt="StatefulSet in Kubernetes.png"
3. A specific order of scaling (e.g., primary DB first, then replicas).#
Scaling in Stateful applications is different from stateless ones. In a Deployment, new pods can be created in any order, which is fine for web servers but problematic for databases.
Why is this important?
- Many databases and message brokers have a leader-follower model, where one instance is the primary and others are replicas.
- The primary must always start before the replicas to ensure a consistent state.
- StatefulSet scales pods in a defined order, ensuring the primary starts first, then the replicas.
Example:
- A Kafka cluster consists of multiple brokers.
- If we scale up a Kafka cluster randomly (as in a Deployment), it can cause issues in leader election and partitioning.
- StatefulSet ensures the brokers start in a structured way, keeping the cluster stable.
Deploying Nginx StatefulSet in Kubernetes#
Let’s deploy an Nginx StatefulSet with persistent storage.
Step-1 : Create a Headless Service#
Why Headless Service?
- Without
clusterIP
, each pod gets a unique DNS entry. - Example:
mystatefulset-0.headless-service
,mystatefulset-1.headless-service
.
Step 2: Create the StatefulSet#
What Happens Here?
- Three Nginx pods (
mystatefulset-0
,mystatefulset-1
,mystatefulset-2
) are created sequentially. - Each pod gets a stable hostname (
mystatefulset-0.headless-service
). - Each pod gets a separate 1GB persistent storage (
data
).
Step 3: Deploy the StatefulSet#
Apply the YAML files:
Verify the running pods:
Expected output:
Step 4: Verify StatefulSet Behavior#
Check Pod Hostnames
Each pod gets a stable hostname:
Expected output:
Check Persistent Storage
Each pod has its own volume mounted at /data
.
Step 5: Scale the StatefulSet#
Scale Up
This will create mystatefulset-3 and mystatefulset-4 in order.
Scale Down
This will remove mystatefulset-4, then mystatefulset-3, keeping mystatefulset-0 and mystatefulset-1.
Step 6: Verify Persistent Storage for Each Pod#
Each pod in the StatefulSet has its own Persistent Volume (PV).
Let's check it step by step.
- List the Persistent Volume Claims (PVCs)
Run the following command to see the PVCs created for each pod:
Expected output:
What this means:
- Each pod (
mystatefulset-0
,mystatefulset-1
, etc.) has a separate 1GB Persistent Volume (PV). - The PVC names are automatically generated based on the StatefulSet name (
data-mystatefulset-0
,data-mystatefulset-1
). - Each PVC is bound to a unique PV.
- Check Storage Inside a Pod
Now, let's log into a pod and check if its /data
directory is persistent.
Inside the container, run
Expected output:
- Restart the Pod & Verify Data Persistence
Now, let’s delete the pod and check if the data is still there
Wait for it to restart, then log in again:
Expected output:
Why is the data still there?
- Because each pod gets a dedicated Persistent Volume (PV).
- Even if the pod is deleted, its storage remains and is reattached when the pod restarts.
Step 7: Scale StatefulSet & Check New Storage#
Scale Up to 5 Pods
Verify New PVCs
Expected output:
What this confirms:
- New pods (
mystatefulset-3
,mystatefulset-4
) also get their own 1GB Persistent Volume automatically
Step 8: Delete the StatefulSet#
Important! The Persistent Volumes are not deleted automatically.
To delete the storage:
Conclusion#
In this blog, we explored StatefulSets in Kubernetes and their significance for managing stateful applications like databases and distributed systems. We covered how they ensure stable network identities, persistent storage, and ordered scaling, making them essential for workloads that require data consistency. By following the provided example, you can deploy and manage StatefulSets effectively in Kubernetes environments.