StatefulSet in Kubernetes

Introduction#

In this blog we are going to cover how to create StatefulSet which are very important when we are working with stateful applications in kubernetes. In Kubernetes, managing stateless applications is straightforward using Deployments, but when it comes to stateful applications like databases, message brokers, or distributed systems, we need a more advanced approach. StatefulSet is a Kubernetes feature designed to manage such workloads.

In this blog, we’ll explore what StatefulSet is, how it differs from Deployments, and how to use it.

Understanding Stateful and Stateless Applications#

Stateless Applications#

A stateless application does not retain any information between requests. Each request is handled independently, without relying on past interactions.

Examples of Stateless Applications:

Web servers (e.g., Nginx, Apache)
Microservices APIs (RestFul APIs)

These applications do not require persistent storage because they don’t need to maintain data across restarts.

Stateful Applications#

A stateful application, on the other hand, retains information, or "state," across sessions. This means that data from previous interactions is stored and can influence future requests. Stateful applications are often more complex because they rely on persistent storage to maintain their state.

Examples of Stateful Applications:

Databases (e.g., PostgreSQL, MySQL, MongoDB)
Message brokers (e.g., Kafka, RabbitMQ)
Distributed systems (e.g., Zookeeper, Elasticsearch)

For such applications, Kubernetes provides StatefulSet, ensuring that storage, network identities, and pod ordering remain consistent.

What is StatefulSet in Kubernetes?#

Kubernetes is best known for managing stateless services. A StatefulSet in kubernetes is a powerful concept designed to manage stateful applications, such as databases, messaging queues, and other applications that require stable network identities, unique persistent storage, and ordered deployment and scaling. Unlike traditional stateless applications, stateful applications maintain some form of internal state or data that needs to be preserved across pod restarts or rescheduling.

When Should You Use StatefulSet?#

Use StatefulSet when:

Your application requires stable network identities (e.g., databases).
You need ordered deployment and scaling (e.g., Kafka, Zookeeper).
Your app requires persistent storage per pod (e.g., PostgreSQL, MySQL).

Why Not Use a Deployment Instead of a StatefulSet?#

While Deployments work well for stateless applications, they are not ideal for stateful workloads like databases. This is because:

Pods in a Deployment do not retain a stable identity, whereas StatefulSets provide persistent network identities and storage per pod.
Deployments scale up/down unpredictably, but StatefulSets ensure that pods are created and deleted in a controlled order, which is crucial for applications that rely on consistent state, like databases or distributed systems.

If your application needs stable storage, ordered scaling, or persistent network identity, a StatefulSet is the right choice. Otherwise, a Deployment is sufficient for most stateless applications like web servers or APIs.

Feature	Deployment	StatefulSet
Pod Names	Randomly assigned	Fixed, sequential (e.g., `postgres-0`, `postgres-1`)
Network Identity	Ephemeral	Stable
Storage	Shared or ephemeral	Persistent, per-pod
Scaling	Pods scale in any order	Pods scale in a defined order
Use Case	Web apps, APIs	Databases, message brokers

Example: When to Use Deployment vs. StatefulSet#

Use Deployment (Stateless Application)#

Imagine you are deploying a web application that serves user requests. It doesn’t store data locally because it relies on an external database. Here, a Deployment is ideal because:

Pods can scale up or down in any order.
They do not require persistent storage.
Any pod can handle incoming requests since there is no unique identity requirement.

Example: A Node.js backend API that connects to an external database.

Use StatefulSet (Stateful Application)#

Now, suppose you need to deploy a PostgreSQL database. Each database instance needs:

1. A stable hostname to form a cluster or replication setup.#

In Kubernetes, every pod gets a unique name like pod-xyz-1234, which changes when the pod restarts or scales. This can be a problem for applications that need to communicate reliably with specific instances, such as database clusters or message brokers.

Why is this important?

In a database cluster (e.g., PostgreSQL with replication), the primary database must always know where its replicas are.
If the pod name keeps changing (like in a Deployment), the primary can’t keep track of its replicas.
StatefulSet ensures each pod has a predictable, fixed hostname, like postgres-0, postgres-1, etc.

Example:

A PostgreSQL replica set: The primary database (postgres-0) must know its replicas (postgres-1, postgres-2).
If we used a Deployment instead, pod names would change every time they restarted, breaking the replication setup.

2. Persistent storage to ensure data is not lost when a pod restarts.#

By default, when a pod is deleted in Kubernetes (due to restart, scaling down, or failure), its storage is also deleted. This is fine for stateless apps but a disaster for databases or applications that store data.

Why is this important?

If a database pod restarts and loses its storage, all stored data would be lost.
StatefulSet ensures that each pod gets persistent storage (Persistent Volume - PV), which remains even if the pod restarts.

Example:

Suppose you have a Postgres database running in Kubernetes.
If you deploy it with a Deployment, and the pod crashes or is rescheduled, the new pod will start with fresh storage, losing all previous data.
But with a StatefulSet, Kubernetes ensures that the same storage (Persistent Volume) is attached to the new pod.

StatefulSet in Kubernetes.png — StatefulSet in Kubernetes

3. A specific order of scaling (e.g., primary DB first, then replicas).#

Scaling in Stateful applications is different from stateless ones. In a Deployment, new pods can be created in any order, which is fine for web servers but problematic for databases.

Why is this important?

Many databases and message brokers have a leader-follower model, where one instance is the primary and others are replicas.
The primary must always start before the replicas to ensure a consistent state.
StatefulSet scales pods in a defined order, ensuring the primary starts first, then the replicas.

Example:

A Kafka cluster consists of multiple brokers.
If we scale up a Kafka cluster randomly (as in a Deployment), it can cause issues in leader election and partitioning.
StatefulSet ensures the brokers start in a structured way, keeping the cluster stable.

Deploying Nginx StatefulSet in Kubernetes#

Let’s deploy an Nginx StatefulSet with persistent storage.

Step-1 : Create a Headless Service#


apiVersion: v1
kind: Service
metadata:
  name: headless-service
spec:
  clusterIP: None  # This makes it a headless service
  selector:
    app: myapp
  ports:
    - port: 4000
      targetPort: 80

Why Headless Service?

Without clusterIP, each pod gets a unique DNS entry.
Example: mystatefulset-0.headless-service, mystatefulset-1.headless-service.

Step 2: Create the StatefulSet#


apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mystatefulset
spec:
  selector:
    matchLabels:
      app: myapp
  serviceName: headless-service  # Uses headless service for stable DNS
  replicas: 3  # Number of pods
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
        volumeMounts:
        - name: data
          mountPath: /data  # Persistent storage mount path
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: standard  # Uses default storage class
      resources:
        requests:
          storage: 1Gi  # 1GB Persistent Volume per pod

What Happens Here?

Three Nginx pods (mystatefulset-0, mystatefulset-1, mystatefulset-2) are created sequentially.
Each pod gets a stable hostname (mystatefulset-0.headless-service).
Each pod gets a separate 1GB persistent storage (data).

Step 3: Deploy the StatefulSet#

Apply the YAML files:


kubectl apply -f headless-service.yaml
kubectl apply -f mystatefulset.yaml

Verify the running pods:


kubectl get pods

Expected output:


NAME                 READY   STATUS    RESTARTS   AGE
mystatefulset-0      1/1     Running   0          10s
mystatefulset-1      1/1     Running   0          20s
mystatefulset-2      1/1     Running   0          30s

Step 4: Verify StatefulSet Behavior#

Check Pod Hostnames

Each pod gets a stable hostname:


kubectl exec -it mystatefulset-0 -- hostname
kubectl exec -it mystatefulset-1 -- hostname
kubectl exec -it mystatefulset-2 -- hostname

Expected output:


mystatefulset-0
mystatefulset-1
mystatefulset-2

Check Persistent Storage

Each pod has its own volume mounted at /data.


kubectl exec -it mystatefulset-0 -- ls /data

Step 5: Scale the StatefulSet#

Scale Up


kubectl scale statefulset mystatefulset --replicas=5

This will create mystatefulset-3 and mystatefulset-4 in order.

Scale Down


kubectl scale statefulset mystatefulset --replicas=2

This will remove mystatefulset-4, then mystatefulset-3, keeping mystatefulset-0 and mystatefulset-1.

Step 6: Verify Persistent Storage for Each Pod#

Each pod in the StatefulSet has its own Persistent Volume (PV).

Let's check it step by step.

List the Persistent Volume Claims (PVCs)

Run the following command to see the PVCs created for each pod:


kubectl get pvc

Expected output:


NAME            STATUS   VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS
data-mystatefulset-0   Bound    pvc-xxxxx   1Gi        RWO            standard
data-mystatefulset-1   Bound    pvc-yyyyy   1Gi        RWO            standard
data-mystatefulset-2   Bound    pvc-zzzzz   1Gi        RWO            standard

What this means:

Each pod (mystatefulset-0, mystatefulset-1, etc.) has a separate 1GB Persistent Volume (PV).
The PVC names are automatically generated based on the StatefulSet name (data-mystatefulset-0, data-mystatefulset-1).
Each PVC is bound to a unique PV.

Check Storage Inside a Pod

Now, let's log into a pod and check if its /data directory is persistent.


kubectl exec -it mystatefulset-0 -- /bin/sh

Inside the container, run


ls -l /data
echo "Hello from mystatefulset-0" > /data/testfile.txt
cat /data/testfile.txt
exit

Expected output:


Hello from mystatefulset-0

Restart the Pod & Verify Data Persistence

Now, let’s delete the pod and check if the data is still there


kubectl delete pod mystatefulset-0

Wait for it to restart, then log in again:


kubectl exec -it mystatefulset-0 -- /bin/sh
cat /data/testfile.txt

Expected output:


Hello from mystatefulset-0

Why is the data still there?

Because each pod gets a dedicated Persistent Volume (PV).
Even if the pod is deleted, its storage remains and is reattached when the pod restarts.

Step 7: Scale StatefulSet & Check New Storage#

Scale Up to 5 Pods


kubectl scale statefulset mystatefulset --replicas=5

Verify New PVCs


kubectl get pvc

Expected output:


NAME                     STATUS   VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS
data-mystatefulset-0     Bound    pvc-xxxxx     1Gi        RWO            standard
data-mystatefulset-1     Bound    pvc-yyyyy     1Gi        RWO            standard
data-mystatefulset-2     Bound    pvc-zzzzz     1Gi        RWO            standard
data-mystatefulset-3     Bound    pvc-aaaaa     1Gi        RWO            standard
data-mystatefulset-4     Bound    pvc-bbbbb     1Gi        RWO            standard

What this confirms:

New pods (mystatefulset-3, mystatefulset-4) also get their own 1GB Persistent Volume automatically

Step 8: Delete the StatefulSet#


kubectl delete statefulset mystatefulset

Important! The Persistent Volumes are not deleted automatically.

To delete the storage:


kubectl delete pvc -l app=myapp

Conclusion#

In this blog, we explored StatefulSets in Kubernetes and their significance for managing stateful applications like databases and distributed systems. We covered how they ensure stable network identities, persistent storage, and ordered scaling, making them essential for workloads that require data consistency. By following the provided example, you can deploy and manage StatefulSets effectively in Kubernetes environments.

Introduction To Spring Boot

Spring MVC

Spring Data JPA

Production Ready Features

Spring Security Basics

Spring Security Advance

Spring Boot Testing

Spring Boot Deployment with CI/CD

Spring Boot AOP

Spring Boot Caching and Concurrent Transaction management

Basic Microservice Architecture Concepts

Advanced Microservice Architecture Concepts

Spring Boot Kafka

Spring Boot Kubernetes

Spring Boot Task Scheduling

StatefulSet in Kubernetes

Introduction#

Understanding Stateful and Stateless Applications#

Stateless Applications#

Stateful Applications#

What is StatefulSet in Kubernetes?#

When Should You Use StatefulSet?#

Why Not Use a Deployment Instead of a StatefulSet?#

Example: When to Use Deployment vs. StatefulSet#

Use Deployment (Stateless Application)#

Use StatefulSet (Stateful Application)#

1. A stable hostname to form a cluster or replication setup.#

2. Persistent storage to ensure data is not lost when a pod restarts.#

3. A specific order of scaling (e.g., primary DB first, then replicas).#

Deploying Nginx StatefulSet in Kubernetes#

Step-1 : Create a Headless Service#

Step 2: Create the StatefulSet#

Step 3: Deploy the StatefulSet#

Step 4: Verify StatefulSet Behavior#

Step 5: Scale the StatefulSet#

Step 6: Verify Persistent Storage for Each Pod#

Step 7: Scale StatefulSet & Check New Storage#

Step 8: Delete the StatefulSet#

Conclusion#