Spring Boot HandBook

    StatefulSet in Kubernetes

    Introduction#

    In this blog we are going to cover how to create StatefulSet which are very important when we are working with stateful applications in kubernetes. In Kubernetes, managing stateless applications is straightforward using Deployments, but when it comes to stateful applications like databases, message brokers, or distributed systems, we need a more advanced approach. StatefulSet is a Kubernetes feature designed to manage such workloads.

    In this blog, we’ll explore what StatefulSet is, how it differs from Deployments, and how to use it.

    Understanding Stateful and Stateless Applications#

    Stateless Applications#

    A stateless application does not retain any information between requests. Each request is handled independently, without relying on past interactions.

    Examples of Stateless Applications:

    • Web servers (e.g., Nginx, Apache)
    • Microservices APIs (RestFul APIs)

    These applications do not require persistent storage because they don’t need to maintain data across restarts.

    Stateful Applications#

    A stateful application, on the other hand, retains information, or "state," across sessions. This means that data from previous interactions is stored and can influence future requests. Stateful applications are often more complex because they rely on persistent storage to maintain their state.

    Examples of Stateful Applications:

    • Databases (e.g., PostgreSQL, MySQL, MongoDB)
    • Message brokers (e.g., Kafka, RabbitMQ)
    • Distributed systems (e.g., Zookeeper, Elasticsearch)

    For such applications, Kubernetes provides StatefulSet, ensuring that storage, network identities, and pod ordering remain consistent.

    What is StatefulSet in Kubernetes?#

    Kubernetes is best known for managing stateless services. A StatefulSet in kubernetes is a powerful concept designed to manage stateful applications, such as databases, messaging queues, and other applications that require stable network identities, unique persistent storage, and ordered deployment and scaling. Unlike traditional stateless applications, stateful applications maintain some form of internal state or data that needs to be preserved across pod restarts or rescheduling.

    When Should You Use StatefulSet?#

    Use StatefulSet when:

    • Your application requires stable network identities (e.g., databases).
    • You need ordered deployment and scaling (e.g., Kafka, Zookeeper).
    • Your app requires persistent storage per pod (e.g., PostgreSQL, MySQL).

    Why Not Use a Deployment Instead of a StatefulSet?#

    While Deployments work well for stateless applications, they are not ideal for stateful workloads like databases. This is because:

    • Pods in a Deployment do not retain a stable identity, whereas StatefulSets provide persistent network identities and storage per pod.
    • Deployments scale up/down unpredictably, but StatefulSets ensure that pods are created and deleted in a controlled order, which is crucial for applications that rely on consistent state, like databases or distributed systems.

    If your application needs stable storage, ordered scaling, or persistent network identity, a StatefulSet is the right choice. Otherwise, a Deployment is sufficient for most stateless applications like web servers or APIs.

    FeatureDeploymentStatefulSet
    Pod NamesRandomly assignedFixed, sequential (e.g., postgres-0, postgres-1)
    Network IdentityEphemeralStable
    StorageShared or ephemeralPersistent, per-pod
    ScalingPods scale in any orderPods scale in a defined order
    Use CaseWeb apps, APIsDatabases, message brokers

    Example: When to Use Deployment vs. StatefulSet#

    Use Deployment (Stateless Application)#

    Imagine you are deploying a web application that serves user requests. It doesn’t store data locally because it relies on an external database. Here, a Deployment is ideal because:

    • Pods can scale up or down in any order.
    • They do not require persistent storage.
    • Any pod can handle incoming requests since there is no unique identity requirement.

    Example: A Node.js backend API that connects to an external database.

    Use StatefulSet (Stateful Application)#

    Now, suppose you need to deploy a PostgreSQL database. Each database instance needs:

    1. A stable hostname to form a cluster or replication setup.#

    In Kubernetes, every pod gets a unique name like pod-xyz-1234, which changes when the pod restarts or scales. This can be a problem for applications that need to communicate reliably with specific instances, such as database clusters or message brokers.

    Why is this important?

    • In a database cluster (e.g., PostgreSQL with replication), the primary database must always know where its replicas are.
    • If the pod name keeps changing (like in a Deployment), the primary can’t keep track of its replicas.
    • StatefulSet ensures each pod has a predictable, fixed hostname, like postgres-0, postgres-1, etc.

    Example:

    • A PostgreSQL replica set: The primary database (postgres-0) must know its replicas (postgres-1, postgres-2).
    • If we used a Deployment instead, pod names would change every time they restarted, breaking the replication setup.

    2. Persistent storage to ensure data is not lost when a pod restarts.#

    By default, when a pod is deleted in Kubernetes (due to restart, scaling down, or failure), its storage is also deleted. This is fine for stateless apps but a disaster for databases or applications that store data.

    Why is this important?

    • If a database pod restarts and loses its storage, all stored data would be lost.
    • StatefulSet ensures that each pod gets persistent storage (Persistent Volume - PV), which remains even if the pod restarts.

    Example:

    • Suppose you have a Postgres database running in Kubernetes.
    • If you deploy it with a Deployment, and the pod crashes or is rescheduled, the new pod will start with fresh storage, losing all previous data.
    • But with a StatefulSet, Kubernetes ensures that the same storage (Persistent Volume) is attached to the new pod.
    StatefulSet in Kubernetes

    3. A specific order of scaling (e.g., primary DB first, then replicas).#

    Scaling in Stateful applications is different from stateless ones. In a Deployment, new pods can be created in any order, which is fine for web servers but problematic for databases.

    Why is this important?

    • Many databases and message brokers have a leader-follower model, where one instance is the primary and others are replicas.
    • The primary must always start before the replicas to ensure a consistent state.
    • StatefulSet scales pods in a defined order, ensuring the primary starts first, then the replicas.

    Example:

    • A Kafka cluster consists of multiple brokers.
    • If we scale up a Kafka cluster randomly (as in a Deployment), it can cause issues in leader election and partitioning.
    • StatefulSet ensures the brokers start in a structured way, keeping the cluster stable.

    Deploying Nginx StatefulSet in Kubernetes#

    Let’s deploy an Nginx StatefulSet with persistent storage.

    Step-1 : Create a Headless Service#

    apiVersion: v1 kind: Service metadata: name: headless-service spec: clusterIP: None # This makes it a headless service selector: app: myapp ports: - port: 4000 targetPort: 80

    Why Headless Service?

    • Without clusterIP, each pod gets a unique DNS entry.
    • Example: mystatefulset-0.headless-service, mystatefulset-1.headless-service.

    Step 2: Create the StatefulSet#

    apiVersion: apps/v1 kind: StatefulSet metadata: name: mystatefulset spec: selector: matchLabels: app: myapp serviceName: headless-service # Uses headless service for stable DNS replicas: 3 # Number of pods template: metadata: labels: app: myapp spec: containers: - name: nginx image: nginx ports: - containerPort: 80 volumeMounts: - name: data mountPath: /data # Persistent storage mount path volumeClaimTemplates: - metadata: name: data spec: accessModes: ["ReadWriteOnce"] storageClassName: standard # Uses default storage class resources: requests: storage: 1Gi # 1GB Persistent Volume per pod

    What Happens Here?

    • Three Nginx pods (mystatefulset-0, mystatefulset-1, mystatefulset-2) are created sequentially.
    • Each pod gets a stable hostname (mystatefulset-0.headless-service).
    • Each pod gets a separate 1GB persistent storage (data).

    Step 3: Deploy the StatefulSet#

    Apply the YAML files:

    kubectl apply -f headless-service.yaml kubectl apply -f mystatefulset.yaml

    Verify the running pods:

    kubectl get pods

    Expected output:

    NAME READY STATUS RESTARTS AGE mystatefulset-0 1/1 Running 0 10s mystatefulset-1 1/1 Running 0 20s mystatefulset-2 1/1 Running 0 30s

    Step 4: Verify StatefulSet Behavior#

    Check Pod Hostnames

    Each pod gets a stable hostname:

    kubectl exec -it mystatefulset-0 -- hostname kubectl exec -it mystatefulset-1 -- hostname kubectl exec -it mystatefulset-2 -- hostname

    Expected output:

    mystatefulset-0 mystatefulset-1 mystatefulset-2

    Check Persistent Storage

    Each pod has its own volume mounted at /data.

    kubectl exec -it mystatefulset-0 -- ls /data

    Step 5: Scale the StatefulSet#

    Scale Up

    kubectl scale statefulset mystatefulset --replicas=5

    This will create mystatefulset-3 and mystatefulset-4 in order.

    Scale Down

    kubectl scale statefulset mystatefulset --replicas=2

    This will remove mystatefulset-4, then mystatefulset-3, keeping mystatefulset-0 and mystatefulset-1.

    Step 6: Verify Persistent Storage for Each Pod#

    Each pod in the StatefulSet has its own Persistent Volume (PV).

    Let's check it step by step.

    1. List the Persistent Volume Claims (PVCs)

    Run the following command to see the PVCs created for each pod:

    kubectl get pvc

    Expected output:

    NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS data-mystatefulset-0 Bound pvc-xxxxx 1Gi RWO standard data-mystatefulset-1 Bound pvc-yyyyy 1Gi RWO standard data-mystatefulset-2 Bound pvc-zzzzz 1Gi RWO standard

    What this means:

    • Each pod (mystatefulset-0, mystatefulset-1, etc.) has a separate 1GB Persistent Volume (PV).
    • The PVC names are automatically generated based on the StatefulSet name (data-mystatefulset-0, data-mystatefulset-1).
    • Each PVC is bound to a unique PV.
    1. Check Storage Inside a Pod

    Now, let's log into a pod and check if its /data directory is persistent.

    kubectl exec -it mystatefulset-0 -- /bin/sh

    Inside the container, run

    ls -l /data echo "Hello from mystatefulset-0" > /data/testfile.txt cat /data/testfile.txt exit

    Expected output:

    Hello from mystatefulset-0
    1. Restart the Pod & Verify Data Persistence

    Now, let’s delete the pod and check if the data is still there

    kubectl delete pod mystatefulset-0

    Wait for it to restart, then log in again:

    kubectl exec -it mystatefulset-0 -- /bin/sh cat /data/testfile.txt

    Expected output:

    Hello from mystatefulset-0

    Why is the data still there?

    • Because each pod gets a dedicated Persistent Volume (PV).
    • Even if the pod is deleted, its storage remains and is reattached when the pod restarts.

    Step 7: Scale StatefulSet & Check New Storage#

    Scale Up to 5 Pods

    kubectl scale statefulset mystatefulset --replicas=5

    Verify New PVCs

    kubectl get pvc

    Expected output:

    NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS data-mystatefulset-0 Bound pvc-xxxxx 1Gi RWO standard data-mystatefulset-1 Bound pvc-yyyyy 1Gi RWO standard data-mystatefulset-2 Bound pvc-zzzzz 1Gi RWO standard data-mystatefulset-3 Bound pvc-aaaaa 1Gi RWO standard data-mystatefulset-4 Bound pvc-bbbbb 1Gi RWO standard

    What this confirms:

    • New pods (mystatefulset-3, mystatefulset-4) also get their own 1GB Persistent Volume automatically

    Step 8: Delete the StatefulSet#

    kubectl delete statefulset mystatefulset

    Important! The Persistent Volumes are not deleted automatically.

    To delete the storage:

    kubectl delete pvc -l app=myapp

    Conclusion#

    In this blog, we explored StatefulSets in Kubernetes and their significance for managing stateful applications like databases and distributed systems. We covered how they ensure stable network identities, persistent storage, and ordered scaling, making them essential for workloads that require data consistency. By following the provided example, you can deploy and manage StatefulSets effectively in Kubernetes environments.

    Last updated on Feb 20, 2025