Aug 07 2020 Self Hosted Kubernetes Part 4 - Persistent Volumes

A comprehensive guide to Self Hosted Kubernetes Part 4 - Persistent Volumes

In your journey into distributed computing, you will find that you often need persistent data. Pods are disposable. You should design your deployments in such a way that a pod can die at any given time, without it affecting availability. One way we do this is persisting data across pods lifecycles. Kubernetes solution to this problem is called Persistent Volumes. They are sometimes refereed to as simply PV's. These are not to be confused with Persistent Volume Claims (PVC). A PVC is generally the resource that you would define in yaml for a deployment. It is the resource that connects a volume to a pod. When a PVC is created, it will create a corresponding PV.

In the Kubernetes ecosystem there are many providers and types of Persistent Volumes. You define the types of PVs that you cluster supports in a StorageClass resource. In that resource, you can also define whatever default values a volume of that type should have (like how many replicas, etc.).

There are many types of StorageClasses. Some, like AWS's Elastic Block Store (EBS), are specific to a cloud provider.

In many occasions you will also find a need to have shared data across many pods. This is when you have multiple pods / container using the same PVC. These are refereed to as multi-mounts. Some StorageClasses support this feature, while some do not. We will need a StorageClass that supports multi mounts later when we are setting up the traefik ingress.

Persistent Volumes

Most apps you will run will need persistent volumes. At the core of most all distributed systems is efficient sharing of data. One of the most simple ways to do this in Kubernetes is persistent volumes. These allow us to share filesystems between containers. One very important thing to note is that not all storage classes are the same. Storage classes refer to different types of persistent volumes. You may want one type of volume for one service, and a different for another.

The main difference that you will need to decide on a case by case basis is which access mode you need. For the most part, this is going to be a decision between if it will be ReadWriteOnce or ReadWriteMany. ReadWriteOnce means that the volume will only ever be able to be mounted to one container at a time. ReadWriteMany is where you can mount the volume on multiple containers at once.

I would recommend running longhorn and longhorn-nfs. Longhorn is maintained by Rancher and provides some of the creature comforts of persistent volume solutions on hosted platforms (like EBS). Longhorn provides a pertty web interface where you can manage volumes, setup backup and snapshot schedules, and perform maintenance when necessary. Longhorn only supports ReadWriteOnce, which can be very limiting depending on your application. For my needs, I use longhorn-nfs. With longhorn-nfs, it creates a longhorn volume that is used by a completely separate nfs storage class. This allows us to have a pretty easy to set up storage class that can do multi mounts. Since the underlying volume is just a longhorn volume, we can also setup snapshots, and backups just like any other longhorn volume.

Longhorn

Longhorn is a super powerful StorageClass that is totally inclusive to your cluster. What I mean by that is that Longhorn has all the features of a cloud storge class but it will exist fully within your cluster. This has some advantages, and some disadvantages. On the one hand Longhorn is perfect for getting a cloud like StorageClass when your self hosting. On the other hand, your data is still dependent on your clusters stability. If your cluster becomes unrecoverably corrupted or broken, you may not be able to recover the data you had in longhorn. This is why you should schedule external backups of all your critical data.

Longhorn does have a pretty nice UI where you can manage your PVs. You can schedule snapshots and backups of volumes.

TODO: fillin longhorn ui image

You can either install Longhorn via the Rancher UI, or just through helm. Installing via helm is significantly simpler. All you need to do is add the longhorn helm repo and install it with your options.

# Add and update repo
helm repo add longhorn https://charts.longhorn.io
helm repo update

# Install via helm 3
kubectl create namespace longhorn-system
helm install longhorn longhorn/longhorn --namespace longhorn-system

Once this is deployed, you can verify it created a StorageClass with:

kubectl get sc --all-namespaces