Kubernetes
A multi node cluster virtualization technology that emphasizes high availabilty
and performance. Took a long time to actually take the first step in starting to
learn how kubernetes works. It's for the most part similar to docker compose
in that they both use yaml in order to declare containers and services, but it
takes virtualization management to a next level. In constrast to docker,
kubernetes requires you to understand almost every aspect of what you
creating. The kubernetes distribution I started with was
k3s.
Why I left
This wasn't something I expected to do ever once I learned kubernetes but
there were issues that bothered me a lot when I was running it. I had to hack
around coredns due to issues with it's dns resolvers. Running a multi node
setup required constantly needing to check up on two computers rather than one.
Also local path provider broke constantly, leaving folders and not creating
folders sometimes if it it's runtime class was not synced correctly by argocd.
Ultimately the extra overhead and high availablity weren't needed anymore in my
mind, and going back to docker would save myself a lot of headache and
maintenance for the server. Learning it all was incredible and I am so happy I
took the time to learn all of it, I learned so much from the experience. Going
back to docker compose with all of this new found knowledge I was able to
build an equally self sufficient setup without all of the issues I ran into with
swarm and kubernetes.
Apps
Deployments
The most used type of resource in Kubernetes, defines a ReplicaSet, meaning
a pod (or service in docker world) that will replicated across nodes based
on selectors, that will be deployed onto the cluster. This was the first
resource I looked into as it was the closest thing I understood to be to
services in docker. There's various different parts to a deployment, but
essentially it's the definition of a Pod with a ReplicaSet. There is the
posibility of creating a single pod, but it loses the ability to be highly
available, as in there is no auto recovery in the case the pod goes down.
DaemonSets
Like deployments it's a high level abstraction over a Pod, but with a
different purpose than a deployment. The purpose of a DaemonSet is to be
replicated by every node defined in the selector. The expectation would be that
if the node selectors select N nodes, then there will be N replicas. I've
found really the only purpose to use this was to add log aggregation for all
deployments running on the cluster.
CronJobs
This was very cool to learn in kubernetes as it allowed me to get really
creative with running some automated tasks. This type of abstraction deploys a
pod to be executed at a specific time during the day and it would be auto
cleaned up after it's execution. In addition it allows for retries on the cron
job in the case of failures. One really cool automation was to create cron jobs
for each deployment that ran in the cluster to automatically create restic
backups for persistent volumes in the cluster.
Resources
Persistent Volumes
This took some time to wrap my head around as it's a move away from simple file
and folder mounts from docker. A persistent volume is a file or folder, that
is declared as a mountable volume for the cluster to be used. In order to define
a persistent volume you need to define the type of filesystem, which has so
many [different types](https://kubernetes.io/docs/concepts/storage/persistent-
volumes/) in kubernetes. I've only really used local, nfs, and hostPath
when creating a set or job. You can also define read/write policies to limit how
many sets can mount this volume and if it can be modified. You also must define
a node selector to indicate this volume can only be created on specific nodes.
Persistent volumes exist to allow the cluster to automatically clean up volumes
once they are no longer in use. This is defined by the [reclaim
policy](https://kubernetes.io/docs/concepts/storage/persistent-
volumes/#reclaiming).
To use nfs it requires installing a driver to the cluster.
Recommend not using very specific node selectors as it became an issues when migrating persistent volumes from one node to another. Node selectors are immutable and therefore not very easy to change once it's declared.
When defining a reclaim policy I have always used retain as I did not want to lose any data after the volume no longer has claims.
Persistent Volume Claims
Now here is where the confusion starts with persistent volumes. In order to
use a pv within a set or job, a persistent volume claim must be declared to
claim the volume. The claim's purpose is to ensure that the claim is requesting
resources that are less than or equal to what is provided by the persistent volume. This can be storage requests as well as read/write operations. Using
local and hostPath volumes there is no enforcement of storage size (this
might specifically be a k3s thing). Claims also is an easy way for the cluster
to manage when the persistent volume is ready to be handle by the retention
policy setting.
When using node selectors on persistent volumes, try not to be too specific, due to node selectors being immutable you can't migrate a PV from a specific node to another without needing to delete the PV completely. This was insanely annoying with argocd.
DNS Config
In majority of cases I don't think I actually needed to learn about DNS config
for Pods, but due to some weird network performance issues when first setting up
Kubernetes clusters, I need to tinker with this setting. This setting allows to
set the primary DNS servers that the Pod will use as well as overriding other
dns options. An option I played around with a lot is ndots which defines
that if a url appears with less dots than what is defined than an absolute query
will be triggered before iterating through the list of search domains. Reducing
it to one in Pods improved network performance as there are four different
search domains defined by default.
Config Maps
A resource that allows defining a static file(s) or environment variables that
can be mounted into a container. I originally only used it when needing to set
up a resuable set of environment variables. But overtime I took advantage of
it's ability to mount files into a pod. It made it really easy to avoid needing
to mount a persistent volume just to modify a single file and there was also
no need to ssh into the node that contained the persistent volume to make an
update.
One caveat around ConfigMaps volume mounts was that on data change it does not trigger a reload of the container. So that required manual intervention when making an update to the file.
Secrets
A resource that allows mounting sensitive information in a secure and encrypted
manner into pods. At first I never actually found a reason to use secrets as I
just directly used sensitive data in environment variables without much
concern. But as I became more comfortable with kubernetes I was able to get a
better understanding on how to better set this up to provide better security for
my kubernetes repo. An initial finding was that it was so tedious to get
everything setup initially to be used, but I eventually learned abount
Kubernetes Secret Operators that would auto-generate kubernetes secrets from a
secrets manager. Using Bitwarden's SM Operator I was able to automatically
create and define all secrets used in the cluster.
Role Based Access Control
A collection of resources that defines a Pod's ability to interact with the kubernetes cluster API. Only used in a couple of Pods in the cluster but its concept was pretty easy to understand.
Services
A resource that defines what ports are open for a particular pod to allow for
inter-cluster communication. By default, unlike docker, apps are not able to
communicate with one another, they can only communicate with containers within
the same pod. To allow for inter-cluster connections you need to define a
ClusterIP service that would open up ports to the rest of the cluster. These
ports are defined in the container definition and can be either a number or a
string represnting the name of the port. In order to bind a service to the
Pod, it's selectors must match the labels of the app. There's also
LoadBalancer services that allows for binding to the host port of the node.
Operators
When searching for how to appropriately hide sensistive information from
directly existing in manifest files I came across operators and secrets managers. Operators are essentially pods that have permissions to
autogenerate kubernetes resources based on a request in a manifest from a
deployed pod.
Security Context
I never saw the need to utilize this other than granting high level access for
the vpn or media pods that need access to network rules and accessing the
igpu on a node. But recently I looked more into avoiding running pods
without root access, and using security context is how you can enforce this. In
addition it would be good to ensure all files created in volumes are properly
using the correct permissions. I decided to follow a default setting on the
pod level to ensure that the user id is 1000 and group id is 100, but for
certain containers like postgres or any linuxserver/hotio it needs to be
run as root.
For postgres I wanted to look into how to avoid this as it forced all files to
be created as root as the other containers would fix the permissions based on
the PGID and PUID environment variables. In order to fix that I needed to
create an init container to chown the files created by postgres before it
executed and this allowed it to run as the specified user.
Noteworthy Pods
CoreDNS
A DNS service that was automatically deployed with k3s that handles all
lookups originating from pods. Originally I had no need to really learn about
this, but due to the network performance issues with pods I spent some time
understanding how the config files works. It runs through some logic where it
checks against the url being looked up against the cluster domain checks what
services match it. If there is no match the lookup is forwarded to the node it's
running on. Well that latter part I found was a major issue in the performance
investigation, removing the passthrough to the node and replacing it with google
DNS, I no longer found errors or timeouts in lookups. Currently I have an
application set that overrides coredns default config as there's no way to
make a change to it's config map and keep it persistent.
Local Path Provisioner
This service allows kubernetes to automatically create directories for
persistent volume claims on a node for which it is requested. This is called
dynamic provisioning. This comes built in it with k3s but for the longest time
I avoided using it due it's delete behavior where it would cleanup and delete
the directory on the node. Instead I manually created Persistent Volumes and
Persistent Volume Claims to override this. After getting frustrated with
needing to create new folders for any new deployments I decided to look into how
I can utilize it
Local Path Provisioner default configuration can be overrided similar to
coredns where you need to edit a config map that the deployment listens to. In
that config map is where you can define the default directory, or node path,
that will be used by the provisioner, and it's setup and teardown instructions
which are stored as bash scripts. I updated it to use the standard /kubernetes- appdata folder and updated the setup script to set the correct user and group
permissions. In addition I disabled the teardown script to not actually delete
the folder.
I then created a new storageclass that would automatically create the folder
based on an annotation key given to the pvc, and in using the new
storageclass with a pvc I was able to automatically setup all data needed
for new deployments. One caveat was that I needed to retain the old
configuration for daemonsets that need persistent volumes due to the
provisioner not being able to work with ReadWriteMany access modes.
K8s Declaratives
Manifests
Manifests are the default way of defining kubernetes resources to be deployed
into a cluster. The file format used is yaml and the structure must adhere to
the resource definition.
Commands:
- Apply manifest to cluster:
kubectl apply -f (folder|file)
- Remove resources in manifest from cluster:
kubectl delete -f (folder|file)
Kustomize
Using manifests I found myself duplicating configurations multiples times when
writing deployments and creating other resources and I attempted to find tooling
that could help reduce the duplication of configurations. Kustomize was an
interesting tool to use, as it allows pooling multiple templates and modifiers
in order to build the resources for an application. But I found that I was
spending so much time trying to get these modifiers to work and I was creating
even more configurations in order to get it to match the expected behavior.
Commands:
- Execute and apply to cluster:
kubectl apply -k (folder)
- Remove resources:
kubectl delete -k (folder)
Helm
The next tool I used in trying to deduplicate configuration files was helm and
it was actually very well suited for the job. It did exactly what I was trying
to accomplish with kustomize, but it also gave me the flexibility of naming
and defining how exactly I wanted the resource to be generated based on data
defined on a values file. In addition I could run lint on helm templates to
help validate that the configuration I wrote was in fact correct and won't have
unexpected paths. One thing I disliked about helm app was the fact it needed
to be installed from a repo and a chart configuration was needed to apply it in
a certain way. The command to generate a manifest with helm
Commands:
- Test helm templates:
helm lint
- Generate manifests:
helm template . --name-template (app_name) -f (values_file)
cdk8s
I found cdk8s when reading up on how helm was not a
great tool to auto generate manifests file from base customizations. There were
also frustrations expressed with using helm in that it has so many formatting
issues and a yaml language with the ability to build like that was not the best
way. Using cdk8s you are able to use code to define how manifests are
generated and it doing so allows for a much cleaner deduplication flow. Compared
to helm it was a much more enjoyable experience and allows for better
customization without need to write 'custom' settings everywhere.
It can also import resource definitions from what is installed and available in
the cluster and type checks it when synthing, so it comes with built in linting.
You can also define resources using helm so I was easily able to charts and
repos without needing to manually install them into the cluster and letting
cdk8s handle all of it when it's synthing and being applied.
Commands:
- Generate manifests:
cdk8s synth
When using renovate with cdk8s I needed to setup custom regex matchers to upgrade the image versions defined in python files. In addition I could specify post commands to run when executing renovate upgrades so it actually removed my need to manual intervene on PRs as it would update the code and generate the manifests after.