Monitoring
cAdvisor
This was the perfect service to capture metrics about my docker containers. One
thing that was kind of crazy to deal with was the resource usage of cadvisor,
so had to find some arguments to add to the container to limit the amount of
stats capturing that it does.
Gatus
Getting a bit frustrated with uptime-kuma's slow performance when dealing with
many endpoints I decide to look into alternatives that would be better optomized
for my needs. I found gatus that seemed to be promising, with a simple
configuration setup and actually using a database, I was pretty excited to get
it to work. Due to wanting to have a better declarative way of setting up
monitors gatus was a great solution and instead of using authentik passthrough
to check if applications were running instead I used direct checks to their
docker containers. This allowed for a simple setup without needing to worry
about the complicated setup I had with uptime-kuma.
Grafana
A service that I have included in my setup throughout all of the transitioning
between virtualization technologies, grafana has been the absolute best way to
centralize logging and metrics for my cluster/services. The difference in usage
between docker and kubernetes was the aggregator tools that were used to
power visualizations. With docker I used telegraf and influxdb to handle
log gathering and time series data to create a custom resource monitor and log
viewer. One thing was that the log viewing capabilites were terrible due to
the format in which influxdb would keep logs, but it was amazing for the
resource monitor. With kubernetes I moved to tools that are more well suited
for log gathering as resource monitoring was not much of a concern. Using
promtail as a log collector agent and loki as a log tagger and centralize
storage for logs really built out a simple to use dashboard to view all logs
generated by my containers. This allowed better oversight on any deployment,
pod, or daemonset.
Got rid of this since komodo could give me the same functionality at a high
level.
Update (4/17/26)
I actually added this back since komodo only gave a real time snapshot of
metrics for my container. Combined with cadvisor I could get a better
understanding of my container resource usage, especially since I was looking to
limit resource consumption for all of my services. I had way to many instances
of something using way more resources than I expect.
Healthchecks
Another monitoring service but specifically for cron jobs. Healthchecks
allowed for better tracking of ensure scheduled tasks continued to be executed
and notify me if the task has not checked in. All that is required for a task is
for when it starts that it registers with healthchecks, when it ends, and if
it fails. I currently use it for handling all container appdata backup jobs and
for alerting on the subtitle sync service that I created. What has been really
great is with a certain api key I can allow a task to auto create a new monitor,
e.g. creating a new service to my cluster and needs to be backed up, so that I
no longer need to manually create them. The only manual task that I do now is
just prettifying the name in healthchecks. In addition I can send logs related
to the completion or failure of a job to allow for quick debugging on what may
have happened.
Loki
A service built by the grafana team that is responsible for post processing
and storing log information from agents. Initially when I first configured this
I stored all logs locally and with that I had insanely poor performance when
trying to view them in grafana and it also crashed vscode a couple of times
when attempting to open the folder in the editor. Eventually I moved it to
vultr and stored it in object storage and it has been so fast when viewing log
information now. It took some to properly configure the tagging system but I
used pre-configured settings for kubernetes logs that made it really easy to
ingest.
Got rid of this since komodo could give me the same functionality at a high
level
Updated (4/17/26)
Added this back to keep some historical logging, especially since I have docker auto rotate logs now. I use it mostly to monitor my snapraid and smart script logs.
Promtail
Another service built by the grafana team that reads and pushes logs from
local nodes or servers to a service like loki. This is the only service that I
currently run as a daemonset as it would need to be deployed on every node to
scrape kubernetes logs. This was really easy to configure having copied a
configuration for kubernetes scraping.
Got rid of this since komodo could give me the same functionality at a high
level
Updated (4/17/26)
Added this back to keep some historical logging, especially since I have docker auto rotate logs now. I use it mostly to monitor my snapraid and smart script logs.
SpeedTest Tracker
I forgot when exactly, but there was a point in time where it seemed like the
wifi was acting up and speeds were terrible, so I decided to try and get a
better understanding of when it happens. In order to do this I wanted to run
speedtests every so often on the network, both in the VPN and the Router to see
how well the network is performing. Setting this up was really straighforward
and almost a set it and forget it setup. I just setup thresholds (which took
some time to figure out) to get a notice when internet speeds were really bad.
Setting this up to work in the vpn was tricky due to the network dependency on
the qbittorrent instance, but I was able to figure this out with recreator.
Uptime Kuma
Deprecated 5/29/25
Another very cool monitoring tool that I use, uptime-kuma continuously ping exposed services to ensure they are up and running. If anything does go down I am able to get notified through discord the application that is not running. I also use it to ensure that there is proper authentication middleware for some of the services I run in case something like authentik or authelia goes down. I can also create status pages that I can share to show users if there is an issue with any service. The only downside to this service is the need to manually add a new check for every new public service, and so it was not the best solution for me as it was something I forget pretty easily to do when setting up new services.
Uptime Robot
This isn't really a self hosted service, but it was a free solution for me to monitor my public services if they every went down. This was extremely useful in knowing if the server went down because all the monitoring I have for it, is hosted on the server itself. It automatically sends me a discord notification if any service goes down and it keeps track of uptime for my services as well.