Skip to main content
Version: Next

Monitoring

Alloy

I found this service when looking to reduce the number of containers I had running to handle all of my monitor needs. It had grown substantially to 7 services, and in my search for a way to consolidate I found this. It's another grafana service that basically is a drop in replacement for both cadvisor and promtail allowing me to reduce those containers to just one. It's an all in one collector tool for all different types of data.

cAdvisor

warning

Deprecated 4/26/26

Ended up removing this since it got completely replaced by alloy.

This was the perfect service to capture metrics about my docker containers. One thing that was kind of crazy to deal with was the resource usage of cadvisor, so had to find some arguments to add to the container to limit the amount of stats capturing that it does.

Apprise

Central notification relay used across the stack. Services POST to Apprise's /notify endpoint and Apprise fans out to Discord (and any other configured backends). The stack lives at compose/apprise/ on docker-net at http://apprise:8000.

Services running in another container's network namespace (for example speedtest-torrent-tracker in qbittorrent's VPN namespace) cannot resolve the Docker service name apprise. Give Apprise a Hotio-style hostname: apprise.internal and point those instances at http://apprise.internal:8000/notify instead.

Gatus

Getting frustrated with uptime-kuma's slow performance when monitoring many endpoints pushed me toward Gatus. Configuration lives in builds/gatus/config.yaml as declarative YAML, state is stored in PostgreSQL, and alerts go to Discord. The stack runs at compose/gatus/ on docker-net and is reachable at gatus.dripdrop.pro.

Each service gets two checks:

  • Protected — HTTPS through Traefik; expects 401 and a body match for Authelia, confirming forward auth is still enforcing access.
  • Internal — direct HTTP to the container on docker-net; expects 200 (or service-specific API status endpoints where applicable, e.g. Bazarr, Radarr, Sonarr, Prowlarr).

Adding a new monitored service means adding both endpoint blocks to the config. This replaced the older Uptime Kuma setup that relied on Authentik passthrough checks and manual UI configuration.

Grafana

A service that I have included in my setup throughout all of the transitioning between virtualization technologies, grafana has been the absolute best way to centralize logging and metrics for my cluster/services. The difference in usage between docker and kubernetes was the aggregator tools that were used to power visualizations. With docker I used telegraf and influxdb to handle log gathering and time series data to create a custom resource monitor and log viewer. One thing was that the log viewing capabilities were terrible due to the format in which influxdb would keep logs, but it was amazing for the resource monitor. With kubernetes I moved to tools that are more well suited for log gathering as resource monitoring was not much of a concern. Using promtail as a log collector agent and loki as a log tagger and centralize storage for logs really built out a simple to use dashboard to view all logs generated by my containers. This allowed better oversight on any deployment, pod, or daemonset.

Got rid of this since komodo could give me the same functionality at a high level.

Update (4/17/26)

I actually added this back since komodo only gave a real time snapshot of metrics for my container. Combined with cadvisor I could get a better understanding of my container resource usage, especially since I was looking to limit resource consumption for all of my services. I had way to many instances of something using way more resources than I expect.

Healthchecks

Another monitoring service but specifically for cron jobs. Healthchecks allowed for better tracking of ensure scheduled tasks continued to be executed and notify me if the task has not checked in. All that is required for a task is for when it starts that it registers with healthchecks, when it ends, and if it fails. I currently use it for handling all container appdata backup jobs and for alerting on the subtitle sync service that I created. What has been really great is with a certain api key I can allow a task to auto create a new monitor, e.g. creating a new service to my cluster and needs to be backed up, so that I no longer need to manually create them. The only manual task that I do now is just prettifying the name in healthchecks. In addition I can send logs related to the completion or failure of a job to allow for quick debugging on what may have happened.

Loki

A service built by the grafana team that is responsible for post processing and storing log information from agents. Initially when I first configured this I stored all logs locally and with that I had insanely poor performance when trying to view them in grafana and it also crashed vscode a couple of times when attempting to open the folder in the editor. Eventually I moved it to vultr and stored it in object storage and it has been so fast when viewing log information now. It took some to properly configure the tagging system but I used pre-configured settings for kubernetes logs that made it really easy to ingest.

Got rid of this since komodo could give me the same functionality at a high level

Updated (4/17/26)

Added this back to keep some historical logging, especially since I have docker auto rotate logs now. I use it mostly to monitor my snapraid and smart script logs.

Mimir

Since prometheus was lacking a proper way to store metrics in a remote storage and relied on running another service like thanos I was looking for a way to easily consolidate them. I came across another grafana service that was perfect for the job and had feature parity for prometheus so I had to make literally no changes to any of my grafana dashboards in order for it to work.

Prometheus

warning

Deprecated 4/26/26

Dropped for in favor of mimir

This is a service primarily using for metrics monitoring that I've actually heard of in so many places both at work and in self hosted services. It's essentially the de facto service for metric capturing and storing. It does it's job pretty well but one thing I didn't quite like was that on it's own it could not store data into any remote storage unless it had the help of another service like thanos or mimir.

Promtail

warning

Deprecated 4/26/26

Another service built by the grafana team that reads and pushes logs from local nodes or servers to a service like loki. This is the only service that I currently run as a daemonset as it would need to be deployed on every node to scrape kubernetes logs. This was really easy to configure having copied a configuration for kubernetes scraping.

Got rid of this since komodo could give me the same functionality at a high level

Updated (4/17/26)

Added this back to keep some historical logging, especially since I have docker auto rotate logs now. I use it mostly to monitor my snapraid and smart script logs.

SpeedTest Tracker

I forgot when exactly, but there was a point in time where it seemed like the wifi was acting up and speeds were terrible, so I decided to try and get a better understanding of when it happens. In order to do this I wanted to run speedtests every so often on the network, both in the VPN and the router to see how well the network is performing. Setting this up was really straightforward and almost a set it and forget it setup. I just setup thresholds (which took some time to figure out) to get a notice when internet speeds were really bad. Setting this up to work in the vpn was tricky due to the network dependency on the qbittorrent instance, but I was able to figure this out with recreator.

For Apprise notifications, the base instance (speedtest-tracker on docker-net) can reach Apprise at http://apprise:8000/notify. The torrent/VPN instance (speedtest-torrent-tracker) runs in qbittorrent's network namespace, so Docker service names like apprise do not resolve there. Give Apprise a Hotio-style hostname: apprise.internal and configure the torrent instance with http://apprise.internal:8000/notify instead (see Hotio qBittorrent WireGuard docs).

Uptime Kuma

warning

Deprecated 5/29/25

Another very cool monitoring tool that I use, uptime-kuma continuously ping exposed services to ensure they are up and running. If anything does go down I am able to get notified through discord the application that is not running. I also use it to ensure that there is proper authentication middleware for some of the services I run in case something like authelia goes down. I can also create status pages that I can share to show users if there is an issue with any service. The only downside to this service is the need to manually add a new check for every new public service, and so it was not the best solution for me as it was something I forget pretty easily to do when setting up new services.

Uptime Robot

This isn't really a self hosted service, but it was a free solution for me to monitor my public services if they every went down. This was extremely useful in knowing if the server went down because all the monitoring I have for it, is hosted on the server itself. It automatically sends me a discord notification if any service goes down and it keeps track of uptime for my services as well.