Monitoring
Alloy
I found this service when looking to reduce the number of containers I had running
to handle all of my monitor needs. It had grown substantially to 7 services, and in
my search for a way to consolidate I found this. It's another grafana service that
basically is a drop in replacement for both cadvisor and promtail allowing me
to reduce those containers to just one. It's an all in one collector tool for all
different types of data.
cAdvisor
Deprecated 4/26/26
Ended up removing this since it got completely replaced by alloy.
This was the perfect service to capture metrics about my docker containers. One
thing that was kind of crazy to deal with was the resource usage of cadvisor,
so had to find some arguments to add to the container to limit the amount of
stats capturing that it does.
Apprise
Central notification relay used across the stack. Services POST to Apprise's
/notify endpoint and Apprise fans out to Discord (and any other configured
backends). The stack lives at compose/apprise/ on docker-net at
http://apprise:8000.
Services running in another container's network namespace (for example
speedtest-torrent-tracker in qbittorrent's VPN namespace) cannot resolve the
Docker service name apprise. Give Apprise a Hotio-style
hostname: apprise.internal and point those instances at
http://apprise.internal:8000/notify instead.
Gatus
Getting frustrated with uptime-kuma's slow performance when monitoring many
endpoints pushed me toward Gatus. Configuration lives in builds/gatus/config.yaml
as declarative YAML, state is stored in PostgreSQL, and alerts go to Discord.
The stack runs at compose/gatus/ on docker-net and is reachable at
gatus.dripdrop.pro.
Each service gets two checks:
- Protected — HTTPS through Traefik; expects
401and a body match for Authelia, confirming forward auth is still enforcing access. - Internal — direct HTTP to the container on
docker-net; expects200(or service-specific API status endpoints where applicable, e.g. Bazarr, Radarr, Sonarr, Prowlarr).
Adding a new monitored service means adding both endpoint blocks to the config. This replaced the older Uptime Kuma setup that relied on Authentik passthrough checks and manual UI configuration.
Grafana
A service that I have included in my setup throughout all of the transitioning
between virtualization technologies, grafana has been the absolute best way to
centralize logging and metrics for my cluster/services. The difference in usage
between docker and kubernetes was the aggregator tools that were used to
power visualizations. With docker I used telegraf and influxdb to handle
log gathering and time series data to create a custom resource monitor and log
viewer. One thing was that the log viewing capabilities were terrible due to
the format in which influxdb would keep logs, but it was amazing for the
resource monitor. With kubernetes I moved to tools that are more well suited
for log gathering as resource monitoring was not much of a concern. Using
promtail as a log collector agent and loki as a log tagger and centralize
storage for logs really built out a simple to use dashboard to view all logs
generated by my containers. This allowed better oversight on any deployment,
pod, or daemonset.
Got rid of this since komodo could give me the same functionality at a high
level.
Update (4/17/26)
I actually added this back since komodo only gave a real time snapshot of
metrics for my container. Combined with cadvisor I could get a better
understanding of my container resource usage, especially since I was looking to
limit resource consumption for all of my services. I had way to many instances
of something using way more resources than I expect.
Healthchecks
Another monitoring service but specifically for cron jobs. Healthchecks
allowed for better tracking of ensure scheduled tasks continued to be executed
and notify me if the task has not checked in. All that is required for a task is
for when it starts that it registers with healthchecks, when it ends, and if
it fails. I currently use it for handling all container appdata backup jobs and
for alerting on the subtitle sync service that I created. What has been really
great is with a certain api key I can allow a task to auto create a new monitor,
e.g. creating a new service to my cluster and needs to be backed up, so that I
no longer need to manually create them. The only manual task that I do now is
just prettifying the name in healthchecks. In addition I can send logs related
to the completion or failure of a job to allow for quick debugging on what may
have happened.
Loki
A service built by the grafana team that is responsible for post processing
and storing log information from agents. Initially when I first configured this
I stored all logs locally and with that I had insanely poor performance when
trying to view them in grafana and it also crashed vscode a couple of times
when attempting to open the folder in the editor. Eventually I moved it to
vultr and stored it in object storage and it has been so fast when viewing log
information now. It took some to properly configure the tagging system but I
used pre-configured settings for kubernetes logs that made it really easy to
ingest.
Got rid of this since komodo could give me the same functionality at a high
level
Updated (4/17/26)
Added this back to keep some historical logging, especially since I have docker auto rotate logs now. I use it mostly to monitor my snapraid and smart script logs.
Mimir
Since prometheus was lacking a proper way to store metrics in a remote storage
and relied on running another service like thanos I was looking for a way to
easily consolidate them. I came across another grafana service that was perfect
for the job and had feature parity for prometheus so I had to make literally no
changes to any of my grafana dashboards in order for it to work.
Prometheus
Deprecated 4/26/26
Dropped for in favor of mimir
This is a service primarily using for metrics monitoring that I've actually heard
of in so many places both at work and in self hosted services. It's essentially the
de facto service for metric capturing and storing. It does it's job pretty well but
one thing I didn't quite like was that on it's own it could not store data into
any remote storage unless it had the help of another service like thanos or mimir.
Promtail
Deprecated 4/26/26
Another service built by the grafana team that reads and pushes logs from
local nodes or servers to a service like loki. This is the only service that I
currently run as a daemonset as it would need to be deployed on every node to
scrape kubernetes logs. This was really easy to configure having copied a
configuration for kubernetes scraping.
Got rid of this since komodo could give me the same functionality at a high
level
Updated (4/17/26)
Added this back to keep some historical logging, especially since I have docker auto rotate logs now. I use it mostly to monitor my snapraid and smart script logs.
SpeedTest Tracker
I forgot when exactly, but there was a point in time where it seemed like the
wifi was acting up and speeds were terrible, so I decided to try and get a
better understanding of when it happens. In order to do this I wanted to run
speedtests every so often on the network, both in the VPN and the router to see
how well the network is performing. Setting this up was really straightforward
and almost a set it and forget it setup. I just setup thresholds (which took
some time to figure out) to get a notice when internet speeds were really bad.
Setting this up to work in the vpn was tricky due to the network dependency on
the qbittorrent instance, but I was able to figure this out with recreator.
For Apprise notifications, the base instance (speedtest-tracker on docker-net)
can reach Apprise at http://apprise:8000/notify. The torrent/VPN instance
(speedtest-torrent-tracker) runs in qbittorrent's network namespace, so Docker
service names like apprise do not resolve there. Give Apprise a Hotio-style
hostname: apprise.internal and configure the torrent instance with
http://apprise.internal:8000/notify instead (see Hotio qBittorrent WireGuard docs).
Uptime Kuma
Deprecated 5/29/25
Another very cool monitoring tool that I use, uptime-kuma continuously ping
exposed services to ensure they are up and running. If anything does go down I am
able to get notified through discord the application that is not running. I
also use it to ensure that there is proper authentication middleware for some
of the services I run in case something like authelia goes down.
I can also create status pages that I can share to show users if there is an
issue with any service. The only downside to this service is the need to manually
add a new check for every new public service, and so it was not the best solution
for me as it was something I forget pretty easily to do when setting up new services.
Uptime Robot
This isn't really a self hosted service, but it was a free solution for me to monitor my public services if they every went down. This was extremely useful in knowing if the server went down because all the monitoring I have for it, is hosted on the server itself. It automatically sends me a discord notification if any service goes down and it keeps track of uptime for my services as well.