I’ve been setting up and testing prometheus and grafana for about a week now, since that seems to be the universally accepted solution for self-hosted monitoring. But I’m starting to question why it is so accepted. On top of prometheus not seeming useful on it’s own (needing grafana to visualize and alertmanager for alerts) it feels like with each thing i want to monitor I have to spin up another docker container to export/gather the data. There are other options like LibreNMS that seems to have all that built into one container. So what does this Prometheus/Grafana stack have that other monitoring services don’t? Is it really worth having to set up each of these specialized exporters and dashboards? Or am I mistaken that it’s the main solution everyone uses? Are you using something different for monitoring?
one of the reasons you see these as separate is because of the amount of modularity you get with grafana, you don’t always use it with prometheus, sometimes with ELK sometimes Influx and Telegraf. If you intend to set it up outside of the “typical” you start to really appreciate one piece doing one thing, and doing it well.
prometheus and grafana … seems to be the universally accepted solution for self-hosted monitoring
Not exactly. There are many ways to do this. Most of us just use this solution because its easily scalable, highly documented and what we are probably already doing currently at work.
all built into one container
It’s nice to separate data sources from the dashboards and alerting platforms. It’s scalable and extremely light weight and gives you more options.
On top of prometheus not seeming useful on its own …
Yeah, that’s just not always true. Maybe for you, in your use case.
Installing a Prometheus node exporter gives you an easily accessible end point with JSON data that can be used however you like. Modularity is a good thing. Being able to swap parts in and out with other parts is a good thing.
If you haven’t figured it out yet, there is not an exact correct answer here, use what fits your needs. While I have a dash board setup in grafana, it’s not my main use case. Since the data is available from all the node-exporters on all my hardware, I wrote up my own alerting scripts and automations using python.
That’s the beauty of modularity and standards when self hosting.
Grafana and Prometheus are great if you have numeric things you want to monitor. CPU usage, RAM, disks, throughput, etc. You can then do lots of things with these numbers, mainly compare them to your other systems or alert when they go out of bounds.
However, I very much prefer Zabbix for my home network monitoring as this is not so fixated on numbers but can easily work with e.g. error messages in logfiles and alert on those. Or I can regularly check a website for new firmware versions and alert once the latest version changes. There are also lots of ready-to-use templates available from their Community Hub.
standardization is amazing, one data source, and one graphing engine, now you can overlay different metrics from different systems and have very customized dashboards.
I use InfluxDB plus Graphana
I use Prometheus and grafana at home because i use it at work, so I’m familiar with it
Separate components that do one thing and only that thing and does it well are good. Extra containers are basically free.
- The exporters provide the metrics. They can be standalone executables like the node exporter, can also be included in apps themselves easily since it’s just HTTP. It’s trivial to add metrics to just about anything without needing extra ports. Its protocol is also easier and more efficient than SNMP.
- Prometheus scrapes those metrics and stores it into its database. In other apps that’d be the role things like PostgreSQL have: you don’t really use it directly, but it’s no less important.
- Grafana is the frontend you slap in front of Prometheus to actually display your metrics.
- Alertmanager looks at the metrics and sends alerts. It’s separate because if your Prometheus box goes down, how are you gonna be alerted of that?
All 4 of those can be swapped with something else equivalent and it all still works. Don’t like the UI? Replace Grafana. Don’t like Prometheus? There’s VictoriaMetrics and InfluxDB
It looks silly on a small scale, but it scales up very well. Couple hundred VMs per Prometheus install, node exporters on every VM and a single Grafana cluster to visualize the data for the whole infrastructure at once.
That makes it all well liked in enterprise which means there are exporters for damn near anything (even the Lemmy server has a built-in exporter I can scrape with Prometheus), which in turn makes it the easy solution for self-hosters too, and here we are.
I feel like it’s easier to set up than some of the all in one solutions I’ve used previously, despite being several components. They’re all components that basically just work out of the box.
I’ve been using Zabbix for years now. Does what I need it to do.
LibreNMS has a very different purpose from your other monitoring options - it’s network monitoring at a large scale, not a generic data storage / data visualization platform. If your goal is to monitor your selfhosted servers and services, this is going to be an odd fit and you’ll probably struggle against it.
Better fits for an out-of-the-box monitoring setup would be CheckMK or Zabbix.
These other “stacks” for monitoring are a little more bespoke. To cover it briefly:
Grafana is popular because it is a fantastic visualization platform. The backend data storage is pluggable.
There are many options for data storage, all that are a little different. Graphite, is push-based and the Statsd compatibility makes it super simple to push your own metrics into it. Prometheus is pull-based. And InfluxDB is more of a time-series database.
Part of self hosting is to decide yourself what you want or need.
I am very happy with Beszel (https://github.com/henrygd/beszel) as it is enough for my use case.
That being said compatibility is huge in the GrafProm Stack. A lot of software has Prometheus compatible end points which can then be visualised with Grafana.
Want to know how many requests are hitting your server? Count Diamond blocks mined per player on a Minecraft server? Want to track your weight and workout time? Or do you want to count yellow cars driving by your house? Grafana & Prometheus got you.
The number one reason is that Grafana is king of the open source operational dashboards. Grafana works with so many backends and has worked so well for so long it’s hard to beat.
Then when you start considering the metric collection and storage setting up a node exporter and black box exporter covers 80% of your use cases. There are scaling and security advantages to Prometheus’ pull architecture too.
I share your annoyance with having to roll out multiple services but I recently bundled them all together into a docker compose that I had been considering sharing publicly. If you can wait a couple weeks I can share that.
One other thought is that separating all of the development of the exporter components means that the teams with real expertise in the service being monitored can collect the best metrics. Rather than a monitoring project making a half ass metric collector for a service they have never used or managed.
Also Grafana has built in alerting so alert manager can be skipped in some cases.
I was asking myself the same. As everyone talk about these I used them until I discovered ChekMK, and others. Now I’m no longer using Grafana and Prometheus…
'Cos the bros don’t deem Icinga cool enough.