Monitoring stack with Grafana, Prometheus, and Loki.
Overview¶
The monitoring stack at dashboard.spacemusic.tv provides metrics, logs, and alerting for all SpaceMusic services.
| Component | Image | Purpose |
|---|---|---|
| Grafana | grafana/grafana:latest |
Visualization and dashboards |
| Prometheus | prom/prometheus:latest |
Metrics collection (30-day retention) |
| Loki | grafana/loki:latest |
Log aggregation |
| Promtail | grafana/promtail:latest |
Log shipping (Docker + /var/log) |
| cAdvisor | gcr.io/cadvisor/cadvisor:latest |
Container metrics |
| node-exporter | prom/node-exporter:latest |
Host system metrics |
Access¶
URL: dashboard.spacemusic.tv
Authentication: Authentik OIDC (native Grafana OAuth integration). Role mapping by Authentik group:
| Authentik Group | Grafana Role |
|---|---|
spacemusic-admins |
Admin |
spacemusic-studio |
Editor |
| All others | Viewer |
Provisioned Dashboards¶
All dashboards are provisioned from JSON files in the repository and auto-synced every 30 seconds. UI edits do not persist -- changes must be made in the JSON files and committed to git.
| Dashboard | UID | Description |
|---|---|---|
| SpaceMusic Server (Home) | home |
Portal with polystat service map, quick health metrics, recent logs |
| Server Overview | server-overview |
CPU, RAM, disk usage, network I/O from node-exporter |
| Docker Fleet | docker-fleet |
Per-container CPU, memory, network, disk I/O from cAdvisor |
| LiveKit Streaming | livekit-streaming |
Active rooms, participants, track bandwidth, connection quality |
| Uptime & Alerts | uptime-alerts |
Service state timeline, response times, SSL certificate expiry |
| Storage (MinIO) | storage |
Bucket usage, API requests, disk utilization |
| Relay (Centrifugo) | relay |
WebSocket connections, messages/sec, channel activity |
| Auth (Authentik) | auth |
Request rate, latency, status codes, flow timing, DB queries, memory |
| API Gateway | api-gateway |
Health status, container metrics, request logs |
| I/O Hub | io-hub |
Placeholder (service not yet deployed) |
Datasources¶
| Datasource | UID | URL | Purpose |
|---|---|---|---|
| Prometheus | prometheus |
http://prometheus:9090 |
Metrics (default) |
| Loki | loki |
http://loki:3100 |
Logs |
| Infinity | infinity |
-- | REST API queries (JSON/CSV/XML) |
Prometheus Scrape Targets¶
Prometheus collects metrics from all SpaceMusic services:
| Job | Target | Notes |
|---|---|---|
prometheus |
localhost:9090 |
Self-scrape |
node_exporter |
node-exporter:9100 |
Host CPU, RAM, disk, network |
cadvisor |
cadvisor:8080 |
Container metrics (filtered to key metrics) |
livekit |
172.17.0.1:7889 |
Host network, reached via Docker gateway IP |
minio |
spacemusic-minio:9000 |
Bearer token auth |
centrifugo |
spacemusic-centrifugo:8000 |
WebSocket relay metrics |
authentik |
authentik-server:9300 |
SSO service metrics |
kuvasz |
spacemusic-kuvasz:8080 |
Uptime monitoring metrics (bearer token auth) |
Adding a new scrape target
After adding a new scrape target that references containers on external Docker networks, a full docker compose restart prometheus is needed. A SIGHUP reload is insufficient if DNS for the new container name fails on first resolution.
Plugins¶
| Plugin | Purpose |
|---|---|
grafana-polystat-panel |
Hexagon service map on the home dashboard |
grafana-clock-panel |
Live server clock widget |
yesoreyeram-infinity-datasource |
Query REST APIs as datasources |
Logs¶
Promtail ships all Docker container logs and /var/log system logs to Loki. Access logs via Grafana Explore using LogQL queries:
{container_name="spacemusic-api"} | json | status >= 400
The home dashboard includes a recent logs panel for quick troubleshooting.
Editing Dashboards¶
Dashboards are provisioned with disableDeletion: true and allowUiUpdates: false. To make changes:
- Edit the JSON file in
services/dashboard/spacemusic-dashboard/grafana/dashboards/ - Commit and push to
main - GitHub Actions deploys the change; Grafana picks it up within 30 seconds