mirror of https://github.com/geoserver/geoserver-cloud.git synced 2025-12-08 20:16:08 +00:00

Add simple monitoring stack for development (Prometheus/Grafana)

Add basic Prometheus/Grafana setup for local development observability
and as a starting point for building custom production monitoring.

Usage: ./pgconfig -f monitoring.yml up -d
- Grafana at http://localhost:3000 (admin/admin)
- Prometheus at http://localhost:9091 with Eureka service discovery

Includes basic dashboard showing JVM metrics, HTTP rates, service health,
and resource usage. Intentionally kept simple - users should customize for
production with alerting, persistence, security, and integration with
existing observability platforms.

Features:
- Auto-discovery of scaled replicas via Eureka
- Comprehensive monitoring guide
- Example queries and dashboard customization tips

This is a development tool and foundation, not production-ready monitoring.

2025-11-06 22:23:01 -03:00

13 KiB

Raw Permalink Blame History

GeoServer Cloud Monitoring with Prometheus + Grafana

This guide explains how to monitor your GeoServer Cloud deployment using Prometheus and Grafana.

Overview

GeoServer Cloud applications expose Spring Boot Actuator endpoints on port 8081, including:

/actuator/health - Health check endpoints
/actuator/metrics - Micrometer metrics
/actuator/prometheus - Prometheus-formatted metrics

The monitoring stack includes:

Prometheus - Time-series database for metrics collection (port 9091)
Grafana - Visualization and dashboards (port 3000)

Purpose and Scope

This monitoring setup provides basic observability for local development and serves as a starting point for building production monitoring infrastructure.

What this is:

Simple Prometheus/Grafana stack for development environments
Basic dashboard showing essential metrics (JVM, HTTP, service health)
Foundation and reference implementation for custom observability solutions
Example of Eureka-based service discovery integration

What this is NOT:

Production-ready monitoring (no alerting, persistence config, security hardening)
Comprehensive dashboard suite (intentionally kept simple)
Replacement for enterprise observability platforms

For production deployments, you should:

Customize dashboards for your specific needs and SLAs
Configure alerting rules and notification channels
Implement persistent storage with appropriate retention policies
Integrate with your existing observability stack (Datadog, New Relic, etc.)
Add security (authentication, TLS, network policies)
Consider distributed tracing (OpenTelemetry, Jaeger, Zipkin)

Quick Start

1. Start GeoServer Cloud with Monitoring

Add the monitoring.yml file when starting your environment:

cd compose

# For pgconfig backend with monitoring:
./pgconfig -f monitoring.yml up -d

# For datadir backend with monitoring:
./datadir -f monitoring.yml up -d

This starts all GeoServer Cloud services plus Prometheus and Grafana.

2. Access the Monitoring Tools

Grafana Dashboard: http://localhost:3000
- Default credentials: admin / admin (you'll be prompted to change on first login)
Prometheus UI: http://localhost:9091
- View targets, explore metrics, test queries

3. View Pre-configured Dashboard

In Grafana:

Navigate to Dashboards → GeoServer Cloud Overview
This basic starter dashboard shows:
- Service availability (up/down status)
- JVM heap memory usage
- CPU usage per service
- HTTP request rate
- HTTP request duration
- JVM thread count

Note: This dashboard is intentionally simple and serves as a starting point. You should customize it for your specific monitoring needs, add alerting rules, and create additional dashboards as required.

Dashboard Preview

The dashboard provides basic observability metrics for development and debugging. Customize it to suit your production monitoring requirements.

Available Metrics

All Spring Boot services expose these metric categories:

JVM Metrics

jvm_memory_used_bytes - Memory usage by area (heap, non-heap)
jvm_memory_max_bytes - Maximum memory
jvm_threads_live_threads - Thread count
jvm_gc_pause_seconds - Garbage collection pauses
jvm_classes_loaded_classes - Loaded class count

System Metrics

system_cpu_usage - System CPU usage (0-1)
process_cpu_usage - Process CPU usage (0-1)
system_load_average_1m - System load average

HTTP Metrics

http_server_requests_seconds_count - Request count
http_server_requests_seconds_sum - Total request duration
http_server_requests_seconds_max - Maximum request duration

Labeled by: service, uri, method, status, outcome

Tomcat Metrics

tomcat_threads_current_threads - Current thread count
tomcat_threads_busy_threads - Busy threads
tomcat_sessions_active_current_sessions - Active sessions

Database Connection Pool (HikariCP)

hikaricp_connections_active - Active connections
hikaricp_connections_idle - Idle connections
hikaricp_connections_pending - Pending connection requests
hikaricp_connections_max - Maximum pool size

GeoServer-specific Metrics

geoserver_metrics_* - Custom GeoServer metrics (if enabled)

RabbitMQ Metrics

The RabbitMQ management plugin exposes metrics at http://rabbitmq:15692/metrics

Exploring Metrics

Using Prometheus UI

Go to http://localhost:9091
Click Status → Targets to see all scraped services
Click Graph to explore metrics with PromQL

Example queries:

# Services that are down
up == 0

# Average request rate per service
rate(http_server_requests_seconds_count[5m])

# 95th percentile request duration
histogram_quantile(0.95, rate(http_server_requests_seconds_bucket[5m]))

# JVM heap usage percentage
jvm_memory_used_bytes{area="heap"} / jvm_memory_max_bytes{area="heap"} * 100

Creating Custom Dashboards

In Grafana:

Click + → Create Dashboard
Add panels with Prometheus queries
Use the Prometheus data source (pre-configured)

Checking Actuator Endpoints Directly

You can also access actuator endpoints directly from the host:

# Check health of a specific service (services don't expose 8081 to host by default)
# Note: container name depends on your backend (pgconfig, datadir, etc)
docker exec -it gscloud_dev_pgconfig-wms-1 curl localhost:8081/actuator/health

# Or access via the gateway (if configured to proxy actuator endpoints)
curl http://localhost:9090/geoserver/cloud/wms/actuator/health

Scaling Services

The monitoring setup uses Eureka-based service discovery to find all service replicas automatically! 🎉

How It Works

All GeoServer Cloud services register themselves with the Eureka discovery service. Prometheus:

Queries the Eureka service registry every 30 seconds
Discovers all instances of each service, including all scaled replicas
Scrapes metrics from each instance independently
Labels each instance with service name, instance ID, and hostname

This works perfectly with Docker Compose scaling because Eureka tracks every individual container that registers.

How to Scale

Scale any service to multiple replicas:

./pgconfig -f monitoring.yml up -d --scale wms=3 --scale wfs=2

Prometheus will automatically:

Discover all 3 WMS replicas from Eureka
Discover all 2 WFS replicas from Eureka
Scrape metrics from each replica independently
Label each with unique instance_id

Viewing Scaled Services

In Prometheus:

Go to Status → Targets
You'll see one target per replica (e.g., 3 WMS targets if scaled to 3)
Each target has labels: service, instance_id, hostname, application
Go to Status → Service Discovery to see Eureka discovery in action

In Grafana:

Metrics are collected from all replicas simultaneously
Use sum by (service) to aggregate across all replicas of a service
Use instance_id label to filter or view specific replicas
Use hostname to identify individual containers

Understanding the Service Availability Dashboard:

The "Service Availability (Replica Count)" table shows:

Service: The service name (wms, wfs, wcs, etc.)
Replicas UP: Number of healthy replicas running
Replicas DOWN: Number of unhealthy replicas (if any)

When you scale WMS to 3 replicas, you'll see:

Service | Replicas UP | Replicas DOWN
wms     | 3           | 0

This is the correct behavior! Each replica is an independent instance that:

Runs in its own container
Registers independently with Eureka
Has its own health status
Can fail independently

Prometheus tracks each replica separately, allowing you to:

Monitor individual replica health
Detect partial failures (e.g., 2/3 replicas healthy)
View metrics per replica or aggregated

Example Queries:

Total request rate across all WMS replicas:

sum(rate(http_server_requests_seconds_count{service="wms"}[5m]))

Request rate per WMS replica:

rate(http_server_requests_seconds_count{service="wms"}[5m])

Memory usage of a specific replica:

jvm_memory_used_bytes{service="wms", instance_id="wms-service:172.18.0.5:8080", area="heap"}

Count of healthy replicas per service:

count by (service) (up{job="geoserver-cloud-services"} == 1)

Configuration

Prometheus Configuration

The setup includes two Prometheus configurations:

prometheus-eureka.yml (Default, Recommended)
- Uses Eureka service discovery
- Automatically discovers all replicas
- Works perfectly with scaling
- Requires Eureka discovery service to be running
prometheus.yml (Fallback)
- Uses DNS-based service discovery (tasks.<service>)
- Limited replica discovery in Docker Compose
- Use if Eureka is disabled

To switch configurations, edit compose/monitoring.yml and change the volume mount:

volumes:
  # For Eureka-based discovery (default):
  - ./prometheus-eureka.yml:/etc/prometheus/prometheus.yml:ro

  # OR for DNS-based discovery:
  - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro

To customize, edit the active configuration file:

Adjust scrape intervals (default: 15s)
Add more static targets
Configure alerting rules
Modify relabeling rules

Grafana Configuration

Datasources: compose/grafana/provisioning/datasources/
Dashboards: compose/grafana/provisioning/dashboards/

Add .json dashboard files to the dashboards directory - they'll be automatically loaded.

Customizing Credentials

Set environment variables before starting:

export GRAFANA_USER=myuser
export GRAFANA_PASSWORD=mypassword
./pgconfig -f monitoring.yml up -d

Monitoring Additional Components

PostgreSQL

To monitor PostgreSQL, add the postgres_exporter:

Uncomment the postgres section in compose/monitoring.yml
Add this service:

  postgres-exporter:
    image: prometheuscommunity/postgres-exporter:latest
    environment:
      DATA_SOURCE_NAME: "postgresql://geoserver:geoserver@geodatabase:5432/geoserver?sslmode=disable"
    ports:
      - "9187:9187"
    depends_on:
      - geodatabase

Uncomment the postgres job in compose/prometheus.yml

RabbitMQ

RabbitMQ metrics are already configured if the management plugin is enabled (it is by default).

Stopping the Monitoring Stack

# For pgconfig backend:
./pgconfig -f monitoring.yml down

# For datadir backend:
./datadir -f monitoring.yml down

To also remove volumes (WARNING: deletes all data including Prometheus metrics and Grafana dashboards):

./pgconfig -f monitoring.yml down -v
# or
./datadir -f monitoring.yml down -v

Troubleshooting

Not Seeing All Replicas in Prometheus (Eureka Discovery)

If you're using Eureka discovery but only see one instance per service:

Check Eureka service registry:
- Go to http://localhost:8761 (Eureka console)
- Verify all service instances are registered
- Look for multiple instances of the scaled service

Verify services are registering:

# Check if WMS instances are registered
curl http://localhost:8761/eureka/apps/WMS-SERVICE | grep -o "<instance>.*</instance>"

Check Prometheus service discovery:
- Go to http://localhost:9091/service-discovery
- Look for eureka_sd_configs section
- Verify discovered targets match Eureka registry

Check Prometheus logs:

./pgconfig -f monitoring.yml logs prometheus | grep -i eureka

Using DNS Discovery Instead

If Eureka discovery isn't working or you prefer DNS:

Edit compose/monitoring.yml
Change the volume mount to use prometheus.yml instead of prometheus-eureka.yml
Restart: ./pgconfig -f monitoring.yml restart prometheus

Note: DNS discovery has limitations - see the configuration section for details.

Services showing as "Down" in Prometheus

Check if actuator endpoints are accessible:

docker exec -it gscloud_dev_pgconfig-wms-1 curl localhost:8081/actuator/prometheus

Check Prometheus logs:

./pgconfig -f monitoring.yml logs prometheus

Verify services are on the same Docker network:

docker network inspect gscloud_dev_pgconfig_default

Grafana Dashboard Not Loading

Check Grafana logs:

./pgconfig -f monitoring.yml logs grafana

Verify Prometheus datasource:
- Go to Configuration → Data Sources
- Test the Prometheus connection

High Memory Usage

The monitoring tools themselves consume resources. Adjust limits in compose/monitoring.yml:

deploy:
  resources:
    limits:
      memory: 256M  # Reduce if needed

Additional Resources

Spring Boot Actuator Documentation
Micrometer Documentation
Prometheus Documentation
Grafana Documentation
Pre-built Grafana Dashboards
- Dashboard ID 4701: JVM (Micrometer)
- Dashboard ID 11378: Spring Boot Statistics

13 KiB Raw Permalink Blame History

GeoServer Cloud Monitoring with Prometheus + Grafana

Overview

Purpose and Scope

Quick Start

1. Start GeoServer Cloud with Monitoring

2. Access the Monitoring Tools

3. View Pre-configured Dashboard

Dashboard Preview

Available Metrics

JVM Metrics

System Metrics

HTTP Metrics

Tomcat Metrics

Database Connection Pool (HikariCP)

GeoServer-specific Metrics

RabbitMQ Metrics

Exploring Metrics

Using Prometheus UI

Creating Custom Dashboards

Checking Actuator Endpoints Directly

Scaling Services

How It Works

How to Scale

Viewing Scaled Services

Configuration

Prometheus Configuration

Grafana Configuration

Customizing Credentials

Monitoring Additional Components

PostgreSQL

RabbitMQ

Stopping the Monitoring Stack

Troubleshooting

Not Seeing All Replicas in Prometheus (Eureka Discovery)

Using DNS Discovery Instead

Services showing as "Down" in Prometheus

Grafana Dashboard Not Loading

High Memory Usage

Additional Resources

13 KiB

Raw Permalink Blame History