The Cost Metrics Aggregator is a Go-based application for collecting and aggregating cost-related metrics from Kubernetes clusters, focusing on node vCPU utilization and pod CPU usage for subscription purposes. It stores data in a PostgreSQL database with partitioned tables for efficient time-series management. The application is deployed on OpenShift with automated image builds via Quay.io and supports local development with Podman.
- Collects node metrics (e.g., core count) and pod metrics (e.g., CPU usage and request seconds) from clusters.
- Stores data in PostgreSQL with UUID-based identifiers and range-partitioned tables for time-series data.
- Aggregates daily node and pod metrics for efficient querying (e.g., total hours and effective core seconds).
- Manages database partitions with automated creation and deletion via OpenShift CronJobs.
- Provides RESTful API endpoints to upload metrics and query node and pod data.
- Deploys on OpenShift with a dedicated PostgreSQL instance and secrets.
- Supports local development with Podman and
podman-composefor testing and debugging.
- OpenShift Deployment:
- OpenShift cluster (v4.x) with admin access.
- Quay.io account with permissions to push to
quay.io/chambridge/cost-metrics-aggregator. - GitHub repository (
chambridge/cost-metrics-aggregator) with push access. kubectlinstalled locally.
- Local Development:
- Go 1.20 or higher.
- Podman and
podman-composeinstalled. makefor using theMakefile.- A storage class (e.g.,
standard) available in OpenShift for PostgreSQL persistence (if deploying locally with OpenShift).
.
├── Containerfile # Container build configuration
├── Makefile # Build, test, and deployment tasks
├── podman-compose.yaml # Local development services (app, database)
├── go.mod # Go module dependencies
├── api/
│ ├── handlers/ # Handlers for API requests
│ └── router.go # Router for endpoint management
├── cmd/server/main.go # Application entry point
├── internal/
│ ├── config/ # Server configuration
│ ├── db/migrations/ # SQL migrations (e.g., 0001_init.up.sql)
│ └── processor/ # CSV processing logic
├── scripts/ # Go scripts for partition management
│ ├── create_partitions.go
│ └── drop_partitions.go
└── deploy/ # OpenShift manifests
├── namespace.yml
├── cost-metrics-db-secret.yml
├── postgres-deployment.yml
├── deployment.yml
├── service.yml
├── route.yml
├── cronjob-create-partitions.yml
└── cronjob-drop-partitions.yml
The database schema (internal/db/migrations/0001_init.up.sql) defines:
clusters: Stores cluster metadata with UUIDidandname.nodes: Stores node metadata with UUIDid,cluster_id,name,identifier, andtype.node_metrics: Stores time-series node metrics with UUIDid,node_id,timestamp,core_count, andcluster_id, partitioned monthly bytimestamp.node_daily_summary: Aggregates daily node metrics bynode_id,date, andcore_count, storingtotal_hours.pods: Stores pod metadata with UUIDid,cluster_id,node_id,name,namespace, andcomponent.pod_metrics: Stores time-series pod metrics with UUIDid,pod_id,timestamp,pod_usage_cpu_core_seconds,pod_request_cpu_core_seconds,node_capacity_cpu_core_seconds, andnode_capacity_cpu_cores, partitioned monthly bytimestamp.pod_daily_summary: Aggregates daily pod metrics bypod_idanddate, storingmax_cores_used,total_pod_effective_core_seconds, andtotal_hours.
All id columns use UUIDs (via gen_random_uuid()). The node_metrics and pod_metrics tables are partitioned for performance.
git clone https://github.com/chambridge/cost-metrics-aggregator.git
cd cost-metrics-aggregatorCreate a ./db.env file for the application:
echo "DATABASE_URL=postgres://costmetrics:costmetrics@db:5432/costmetrics?sslmode=disable" > ./db.env
echo "POD_LABEL_KEYS=label_rht_comp" >> ./db.envDATABASE_URL: Matches the PostgreSQL service inpodman-compose.yaml.POD_LABEL_KEYS: Defines pod labels for filtering (e.g.,label_rht_comp).
Use the Makefile to start the application and PostgreSQL database:
make compose-upThis:
- Builds the application image using the
Containerfile. - Starts the
app(aggregator) anddb(PostgreSQL) services. - Applies migrations from
internal/db/migrationsto initialize the database schema.
Verify services are running:
podman psExpected output includes containers aggregator and aggregator-db.
Execute unit tests to verify the application logic:
make testThis runs tests in all packages, including CSV processing for node and pod metrics.
Generate a test tar.gz file containing a manifest.json and sample CSV files for the previous 24 hours:
make generate-test-uploadUpload the generated test file to the application:
make upload-testThe generate-test-upload target creates a test_upload.tar.gz file with a manifest and two CSV files, each containing hourly metrics data compatible with the application's ingestion endpoint. The upload-test target sends this file to http://localhost:8080/api/ingres/v1/upload. Ensure the application is running before uploading.
💡 Tip: Substitute
start_dateandend_datewith the current date (inYYYY-MM-DDformat) to ensure you query data from current month partition.
Query node metrics:
curl "http://localhost:8080/api/metrics/v1/nodes?start_date=2025-05-17&end_date=2025-05-17"Query pod metrics:
curl "http://localhost:8080/api/metrics/v1/pods?start_date=2025-05-17&end_date=2025-05-17&namespace=test"Connect to the PostgreSQL database to inspect data:
podman exec -it aggregator-db psql -U costmetrics -d costmetricsList tables and partitions:
\dt+ node_metrics*
\dt+ pod_metrics*Query summaries:
SELECT * FROM node_daily_summary WHERE date = '2025-05-17';
SELECT * FROM pod_daily_summary WHERE date = '2025-05-17';Shut down and remove containers:
make compose-downmake build
podman build -t quay.io/chambridge/cost-metrics-aggregator:latest .
podman push quay.io/chambridge/cost-metrics-aggregator:latest-
Create the
cost-metricsnamespace:kubectl apply -f deploy/namespace.yml
-
Update
deploy/cost-metrics-db-secret.ymlwith a base64-encodedDATABASE_URL:- Format:
postgres://<username>:<password>@postgres:5432/costmetrics - Example: Encode
postgres://costmetrics:costmetrics@postgres:5432/costmetricsusingecho -n "<url>" | base64.
- Format:
-
Deploy PostgreSQL and secret:
kubectl apply -f deploy/cost-metrics-db-secret.yml -n cost-metrics kubectl apply -f deploy/postgres-deployment.yml -n cost-metrics
-
Deploy the application:
kubectl apply -f deploy/deployment.yml -n cost-metrics kubectl apply -f deploy/service.yml -n cost-metrics kubectl apply -f deploy/route.yml -n cost-metrics
-
Deploy CronJobs for partition management:
kubectl apply -f deploy/cronjob-create-partitions.yml -n cost-metrics kubectl apply -f deploy/cronjob-drop-partitions.yml -n cost-metrics
-
Check pod status:
kubectl get pods -n cost-metrics -l app=postgres kubectl get pods -n cost-metrics -l app=cost-metrics-aggregator
-
Verify database schema:
kubectl exec -it <postgres-pod-name> -n cost-metrics -- psql -U costmetrics -d costmetrics -c "\dt+ node_metrics*" kubectl exec -it <postgres-pod-name> -n cost-metrics -- psql -U costmetrics -d costmetrics -c "\dt+ pod_metrics*"
-
Check application logs:
kubectl logs -l app=cost-metrics-aggregator -n cost-metrics
-
Verify CronJob execution:
kubectl get jobs -n cost-metrics kubectl logs <job-pod-name> -n cost-metrics
- Creation: The
create_partitions.goscript (run by an initContainer andcronjob-create-partitions) createsnode_metricsandpod_metricspartitions for the previous and next 90 days. - Deletion: The
drop_partitions.goscript (run bycronjob-drop-partitions) drops partitions older than 90 days. - Schedule: Both CronJobs run monthly on the 1st at midnight (
0 0 1 * *).
- POST /api/ingres/v1/upload: Uploads a tar.gz file containing
manifest.jsonand CSV files (e.g.,node.csv) for metric ingestion. - GET /api/metrics/v1/nodes: Queries node metrics (e.g., core count, total hours) with optional filters (
start_date,end_date,cluster_id,cluster_name,node_type). - GET /api/metrics/v1/pods: Queries pod metrics (e.g., max cores used, effective core seconds, total hours) with optional filters (
start_date,end_date,cluster_id,namespace,component).
- Local Development:
- Container Failures: Check
podman logs aggregatororpodman logs aggregator-dbfor errors. - Database Connectivity: Ensure
vulnerability/db.envhas the correctDATABASE_URLand thedbservice is running. - CSV Processing Errors: Verify CSV format and
interval_starttimestamps (2006-01-02 15:04:05 +0000 MST).
- Container Failures: Check
- OpenShift Deployment:
- Build Failures: Check Quay.io build logs for missing dependencies or network issues.
- Migration Errors: Verify
DATABASE_URLincost-metrics-db-secret.ymland PostgreSQL pod logs. - CronJob Failures: Check job logs for script errors or database permissions.
- Metrics Issues:
- Query
node_daily_summaryorpod_daily_summaryto verifytotal_hours:SELECT * FROM node_daily_summary WHERE date = '2025-05-17'; SELECT * FROM pod_daily_summary WHERE date = '2025-05-17';
- Query
- Submit pull requests to
chambridge/cost-metrics-aggregator. - Update
internal/db/migrations/for schema changes andinternal/processor/for metric processing logic. - Add tests in relevant packages (e.g.,
internal/processor) for node and pod metric aggregation. - Test locally with
make compose-upandmake testbefore pushing to Quay.io.