Cutting AI infra spend for a claims-automation platform
We instrumented the insurer’s training and inference stack, moved batch jobs to spot capacity, and introduced cost guardrails the ML team actually uses day-to-day.
Challenge
An Australian insurer had invested heavily in AI for claims triage and fraud detection, but cloud costs were compounding faster than the business case assumed. Finance wanted a spend reset; the ML team did not want the pace of model releases to slow down.
Existing monitoring told them what they had spent – not which jobs were wasteful, and not what to do about it.
Approach
We instrumented both the training stack (PyTorch + Kubeflow) and the online inference stack (Triton) with per-job cost attribution, tying every run back to a model, a team, and a business use case.
Moved nightly batch retraining and offline evaluation onto spot capacity with automatic checkpoint-and-restart, and introduced right-sizing guidance for serving replicas based on observed traffic.
Shipped a set of guardrails – alerts on runaway jobs, default quotas per project, budget-based throttles – that integrate with the ML team’s existing Jira and Slack.
Outcome
GPU spend fell 42% in the first quarter post-rollout, with no regression in model release cadence and no customer-facing latency impact.
The insurer’s FinOps council now uses the same dashboards we built for AI workloads across the rest of their cloud estate.
Let's build what's next
Share your challenge – AI, data, or infrastructure. We'll scope your project and put the right team on it.