Quant Hedge Fund Builds Software-defined GPU Data Center to Reduce Cloud Dependency

Challenge

In quantitative research and trading, data and compute resources are everything. Quantitative analysts (quants) must rapidly develop, test and refine AI-powered models trained on large data sets. When models hit certain thresholds, firms must quickly execute trades to capture market opportunities.

Complex mathematical models give firms a competitive edge, but they also put enormous strain on IT infrastructure running behind the scenes.

This was the situation facing a global hedge fund that relies on cutting-edge models for quantitative trading.

The firm had built its AI-driven trading operations entirely in the cloud. While this strategy served the firm well initially, the boom in large language models (LLMs) and generative AI (GenAI) created new infrastructure challenges.

Demand for GPUs surged, driving up costs and straining availability, with lead times becoming increasingly unpredictable. Meanwhile, the firm became more intent on protecting its proprietary models and reducing the risk of data exfiltration.

What was once a straightforward infrastructure decision had become a complex balancing act between cost, speed and security.

To stay competitive, the firm decided to augment its public cloud footprint with an on-premises setup, investing more than $100 million in building a private, GPU-based data center from the ground up.

The firm wanted its new greenfield environment to give quants the same seamless experience they had come to expect from the cloud, along with greater speed, reliability, lower costs and secure control over intellectual property.

Free from legacy systems and technical debt, the greenfield build gave the firm a unique opportunity to build automation the right way from day one.

That's where WWT came in.

Solution

WWT engaged early to help the hedge fund define and execute an automation strategy tailored to the unique demands of quantitative trading. The goal: design a high-performance computing (HPC) cluster capable of supporting GPU-intensive workloads with full automation and real-time visibility.

Working alongside the firm's IT teams, WWT led high-level design and hands-on engineering of a software-defined environment, aligning automation workflows to what the business needed to succeed.

Automating core use cases

To start, we helped stakeholders define minimum viable product (MVP) use cases, focusing on common, high-touch workflows that were slowing teams down. These included:

Provisioning environments for quants to run GPU-intensive research and modeling workloads
Deploying infrastructure for new trading strategies with preloaded configurations
Onboarding new server racks automatically without manual setup
Re-imaging environments between strategy cycles or team handoffs
Enforcing security policies automatically across all environments
Providing developers with sandbox environments for testing and tuning

Key workflows in support of quantitative trading. — Identifying high-touch workflows that were slowing teams down was one of the first steps in shaping an automation strategy.

Each use case was mapped to internal user needs — such as fast access to compute, low operational overhead and minimal friction between teams — and then translated into automated workflows.

These workflows covered everything from operating system deployment and network setup to GPU scheduling, access controls implementation and policy enforcement.

Every environment in the new data center is deployed using infrastructure as code, with built-in scans and policy checks protecting proprietary models and data at every step. This approach eliminates configuration drift and reduces risk, giving the firm's IT teams a repeatable, auditable way to deploy infrastructure with speed and confidence.

When new GPU server racks arrive, they are automatically discovered and configured through code. That same level of automation extends to internal teams, who can request fully provisioned environments through a self-service catalogue, with everything configured and ready to go in minutes.

Monitoring a dynamic landscape

Monitoring the new high-performance environment posed unique challenges. Newer storage technologies, strict thermal thresholds, dense high-throughput networking and energy-intensive GPU workloads created a constantly shifting performance landscape.

To preserve system health and efficiency, the firm needed deep visibility into power and cooling telemetry, network throughput, and resource utilization across clusters.

We helped design and implement a modern observability platform tailored to these demands. The platform gives teams continuous insight into system health, job performance and resource allocation. It also includes built-in logic to trigger automated alerts and remediation.

The result is infrastructure that behaves like a product: fast to deploy, easy to consume and fully aligned with users' work habits.

Scaling through a shared delivery model

The engagement follows a build-operate-transfer model, with WWT leading initial planning and development before progressively offloading major implementation workstreams — such as infrastructure automation, observability, Linux engineering, data mirroring and storage configuration — to the client.

While the client took ownership of day-to-day execution, WWT continued to provide core program strategy, architectural oversight and hands-on support through to full completion.

This shared delivery model has helped the client accelerate progress while building internal confidence and capability. More importantly, it has given the firm a platform that can scale as quickly as its trading strategies evolve.

Results

The automation-first strategy positioned the firm to realize measurable gains in performance, security and cost efficiency.

By replacing manual provisioning with infrastructure as code, the firm dramatically accelerated access to compute resources. Internal teams can now spin up GPU-powered environments in minutes instead of days.

GPU utilization also stands to improve significantly. With automated provisioning, orchestration and observability in place, the firm is better able to allocate resources, reduce idle cycles and track performance trends. These capabilities are critical to the firm maximizing its investment in private, GPU-based infrastructure.

The shift to an on-premises environment also reduces cloud concentration risks. By moving away from exclusive reliance on public cloud providers, the firm now has greater control over GPU availability and provisioning timelines, avoiding delays in model development.

Comparison of public cloud model versus software-defined on-premises model for compute resources.. — By moving to a software-defined on-premises infrastructure model, the firm gained greater control over compute resources.

Security posture also improves with every deployment. Infrastructure is provisioned through code with embedded policy enforcement and continuous scanning, reducing the risk of human error and misconfiguration. Combined with deeper system visibility and improved access controls, the firm anticipates a significant reduction in data exfiltration risk.

Altogether, these improvements drive infrastructure efficiency, reduce operational overhead and support smarter hardware investment.

And because the client co-built the platform alongside WWT, it now has the internal ownership and capability to scale infrastructure at the speed and scale a global quantitative trading firm demands.