In this article

It's an IT manager's never-ending challenge: how to optimize data center resources to maximize performance, minimize CapEx/OpEx and realize a healthy return on investment (ROI) — all to stay competitive and meet the ever-increasing demand for faster and more efficient data processing. Solving the "performance" part of the equation requires making the most efficient use of every hardware and software asset, and that includes accelerators.

For those unfamiliar, a hardware accelerator offloads specific compute functions from a general-purpose processor, freeing up that processor for other tasks while decreasing latency and increasing throughput. Common acceleration devices include graphics processing units (GPUs), field programmable gate arrays (FPGAs) and application-specific integrated circuits (ASICs).

In a conventional usage model, all or part of a GPU is assigned to a single virtual machine (VM) for a task-specific purpose. Even if only a fraction of its acceleration capability is used, the remainder cannot be shared. Another common way of "sharing" GPU acceleration, especially in high performance computing, is to move users on and off the accelerated machines. However, that can create costly overhead related to human scheduling and resource shuffling.

As a result, GPU capacity can go underutilized for a variety of reasons. For example, one VM might use 10 percent of its assigned GPU capacity, leaving 90 percent under-utilized and out of reach from other acceleration-starved VMs. Or, despite this underutilization, users may avoid the hassle of filing a separate request by unnecessarily extending their reservation to hold on to that GPU for their next set of computations.

But that's all changing now as our technology partner VMware introduces a novel solution for abstracting acceleration as a shareable resource to maximize utilization while reducing total cost of ownership. Currently this pooled resource solution supports GPU-based devices and the CUDA API; development to extend support to FPGAs and other hardware acceleration cards and APIs is currently underway.

Explore the benefits of a new, flexible alternative to yesterday's dedicated data center acceleration dilemma: vSphere Bitfusion.

Bringing resource pooling to acceleration technology

Think of it in terms of the first virtualized applications: there was a physical server with an OS installed and an application within the OS. As virtualization came to life, WWT and other technology providers began deploying multiple systems running different operating systems and applications. We pooled CPU, memory and networking resources — and now we're doing the same thing with accelerators.

Essentially, vSphere Bitfusion is doing for acceleration what vSphere did for processing: providing the ability to abstract pooled resources in an "on demand" delivery model for greater flexibility, improved utilization and reduced cost. This acceleration solution supports a range of intense compute workflows, including artificial intelligence (AI) and machine learning (ML), as well as high performance computing (HPC). It functions as a transparent layer across AI frameworks, cloud environments, networks, VMs, containers and more.

Here's how it works: vSphere Bitfusion abstracts the acceleration hardware from the underlying infrastructure, placing all available devices into a large capacity pool of shared, network-accessible resources — no longer isolated per-server. Bitfusion's flexible acceleration card scheduler eliminates gate-keeping of GPUs. Users who require all or part of a GPU are no longer impacted by those who require a smaller slice. This enables more use cases to benefit from acceleration, positively affecting TCO and ROI.

vSphere Bitfusion architecture

With vSphere Bitfusion, an agent informs the guest OS that a full GPU is installed and intercepts any API calls made to this GPU. These calls are sent to any available GPU within the cluster, and only the required amount of that GPU is allocated to accelerate that task, with results being returned to the guest OS as if the GPU was local. This allows multiple VMs to simultaneously and efficiently use a single GPU.

The vSphere Bitfusion solution also eliminates the need for GPUs to be physically located in the server running the accelerated VM. This solves a host of problems, especially for users who might not have their VM or data in a location accessible to the acceleration servers. What's more, removing this locality requirement opens up the possibility of GPU acceleration for locations that lack sufficient power and/or cooling for the cards.

The result: GPU utilization is dramatically increased, and the range of GPU-accelerated tasks can be extended to more VM users. By maximizing utilization of all accelerator resources, the organization can expedite workflows at a reduced cost for a stronger ROI.

Speeding development while reducing CapEx/OpEx

The benefits of pooled acceleration should be apparent to anyone who's been tasked with allocating a limited hardware resource — or anyone who's had to wait for that resource. With vSphere Bitfusion, unused resources are automatically released back into the pool, transparent to the user, so they can benefit others.

Imagine the scenario of an IT manager needing to equip half a dozen remote data scientists with high-performance virtual machines — all six virtual boxes would require significant acceleration capability on a server large enough to host them. But rather than invest thousands of dollars in separate GPU devices, the Bitfusion solution enables all six boxes to draw acceleration from the same resource, averting a significant expenditure on dedicated hardware for each. We should also consider edge solutions that could effectively and efficiently streamline their required hardware needs, being able to draw on previously impractical acceleration resources.

In addition to that reduction in total cost of ownership, vSphere Bitfusion enables organizations to scale up or down flexibly without the hassle and expense of acquiring and constantly re-assigning accelerators or the users to those accelerators. Increased access to acceleration makes feasible all kinds of previously impractical use cases, extending resources to any number of devices and applications that would benefit from even a minuscule amount of acceleration, whereas in the past such an allocation would need to be justified with a dedicated server.

But perhaps most important, vSphere Bitfusion enables IT to derive maximum utilization from its acceleration resources, unlike the traditional model of one VM consuming an entire GPU regardless of how small the percentage of actual utilization is while other VMs go without. Now everyone can draw freely from the pool for dramatically expedited workflows.

Expanding the vSphere Bitfusion solution with Intel® FPGAs

As previously mentioned, at present only GPU-based accelerators are supported by vSphere Bitfusion. However, VMware and WWT are collaborating with Intel on widening the solution to include devices that use OpenCL and OPAE API calls such as Intel® FPGA accelerators.

Intel FPGAs are versatile accelerators that, once built, can be re-configured for specific environments and workflows. Simply switch around their logic gates, memory stores and input/output wires for almost any given operation. At one time requiring specialized development skills to program, the newest generations of Intel FPGA accelerators are actually preferred over GPUs in many cases for their flexibility and modest power requirements — valuable advantages for the vSphere Bitfusion solution, especially in environments where power draw and cooling are significant concerns.

We take a forward-looking approach to vSphere Bitfusion, using an architecture that's already equipped to test the abstraction of FPGA-based acceleration solutions as soon as the software supports it.

vSphere Bitfusion solves the data center acceleration challenge

Our focus is on helping customers maximize data center efficiency, envisioning the most highly performant virtualized environments to deliver greater resource utilization and ROI. Toward that goal, vSphere Bitfusion promises to be a significant game-changer, introducing acceleration to the existing array of pooled resources — compute, memory, storage, OS and more — adding even more value with the imminent addition of versatile, power-efficient FPGAs.

We are currently building out a next-gen high performance computing environment in the Advanced Technology Center (ATC) to demonstrate the capabilities of technologies like pooled accelerator cards, with workshops and labs planned for sharing even more insight into its capabilities. The days of battling over limited acceleration capacity are over, and a new era of maximized, easily shared resource pools has begun.

Technologies