For HPC Cloud, The Underlying Hardware Will Always Matter


Decades of cloud collaborations and optimizations have brought solid performance and cost efficiencies to cloud.

Those who require high performance computing (HPC) resources tend to set their own rules when it comes to systems. Highly tuned for blazing fast computation and communication with software stacks optimized to match, these users have little in common system-selection wise with the average enterprise running database or transactional applications.

It stands to reason these HPC users extend these habits no matter where they run, including on public cloud resources. The number of HPC applications running in cloud environments has steadily grown and now the scale at which they operate is growing too, especially with new demands from increased data volumes and adding AI/ML into the workload mix.

For a large contingent of those ordinary enterprise cloud users, the belief is that a major benefit of the cloud is not thinking about the underlying infrastructure. But, in fact, understanding the underlying infrastructure is critical to unleashing the value and optimal performance of a cloud deployment. Even more so, HPC application owners need in-depth insight and therefore, a trusted hardware platform with co-design and portability built in from the ground up and solidified through long-running cloud provider partnerships.

These HPC users understand co-design, optimization, and what specific enhancements to both ISV and open source codes can yield when tweaked for certain hardware. In other words, the standard lift-and-shift approach to cloud migration is not an option. The need for blazing fast performance with complex parallel codes means fine-tuning hardware and software. That’s critical for performance and for cost optimization, says Amy Leeland, director of hyperscale cloud software and solutions at Intel.

“Software in the cloud isn’t always set by default to use Intel CPU extensions or embedded accelerators for optimal performance, even though it is so important to have the right software stack and optimizations to unlock the potential of a platform, even on a public cloud,” she explains.

Silicon Choice Has Cost/Performance Impact

These distinctions are even more important when there is an ever-growing set of compute options available in the public cloud; from rival CPUs offered by large chipmakers to native cloud-based processor types based on architectures like Arm, which do not have the long history of optimization, use, and broad adoption across multiple HPC application areas.

HPC customers need to know what silicon their workloads are running on, as this will affect their ability to use infrastructure elsewhere, whether on-prem or in other private or public clouds,” Leeland says. Ultimately these decisions about the infrastructure can have a significant performance and cost impact, and in some cases, can even lead to cloud lock-in, especially with CSPs with their own proprietary architectures (the Arm-based Graviton processors via AWS, for instance).

“With the variety and requirements of HPC, portability is key. Arm-based platforms, including AWS Graviton, are trying to get everything ported, they’re working up the stack and migrating workloads and software. But that doesn’t mean that what gets migrated to Graviton will work on another vendor’s Arm SoC,” she adds.

In short, HPC in the cloud requires a stable foundation with years (decades even) of optimizations to balance high performance with cost efficiency. That platform needs to support the many environments HPC jobs run across (on-prem, hybrid, and on public cloud resources) and therefore needs to be portable, stable, and tuned for the application types HPC shops value.

Value In The HPC Software Ecosystem

HPC users well understand Intel platforms from decades of on-prem use. One glance of the Top 500 list of the world’s most powerful supercomputers shows the vast majority are based on Intel processors. This same dominance carries over to cloud environments where Intel-based instances are the norm and are found in every region and availability zone across all major cloud providers.

“We have a rich HPC software ecosystem based on decades of work, particularly around HPC open source, standards, and work with ISVs. For over a decade we’ve been the top corporate contributor to the Linux kernel and in the top five open source contributors to the Open Source Standards Committee,” Leeland explains.

“We are always testing to make sure all software is optimized out of the box on Intel Xeon SP processors. This means companies can choose to be anywhere – on prem, at the edge, moving into the cloud and out again.” Unlike less established ecosystems for HPC like Arm-based efforts, portability and flexibility are built in alongside performance optimizations.

No other compute platform has a longer history on the world’s first and largest cloud platform, AWS, than Intel. With fifteen years of collaboration between EC2 and Xeon teams and a range of HPC optimized instances over those years, not to mention specific enhancements on native platforms like AWS Parallel Cluster as one example, Intel and the cloud are matched out of the box for enterprise, but well-honed for performance-aware HPC shops.

The outcome of a long-lasting partnership based on continuous honing of hardware and software and an emphasis on standards and portability means HPC in the cloud can consistently depend on Intel platforms for scale, performance, and ecosystem reach. At the end of the day, all of this has a net effect on the true running costs over time as well as the option to shift infrastructure patterns as needs – and workloads – demand.

Sponsored by Intel.