Northern Data takes HPC to a new level of cost and energy efficiency with AMD technology

0

AMD EPYC™ CPUs and AMD Instinct™ GPUs enable more scalable and power efficient processing to bring AI or ML to more customers

The high-performance data center market has been rapidly expanding and U.S. brands have tended to be the predominate providers. But Northern Data is going to change that. The multi-billion-dollar European tech provider is aiming to offer HPC cloud services such as machine learning (ML) to a wider audience. Designed for HPC workloads and modern software architectures, Northern Data’s data centers are to more than 90 percent powered by green energy.

lefdal_containerTo achieve this goal, Northern Data relies on technology partners that could deliver the performance required with the optimal cost efficiency. Together with GIGABYTE, Northern Data found that AMD EPYC and AMD Instinct technologies delivered the scale and affordability that the company’s customers demanded. Northern Data is planning to provide a local service compliant with recent GDPR laws, which is an increasing concern for European customers.

Faster, scalable and more affordable ML training

“We are going to obtain some market share from the hyper scalers for AI, Rendering, and other HPC applications,” explains Michel Boutouil, General Manager of Northern Data Software GmbH. “We bring affordable HPC applications to the market, because HPC is still expensive for several reasons, which prevents small and medium-sized enterprises from entering this field.” This led Northern Data to both AMD EPYC processors and AMD Instinct GPUs, which together delivered excellent scalability and lower power consumption.

“AMD processors are of high performance and power efficiency,” says Boutouil. “When we evaluated the AMD Instinct MI50s, they performed particularly well in large clusters, and their power consumption is also very low.”

Northern Data also appreciated the direct dialogue with AMD engineers to optimize the AMD Instinct GPUs’ performance. “We are in close collaboration with the AMD team. We are confident the AMD product will be a great choice for our customers for ML applications, especially at large scales.”

AMD INSTINCT MI100 ServerTo confirm the performance of the AMD Instinct GPUs, Northern Data performed a series of tests, starting with a TensorFlow implementation aimed at training ML frameworks. Northern Data measured how many seconds an AMD Instinct MI50 GPU took to train ML with 300,000 images compared to a typical instance from a major cloud service provider, which took close to 9,000 seconds, whereas the MI50 took well under 5,000 seconds. Northern Data stated that this would considerably lower the cost of training ML systems. A full training cycle could be much more cost efficient than before. “On the compared instance, for example, it would cost €1,987 ($2,339) to run one experiment,” says Loesch. “We can do it with AMD solutions for €1,100 ($1,295). So that’s almost half the price.”

This could be the difference between employing ML and it not being cost effective. “Typically, in machine learning, you would run several experiments with each team deploying several experiments a day,” says Loesch. “You want to have a scalable solution. If you are spending two thousand Euros on an experiment when you do 20 a day, it can get quite expensive very fast.

 Linear scaling with AMD Instinct GPUs

More and more companies are now using AI and ML like this on a regular basis. “Social media will be a big one,” says Boutouil. “For example, companies that deploy these sorts of technologies would use this for research. Natural Language Processing (NLP) is also quite big, which we’re going to evaluate in the future as well.”

Northern Data expects its AMD technology to deliver these capabilities to a much wider range of clients. “With NLP becoming more and more expensive, we could give AI startups a chance to be competitive against the bigger players, which have millions to blow on training.”

INSTINCT MI100 Beautiful ServerFurther underlining the possibilities, Northern Data found AMD Instinct GPUs scaled in a brilliantly linear fashion, with TensorFlow ResNet50 on AMD Instinct MI50 GPUs improving from around 400 images per second with one GPU to around 2,250 images per second with eight – almost exactly the eight times performance you would hope for. Similar scalability was shown when running a Blender® 3D render in a virtual machine on the AMD Instinct MI50 GPUs.

Even better was the power efficiency, which more than doubled the amount of work completed per watt with eight GPUs compared to one. This is because eight GPUs can be installed on a single AMD EPYC CPUpowered server without losing any bandwidth, thanks to the 128 PCIe Express® 4 lanes per single socket and up to 160 per dual socket. In order to fulfill this high demand workload, the GIGABYTE G292-Z20 was chosen for its efficient design and topology that allowed for maximum throughput for the AMD Instinct MI50. For a 2U GPU-dense chassis, thermals can be a challenge, but this server excelled without throttling performance. “For distributing training workloads, the extremely large bandwidth towards the GPUs really helps,” says Loesch. These positive results led Northern Data to place a huge order for 4,366 single-socket GIGABYTE servers powered by AMD EPYC 7402P processors, each one equipped with eight AMD Instinct MI50 GPUs, for a grand total of 34,928 accelerators. Of these servers, around 2,000 have been deployed so far.

Low costs and more speed for new industries

Northern Data is already seeing its investment deliver outstanding results. “We managed to lower the power consumption about 30 to 40 percent for a comparable workload versus other cloud platforms,” says Boutouil. “The reduction in carbon footprint is especially important to us.

In addition, we really like the flexibility of the open-source approach. Furthermore, the low costs empower more people, who previously weren’t really capable of utilizing the technology, due to economic factors. We can deploy amazingly fast data centers, and we are able to scale quickly to run large GPU clusters. This combination also leads us into new industries. For example, the healthcare industry, biotech or MedTech companies, which have a need for strong GDPR assurances. The carbon footprint is especially important to us, which the lower power consumption really helps.”

“We see our data centers not as a customer data center, we see IT as our customer, and we try to build a perfect setup that works together,” concludes Boutouil. “We worked with colleagues from GIGABYTE and AMD to find the optimal setup for the servers, and the optimal density because space is money, and we buy as little space as we need. We like the service AMD provides, as well as the flexibility, the open-source approach, the way of thinking, and of course, the excellent and cost-effective technology. That is why we chose AMD.”

To find out more about Northern data please visit the Northern Data website 

Share.

Comments are closed.

Subscribe Now!

Sign up for a FREE subscription and receive the latest news, features and updates from SMEToday: