The Massive AI Supercomputer Behind ChatGPT Running at Microsoft

By now, you have likely heard about ChatGPT. While variations of this have been around for many years, in the past five or so years, it has become much more advanced.

Most of us have likely used the ‘talk and type’ program on our phones, Google Docs, and other messaging applications. But it has advanced to be able to dictate and write complete documents.

It’s not all that new. Many private citizens and companies use devices that you can simply ask to play a song or dim the lights. Smart features like appliances and security systems have been around for years.

ChatGPT

ChatGPT initially used a Microsoft Azure supercomputing infrastructure. This was powered by Nvidia GPUs, built by Microsoft specifically for OpenAI. A system like this costs hundreds of millions of dollars.

Of course, the cost doesn’t end there. These systems need constant monitoring and upkeep. There are many moving parts and because they are all new, there are many of these parts that fail.

This is common with anything new. And certainly not surprising when you consider the scope and magnitude of this system. Following the success of ChatGPT, Microsoft dramatically upgraded the OpenAI infrastructure early this year.

Microsoft is not new to AI. From automatic spell check and correction to language interpreters to allowing your virtual assistant to make coffee, these tools have only been advancing.

Once these AI capabilities started to improve, the company then used its expertise in high-performance computing to scale up infrastructure across its Azure cloud to allow customers to use its AI tools to build, train and serve custom AI applications.

They are also able to use more powerful graphics processing units, known as GPUs. Because these GPUs could handle more complex AI workloads, they began to understand the potential for much larger AI models.

These larger models could be programmed to understand nuances so well they were able to tackle many different language tasks at once. Of course, these larger models also had limitations.

The bigger the model, the more data it would require. But this also means you can train the model longer and that would make it more accurate. Bigger infrastructure that runs longer is what was needed.

Azure

Microsoft Azure is essentially a cloud computing system. Like others, Azure provides access, management, and development of applications and services through global data centers.
It allows a wide range of capabilities, including software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). Azure supports many programming languages, tools, and frameworks, which include Microsoft-specific and third-party software and systems.

The Azure infrastructure is what allowed Microsoft to build bigger and better machines that would run longer. This meant they were able to build up the data the system would need to function as it is needed to provide AI services.

These new and improved Azure AI supercomputing technologies allow them to accelerate breakthroughs in AI, deliver on the promise of large language models, and help expand AI’s benefits are shared broadly.

Next Step for Azure

Back in 2019, Microsoft and OpenAI entered a partnership which continues today, to collaborate on new Azure AI supercomputing technologies. The goal was to accelerate breakthroughs in AI, deliver on the promise of large language models, and help ensure AI’s benefits are shared broadly.

They worked together to build supercomputing resources in Azure that were designed and dedicated to allow OpenAI to train an expanding suite of increasingly powerful AI models.

This infrastructure included thousands of NVIDIA AI-optimized GPUs linked together in a high-throughput, low-latency network based on NVIDIA Quantum InfiniBand communications for high-performance computing.

Thousands and thousands of pages of data were programmed into the system over and over again to build up enough language to make almost anything possible. The more information the system has, the better it can function.


Azure Features

Now, Azure is equipped with ND H100 v5 VM to allow on-demand in sizes ranging from eight to thousands of NVIDIA H100 GPUs interconnected by NVIDIA Quantum-2 InfiniBand networking. Customers will see significantly faster performance for AI models over our last generation ND A100 v4 VMs with innovative technologies like:

  • 8x NVIDIA H100 Tensor Core GPUs interconnected via next-gen
    NVSwitch and NVLink 4.0
  • 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand per GPU with 3.2Tb/s per
    VM in a non-blocking fat-tree network
  • NVSwitch and NVLink 4.0 with 3.6TB/s bisectional bandwidth between
    8 local GPUs within each VM
  • 4th Gen Intel Xeon Scalable processors
  • PCIE Gen5 host to GPU interconnect with 64GB/s bandwidth per GPU
  • 16 Channels of 4800MHz DDR5 DIMMs


To quote Mustafa Suleyman, CEO, Inflection:
“Our focus on conversational AI requires us to develop and train some of the most complex large language models. Azure’s AI infrastructure provides us with the necessary performance to efficiently process these models reliably at a huge scale. We are thrilled about the new VMs on Azure and the increased performance they will bring to our AI development efforts.”


To quote Mustafa Suleyman, CEO, Inflection:


AI is Here to Stay

Although we are hearing more and more about ChatGPT and AI, it is still in its infancy. There have been outages and failures along the way, but that is to be expected.

However, as technology continues to advance, you can expect only better and faster AI and ChatGPT. There are more branches of Azure to make AI supercomputers accessible to customers for model training and Azure OpenAI Service allows customers access to the power of large-scale generative AI models.

Microsoft Azure offers services that comply with multiple compliance programs, including ISO 27001:2005 and HIPAA. You can find a detailed and up-to-date list of these services on the Microsoft Azure Trust Center Compliance page.

It’s moving quickly into the mainstream and it won’t be long before most businesses and even individuals are using ChatGPT for almost all their daily needs.

Watch This Video

For more in-depth information, please watch the following video:

What runs ChatGPT? Inside Microsoft’s AI supercomputer | Featuring Mark Russinovich

Leave a Comment