The Cloud wins the AI infrastructure debate by default

It’s time to celebrate the incredible women leading the way in AI! Nominate your inspiring leaders for VentureBeat’s Women in AI Awards today before June 18. Learn More


As artificial intelligence (AI) takes the world by storm, an old debate is reigniting: should businesses self-host AI tools or rely on the cloud? For example, Sid Premkumar, founder of AI startup Lytix, recently shared his analysis self-hosting an open source AI model, suggesting it could be cheaper than using Amazon Web Services (AWS). 

Premkumar’s blog post, detailing a cost comparison between running the Llama-3 8B model on AWS and self-hosting the hardware, has sparked a lively discussion reminiscent of the early days of cloud computing, when businesses weighed the pros and cons of on-premises infrastructure versus the emerging cloud model.

Premkumar’s analysis suggested that while AWS could offer a price of $1 per million tokens, self-hosting could potentially reduce this cost to just $0.01 per million tokens, albeit with a longer break-even period of around 5.5 years. However, this cost comparison overlooks a crucial factor: the total cost of ownership (TCO). It’s a debate we’ve seen before during “The Great Cloud Wars,” where the cloud computing model emerged victorious despite initial skepticism.

The question remains: will on-premises AI infrastructure make a comeback, or will the cloud dominate once again?


VB Transform 2024 Registration is Open

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now


A closer look at Premkumar’s analysis 

Premkumar’s blog post provides a detailed breakdown of the costs associated with self-hosting the Llama-3 8B model. He compares the cost of running the model on AWS’s g4dn.16xlarge instance, which features 4 Nvidia Tesla T4 GPUs, 192GB of memory, and 48 vCPUs, to the cost of self-hosting a similar hardware configuration.

According to Premkumar’s calculations, running the model on AWS would cost approximately $2,816.64 per month, assuming full utilization. With the model able to process around 157 million tokens per month, this translates to a cost of $17.93 per million tokens.

In contrast, Premkumar estimates that self-hosting the hardware would require an upfront investment of around $3,800 for 4 Nvidia Tesla T4 GPUs and an additional $1,000 for the rest of the system. Factoring in energy costs of approximately $100 per month, the self-hosted solution could process the same 157 million tokens at a cost of just $0.000000636637738 per token, or $0.01 per million tokens.

While this may seem like a compelling argument for self-hosting, it’s important to note that Premkumar’s analysis assumes 100% utilization of the hardware, which is rarely the case in real-world scenarios. Additionally, the self-hosted approach would require a break-even period of around 5.5 years to recoup the initial hardware investment, during which time newer, more powerful hardware may have already emerged.

A familiar debate 

In the early days of cloud computing, proponents of on-premises infrastructure made many passionate and compelling arguments. They cited the security and control of keeping data in-house, the potential cost savings of investing in their own hardware, better performance for latency-sensitive tasks, the flexibility of customization, and the desire to avoid vendor lock-in.

Today, advocates of on-premises AI infrastructure are singing a similar tune. They argue that for highly regulated industries like healthcare and finance, the compliance and control of on-premises is preferable. They believe investing in new, specialized AI hardware can be more cost-effective in the long run than ongoing cloud fees, especially for data-heavy workloads. They cite the performance benefits for latency-sensitive AI tasks, the flexibility to customize infrastructure to their exact needs, and the need to keep data in-house for residency requirements.

The cloud’s winning hand Despite these arguments, on-premises AI infrastructure simply cannot match the cloud’s advantages. 

Here’s why the cloud is still poised to win

  1. Unbeatable cost efficiency: Cloud providers like AWS, Microsoft Azure, and Google Cloud offer unmatched economies of scale. When considering the TCO – including hardware costs, maintenance, upgrades, and staffing – the cloud’s pay-as-you-go model is undeniably more cost-effective, especially for businesses with variable or unpredictable AI workloads. The upfront capital expenditure and ongoing operational costs of on-premises infrastructure simply can’t compete with the cloud’s cost advantages.
  2. Access to specialized skills: Building and maintaining AI infrastructure requires niche expertise that is costly and time-consuming to develop in-house. Data scientists, AI engineers, and infrastructure specialists are in high demand and command premium salaries. Cloud providers have these resources readily available, giving businesses immediate access to the skills they need without the burden of recruiting, training, and retaining an in-house team.
  3. Agility in a fast-paced field: AI is evolving at a breakneck pace, with new models, frameworks, and techniques emerging constantly. Enterprises need to focus on creating business value, not on the cumbersome task of procuring hardware and building physical infrastructure. The cloud’s agility and flexibility allow businesses to quickly spin up resources, experiment with new approaches, and scale successful initiatives without being bogged down by infrastructure concerns.
  4. Robust security and stability: Cloud providers have invested heavily in security and operational stability, employing teams of experts to ensure the integrity and reliability of their platforms. They offer features like data encryption, access controls, and real-time monitoring that most organizations would struggle to replicate on-premises. For businesses serious about AI, the cloud’s enterprise-grade security and stability are a necessity.

The financial reality of AI infrastructure 

Beyond these advantages, there’s a stark financial reality that further tips the scales in favor of the cloud. AI infrastructure is significantly more expensive than traditional cloud computing resources. The specialized hardware required for AI workloads, such as high-performance GPUs from Nvidia and TPUs from Google, comes with a hefty price tag.

Only the largest cloud providers have the financial resources, unit economics, and risk tolerance to purchase and deploy this infrastructure at scale. They can spread the costs across a vast customer base, making it economically viable. For most enterprises, the upfront capital expenditure and ongoing costs of building and maintaining a comparable on-premises AI infrastructure would be prohibitively expensive.

Also, the pace of innovation in AI hardware is relentless. Nvidia, for example, releases new generations of GPUs every few years, each offering significant performance improvements over the previous generation. Enterprises that invest in on-premises AI infrastructure risk immediate obsolescence as newer, more powerful hardware hits the market. They would face a brutal cycle of upgrading and discarding expensive infrastructure, sinking costs into depreciating assets. Few enterprises have the appetite for such a risky and costly approach.

Data privacy and the rise of privacy-preserving AI 

As businesses grapple with the decision between cloud and on-premises AI infrastructure, another critical factor to consider is data privacy. With AI systems relying on vast amounts of sensitive user data, ensuring the privacy and security of this information is paramount.

Traditional cloud AI services have faced criticism for their opaque privacy practices, lack of real-time visibility into data usage, and potential vulnerabilities to insider threats and privileged access abuse. These concerns have led to a growing demand for privacy-preserving AI solutions that can deliver the benefits of cloud-based AI without compromising user privacy.

Apple’s recently announced Private Compute Cloud (PCC) is a prime example of this new breed of privacy-focused AI services. PCC extends Apple’s industry-leading on-device privacy protections to the cloud, allowing businesses to leverage powerful cloud AI while maintaining the privacy and security users expect from Apple devices.

PCC achieves this through a combination of custom hardware, a hardened operating system, and unprecedented transparency measures. By using personal data exclusively to fulfill user requests and never retaining it, enforcing privacy guarantees at a technical level, eliminating privileged runtime access, and providing verifiable transparency into its operations, PCC sets a new standard for protecting user data in cloud AI services.

As privacy-preserving AI solutions like PCC gain traction, businesses will have to weigh the benefits of these services against the potential cost savings and control offered by self-hosting. While self-hosting may provide greater flexibility and potentially lower costs in some scenarios, the robust privacy guarantees and ease of use offered by services like PCC may prove more valuable in the long run, particularly for businesses operating in highly regulated industries or those with strict data privacy requirements.

The edge case

The only potential dent in the cloud’s armor is edge computing. For latency-sensitive applications like autonomous vehicles, industrial IoT, and real-time video processing, edge deployments can be critical. However, even here, public clouds are making significant inroads.

As edge computing evolves, it’s likely that we will see more utility cloud computing models emerge. Public cloud providers like AWS with Outposts, Azure with Stack Edge, and Google Cloud with Anthos are already deploying their infrastructure to the edge, bringing the power and flexibility of the cloud closer to where data is generated and consumed. This forward deployment of cloud resources will enable businesses to leverage the benefits of edge computing without the complexity of managing on-premises infrastructure.

The verdict 

While the debate over on-premises versus cloud AI infrastructure will no doubt rage on, the cloud’s advantages are still compelling. The combination of cost efficiency, access to specialized skills, agility in a fast-moving field, robust security, and the rise of privacy-preserving AI services like Apple’s PCC make the cloud the clear choice for most enterprises looking to harness the power of AI.

Just as in “The Great Cloud Wars,” the cloud is already poised to emerge victorious in the battle for AI infrastructure dominance. It’s just a matter of time. While self-hosting AI models may appear cost-effective on the surface, as Premkumar’s analysis suggests, the true costs and risks of on-premises AI infrastructure are far greater than meets the eye. The cloud’s unparalleled advantages, combined with the emergence of privacy-preserving AI services, make it the clear winner in the AI infrastructure debate. As businesses navigate the exciting but uncertain waters of the AI revolution, betting on the cloud is still the surest path to success.