Intel’s Death and Potential Revival

2024-12-09 作者: Ben Thompson

Stratechery

原文 #Stratechery 的其它文章

In 1980 IBM, under pressure from its customers to provide computers for personal use, not just mainframes, set out to create the IBM PC; given the project’s low internal priority but high external demand they decided to outsource two critical components: Microsoft would provide the DOS operating system, which would run on the Intel 8088 processor.

The original IBM PC

Those two deals would shape the computing industry for the following 27 years. Given that the point of the personal computer was to run applications, the operating system that provided the APIs for those applications would have unassailable lock-in, leading to Microsoft’s dominance with first DOS and then Windows, which was backwards compatible.

The 8088 processor, meanwhile, was a low-cost variant of the 8086 processor; up to that point most new processors came with their own instruction set, but the 8088 and 8086 used the same instruction set, which became the foundation of Intel processors going forward. That meant that the 286 and 386 processors that followed were backwards compatible with the 8088 in the IBM PC; in other words, Intel, too, had lock-in, and not just with MS-DOS: while the majority of applications leveraged operating system-provided APIs, it was much more common at that point in computing history to leverage lower level APIs, including calling on the processor instruction set directly. This was particularly pertinent for things like drivers, which powered all of the various peripherals a PC required.

Intel’s CISC Moat

The 8086 processor that undergirded the x86 instruction set was introduced in 1978, when memory was limited, expensive, and slow; that’s why the x86 used Complex Instruction Set Computing (CISC), which combined multiple steps into a single instruction. The price of this complexity was the necessity of microcode, dedicated logic that translated CISC instructions into its component steps so they could actually be executed.

The same year that IBM cut those deals, however, was the year that David Patterson and a team at Berkeley started work on what became known as the RISC-1 processor, which took an entirely different approach: Reduced Instruction Set Computing (RISC) replaced the microcode-focused transistors with registers, i.e. memory that operated at the same speed as the processor itself, and filled them with simple instructions that corresponded directly to transistor functionality. This would, in theory, allow for faster computing with the same number of transistors, but memory access was still expensive and more likely to be invoked given the greater number of instructions necessary to do anything, and programs and compilers needed to be completely reworked to take advantage of the new approach.

Intel, more than anyone, realized that this would be manageable in the long run. “Moore’s Law”, the observation that the number of transistors in an integrated circuit doubles every two years, was coined by Gordon Moore, their co-founder and second CEO; the implication for instruction sets was that increased software complexity and slow hardware would be solved through ever faster chips, and those chips could get even faster if they were simplified RISC designs. That is why most of the company wanted, in the mid-1980s, to abandon x86 and its CISC instruction set for RISC.

There was one man, however, who interpreted the Moore’s Law implications differently, and that was Pat Gelsinger; he led the development of the 486 processor and was adamant that Intel stick with CISC, as he explained in an oral history at the Computer Museum:

Gelsinger: We had a mutual friend that found out that we had Mr. CISC working as a student of Mr. RISC, the commercial versus the university, the old versus the new, teacher versus student. We had public debates of John and Pat. And Bear Stearns had a big investor conference, a couple thousand people in the audience, and there was a public debate of RISC versus CISC at the time, of John versus Pat.

And I start laying out the dogma of instruction set compatibility, architectural coherence, how software always becomes the determinant of any computer architecture being developed. “Software follows instruction set. Instruction set follows Moore’s Law. And unless you’re 10X better and John, you’re not 10X better, you’re lucky if you’re 2X better, Moore’s Law will just swamp you over time because architectural compatibility becomes so dominant in the adoption of any new computer platform.” And this is when x86– there was no server x86. There’s no clouds at this point in time. And John and I got into this big public debate and it was so popular.

Brock: So the claim wasn’t that the CISC could beat the RISC or keep up to what exactly but the other overwhelming factors would make it the winner in the end.

Gelsinger: Exactly. The argument was based on three fundamental tenets. One is that the gap was dramatically overstated and it wasn’t an asymptotic gap. There was a complexity gap associated with it but you’re going to make it leap up and that the CISC architecture could continue to benefit from Moore’s Law. And that Moore’s Law would continue to carry that forward based on simple ones, number of transistors to attack the CISC problems, frequency of transistors. You’ve got performance for free. And if that gap was in a reasonable frame, you know, if it’s less than 2x, hey, in a Moore’s Law’s term that’s less than a process generation. And the process generation is two years long. So how long does it take you to develop new software, porting operating systems, creating optimized compilers? If it’s less than five years you’re doing extraordinary in building new software systems. So if that gap is less than five years I’m going to crush you John because you cannot possibly establish a new architectural framework for which I’m not going to beat you just based on Moore’s Law, and the natural aggregation of the computer architecture benefits that I can bring in a compatible machine. And, of course, I was right and he was wrong.

Intel would, over time, create more RISC-like processors, switching out microcode for micro-ops processing units that dynamically generated RISC-like instructions from CISC-based software that maintained backwards compatibility; Gelsinger was right that no one wanted to take the time to rewrite all of the software that assumed an x86 instruction set when Intel processors were getting faster all of the time, and far out-pacing RISC alternatives thanks to Intel’s manufacturing prowess.

That, though, turned out to be Intel’s soft underbelly; while the late Intel CEO Paul Ottelini claimed that he turned down the iPhone processor contract because of price, Tony Fadell, who led the creation of the iPod and iPhone hardware, told me in a Stratechery Interview that the real issue was Intel’s obsession with performance and neglect of efficiency.

The new dimension that always came in with embedded computing was always the power element, because on battery-operated devices, you have to rethink how you do your interrupt structures, how you do your networking, how you do your memory. You have to think about so many other parameters when you think about power and doing enough processing effectively, while having long battery life. So everything for me was about long, long battery life and why do we do what we do? David Tupman was on the team, the iPod team with me, he would always say every nanocoulomb was sacred, and we would go after that and say, “Okay, where’s that next coulomb? Where are we going to go after it?” And so when you take that microscopic view of what you’re building, you look at the world very differently.

For me, when it came to Intel at the time, back in the mid-2000s, they were always about, “Well, we’ll just repackage what we have on the desktop for the laptop and then we’ll repackage that again for embedding.” It reminded me of Windows saying, “I’m going to do Windows and then I’m going to do Windows Mobile and I’m going to do Windows embedded.” It was using those same cores and kernels and trying to slim them down.

I was always going, “Look, do you see how the iPhone was created? It started with the guts of the iPod, and we grew up from very little computing, and very little space, and we grew into an iPhone, and added more layers to it.” But we weren’t taking something big and shrinking it down. We were starting from the bottom up and yeah, we were taking Mac OS and shrinking it down, but we were significantly shrinking it down. Most people don’t want to take those real hard cuts to everything because they’re too worried about compatibility. Whereas if you’re just taking pieces and not worrying about compatibility, it’s a very different way of thinking about how building and designing products happens.

This is why I was so specific with that “27 year” reference above; Apple’s 2007 launch of the iPhone marked the end of both Microsoft and Intel’s dominance, and for the same reason. The shift to efficiency as the top priority meant that you needed to rewrite everything; that, by extension, meant that Microsoft’s API and Intel’s x86 instruction set were no longer moats but millstones. On the operating side Apple stripped macOS to the bones and rebuilt it for efficiency; that became iOS, and the new foundation for apps; on the processor side Apple used processors based on the ARM instruction set, which was RISC from the beginning. Yes, that meant a lot of things had to be rewritten, but here the rewriting wasn’t happening by choice, but by necessity.

This leads, as I remarked to Fadell in that interview, to a rather sympathetic interpretation of Microsoft and Intel’s failure to capture the mobile market; neither company had a chance. They were too invested in the dominant paradigm at the time, and thus unable to start from scratch; by the time they realized their mistake, Apple, Android, and ARM had already won.

Intel’s Missed Opportunity

It was their respective response to missing mobile that saved Microsoft, and doomed Intel. For the first seven years of the iPhone both companies refused to accept their failure, and tried desperately to leverage what they viewed as their unassailable advantages: Microsoft declined to put its productivity applications on iOS or Android, trying to get customers to adopt Windows Mobile, while Intel tried to bring its manufacturing prowess to bear to build processors that were sufficiently efficient while still being x86 compatible.

It was in 2014 that their paths diverged: Microsoft named Satya Nadella its new CEO, and his first public decision was to launch Office on iPad. This was a declaration of purpose: Microsoft would no longer be defined by Windows, and would instead focus on Azure and the cloud; no, that didn’t have the software lock-in of Windows — particularly since a key Azure decision was shifting from Windows servers to Linux — but it was a business that met Microsoft’s customers where they were, and gave the company a route to participating in the massive business opportunities enabled by mobile (given that most apps are in fact cloud services), and eventually, AI.

The equivalent choice for Intel would have been to start manufacturing ARM chips for 3rd parties, i.e. becoming a foundry instead of an integrated device manufacturer (IDM); I wrote that they should do exactly that in 2013:

It is manufacturing capability, on the other hand, that is increasingly rare, and thus, increasingly valuable. In fact, today there are only four major foundries: Samsung, GlobalFoundries, Taiwan Semiconductor Manufacturing Company, and Intel. Only four companies have the capacity to build the chips that are in every mobile device today, and in everything tomorrow.

Massive demand, limited suppliers, huge barriers to entry. It’s a good time to be a manufacturing company. It is, potentially, a good time to be Intel. After all, of those four companies, the most advanced, by a significant margin, is Intel. The only problem is that Intel sees themselves as a design company, come hell or high water.

Making chips for other companies would have required an overhaul of Intel’s culture and processes for the sake of what was then a significantly lower margin opportunity; Intel wasn’t interested, and proceeded to make a ton of money building server chips for the cloud.

In fact, though, the company was already fatally wounded. Mobile meant volume, and as the cost of new processes skyrocketed, the need for volume to leverage those costs skyrocketed as well. It was TSMC that met the moment, with Apple’s assistance: the iPhone maker would buy out the first year of every new process advancement, giving TSMC the confidence to invest, and eventually surpass Intel. That, in turn, benefited AMD, Intel’s long-time rival which now fabbed its chips at TSMC, which not only had better processor designs but, for the first time, had a better process, leading to huge gains in the data center. All of that low-level work on ARM, meanwhile, helped make ARM in PCS and in the datacenter viable, putting further pressure on Intel’s core markets.

AI was the final blow: not only did Intel not have a competitive product, it also did not have a foundry through which it could have benefitted from the exploding demand for AI chips; making matters worse is the fact that data center spending on GPUs is coming at the expense of traditional server chips, Intel’s core market.

Intel’s Death

The fundamental flaw with Pat Gelsinger’s 2020 return to Intel and his IDM 2.0 plan is that it was a decade too late. Gelsinger’s plan was to become a foundry, with Intel as its first-best customer. The former was the way to participate in mobile and AI and gain the volume necessary to push technology forward, which Intel has always done better than anyone else (EUV was the exception to the rule that Intel invents and introduces every new advance in processor technology); the latter was the way to fund the foundry and give it guaranteed volume.

Again, this is exactly what Intel should have done a decade ago, while TSMC was still in their rear-view mirror in terms of processing technology, and when its products were still dominant in PCs and the data center. By the time Gelsinger came on board, though, it was already too late: Intel’s process was behind, its product market share was threatened on all of the fronts noted above, and high-performance ARM processors had been built by TSMC for years (which meant a big advantage in terms of pre-existing IP, design software, etc.). Intel brought nothing to the table as a foundry other than being a potential second source to TSMC, which, to make matters worse, has dramatically increased its investment in leading edge nodes to absorb that skyrocketing demand. Intel’s products, meanwhile, are either non-competitive (because they are made by Intel) or not-very-profitable (because they are made by TSMC), which means that Intel is simply running out of cash.

Given this, you can make the case that Gelsinger was never the right person for the job; shortly after he took over I wrote in Intel Problems that the company needed to be split up, but he told me in a 2022 Stratechery Interview that he — and the board — weren’t interested in that:

So last week, AMD briefly passed Intel in market value, and I think Nvidia did a while ago, and neither of these companies build their own chips. It’s kind of like an inverse of the Jerry Sanders quote about “Real men have fabs!” When you were contemplating your strategy for Intel as you came back, how much consideration was there about going the same path, becoming a fabless company and leaning into your design?

PG: Let me give maybe three different answers to that question, and these become more intellectual as we go along. The first one was I wrote a strategy document for the board of directors and I said if you want to split the company in two, then you should hire a PE kind of guy to go do that, not me. My strategy is what’s become IDM 2.0 and I described it. So if you’re hiring me, that’s the strategy and 100% of the board asked me to be the CEO and supported the strategy I laid out, of which this is one of the pieces. So the first thing was all of that discussion happened before I took the job as the CEO, so there was no debate, no contemplation, et cetera, this is it.

Fast forward to last week, and the Intel board — which is a long-running disaster — is no longer on board, firing Gelsinger in the process. And, to be honest, I noted a couple of months ago that Gelsinger’s plan probably wasn’t going to work without a split and a massive cash infusion from the U.S. government, far in excess of the CHIPS Act.

That, though, doesn’t let the board off the hook: not only are they abandoning a plan they supported, their ideas for moving Intel forward are fundamentally wrong. Chairman Frank Yeary, who has inexplicable been promoted despite being present for the entirety of the Intel disaster, said in Intel’s press release about Gelsinger’s departure:

While we have made significant progress in regaining manufacturing competitiveness and building the capabilities to be a world-class foundry, we know that we have much more work to do at the company and are committed to restoring investor confidence. As a board, we know first and foremost that we must put our product group at the center of all we do. Our customers demand this from us, and we will deliver for them. With MJ’s permanent elevation to CEO of Intel Products along with her interim co-CEO role of Intel, we are ensuring the product group will have the resources needed to deliver for our customers. Ultimately, returning to process leadership is central to product leadership, and we will remain focused on that mission while driving greater efficiency and improved profitability.

Intel’s products are irrelevant to the future; that’s the fundamental foundry problem. If x86 still mattered, then Intel would be making enough money to fund its foundry efforts. Moreover, prospective Intel customers are wary that Intel — as it always has — will favor itself at the expense of its customers; the board is saying that is exactly what they want to do.

In fact, it is Intel’s manufacturing that must be saved. This is a business that yes, needs billions upon billions of dollars in funding, but it not only has a market as a TSMC competitor, but also the potential to lead that market in the long run. Moreover, Intel foundry existing is critical to national security: currently the U.S. is completely dependent on TSMC and Taiwan and all of the geopolitical risk that entails. That means it will fall on the U.S. government to figure out a solution.

Saving Intel

Last month, in A Chance to Build, I explained how tech has modularized itself over the decades, with hardware — including semiconductor fabrication — largely being outsourced to Asia, while software is developed in the U.S. The economic forces undergirding this modularization, including the path dependency from the past sixty years, will be difficult to overcome, even with tariffs.

Apple could not only not manufacture an iPhone in the U.S. because of cost, it also can’t do so because of capability; that capability is downstream of an ecosystem that has developed in Asia and a long learning curve that China has traveled and that the U.S. has abandoned. Ultimately, though, the benefit to Apple has been profound: the company has the best supply chain in the world, centered in China, that gives it the capability to build computers on an unimaginable scale with maximum quality for not that much money at all. This benefit has extended to every tech company, whether they make their own hardware or not. Software has to run on something, whether that be servers or computers or phones; hardware is software’s most essential complement.

The inverse may be the key to American manufacturing: software as hardware’s grantor of viability through integration. This is what Tesla did: the company is deeply integrated from software down through components, and builds vehicles in California (of course it has an even greater advantage with its China factory).

This is also what made Intel profitable for so long: the company’s lock-in was predicated on software, which allowed for massive profit margins that funded all of that innovation and leading edge processes in America, even as every other part of the hardware value chain went abroad. And, by extension, the reason why a product focus is a dead end for the company is because nothing is preserving x86 other than the status quo.

It follows, then, that if the U.S. wants to make Intel viable, it ideally will not just give out money, but also a point of integration. To that end, consider this report from Reuters:

A U.S. congressional commission on Tuesday proposed a Manhattan Project-style initiative to fund the development of AI systems that will be as smart or smarter than humans, amid intensifying competition with China over advanced technologies. The bipartisan U.S.-China Economic and Security Review Commission stressed that public-private partnerships are key in advancing artificial general intelligence, but did not give any specific investment strategies as it released its annual report.

To quote the report’s recommendation directly:

The Commission recommends:

Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability. AGI is generally defined as systems that are as good as or better than human capabilities across all cognitive domains and would surpass the sharpest human minds at every task. Among the specific actions the Commission
recommends for Congress:

Provide broad multiyear contracting authority to the executive branch and associated funding for leading artificial intelligence, cloud, and data center companies and others to advance the stated policy at a pace and scale consistent with the goal of U.S. AGI leadership; and

Direct the U.S. secretary of defense to provide a Defense Priorities and Allocations System “DX Rating” to items in the artificial intelligence ecosystem to ensure this project receives national priority.

The problem with this proposal is that spending the money via “public-private partnerships” will simply lock-in the current paradigm; I explained in A Chance to Build:

Software runs on hardware, and here Asia dominates. Consider AI:

Chip design, a zero marginal cost activity, is done by Nvidia, a Silicon Valley company.

Chip manufacturing, a minimal marginal cost activity that requires massive amounts of tacit knowledge gained through experience, is done by TSMC, a Taiwanese company.

An AI system contains multiple components beyond the chip, many if not most of which are manufactured in China, or other countries in Asia.

Final assembly generally happens outside of China due to U.S. export controls; Foxconn, for example, assembles many of its systems in Mexico.

AI is deployed mostly by U.S. companies, and the vast majority of application development is done by tech companies and startups, primarily in Silicon Valley.

The fact that the U.S. is the bread in the AI sandwich is no accident: those are the parts of the value chain where marginal cost is non-existent and where the software talent has the highest leverage. Similarly, it’s no accident that the highest value add in terms of hardware happens in Asia, where expertise has been developing for fifty years. The easiest — and by extension, most low-value — aspect is assembly, which can happen anywhere labor is cheap.

Given this, if the U.S. is serious about AGI, then the true Manhattan Project — doing something that will be very expensive and not necessarily economically rational — is filling in the middle of the sandwich. Saving Intel, in other words.

Start with the fact that we know that leading AI model companies are interested in dedicated chips; OpenAI is reportedly working on its own chip with Broadcom, after flirting with the idea of building its own fabs. The latter isn’t viable for a software company in a world where TSMC exists, but it is for the U.S. government if it’s serious about domestic capabilities continuing to exist. The same story applies to Google, Amazon, Microsoft, and Meta.

To that end, the U.S. government could fund an independent Intel foundry — spin out the product group along with the clueless board to Broadcom or Qualcomm or private equity — and provide price support for model builders to design and buy their chips there. Or, if the U.S. government wanted to build the whole sandwich, it could directly fund model builders — including one developed in-house — and dictate that they not just use but deeply integrate with Intel-fabricated integrated chips (it’s not out of the question that a fully integrated stack might actually be the optimal route to AGI).

It would, to be sure, be a challenge to keep such an effort out of the clutches of the federal bureaucracy and the dysfunction that has befallen the U.S. defense industry. It would be essential to give this effort the level of independence and freedom that the original Manhattan Project had, with compensation packages to match; perhaps this would be a better use of Elon Musk’s time — himself another model builder — than DOGE?

This could certainly be bearish for Nvidia, at least in the long run. Nvidia is a top priority for TSMC, and almost certainly has no interest in going anywhere else; that’s also why it would be self-defeating for a U.S. “Manhattan Project” to simply fund the status quo, which is Nvidia chips manufactured in Taiwan. Competition is ok, though; the point isn’t to kill TSMC, but to stand up a truly domestic alternative (i.e. not just a fraction of non-leading edge capacity in Arizona). Nvidia for its part deserves all of the success it is enjoying, but government-funded alternatives would ultimately manifest for consumers and businesses as lower prices for intelligence.

This is all pretty fuzzy, to be clear. What does exist, however, is a need — domestically sourced and controlled AI, which must include chips — and a company, in Intel, that is best placed to meet that need, even as it needs a rescue. Intel lost its reason to exist, even as the U.S. needs it to exist more than ever; AI is the potential integration point to solve both problems at the same time.

文章版权归原作者所有。