Cloudflare on the Edge

2021-05-11 作者: Ben Thompson 原文 #Stratechery 的其它文章

Cloudflare on the Edge ——

Matthew Prince, at the end of his prepared remarks after Cloudflare’s recent earnings report, related a story from the company’s earliest days:

Back in 2010, right before Cloudflare’s first Board meeting and our launch, I got some advice from one of our early investors. He said running a company is a bit like flying an airplane. You want to make sure it’s well maintained at all times. And that when you’re flying, you keep the wheel steady and the nose 10 degrees about the horizon. That’s stuck with me, and we’ve designed Cloudflare for consistent and disciplined execution. That shows in quarters like the one we just had.

What is most important of all, though, is the destination that airplane is headed for.

TechCrunch Disrupt

The launch Prince referred to happened at TechCrunch Disrupt 2010; the entire video is worth a watch, but there are three highlights in particular. First, Prince — despite a three-minute technical delay — did an excellent job of laying out Cloudflare’s core value proposition:

Prince, a graduate of Harvard Business School, explicitly invoked HBS Professor Clayton Christensen while answering a question about competition:

The most memorable moment of the presentation, though, was Prince’s response to a seemingly anodyne question about when companies might grow out of Cloudflare’s offering:

Despite the audacity of Prince’s answer — Our vision is that we’re going to power the Internet — the company’s list of competitors in its 2019 S-1 seemed rather aspirational, in both breadth and scale:

Our current and potential future competitors include a number of different types of companies, including:

  • On-premise hardware network vendors, such as Cisco Systems Inc., F5 Networks, Inc., Check Point Software Technologies Ltd., FireEye, Inc., Imperva, Inc., Palo Alto Networks, Inc., Juniper Networks, Inc., and Riverbed Technology, Inc.;
  • Point-cloud solution vendors, including cloud security vendors such as Zscaler, Inc. and Cisco Systems Inc. through Umbrella (formerly known as OpenDNS), content delivery network vendors such as Akamai Technologies, Inc., Limelight Networks, Inc., Fastly, Inc., and Verizon Communications Inc. through Edgecast, domain name system vendors services such as Oracle Corporation through DYN, NeuStar, Inc., and UltraDNS Corporation, and cloud SD-WAN vendors; and
  • Traditional public cloud vendors, such as Amazon.com, Inc. through Amazon Web Services, Alphabet Inc. through Google Cloud Platform, Microsoft Corporation through Azure, and Alibaba Group Holding Limited through Alibaba Cloud.

The first two categories make sense; after all, Cloudflare’s value proposition from the beginning was speed and security, so of course they would grow up to compete with network and security vendors. It was that last bullet point, though, that even now leads to raised eyebrows: Cloudflare’s big quarter entailed $138 million in revenue; AWS, over the same period, made $150 million a day.

Cloudflare Disrupt

To understand why Cloudflare sees public cloud vendors as competitors it helps to go back to what made Cloudflare disruptive; Christensen wrote in The Innovator’s Dilemma:

Occasionally, however, disruptive technologies emerge: innovations that result in worse product performance, at least in the near-term. Ironically, in each of the instances studied in this book, it was disruptive technology that precipitated the leading firms’ failure. Disruptive technologies bring to a market a very different value proposition than had been available previously. Generally, disruptive technologies underperform established products in mainstream markets. But they have other features that a few fringe (and generally new) customers value. Products based on disruptive technologies are typically cheaper, simpler, smaller, and, frequently, more convenient to use.

That was basically Prince’s value proposition: Cloudflare’s CDN would be cheaper (free), simpler (just change DNS servers), smaller (only 5 servers to start), and more convenient (ridiculously easy!). And Cloudflare’s customers were definitely fringe:

What Cloudflare had in its favor, though, was the most potent advantage on the Internet: the service, much like Google a decade-earlier with its link-based ranking system, got better with use. This was because Cloudflare paired its content delivery network with DDoS protection; the latter was extremely attractive to websites, gave Cloudflare an in with ISPs who valued the protection to build point-of-presence servers around the world, and, critically, gave Cloudflare better-and-better data about how data flowed around the world (improving its service) even as it improved its CDN capabilities.

Cloudflare’s focus on security-for-free also meant its CDN was built on general-purpose hardware from the beginning; from the S-1:

To achieve the level of efficiency needed to compete with hardware appliances required us to invent a new type of platform. That platform needed to be built on commodity hardware. It needed to be architected so any server in any city that made up Cloudflare’s network could run every one of our services. It also needed the flexibility to move traffic around to serve our highest paying customers from the most performant locations while serving customers who paid us less, or even nothing at all, from wherever there was excess capacity.

As time went on those general purpose machines were used for more-and-more offerings beyond a CDN and DDoS protection; HHHypergrowth has a fantastic overview of everything Cloudflare is working on, and the article is daunting in length because Cloudflare’s portfolio is so vast. It is Cloudflare Workers, though, that are responsible for the big cloud players being in Cloudflare’s competitive set.

Cloudflare Workers

Cloudflare launched Workers seven years after the company’s launch at Disrupt; from the introductory blog post:

Cloudflare is about to go through a similar transition [as programmable CPUs]. At its most basic level, Cloudflare is an HTTP cache that runs in 117 locations worldwide (and growing). The HTTP standard defines a fixed feature set for HTTP caches. Cloudflare, of course, does much more, such as providing DNS and SSL, shielding your site against attacks, load balancing across your origin servers, and so much else.

But, these are all fixed functions. What if you want to load balance with a custom affinity algorithm? What if standard HTTP caching rules aren’t quite right, and you need some custom logic to boost your cache hit rate? What if you want to write custom WAF rules tailored for your application?

You want to write code.

We can keep adding features forever, but we’ll never cover every possible use case this way. Instead, we’re making Cloudflare’s edge network programmable. We provide servers in 117+ locations around the world — you decide how to use them.

Workers were extremely limited in functionality to start; just a bit of stateless Javascript code running in a V8 isolate, but as close to users as possible. In 2018 Cloudflare added a key-value store, giving Workers access to highly distributed eventually-consistent data storage; in 2020 the company introduced Workers Unbound, dramatically expanding Workers capabilities, and Durable Objects, which not only store data but also state, which means a single source of truth. Once again Cloudflare’s network comes to the rescue:

When using Durable Objects, Cloudflare automatically determines the Cloudflare datacenter that each object will live in, and can transparently migrate objects between locations as needed. Traditional databases and stateful infrastructure usually require you to think about geographical “regions”, so that you can be sure to store data close to where it is used.

Thinking about regions can often be an unnatural burden, especially for applications that are not inherently geographical. With Durable Objects, you instead design your storage model to match your application’s logical data model. For example, a document editor would have an object for each document, while a chat app would have an object for each chat. There is no problem creating millions or billions of objects, as each object has minimal overhead.

In Cloudflare’s example of a chat app, every individual conversation is an object, and that object is moved as close to the participants as possible; two people chatting in the U.S. would utilize a Durable Object in a U.S. data center, for example, while two in Europe would use one there. There is a bit of additional latency, but less than there might be with a centralized cloud provider. That’s ok, though, because the real advantage of Workers isn’t what Cloudflare thought it was.

Public Cloud Economics

The economics of public clouds are very straightforward: it makes far more sense for Amazon or Microsoft or Google to build and maintain data centers all over the world and rent out capacity than it does for companies for whom data centers are not their core competency to duplicate their efforts at a sub-scale level. It’s so compelling I labeled the current state The End of the Beginning:

This last point gets at why the cloud and mobile, which are often thought of as two distinct paradigm shifts, are very much connected: the cloud meant applications and data could be accessed from anywhere; mobile made the I/O layer available anywhere. The combination of the two make computing continuous.

A drawing of The Evolution of Computing

What is notable is that the current environment appears to be the logical endpoint of all of these changes: from batch-processing to continuous computing, from a terminal in a different room to a phone in your pocket, from a tape drive to data centers all over the globe. In this view the personal computer/on-premises server era was simply a stepping stone between two ends of a clearly defined range.

While this view of the omnipresent cloud is true for end users, the story is a bit more complicated for developers; if you want to set up a new instance you need to first select a region. AWS, for example, has twenty-five regions around the world:

AWS regions

Once you choose a region your actual app is geographically contained in that region. In theory that limitation gives an advantage to Cloudflare Workers; Prince wrote in a blog post:

Since we’re unlikely to make the speed of light any faster, the ability for any developer to write code and have it run across our entire network means we will always have a performance advantage over legacy, centralized computing solutions — even those that run in the “cloud.” If you have to pick an “availability zone” for where to run your application, you’re always going to be at a performance disadvantage to an application built on a platform like Workers that runs everywhere Cloudflare’s network extends.

The truth, though, is that this performance doesn’t matter very much for most applications. Stratechery’s podcast service runs in the US East (Ohio) region, for example, and it doesn’t really make a difference for me, despite the fact I’m halfway around the world. Price admitted as such:

But let’s be real a second. Only a limited set of applications are sensitive to network latency of a few hundred milliseconds. That’s not to say under the model of a modern major serverless platform network latency doesn’t matter, it’s just that the applications that require that extra performance are niche…People who talk a lot about edge computing quickly start talking about IoT and driverless cars. Embarrassingly, when we first launched the Workers platform, I caught myself doing that all the time.

Indeed, for almost all applications the public clouds were good enough, and again, the economics made any other choice a bad idea.

The Edge Opportunity

Earlier this year, in the wake of January 6, I wrote Internet 3.0 and the Beginning of (Tech) History; after raising the arguments from The End of the Beginning I noted:

In the case of the Internet, we are at the logical endpoint of technological development; here, though, the impasse is not the nature of man, but the question of sovereignty, and the potential re-liberation of megalothymia is the likely refusal by people, companies, and countries around the world to be lorded over by a handful of American giants.

As long as economics were all that mattered, we would only ever have the centralized cloud providers; the “limited set of applications” that needed minimal latency could pay a bit more to run on those blue AWS edge providers in the maps above. The point of that article, though, is that economics weren’t the only thing that mattered: going forward politics would be even more important.

Prince had the same realization; the blog post I have been quoting is entitled The Edge Computing Opportunity: It’s Not What You Think, and the chief benefits Prince cites are very much about politics:

Most computing resources that run on cloud computing platforms, including serverless platforms, are created by developers who work at companies where compliance is a foundational requirement. And, up until to now, that’s meant ensuring that platforms follow government regulations like GDPR (European privacy guidelines) or have certifications providing that they follow industry regulations such as PCI DSS (required if you accept credit cards), FedRamp (US government procurement requirements), ISO27001 (security risk management), SOC 1/2/3 (Security, Confidentiality, and Availability controls), and many more.

But there’s a looming new risk of regulatory requirements that legacy cloud computing solutions are ill-equipped to satisfy. Increasingly, countries are pursuing regulations that ensure that their laws apply to their citizens’ personal data. One way to ensure you’re in compliance with these laws is to store and process data of a country’s citizens entirely within the country’s borders.

The EU, India, and Brazil are all major markets that have or are currently considering regulations that assert legal sovereignty over their citizens’ personal data. China has already imposed data localization regulations on many types of data. Whether you think that regulations that appear to require local data storage and processing are a good idea or not — and I personally think they are bad policies that will stifle innovation — my sense is the momentum behind them is significant enough that they are, at this point, likely inevitable. And, once a few countries begin requiring data sovereignty, it will be hard to stop nearly every country from following suit.

This potential reality presents a big problem for Amazon, Microsoft, and Google: what scales on their side is the cloud as a whole, from management to interface to purchasing; individual developers are meant to stay in their regions. Yes, all three companies guarantee that data in one region won’t go elsewhere, but it’s a development nightmare: you have to maintain different apps with different data stores in different regions.

Cloudflare, meanwhile, can use the same capabilities that seamlessly transfer Durable Objects to the nearest data center, to follow local compliance data sovereignty laws at a granual level; from an announcement for Jurisdictional Restrictions for Durable Objects:

Durable Objects, currently in limited beta, already make it easy for customers to manage state on Cloudflare Workers without worrying about provisioning infrastructure. Today, we’re announcing Jurisdictional Restrictions for Durable Objects, which ensure that a Durable Object only stores and processes data in a given geographical region. Jurisdictional Restrictions make it easy for developers to build serverless, stateful applications that not only comply with today’s regulations, but can handle new and updated policies as new regulations are added…

By setting restrictions at a per-object level, it becomes easy to ensure compliance without sacrificing developer productivity. Applications running on Durable Objects just need to identify the jurisdictional rules a given Object should follow and set the corresponding rule at creation time. Gone is the need to run multiple clusters of infrastructure across cloud provider regions to stay compliant — Durable Objects are both globally accessible and capable of partitioning state with no infrastructure overhead.

Durable Objects are not, in-and-of-themselves, going to kill the public clouds; what they represent, though, is an entirely new way of building infrastructure — from the edge in, as opposed to the data center out — that is perfectly suited to a world where politics matters more than economics.

Internet 3.0

I actually already covered Cloudflare’s differentiated approach, albeit in passing, and by accident. Back in March, I interviewed Prince in the process of writing Moderation in Infrastructure; one thing that stood out to me was how his response to Internet fragmentation differed from Microsoft President Brad Smith, and Google Cloud CEO Thomas Kurian:

Smith:

I think, it’s a reflection of the fact that if you’re a global technology business, most of the time, it is far more efficient and legally compliant to operate a global model than to have different practices and standards in different countries, especially when you get to things that are so complicated. It’s very hard to have content moderators make decisions about individual pieces of content under one standard, but to try to do it and say, “Well, okay, we’ve evaluated this piece of content and it can stay up in the US but go down in France.” Then you add these additional layers of complexity that add both cost and the risk of non-compliance which creates reputational risk.

Kurian:

So far, we have tried to get to what’s common, and the reality is, Ben, it’s super hard on a global basis to design software that behaves differently in different countries. It is super difficult. And at the scale at which we’re operating and the need for privacy, for example, it has to be software and systems that do the monitoring. You cannot assume that the way you’re going to enforce ToS and AUPs is by having humans monitor everything, I mean we have so many customers at such a large scale. And so that’s probably the most difficult thing is saying virtual machines behave one way in Canada, and a different way in the United States, and a third way…I mean that’s super complicated.

Prince:

Everywhere in the world, governments have some political legitimacy, and they certainly have a lot more political legitimacy than I do…It’s important that we comply with the laws in each jurisdiction in which we operate. We should help our customers comply with the laws in each jurisdiction we operate…Germany can set whatever rules they want for Germany, but it has to be the rules inside of Germany.

And you can manage that okay. You can manage on a per country basis. You feel good about that?

Sure. I mean, for us, that’s easy. And then we can provide that to our customers as a function of what we’re doing. But I think that if you could say, German rules don’t extend beyond Germany and French rules don’t extend beyond France and Chinese rules don’t extend beyond China and that you have some human rights floor that’s in there.

Right. But given the nature of the internet, isn’t that the whole problem? Because, anyone in Germany can go to any website outside of Germany.

That’s the way it used to be, I’m not sure that’s going to be the way it’s going to be in the future. Because, there’s a lot of atoms under all these bits and there’s an ISP somewhere, or there’s a network provider somewhere that’s controlling how that flows and so I think that, that we have to follow the law in all the places that are around the world and then we have to hold governments responsible to the rule of law, which is transparency, consistency, accountability. And so, it’s not okay to just say something disappears from the internet, but it is okay to say due to German law it disappeared from the internet. And if you don’t like it, here’s who you complain to, or here’s who you kick out of office so you do whatever you do. And if we can hold that, we can let every country have their own rules inside of that, I think that’s what keeps us from slipping to the lowest common denominator.

The quotes aren’t perfectly comparable — you can read the full interviews to get the context — but it makes sense that Microsoft and Google (and presumably Amazon) would be very concerned about a world where individual countries make their own laws about what can be put on the Internet, or even seen. Theirs are services predicated on the superior economics that come from centralization; Cloudflare, on the other hand, is already doing all of its computing on the edge — data sovereignty rules are simply a variable. It’s “easy”.

This is why the direction of Cloudflare’s metaphorical plane is so fascinating: Cloudflare’s current addressable market of enterprise security and networking is significant, particularly as remote work has laid bare the problems with traditional approaches; the destination with outsized upside, though, is Internet 3.0, and the resultant need for a service that routes around obstacles, not from nuclear war, but sovereign governments.1

  1. As I explained last week, I use “Internet 3.0” instead of Web3 because the world beyond centralization isn’t only going to be about crypto; Cloudflare’s potential is a great example.

文章版权归原作者所有。
二维码分享本站