Cloud is the new computer

In our previous discussion on FinOps Sucks, FinArch Works we mentioned that the cloud should be viewed as a single supercomputer. With the widespread adoption of high-performance networks, the bandwidth between cloud nodes now surpasses the local disk read/write speeds of the previous generation of physical servers. From this point on, the cloud should no longer be seen as a cluster of virtual machines but as a supercomputer with virtually unlimited resources. (This is actually a consensus among cloud computing practitioners worldwide.) However, many practitioners might only understand this superficially. Understanding how to view the cloud is the key difference in designing cloud-based software.

Treating the cloud as a single computer versus treating it as a cluster of virtual machines makes a huge difference. Let’s explore these differences.

Architecture Design

In a distributed cluster, services are deployed, whereas, in a single operating system, programs are installed. Services are always running, while programs are triggered by events. Here are the details:

Deployment vs. Startup

Services remain online once deployed, while programs start only when triggered. Services focus on runtime performance without concern for startup speed, whereas programs need to start quickly. For example, your grep command shouldn’t take 5 seconds to warm up.

Stateful vs. Stateless

Services are stateful and provide services through interfaces; programs are stateless and operate via input and output.

Scaling Resources Proportionally vs. On-Demand Allocation

Even with Kubernetes, services can only scale proportionally, with each unit needing equal resources. Programs, however, can scale resources on-demand without proportionality.

Optimizing for Best Performance vs. Minimizing Cost

Services aim to optimize single-instance performance as they occupy static resources, while programs aim to minimize total resource usage over time since they don’t persistently consume resources.

Installation and Deployment

Maintaining a set of distributed services versus maintaining a program on an operating system differs significantly:

Cross-Cluster Compatibility vs. Cross-OS Compatibility

For cross-cluster applications, we maximize commonalities to avoid code changes during deployment. Cross-OS, we utilize different APIs provided by the OS, encapsulating logic to handle differences. For example, high-performance network programs use epoll on Linux, kqueue on BSD, and IO completion ports on Windows.

Treating VMs as Computers vs. Treating VMs as Processes

In a cluster view, VMs are like servers and should be treated like physical servers, hosting multiple services. However, if viewed as a process, a VM should host a single service with minimal configuration.

Cloud Functions as Processes vs. VMs as Long-Lifecycle Cloud Functions

From a process perspective, AWS Lambda functions start quickly but typically have lower hardware specs. VMs should be seen as processes with more compute and memory resources, stronger local storage, and longer runtimes.

Updating a Set of Services vs. Updating an Application

Code is constantly updated. Upgrading running services involves preparatory and follow-up tasks, whereas upgrading a program is simpler, akin to software upgrades on an OS. Since programs have limited lifecycles, starting with new code suffices for upgrades.

Security and Data Privacy

Interface Security vs. Data Security

Services ensure the security of interface calls, often by restricting IP access—a simplistic approach prone to leaks as deployments change. A temporary application program secures itself and its dependent data permissions to ensure security.

Logical Multi-User Isolation vs. Physical Multi-User Isolation

Services require extensive code-level isolation to handle multiple users sharing the same service, ensuring one user doesn’t monopolize resources. In contrast, each user’s application operates as its own process, with inherent data and resource isolation provided by the cloud OS.

Asset Management

CMDB vs. Billing Management

Clusters manage assets and resource-to-business relationships using CMDB. However, viewing the cloud as an OS means businesses can’t provide static server states for CMDB management. Resource usage is dynamic, tracked through logs and billing services.

Default Exclusive Resource Use vs. Default Low Resource Use

Services typically monopolize the server/container’s resources. In an OS, even a daemon program uses minimal resources, only allocating what’s needed on demand.

Conclusion

Viewing from Different Angles

Abstracting and analogizing are steps we use to understand new concepts through our experiences. Our cognitive abilities determine how we abstract and analogize.

Given that operating systems are one of the greatest abstractions in computer science history, designing cloud systems from the “cloud is the new computer” perspective fully leverages cloud elasticity and native services, minimizing resource waste. The What is the a CloudFirst architecture? proposed in Cloud-First Architecture is based on this abstraction. Interestingly, the cloud is not just a computer but a NUMA-architecture computer. Best practices for using the cloud are akin to programming on a NUMA-architecture supercomputer. We’ll delve into more details in later articles in this series.