-->
In our previous discussion on FinOps Sucks, FinArch Works we mentioned that the cloud should be viewed as a single supercomputer. With the widespread adoption of high-performance networks, the bandwidth between cloud nodes now surpasses the local disk read/write speeds of the previous generation of physical servers. From this point on, the cloud should no longer be seen as a cluster of virtual machines but as a supercomputer with virtually unlimited resources. (This is actually a consensus among cloud computing practitioners worldwide.) However, many practitioners might only understand this superficially. Understanding how to view the cloud is the key difference in designing cloud-based software.
Treating the cloud as a single computer versus treating it as a cluster of virtual machines makes a huge difference. Let’s explore these differences.
In a distributed cluster, services are deployed, whereas, in a single operating system, programs are installed. Services are always running, while programs are triggered by events. Here are the details:
Services remain online once deployed, while programs start only when triggered. Services focus on runtime performance without concern for startup speed, whereas programs need to start quickly. For example, your grep command shouldn’t take 5 seconds to warm up.
Services are stateful and provide services through interfaces; programs are stateless and operate via input and output.
Even with Kubernetes, services can only scale proportionally, with each unit needing equal resources. Programs, however, can scale resources on-demand without proportionality.
Services aim to optimize single-instance performance as they occupy static resources, while programs aim to minimize total resource usage over time since they don’t persistently consume resources.
Maintaining a set of distributed services versus maintaining a program on an operating system differs significantly:
For cross-cluster applications, we maximize commonalities to avoid code changes during deployment. Cross-OS, we utilize different APIs provided by the OS, encapsulating logic to handle differences. For example, high-performance network programs use epoll on Linux, kqueue on BSD, and IO completion ports on Windows.
In a cluster view, VMs are like servers and should be treated like physical servers, hosting multiple services. However, if viewed as a process, a VM should host a single service with minimal configuration.
From a process perspective, AWS Lambda functions start quickly but typically have lower hardware specs. VMs should be seen as processes with more compute and memory resources, stronger local storage, and longer runtimes.
Code is constantly updated. Upgrading running services involves preparatory and follow-up tasks, whereas upgrading a program is simpler, akin to software upgrades on an OS. Since programs have limited lifecycles, starting with new code suffices for upgrades.
Services ensure the security of interface calls, often by restricting IP access—a simplistic approach prone to leaks as deployments change. A temporary application program secures itself and its dependent data permissions to ensure security.
Services require extensive code-level isolation to handle multiple users sharing the same service, ensuring one user doesn’t monopolize resources. In contrast, each user’s application operates as its own process, with inherent data and resource isolation provided by the cloud OS.
Clusters manage assets and resource-to-business relationships using CMDB. However, viewing the cloud as an OS means businesses can’t provide static server states for CMDB management. Resource usage is dynamic, tracked through logs and billing services.
Services typically monopolize the server/container’s resources. In an OS, even a daemon program uses minimal resources, only allocating what’s needed on demand.
Abstracting and analogizing are steps we use to understand new concepts through our experiences. Our cognitive abilities determine how we abstract and analogize.
Given that operating systems are one of the greatest abstractions in computer science history, designing cloud systems from the “cloud is the new computer” perspective fully leverages cloud elasticity and native services, minimizing resource waste. The What is the a CloudFirst architecture? proposed in Cloud-First Architecture is based on this abstraction. Interestingly, the cloud is not just a computer but a NUMA-architecture computer. Best practices for using the cloud are akin to programming on a NUMA-architecture supercomputer. We’ll delve into more details in later articles in this series.