(This figure was obtained by Google search, which actually came from VMWare.)
Virtualization refers to the practice of simulating multiple virtual machines on a single physical computer. Logically each virtual machine has its own virtualized CPU, memory, storage, and networking. Through virtualization the underlying hardware resource can be utilized with more efficiency, applications can run on the same physical computer but its own runtime environment that is isolated from each other.
There exist different levels of virtualization, for example hardware virtualization and software virtualization. Hardware virtualization means providing a virtual computer by simulating the underlying hardware, and the virtual computer is capable of running a full copy of operating system. Among hardware virtualization there exist different implementations such as full virtualization (simulating a full set of the underlying hardware such that most operating systems could run on top of the virtual machine without modifications), partial virtualization (simulating only some key hardware components, operating systems might need modifications to run in such an environment) and paravirtualization (does not simulating the underlying hardware, but rather shares the underlying hardware through virtual machine manager applications, and most operating systems need modifications to run in such an environment). Virtualization on software level usually refers to the practice of providing multiple isolated runtime environment on top of a single operating system instance, and it is often called container technology.
With hardware virtualization, most modern virtualization technologies (such as VMWare, Xen and KVM) are a combination of full virtualization and paravirtualization. Virtual machines provided by hardware virtualization technologies usually run a full copy of operating system, therefore these exist large amount of similar (or even identical) processes and memory pages on the same host machine. Currently memory pages with the same content can be consolidated by techniques such as KSM, but these is so far no good method to handle similar (or even identical) processes. Therefore hardware virtualization is usually referred to as heavy-weight virtualization, and the number of virtual machines that could run on a single host machine is relatively limited.
With software virtualization, the overhead of running multiple operating system instances does not exist. Therefore software virtualization is usually referred to as light-weight virtualization, and the number of virtual runtime environments that could present on a single host machine is relatively large. For example, in theory Solaris can support 8000 containers on a single operating system instance (the actually number of supported containers is limited by hardware sources and system work load). Similarly, LXC on Linux can easily provide a large amount of virtualized runtime environments.
In terms of virtualization technologies, most companies in China seem to focus more on hardware virtualization, and deploy hardware virtualization in development and production environments. Taobao (a subsidiary of Alibaba Inc) is one of the first to study and deploy software virtualization in production environment. Their experiences proved that replacing Xen with cgroup could result in better resource utilization.
For a specific application scenario, the decision between hardware virtualization and software virtualization should rely on whether the end users needs control over the operating system (such as kernel upgrade). If the end user only needs control over runtime environment (such as various App Engine services), software virtualization might be a better choice.
For those who want to know more about virtualization technologies, the VMWare white page Understanding Full Virtualization, Paravirtualization, and Hardware Assist is a great reference.
Generally speaking, the number of users that can access virtualization technology is very small. On Linux operating system, the user with virtual machine life cycle privileges is usually the user with libvirt access. In a company or other entities, these users are usually system administrators.
B. Virtualization Management
In the early days,virtualization technologies solved the problem of providing multiple isolated runtime environments on a single physical computer. When the number of physical computers is small, system administrators can manually login to different servers to carry out the virtual machine life cycle management tasks. When the number of physical computers becomes big, some kind of scripting / application is needed to increase the degree of automation and relief system administrators from tedious works. Applications that enables system administrators manage multiple physical and virtual computers from a single location are called virtualization management tools. Such tools can usually accomplish the following tasks: (1) manage the life cycles of multiple virtual machines on multiple physical computers; (2) query and monitor all physical and virtual computers; and (3) establish a mapping between the name of virtual machines and the actual virtual machine instances on different computers such that virtual machine identification and management becomes easier. On Linux operating system VirtManager is a simple virtualization management tool. Among the VMWare product family VMWare vSphere is a powerful virtualization management tool.
Virtualization management tools are direct extensions of virtualization technology. The purpose of a simple virtualization management tools is to rescue system administrators out of the tedious repeating work induced by increasing number of physical and virtual machines. On such level, the scope of a virtualization management tool is usually limited to a cluster. In many cases, the virtualization management tools needs to have the user name and password to access different physical computers to perform virtual machine life cycle management. To make the management work easier the system administrator might need to setup a common management user for all physical computers in the cluster.
Virtualization management tools provide convenience for the system administrators, but do not delegate virtual machine life cycle management rights to other users.
C. Data Center Virtualization
In a data center, system administrators need to look after a large amount of different hardware and applications. As compared to a small cluster, the complexity of a data center is significantly different. Now a simple virtualization management tools is no longer capable of satisfying the need of system administrators. Therefore people developed data center virtualization management software to meet these new challenges. On the hardware layer, data center virtualization management software created the concept of “resource pools” to reorganize hardware resources, where a pool is usually a group of servers with similar configuration and purpose. Computing resources are now exposed to the end user in the form of virtual infrastructure, rather than separate servers. On the software layer, data center virtualization software created different roles for system administrators and regular end users, or more fine-grained role based access control (RBAC) based on the need of a specific application scenario. System administrators have the right to manage all the physical servers and virtual machines, but usually do not interfere the virtual machines that are running normally. Regular end users can only carry out virtual machine life cycle management tasks within the resource pool that are assigned to them, and do not have the right to manage the physical servers. In the extreme case, regular end users can only see the resource pool that are assigned to them, without any knowledge of the details about the resource pool.
Before data center virtualization technology, the action of creating and managing virtual machines are usually carried out by system administrators. In a data center virtualization software, based on RBAC the virtual machine life cycle management rights are delegated to so called “regular users”, therefore relieves the pressure on system administrators (to some degree). However, for security considerations not all employees in a company can have such a “regular user” account, which is usually assigned to managers or team leads. It is safe to assume that in data center virtualization the life cycle of virtual machines are still managed centrally.
Data center virtualization management software is a further extension of virtualization management tools. It solved the problem of system complexity which is introduced by the increasing number of hardware devices and applications. When specific physical hardware are presented in the form of an abstracted “resource pool”, managers only need to worry about the size, work load, and health status of various resource pools, while end users only need to know about the status of the resource pool that is assigned to them. Only system administrators need to know by heart the configuration, work load and status of each and every single physical server. However, with the concept of resource pools, all physical devices can be reorganized in a relatively logical way, which makes the life of system administrators easier.
Modern data center virtualization management software usually provides a lot of IT ops automation functionalities. Such functionalities include (1) fast deployment of a number of same or similar runtime environments based on virtual machine templates, (2) monitoring, reporting, notification, and accounting, and (3) high availability, dynamic workload management, backup and recovery. Some data center virtualization management software even provides open API’s that allow system administrators to develop and integrate additional functionalities based on the actualy application scenarios.
Among the VMWare product family VMWare vCenter is a powerful data center virtualization management software. Other good data center virtualization management softwares include Convirt, XenServer, Oracle VM and OpenQRM.
D. Cloud Computing
Cloud computing is a further abstraction of data center virtualization. In cloud computing management software, we still have different roles such as cloud managers and regular users, and have different access rights associated with different roles. Cloud managers have the rights to manage all the physical servers and virtual machines, but usually do not interfere with virtual machines running normally. Regular users can carry out virtual machine life cycle management tasks through a web browser, or through computer programs that talks with the cloud via web services.
In cloud computing, virtual machine life cycle management rights are fully delegated to regular users. However, it also shadows the concepts of resource pools and physical servers from regular users. Regular users is capable of obtaining computing resources, without the need to know about the underlying physical infrastructure. It seems that cloud computing is simply a way to providing computing resource from remote similar to Amazon EC2/S3. In fact, cloud computing represents a change in computing resource management, end users no longer need the help of system administrators to obtain and manage computing resource.
For cloud managers, delegating virtual machine life cycle management rights to regular users does not relieve them from being grilled on fire. Rather, now they have more trouble to handle. In traditional IT infrastructure, each application has its own computing resources, and trouble shooting is relatively easy because physical isolation exists between applications. When upgradign to cloud computing, multiple applications might share the same underlying physical infrastructure, and trouble shooting becomes difficult when multiple applications compete for resources. Therefore, cloud managers usually expect a full set of data center virtualization management functionalities in a cloud computing management software. For cloud managers, critical functionalities includes (1) monitoring, reporting, notification, and accounting, (2) high availability, dynamic workload management, backup and revovery, and (3) live migration, which can be use in trouble shooting or local maintainance.
We can see that from virtualizaton to cloud computing, the degree of encapsulation for physical resources increases, while virtual machine life cycle management rights are gradually delegated.
Among the VMWare product family VMWare vCloud is a cloud computing management software. Other cloud computing management softwares includeOpenStack, OpenNebula, Eucalyptus and CloudStack. Although OpenStack, OpenNebula, Eucalyptus and CloudStack are all cloud computing management softwares, they have significant difference in functionalities, which can be traced to the difference in their design. Originally OpenNebula and CloudStack were designed to be data center virtualization management software, therefore they have a good set of data center virtualization management functionalities. When the concept of cloud computing became popular, OpenNebula added OCCI and Amazon EC2 support, while CloudStack provided an additional Amazon EC2 compatible module called CloudBridge (CloudBridge was integrated into CloudStack since version 4.0). On the contratory, Eucalyptus and OpenStack were designed to be Amazon EC2 compatible cloud computing management softwares, and they are not yet that capable in terms of data center virtualization management functionalities. Between Eucalyputs and OpenStack, Eucalyptus has some first-mover advantages since they have realized the importance of data center virtualization management functionalities based on feedbacks from the market.
E. Private Cloud and Public Cloud
The so called “cloud computing” as described in section D is only a narrow definition, or Amazon EC2 like cloud computing. Broader definitions of cloud computing usually refer to the various practices of obtaining and utilizing various computing resources (such as compute and storage) from remote, which includes both data center virtualization as described in section C and cloud computing as described in section D. In both cases, computing resources are provided to the end user in the form of virtual machines, and the end user does not need to have any knowledge of the underlying physical infrastructure. If the scope of a cloud platform is to provide service within the corporate, then it can be called a “private cloud”. If the scope of a cloud platform is to provide service to the public, then it can be called a “public cloud”. Generally speaking, private cloud emphases the ability to create virtual machines with different configurations (such as the number of vCPU’s, memory and storage), because it needs to satisfy the needs from different applications within the enterprise. On the contratory, public cloud service providers do not have much knowledge about the applications running on top of it, therefore they tend to provide standardized virtual machine products with fixed configurations, and end users can only purchase virtual machines with these fix configurations.
For public cloud service providers, their business model is similar to Amazon EC2. Therefore, most of them will choose to use a cloud computing management software as described in section D. For private cloud service providers, the decision should be make according to the computing resource management model within the enterprise. If the enterprise still wishes to execute central management of computing resources, and delegate virtual machine life cycle management rights only to managers and team leaders, a data center virtualization management software as described in section C is more appropriate. However, if the enterprise wishes to delegate virtual machine life cycle management rights to the end user, then a cloud computing management software as described in section D is more appropriate.
Traditionally, people think that a private cloud should be built upon hardware owned by the enterprise and inside a datacenter managed by the enterprise. However, when hardware vendors join the game the border between private cloud and public cloud is becoming blurred. Recently Rackspace announced private cloud services where customers can choose between self-own hardware and data center or hardware and data center owned by Rackspace. Oracle also announced private cloud services that are owned by Oracle and managed by Oracle. With such a new business model, a private cloud for a particular customer might be just an isolated resource pool for a public cloud service provider（you got private cloud in my public cloud）. For the public cloud service provider, its public cloud service infrastructure might in turn be part of its own bigger infrastructure (private cloud), or even a resource pool from a hardware vendor’s infrastructure（you got public cloud in my private cloud）.
For the customers it is financially reasonable to use a private cloud provided by a cloud service provider. This means the CapEX needed for data center construction and hardware purchasing can be converted into OPEX, while the precious cash can be used to cultivate more business opportunities. Even if in the long term the total cost of working with such kind of private cloud will be more than alternatives based on self-own data center and hardware, the return from new business might be greater than the cost delta between two options. In the extreme case, even if the company is not successful in the end, company owners don’t need to look at a large number of newly purchased hardware and cry. Unless the real estate market grows rapidly in the short term, a failing company usually won’t feel sorry for not building its own data center. (Ahh, I should mention that for a company that has been running long enough, it is still feasible to earn money through real estate. For example, before Sun Microsystems Inc was acquired by Oracle, it did successfully make one of its financial reports look much better by selling one of its major engineering campus.)
Then, what is the role of hardware vendors in this game? When the customer’s CapEX becomes OPEX, wouldn’t it take more time for hardware vendors to collect payment?
In 1865 William Jevons14 (1835-1882), a British economist, wrote a book entitled “The Coal Question”, in which he presented data on the depletion of coal reserves yet, seemingly paradoxically, an increase in the consumption of coal in England throughout most of the 19th century. He theorized that significant improvements in the efficiency of the steam engine had increased the utility of energy from coal and, in effect, lowered the price of energy, thereby increasing consumption. This is known as the Jevons paradox, the principle that as technological progress increases the efficiency of resource utilization, consumption of
that resource will increase. Durign the past 150 years, similar over-consumption was observed in many other areas such as major industry materials, transportation, energy, and food industry.
The core value of public cloud service is that fix assets (such as servers, networking equipments, and storage) that must be purchased with hugh budget by end users now become public resources that are charged by usage. Virtualization technologies improves the efficiency and, in effect, lowers the price of computing resources, which will eventually increase the consumption of computing resources. When we understand this logic, we can understand why HP launched HP Cloud Services in a hurry on top of OpenStack while OpenStack is still inmature for commercial deployment. It is right that HP Cloud Services might not be able to save HP from the next competition, but HP will certainly lose if it does not even join the competition. Similarly, we can understand why Oracle now becomes a cloud computing evangelist while it was sniffing at cloud computing two years ago. When Oracle acquired Sun Microsystems Inc in 2009, it suddenly became one of the major players in the hardware market. At that time the concept of cloud computing is relatively new, and Oracle’s response towards cloud computing proved that it had not yet become familiar with its new role. Now cloud computing is a lot more than just a new concept, it must be very silly if Oracle — as one of the major hardware vendors – does not want to pursue its share in the game.
According Jevons paradox, over-consumption is a result of price decrease. Then, how should cloud computing resources be priced?
Currently, most public cloud service providers set price tags according to the configuration of the virtual machines. Take Amazon EC2 for example, it Medium virtual machine (3.75 GB memory, 2 ECU’s, 410 GB storage, $0.16 per hour) is twice as large, and twic as expensive, as its Small virtual machine (1.7 GB memory, 1 ECU, 160 GB storage, $0.08 per hour). New comers to the competition, such as HP Cloud Services, Grand Cloud (in China), and Aliyun (in China) seem to be copying Amazon EC2′s pricing strategy. The problem is, when the size of the virtual machine gets larger (with more computing resources such as vCPU, memory and storage), the performance of the virtual machine does not increase by the same proportion. A number of performance tests on Amazon EC2, HP Cloud Services, Grand Cloud, and Aliyun suggested that for a wide range of applications the performance-to-price ratio of virtual machines actually decreases as the size of virtual machines increases. It is safe to say that such pricing strategy will not encourage users to use more computing resource.
It might be more reasonable to determine the price of virtual machines according to their performance. For example, a soap manufacturer sells their products in two different packages, the smaller package has one piece and the bigger package has two pieces. Customers are willing two buy the bigger package not because it looks bigger, but because it can do twice the work of a smaller package. Similarly, virtual machine products from the same public cloud service provider should maintain a similar performance-to-price ratio. The problem is, different applications have different requirements for processor, memory and storage resources, which results in a significant difference in the performance-configuration curve. Therefore, in public cloud there is a need for a comprehensive virtual machine performance evaluation suite, which can be used to evaluate the overall performance of a virtual machine rather than just one it components such as processor, memory or storage. Based on such a comprehensive benchmark framework, we can compare not only virtual machine products from one public cloud service provider, but also different virtual machine products across different public cloud service providers.
F. Open Source
In recent years, we are observing a rule in the information industry. When a proprietory solution becomes successful in the market, there will quickly appear one or more followers — either open source or proprietory — with similar functionalities or services. (The opposite case where open source solutions come before proprietory followers is rare.) In operating systems, Linux becomes as good as and even better than Unix, and over takes the market share of Unix. In virtualization, Xen and KVM now becomes comparable of VMWare solutions, and are nibbling VMWare’s market share. In cloud computing, proprietory solution Enomaly appeared after Amazon EC2, followed by open sourced Eucalyptus and OpenStack. At the same time, traditionaly proprietory vendors are showing more friendly attitude to open source projects and open source community. For example, Microsoft established a subsidiary called Microsoft Open Technologies in April, with the goal to promote investments on interoperability, open standards, and open source software.
The business environment today is a lot different from the 1980′s, when the Free Software Movement was started. In fact, since Netscape invented the terminology “open source” to differentiate themselves from free software in 1998, open source has become a new business model for software R&D, marketing, and sales, rather than the opposite alternative of proprietory software. Compared to the traditional proprietory business model, the open source business model exhibits the following characteristics:
（1）In the initial phase, use buzz words such as open source and free software to gain the attention of potential customers, and business partners. For potential customers, their interests is the possibility to get (part of) the functionalities of the competing proprietory software — free or at a relatively low price. For business partners, their interestes might be that they can sell an enhanced version of the open source software (such as enterprise version), provide solutions based on the open source software, or the open source software will promote the sales of its own products.
（2）In the growth phase, major R&D resources usually come from the founding members (businesses) that initiated the project and its business partners. It is true that there are independant contributors who contribute code out of personal interests, however, the number of such individual contributors is relatively small. People promoting open source software use the phrase “developed by community” frequently. In fact, during the past 10 years, the major R&D resources among most — if not all — major open source projects come from enterprise partners. However, some open source projects intentionally underscore the importance of enterprise partners, even mislead the audience to believe that individual contributors constitute the major part of the above-mentioned community.
（3）In the harvest phase, founding members (businesses) and its partners might sell enhanced version of the open source software, or solutions based on the open source software. Although other vendors can also sell similar products or services, but major contributors to the software obviously have more authority and reputation in the market. Regarding how businesses can make profit from open source software, Marten Mickos (currently the CEO of Eucalyptus Systems Inc) said during his tenure as the CEO of MySQL (in 2007) that success in open source requires you to serve (1) those who spend time to save money, and (2) those who spend money to save time. Speaking from a financial point of view, success means that revenue from software sales and services should exceed the expense in R&D and marketing. In that sense, some users are able to use open source software for free because of (1) their usage is in itself some kind of participation in the open source project, which helps the marketing of the open source software, and in some cases, helps the testing and bug fixing of the open source software, and (2) those paying customers might also be paying for those who are not paying.
Then why are open source solutions usually cheaper than proprietory competitors? Generally speaking, proprietory solutions opened a whole new area from nothing and experienced many challenges in market research, product design, engineering, marketing and sales. Open source solutions, as a follower of the proprietory solution, can take the proprietory solution as a reference in market research, product design, and even take advantages of proprietory solution’s previous work in openning the market. In terms of R&D effort, open source solutions usually appear several years after proprietory solutions became successful. During that period technology advancements in related areas will lower the bar to enter into competition. Further more, open source solutions might have some outstanding features that are far better than proprietory solutions, but generally speaking the functionality, user experience, stability, and reliability of open source solutions might not be as good as those of proprietory solutions. This is why open source solutions often promote price advantages such as “30% of the price, 80% of the functionality”. Except for price advantages, the ability to add customized functionalities is very attractive for some customers.
In China, IT companies are usually those who are willing to spend time to save money, while traditional (none IT) companies are those who are willing to spend money to save time. It should be mentioned that most of the traditional none-IT companies in China do not care about open source, but a lot of them are very interested in the ability to make customizations.
Open source as a new business model is not morally more lofty then the traditional proprietory business model. Similarly, it is not appropriate to make moral judgements for different approaches in open source practices. In the initial phase of the OpenStack project, Rackspace made public announcements saying that “OpenStack is the only fully open source cloud computing software available in the market”. Competing open source projects such as CloudStack, Eucalyptus, and OpenNebula were labled as “not truely open source” either because they had an additional enterprise version (based on the open source version) which was not open source, or a more advanced installation package (based on 100% open source smaller packages) that was provided to paying customers only. (Both Eucalyptus and CloudStack had seperate enterprise versions until April 2012. OpenNebula maintains a Pro version with all open source components for paying customers. ) Similar advertisements continued for almost 2 years, until Rackspace launched its own OpenStack-based Rackspace Private Cloud software, which is very similar to OpenNebula Pro in nature. The major difference between Rackspace Private Cloud software is free to download for all, while OpenNebula Pro is provided to paying customers only. The problem is, when the number of nodes exceeds 20 in Rackspace Private Cloud software, cloud administrators need to seek help from Rackspace, probably generating leads for fee-based customer support. Let’s leave alone the question of whether the code to set 20-node limits is open source or not for the time being. It is very difficult to explain from a moral perspective why Rackspace as a founding member of the OpenStack project adds functionalities to limit the usage of OpenStack. Rather, it would be quite reasonable if we look at such practice from a business perspective. It is fair to say that during the past two years the measurements taken by the OpenStack project in R&D, marketing and community are outstanding examples of the open source business model.
As mention before, there might exist multiple competing open source projects in a particular area of application. For example, in the broader sense of cloud computing we have Amazon EC2-like CloudStack, Eucalyptus, OpenNebula, OpenStack, and other options such as Convirt, XenServer, Oracle VM, and OpenQRM. For a particular application scenario, how can a business make a decision among so many open source options? In my experience, the software selection process can be divided into 3 different phases, including requirement analysis, technical analysis, and business analysis.
（1）During requirement analysis, we need to determine the real needs of the project and why they need a cloud solution. In China, many decision makers’ understanding on cloud computing stops at “improve efficiency, lowering operation cost, provide convinience”. They do not realize most open source solutions can satisfy such requirements in one way or another already. Further more, many decision makers refer to VMWare vCenter when talking about functionality requirements, and do not want to discuss why they need a specific functionality. Therefore, it is very important to investigate in details the actual application scenario, understand whether this is a data center virtualization management project or a Amazon EC2-like cloud computing project, and explore functionality requirements as much as possible. In some cases, both data center virtualization and Amazon EC2-like solutions can satisfy the needs of the customer, then it is up to the sales person to introduce the customers to their own solutions (such technique is called expectation management). By carrying out requirement analysis, we can filter out a significant portion of the options available.
（2）During technical analysis, compare the reference architecture of each open source solutions, with a focus on how difficult it would be to implement the reference architecture in the actual application scenario. Then compare different open source solutions in terms of functionalities, and seperate must-have functionalities from good-to-have ones. Further more, we can also compare the difficulties in installation and configuration, user experience, documentations, and customization. By carrying out technical analysis, we can make a rank for the open source solutions, and remove the last one from the list.
（3）During business analysis, make sure whether the decision maker is willing to pay for open source solutions. If yes, this is a “spend money to save time” scenario. If not, this is a “spend time to save money” scenario. For those who are willing to spend time to save money, the open source community is the major place to seek technical support, therefore the activeness of the corresponding community is a very important reference. For those who are willing to spend money to save time, they usually rely on service providers for technical support. Therefore its is very important to know the reputation of the service provider, and whether service is readily available locally. The activeness of the open source community is less important for such scenario.
In China, for application scenarios that are willing to spend money to save time, CloudStack and Eucalyptus are relatively better options. These two projects got started relatively earlier, have better stability and reliability, have good reputation in the industry, and have teams in China to provide support and services. We are seeing some startup teams in China trying to provide OpenStack-based solutions. However, these teams are too young and they still need time to accumulate necessary experiences. The Sina App Engine (SAE) team have a lot of first-hand experience with OpenStack, but they do not yet have the permission to provide support and services for other commercial customers. There are also some teams in China working with OpenNebula, but they are too small to provide support and services to others in the short term.
For application scenarios that are willing to spend time to save money, CloudStack and OpenStack are better optoins, because their user and developer community seem to be more active. Among these two options, CloudStack offers more functionality, and have more successful stories, which make it a better choice in the short term. In the long term, OpenStack is becoming more popular, but other options are making progresses too. It would be very difficult for one software to rule over in the coming 3 years. I would say that from a business perspective, CloudStack and Eucalyptus will move faster than others.
G. Additional Notes
Some friends would like me to add some more information about China. Frankly speaking, I don’t have enough data to elaborate on this topic. Liming Liu recently post a blog entry, which serves as a very good reference. The blog entry can be access from this URL, but it is in Chinese.
Regarding the activeness of different open source projects, reader can refer to my recent blog post CY12-Q3 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack for more information. Regarding performance testing on public cloud service providers, readers can refer to my other blog post HP Cloud Services Performance Tests for more information.
All the figures in this blog entry came from Google search. Many of the concepts mentioned in this blog entry came from Wikipedia, with modifications by the author.