Virtualization, Cloud Computing, Open Source and Beyond…

By , October 15, 2012 10:12 pm

A. Virtualization

(This figure was obtained by Google search, which actually came from VMWare.)

Virtualization refers to the practice of simulating multiple virtual machines on a single physical computer. Logically each virtual machine has its own virtualized CPU, memory, storage, and networking. Through virtualization the underlying hardware resource can be utilized with more efficiency, applications can run on the same physical computer but its own runtime environment that is isolated from each other.

There exist different levels of virtualization, for example hardware virtualization and software virtualization. Hardware virtualization means providing a virtual computer by simulating the underlying hardware, and the virtual computer is capable of running a full copy of operating system. Among hardware virtualization there exist different implementations such as full virtualization (simulating a full set of the underlying hardware such that most operating systems could run on top of the virtual machine without modifications), partial virtualization (simulating only some key hardware components, operating systems might need modifications to run in such an environment) and paravirtualization (does not simulating the underlying hardware, but rather shares the underlying hardware through virtual machine manager applications, and most operating systems need modifications to run in such an environment). Virtualization on software level usually refers to the practice of providing multiple isolated runtime environment on top of a single operating system instance, and it is often called container technology.

With hardware virtualization, most modern virtualization technologies (such as VMWare, Xen and KVM) are a combination of full virtualization and paravirtualization. Virtual machines provided by hardware virtualization technologies usually run a full copy of operating system, therefore these exist large amount of similar (or even identical) processes and memory pages on the same host machine. Currently memory pages with the same content can be consolidated by techniques such as KSM, but these is so far no good method to handle similar (or even identical) processes. Therefore hardware virtualization is usually referred to as heavy-weight virtualization, and the number of virtual machines that could run on a single host machine is relatively limited.

With software virtualization, the overhead of running multiple operating system instances does not exist. Therefore software virtualization is usually referred to as light-weight virtualization, and the number of virtual runtime environments that could present on a single host machine is relatively large. For example, in theory Solaris can support 8000 containers on a single operating system instance (the actually number of supported containers is limited by hardware sources and system work load). Similarly, LXC on Linux can easily provide a large amount of virtualized runtime environments.

In terms of virtualization technologies, most companies in China seem to focus more on hardware virtualization, and deploy hardware virtualization in development and production environments. Taobao (a subsidiary of Alibaba Inc) is one of the first to study and deploy software virtualization in production environment. Their experiences proved that replacing Xen with cgroup  could result in better resource utilization.

For a specific application scenario, the decision between hardware virtualization and software virtualization should rely on whether the end users needs control over the operating system (such as kernel upgrade). If the end user only needs control over runtime environment (such as various App Engine services), software virtualization might be a better choice.

For those who want to know more about virtualization technologies, the VMWare white page Understanding Full Virtualization, Paravirtualization, and Hardware Assist is a great reference.

Generally speaking, the number of users that can access virtualization technology is very small. On Linux operating system, the user with virtual machine life cycle privileges  is usually the user with libvirt access. In a company or other entities, these users are usually system administrators.

B. Virtualization Management

In the early days,virtualization technologies solved the problem of providing multiple isolated runtime environments on a single physical computer. When the number of physical computers is small, system administrators can manually login to different servers to carry out the virtual machine life cycle management tasks. When the number of physical computers becomes big, some kind of scripting / application is needed to increase the degree of automation and relief system administrators from tedious works. Applications that enables system administrators manage multiple physical and virtual computers from a single location are called virtualization management tools. Such tools can usually accomplish the following tasks: (1) manage the life cycles of multiple virtual machines on multiple physical computers; (2) query and monitor all physical and virtual computers; and (3) establish a mapping between the name of virtual machines and the actual virtual machine instances on different computers such that virtual machine identification and management becomes easier. On Linux operating system VirtManager is a simple virtualization management tool. Among the VMWare product family VMWare vSphere is a powerful virtualization management tool.

Virtualization management tools are direct extensions of virtualization technology. The purpose of a simple virtualization management tools is to rescue system administrators out of the tedious repeating work induced by increasing number of physical and virtual machines. On such level, the scope of a virtualization management tool is usually limited to a cluster. In many cases, the virtualization management tools needs to have the user name and password to access different physical computers to perform virtual machine life cycle management. To make the management work easier the system administrator might need to setup a common management user for all physical computers in the cluster.

Virtualization management tools provide convenience for the system administrators, but do not delegate virtual machine life cycle management rights to other users.

C. Data Center Virtualization

In a data center, system administrators need to look after a large amount of different hardware and applications. As compared to a small cluster, the complexity of a data center is significantly different. Now a simple virtualization management tools is no longer capable of satisfying the need of system administrators. Therefore people developed data center virtualization management software to meet these new challenges. On the hardware layer, data center virtualization management software created the concept of “resource pools” to reorganize hardware resources, where a pool is usually a group of servers with similar configuration and purpose. Computing resources are now exposed to the end user in the form of virtual infrastructure, rather than separate servers. On the software layer, data center virtualization software created different roles for system administrators and regular end users, or more fine-grained role based access control (RBAC) based on the need of a specific application scenario. System administrators have the right to manage all the physical servers and virtual machines, but usually do not interfere the virtual machines that are running normally. Regular end users can only carry out virtual machine life cycle management tasks within the resource pool that are assigned to them, and do not have the right to manage the physical servers. In the extreme case, regular end users can only see the resource pool that are assigned to them, without any knowledge of the details about the resource pool.

Before data center virtualization technology, the action of creating and managing virtual machines are usually carried out by system administrators. In a data center virtualization software, based on RBAC the virtual machine life cycle management rights are delegated to so called “regular users”, therefore relieves the pressure on system administrators (to some degree). However, for security considerations not all employees in a company can have such a “regular user” account, which is usually assigned to managers or team leads. It is safe to assume that in data center virtualization the life cycle of virtual machines are still managed centrally.

Data center virtualization management software is a further extension of virtualization management tools.  It solved the problem of system complexity which is introduced by the increasing number of hardware devices and applications. When specific physical hardware are presented in the form of an abstracted “resource pool”, managers only need to worry about the size, work load, and health status of various resource pools, while end users only need to know about the status of the resource pool that is assigned to them. Only system administrators need to know by heart the configuration, work load and status of each and every single physical  server. However, with the concept of resource pools, all physical devices can be reorganized in a relatively logical way, which makes the life of system administrators easier.

Modern data center virtualization management software usually provides a lot of IT ops automation functionalities. Such functionalities include (1) fast deployment of a number of same or similar runtime environments based on virtual machine templates, (2) monitoring, reporting, notification, and accounting, and (3) high availability, dynamic workload management, backup and recovery. Some data center virtualization management software even provides open API’s that allow system administrators to develop and integrate additional functionalities based on the actualy application scenarios.

Among the VMWare product family VMWare vCenter is a powerful data center virtualization management software. Other good data center virtualization management softwares include Convirt, XenServer, Oracle VM and OpenQRM.

D. Cloud Computing

Cloud computing is a further abstraction of data center virtualization. In cloud computing management software, we still have different roles such as cloud managers and regular users, and have different access rights associated with different roles. Cloud managers have the rights to manage all the physical servers and virtual machines, but usually do not interfere with virtual machines running normally. Regular users can carry out virtual machine life cycle management tasks through a web browser, or through computer programs that talks with the cloud via web services.

In cloud computing, virtual machine life cycle management rights are fully delegated to regular users. However, it also shadows the concepts of resource pools and physical servers from regular users. Regular users is capable of obtaining computing resources, without the need to know about the underlying physical infrastructure. It seems that cloud computing is simply a way to providing computing resource from remote similar to Amazon EC2/S3. In fact, cloud computing represents a change in computing resource management, end users no longer need the help of system administrators to obtain and manage computing resource.

For cloud managers, delegating virtual machine life cycle management rights to regular users does not relieve them from being grilled on fire. Rather, now they have more trouble to handle. In traditional IT infrastructure, each application has its own computing resources, and trouble shooting is relatively easy because physical isolation exists between applications. When upgradign to cloud computing, multiple applications might share the same underlying physical infrastructure, and trouble shooting becomes difficult when multiple applications compete for resources. Therefore, cloud managers usually expect a full set of data center virtualization management functionalities in a cloud computing management software. For cloud managers, critical functionalities includes (1) monitoring, reporting, notification, and accounting, (2) high availability, dynamic workload management, backup and revovery, and (3) live migration, which can be use in trouble shooting or local maintainance.

We can see that from virtualizaton to cloud computing, the degree of encapsulation for physical resources increases, while virtual machine life cycle management rights are gradually delegated.

Among the VMWare product family VMWare vCloud is a cloud computing management software. Other cloud computing management softwares includeOpenStack, OpenNebula, Eucalyptus and CloudStack. Although OpenStack, OpenNebula, Eucalyptus and CloudStack are all cloud computing management softwares, they have significant difference in functionalities, which can be traced to the difference in their design. Originally OpenNebula and CloudStack were designed to be data center virtualization management software, therefore they have a good set of data center virtualization management functionalities. When the concept of cloud computing became popular, OpenNebula added OCCI and Amazon EC2 support, while CloudStack provided an additional Amazon EC2 compatible module called CloudBridge (CloudBridge was integrated into CloudStack since version 4.0). On the contratory, Eucalyptus and OpenStack were designed to be Amazon EC2 compatible cloud computing management softwares, and they are not yet that capable in terms of data center virtualization management functionalities. Between Eucalyputs and OpenStack, Eucalyptus has some first-mover advantages since they have realized the importance of data center virtualization management functionalities based on feedbacks from the market.

E. Private Cloud and Public Cloud

The so called “cloud computing” as described in section D is only a narrow definition, or Amazon EC2 like cloud computing. Broader definitions of cloud computing usually refer to the various practices of obtaining and utilizing various computing resources (such as compute and storage) from remote, which includes both data center virtualization as described in section C and cloud computing as described in section D. In both cases, computing resources are provided to the end user in the form of virtual machines, and the end user does not need to have any knowledge of the underlying physical infrastructure. If the scope of a cloud platform is to provide service within the corporate, then it can be called a “private cloud”. If the scope of a cloud platform is to provide service to the public, then it can be called a “public cloud”. Generally speaking, private cloud emphases the ability to create virtual machines with different configurations (such as the number of vCPU’s, memory and storage), because it needs to satisfy the needs from different applications within the enterprise.  On the contratory, public cloud service providers do not have much knowledge about the applications running on top of it, therefore they tend to provide standardized virtual machine products with fixed configurations, and end users can only purchase virtual machines with these fix configurations.

For public cloud service providers, their business model is similar to Amazon EC2. Therefore, most of them will choose to use a cloud computing management software as described in section D. For private cloud service providers, the decision should be make according to the computing resource management model within the enterprise. If the enterprise still wishes to execute central management of computing resources, and delegate virtual machine life cycle management rights only to managers and team leaders, a data center virtualization management software as described in section C is more appropriate. However, if the enterprise wishes to delegate virtual machine life cycle management rights to the end user, then a cloud computing management software as described in section D is more appropriate.

Traditionally, people think that a private cloud should be built upon hardware owned by the enterprise and inside a datacenter managed by the enterprise. However, when  hardware vendors join the game the border between private cloud and public cloud is becoming blurred.  Recently Rackspace announced private cloud services where customers can choose between self-own hardware and data center or hardware and data center owned by Rackspace. Oracle also announced private cloud services that are owned by Oracle and managed by Oracle. With such a new business model, a private cloud for a particular customer might be just an  isolated resource pool for a public cloud service provider(you got private cloud in my public cloud). For the public cloud service provider, its public cloud service infrastructure might in turn be part of its own bigger infrastructure (private cloud), or even a resource pool from a hardware vendor’s infrastructure(you got public cloud in my private cloud).

For the customers it is financially reasonable to use a private cloud provided by a cloud service provider. This means the CapEX needed for data center construction and hardware purchasing can be converted into OPEX, while the precious cash can be used to cultivate more business opportunities. Even if in the long term the total cost of working with such kind of private cloud will be more than alternatives based on self-own data center and hardware, the return from new business might be greater than the cost delta between two options. In the extreme case, even if the company is not successful in the end, company owners don’t need to look at a large number of newly purchased hardware and cry. Unless the real estate market grows rapidly in the short term, a failing company usually won’t feel sorry for not building its own data center. (Ahh, I should mention that for a company that has been running long enough, it is still feasible to earn money through real estate. For example, before Sun Microsystems Inc was acquired by Oracle, it did successfully make one of its financial reports look much better by selling one of its major engineering campus.)

Then, what is the role of hardware vendors in this game? When the customer’s CapEX becomes OPEX, wouldn’t it take more time for hardware vendors to collect payment?

In 1865 William Jevons14 (1835-1882), a British economist, wrote a book entitled “The Coal Question”, in which he presented data on the depletion of coal reserves yet, seemingly paradoxically, an increase in the consumption of coal in England throughout most of the 19th century. He theorized that significant improvements in the efficiency of the steam engine had increased the utility of energy from coal and, in effect, lowered the price of energy, thereby increasing consumption. This is known as the Jevons paradox, the principle that as technological progress increases the efficiency of resource utilization, consumption of
that resource will increase. Durign the past 150 years, similar over-consumption was observed in many other areas such as major industry materials, transportation, energy, and food industry.

The core value of public cloud service is that fix assets (such as servers, networking equipments, and storage) that must be purchased with hugh budget by end users now become public resources that are charged by usage. Virtualization technologies improves the efficiency and, in effect,  lowers the price of computing resources, which will eventually increase the consumption of computing resources. When we understand this logic, we can understand why HP launched HP Cloud Services in a hurry on top of OpenStack while OpenStack is still inmature for commercial deployment. It is right that HP Cloud Services might not be able to save HP from the next competition, but HP will certainly lose if it does not even join the competition. Similarly, we can understand why Oracle now becomes a cloud computing evangelist while it was sniffing at cloud computing two years ago. When Oracle acquired Sun Microsystems Inc in 2009, it suddenly became one of the major players in the hardware market. At that time the concept of cloud computing is relatively new, and Oracle’s response towards cloud computing proved that it had not yet become familiar with its new role. Now cloud computing is a lot more than just a new concept, it must be very silly if Oracle — as one of the major hardware vendors –  does not want to pursue its share in the game.

According Jevons paradox, over-consumption is a result of price decrease. Then, how should cloud computing resources be priced?

Currently, most public cloud service providers set price tags according to the configuration of the virtual machines. Take Amazon EC2 for example, it Medium virtual machine (3.75 GB memory, 2 ECU’s, 410 GB storage, $0.16 per hour) is twice as large, and twic as expensive, as its Small virtual machine (1.7 GB memory, 1 ECU, 160 GB storage, $0.08 per hour). New comers to the competition, such as HP Cloud Services, Grand Cloud (in China), and Aliyun (in China) seem to be copying Amazon EC2′s pricing strategy. The problem is, when the size of the virtual machine gets larger (with more computing resources such as vCPU, memory and storage), the performance of the virtual machine does not increase by the same proportion. A number of performance tests on Amazon EC2, HP Cloud Services, Grand Cloud, and Aliyun suggested that for a wide range of applications the performance-to-price ratio of virtual machines actually decreases as the size of virtual machines increases. It is safe to say that such pricing strategy will not encourage users to use more computing resource.

It might be more reasonable to determine the price of virtual machines according to their performance. For example, a soap manufacturer sells their products in two different packages, the smaller package has one piece and the bigger package has two pieces. Customers are willing two buy the bigger package not because it looks bigger, but because it can do twice the work of a smaller package. Similarly, virtual machine products from the same public cloud service provider should maintain a similar performance-to-price ratio. The problem is, different applications have different requirements for processor, memory and storage resources, which results in a significant difference in the performance-configuration curve. Therefore, in public cloud there is a need for a comprehensive virtual machine performance evaluation suite, which can be used to evaluate the overall performance of a virtual machine rather than just one it components such as processor, memory or storage. Based on such a comprehensive benchmark framework, we can compare not only virtual machine products from one public cloud service provider, but also different virtual machine products across different public cloud service providers.

F. Open Source

In recent years, we are observing a rule in the information industry. When a proprietory solution becomes successful in the market, there will quickly appear one or more followers — either open source or proprietory — with similar functionalities or services. (The opposite case where open source solutions come before proprietory followers is rare.) In operating systems, Linux becomes as good as and even better than Unix, and over takes the market share of Unix. In virtualization, Xen and KVM now becomes comparable of VMWare solutions, and are nibbling VMWare’s market share. In cloud computing, proprietory solution Enomaly appeared after Amazon EC2, followed by open sourced Eucalyptus and OpenStack. At the same time, traditionaly proprietory vendors are showing more friendly attitude to open source projects and open source community. For example, Microsoft established a subsidiary called  Microsoft Open Technologies in April, with the goal to promote investments  on interoperability, open standards, and open source software.

The business environment today is a lot different from the 1980′s, when the Free Software Movement was started. In fact, since Netscape invented the terminology “open source” to differentiate themselves from free software in 1998, open source has become a new business model for software R&D, marketing, and sales, rather than the opposite alternative of proprietory software. Compared to the traditional proprietory business model, the open source business model exhibits the following characteristics:

(1)In the initial phase, use buzz words such as open source and free software to gain the attention of potential customers, and business partners. For potential customers, their interests is the possibility to get (part of) the functionalities of the competing proprietory software — free or at a relatively low price. For business partners, their interestes might be that they can sell an enhanced version of the open source software (such as enterprise version), provide solutions based on the open source software, or the open source software will promote the sales of its own products.

(2)In the growth phase, major R&D resources usually come from the founding members (businesses) that initiated the project and its business partners. It is true that there are independant contributors who contribute code out of personal interests, however, the number of such individual contributors is relatively small. People promoting open source software use the phrase “developed by community”  frequently. In fact, during the past 10 years, the major R&D resources among most — if not all — major open source projects come from enterprise partners. However, some open source projects intentionally underscore the importance of enterprise partners, even mislead the audience to believe that individual contributors constitute the major part of the above-mentioned community.

(3)In the harvest phase, founding members (businesses) and its partners might sell enhanced version of the open source software, or solutions based on the open source software. Although other vendors can also sell similar products or services, but major contributors to the software obviously have more authority and reputation in the market. Regarding how businesses can make profit from open source software, Marten Mickos (currently the CEO of Eucalyptus Systems Inc) said during his tenure as the CEO of MySQL (in 2007) that success in open source requires you to serve (1) those who spend time to save money, and (2) those who spend money to save time.  Speaking from a financial point of view, success means that revenue from software sales and services should exceed the expense in R&D and marketing. In that sense, some users are able to use open source software for free because of (1) their usage is in itself some kind of participation in the open source project, which helps the marketing of the open source software, and in some cases, helps the testing and bug fixing of the open source software, and (2) those paying customers might also be paying for those who are not paying.

Then why are open source solutions usually cheaper than proprietory competitors? Generally speaking, proprietory solutions opened a whole new area from nothing and experienced many challenges in market research, product design, engineering, marketing and sales. Open source solutions, as a follower of the proprietory solution, can take the proprietory solution as a reference in market research, product design, and even take advantages of proprietory solution’s previous work in openning the market. In terms of R&D effort, open source solutions usually appear several years after proprietory solutions became successful. During that period technology advancements in related areas will lower the bar to enter into competition. Further more, open source solutions might have some outstanding features that are far better than proprietory solutions, but generally speaking the functionality, user experience, stability, and reliability of open source solutions might not be as good as those of proprietory solutions.  This is why open source solutions often promote price advantages such as “30% of the price, 80% of the functionality”. Except for price advantages, the ability to add customized functionalities is very attractive for some customers.

In China, IT companies are usually those who are willing to spend time to save money, while traditional (none IT) companies are those who are willing to spend money to save time. It should be mentioned that most of the traditional none-IT companies in China do not care about open source, but a lot of them are very interested in the ability to make customizations.

Open source as a new business model is not morally more lofty then the traditional proprietory business model. Similarly, it is not appropriate to make moral judgements for different approaches in open source practices. In the initial phase of the OpenStack project, Rackspace made public announcements saying that “OpenStack is the only fully open source cloud computing software available in the market”. Competing open source projects such as CloudStack, Eucalyptus, and OpenNebula were labled as “not truely open source” either because they had an additional enterprise version (based on the open source version) which was not open source, or a more advanced installation package (based on 100% open source smaller packages) that was provided to paying customers only. (Both Eucalyptus and CloudStack had seperate enterprise versions until April 2012. OpenNebula maintains a Pro version with all open source components for paying customers. ) Similar advertisements continued for almost 2 years, until Rackspace launched its own OpenStack-based Rackspace Private Cloud software, which is very similar to OpenNebula Pro in nature. The major difference between Rackspace Private Cloud software is free to download for all, while OpenNebula Pro is provided to paying customers only. The problem is, when the number of nodes exceeds 20 in Rackspace Private Cloud software, cloud administrators need to seek help from Rackspace, probably generating leads for fee-based customer support. Let’s leave alone the question of whether the code to set 20-node limits is open source or not for the time being. It is very difficult to explain from a moral perspective why Rackspace as a founding member of the OpenStack project adds functionalities to limit the usage of OpenStack. Rather, it would be quite reasonable if we look at such practice from a business perspective. It is fair to say that during the past two years the measurements taken by the OpenStack project in R&D, marketing and community are outstanding examples of the open source business model.

As mention before, there might exist multiple competing open source projects in a particular area of application. For example, in the broader sense of cloud computing we have Amazon EC2-like CloudStack, Eucalyptus, OpenNebula, OpenStack, and other options such as Convirt, XenServer, Oracle VM, and OpenQRM. For a particular application scenario, how can a business make a decision among so many open source options? In my experience, the software selection process can be divided into 3 different phases, including requirement analysis, technical analysis, and business analysis.

(1)During requirement analysis, we need to determine the real needs of the project and why they need a cloud solution. In China, many decision makers’ understanding on cloud computing stops at “improve efficiency, lowering operation cost, provide convinience”. They do not realize most open source solutions can satisfy such requirements in one way or another already. Further more, many decision makers refer to VMWare vCenter when talking about functionality requirements, and do not want to discuss why they need a specific functionality. Therefore, it is very important to investigate in details the actual application scenario, understand whether this is a data center virtualization management project or a Amazon EC2-like cloud computing project, and explore functionality requirements as much as possible. In some cases, both data center virtualization and Amazon EC2-like solutions can satisfy the needs of the customer, then it is up to the sales person to introduce the customers to their own solutions (such technique is called expectation management). By carrying out requirement analysis, we can filter out a significant portion of the options available.

(2)During technical analysis, compare the reference architecture of each open source solutions, with a focus on how difficult it would be to implement the reference architecture in the actual application scenario. Then compare different open source solutions in terms of functionalities, and seperate must-have functionalities from good-to-have ones. Further more, we can also compare the difficulties in installation and configuration, user experience, documentations, and customization. By carrying out technical analysis, we can make a rank for the open source solutions, and remove the last one from the list.

(3)During business analysis, make sure whether the decision maker is willing to pay for open source solutions. If yes, this is a “spend money to save time” scenario. If not, this is a “spend time to save money” scenario. For those who are willing to spend time to save money, the open source community is the major place to seek technical support, therefore the activeness of the corresponding community is a very important reference. For those who are willing to spend money to save time, they usually rely on service providers for technical support. Therefore its is very important to know the reputation of the service provider, and whether service is readily available locally. The activeness of the open source community is less important for such scenario.

In China, for application scenarios that are willing to spend money to save time, CloudStack and Eucalyptus are relatively better options. These two projects got started relatively earlier, have better stability and reliability, have good reputation in the industry, and have teams in China to provide support and services. We are seeing some startup teams in China trying to provide OpenStack-based solutions. However, these teams are too young and they still need time to accumulate necessary experiences. The Sina App Engine (SAE) team have a lot of first-hand experience with OpenStack, but they do not yet have the permission to provide support and services for other commercial customers. There are also some teams in China working with OpenNebula, but they are too small to provide support and services to others in the short term.

For application scenarios that are willing to spend time to save money, CloudStack and OpenStack are better optoins, because their user and developer community seem to be more active. Among these two options, CloudStack offers more functionality, and have more successful stories, which make it a better choice in the short term. In the long term, OpenStack is becoming more popular, but other options are making progresses too. It would be very difficult for one software to rule over in the coming 3 years. I would say that from a business perspective, CloudStack and Eucalyptus will move faster than others.

G. Additional Notes

Some friends would like me to add some more information about China. Frankly speaking, I don’t have enough data to elaborate on this topic. Liming Liu recently post a blog entry, which serves as a very good reference. The blog entry can be access from this URL, but it is in Chinese.

Regarding the activeness of different open source projects, reader can refer to my recent blog post CY12-Q3 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack for more information. Regarding performance testing on public cloud service providers, readers can refer to my other blog post HP Cloud Services Performance Tests for more information.

All the figures in this blog entry came from Google search. Many of the concepts mentioned in this blog entry came from Wikipedia, with modifications by the author.

 

虚拟化、云计算、开放源代码及其他

By , October 12, 2012 9:43 am

借国庆长假的机会写了这篇长文,全面地整理了个人从虚拟化到云计算各个层面的看法。主要的内容涉及虚拟化、虚拟化管理、数据中心虚拟化、云计算、公有云与私有云、以及开放源代码。本文的全部内容均属于作者的个人观点,而不代表任何公司的观点。欢迎讨论。

A、虚拟化

虚拟化是指在同一台物理机器上模拟多台虚拟机的能力。每台虚拟机在逻辑上拥有独立的处理器、内存、硬盘和网络接口。使用虚拟化技术能够提高硬件资源的利用率,使得多个应用能够运行在同一台物理机上各自拥有彼此隔离的运行环境。

虚拟化的也有不同的层次,例如硬件层面的虚拟化和软件层面的虚拟化。硬件虚拟化指的是通过模拟硬件的方式获得一个类似于真实计算机的环境,可以运行一个完整的操作系统。在硬件虚拟化这个层面,又有Full Virtualization(全虚拟化,几乎是完整地模拟一套真实的硬件设备。大部分操作系统无须进行任何修改即可直接运行在全虚拟化环境中。)、Partial Virtualization(部分虚拟化,仅仅提供了对关键性计算组件或者指令集的模拟。操作系统可能需要做某些修改才能够运行在部分虚拟化环境中。)和Paravirtualization(半虚拟化,不对硬件设备进行模拟,虚拟机拥有独立的运行环境,通过虚拟机管理程序共享底层的硬件资源。大部分操作系统需要进行修改才能够运行在半虚拟化环境中。)等不同的实现方式。软件层面的虚拟化,往往是指在同一个操作系统实例的基础上提供多个隔离的虚拟运行环境,也常常被称为容器技术。

在硬件虚拟化的层面,现代的虚拟化技术通常是全虚拟化和半虚拟化的混合体。常见的虚拟化技术例如VMWare、Xen和KVM都同时提供了对全虚拟化和半虚拟化的支持。以硬件虚拟化的方式所提供的虚拟机,通常都在运行一个完整的操作系统,在同一台宿主机上存在大量相同或者相似的进程和内存页,从而导致明显的性能损耗。目前,通过KSM等技术可以识别与合并含有相同内容的内存页,但是还没有对大量相同或者相似的进程进行优化处理的有效手段。因此,硬件虚拟化也往往被称为重量级虚拟化,在同一宿主机上能够同时运行的虚拟机数量是相当有限的。在软件虚拟化的层面,同一宿主机上的所有虚拟机共享同一个操作系统实例,不存在由于运行多个操作系统实例所造成的性能损耗。因此,软件虚拟化也往往被称为轻量级虚拟化,在同一宿主机上能够同时运行的虚拟运行环境数量是比较宽松的。以Solaris操作系统上的Container为例,一个Solaris操作系统的实例理论上可以支持多达8000个Container(实际能够运行的Container数量取决于系统资源和负载)。与此类似,Linux操作系统上的LXC可以轻松地在同一宿主机上同时支持数量可观的虚拟运行环境。

在虚拟化这个领域,国内的公司对硬件虚拟化的兴趣较大,在研发和生产环境中也大都采用硬件虚拟化技术。淘宝是国内较早地研究并应用软件虚拟化技术的,他们在淘宝主站的实践经验表明使用cgroup替代Xen能够提升资源利用率。至于在一个实际的应用场景中到底应该选择硬件虚拟化还是软件虚拟化,则应该重点考虑最终用户是否需要对操作系统的完全控制权(例如升级内核版本)。如果最终用户仅仅需要对运行环境的控制权(例如PaaS层面的各种App Engine服务),软件虚拟化可能性价比更高。对于为同一应用提供横向扩展能力的应用场景,软件虚拟化也是比较好的选择。

对于需要深入了解虚拟化技术的技术人员来说,VMWare发表的白皮书《Understanding Full Virtualization, Paravirtualization, and Hardware Assist》是一份很好的参考资料。

通常来讲,能够直接使用虚拟化技术的用户数量是比较少的。以Linux操作系统为例,能够进行虚拟机生命周期管理的用户,一般就是具有访问libvirt权限的用户。在一个公司或者其他实体中,这些用户通常是系统管理员。

B、虚拟化管理

早期的虚拟化技术,解决的是在同一台物理机上提供多个相互独立的运行环境的问题。当需要管理的物理机数量较小时,系统管理员可以手动登录到不同的物理机上进行虚拟机生命周期管理(资源配置、启动、关闭等等)。当需要管理的物理机数量较大时,就需要写一些脚本/程序来提高虚拟机生命周期管理的自动化程度。以管理和调度大量物理/虚拟计算资源为目的软件,称为虚拟化管理工具。虚拟化管理工具使得系统管理员可以从同一个位置执行如下任务:(1)对不同物理机上的虚拟机进行生命周期管理;(2)对所有的物理机和虚拟机进行查询甚至监控;(3)建立虚拟机命名与虚拟机实例直接的映射关系,使得虚拟机的识别和管理更加容易。Linux操作系统上的VirtManager是一个简单的虚拟化管理工具。在VMWare产品家族中,VMWare vSphere是一个功能强大的虚拟化管理工具。

虚拟化管理工具是虚拟化技术的自然延伸。简单的虚拟化管理工具,解决的是由于物理机数量增多所导致的工作内容繁杂问题。在这个层面,虚拟化管理通常和集群的概念同时出现。一个虚拟化管理工具,往往需要获得各台物理机上的虚拟机生命周期管理权限(例如具有访问libvirt权限的用户名和密码)。在同一个集群当中,为了方便起见,可能需要设定一个在整个集群层面通用的管理用户。可以认为,虚拟化管理为系统管理员提供了便利,但是并没有将虚拟机生命周期管理的权限下放给其他用户。

C、数据中心虚拟化

在数据中心的层面,系统管理员需要面对大量不同类型的硬件和应用。与小型的集群相比较,数据中心的系统复杂度大大提高了。这时简单的虚拟化管理工具已经无法满足系统管理员的要求,因此在虚拟化管理工具的基础上又发展出各种数据中心虚拟化管理系统。在硬件层面,数据中心虚拟化管理系统通过划分资源池(一个资源池通常是一个集群)的方式对硬件资源进行重新组织,并以虚拟基础构架(Virtual Infrastructure)的方式将计算资源暴露给用户。在软件层面,数据中心虚拟化管理系统引入系统管理员和普通用户两种不同的角色,甚至是基于应用场景的需要设定颗粒度更细的基于角色的权限控制(Role Based Access Control,RBAC)。系统管理员对整个数据中心的物理机和虚拟机拥有管理权限,但是一般不对正常的虚拟机进行干涉。普通用户只能在自己具有权限的资源池内进行虚拟机生命周期管理操作,不具有控制物理机的权限。在极端的情况下,普通用户只能够看到分配给自己的资源池,而不了解组成该资源池物理机细节。

在数据中心虚拟化之前,创建虚拟机的动作是需要系统管理员来完成的。在数据中心虚拟化管理系统中,通过基于角色的权限控制,虚拟机生命周期管理的权限被下放给所谓的“普通用户”,在一定程度上可以减轻系统管理员的负担。但是,出于系统安全的考虑,并不是公司里所有的员工都能够拥有这样的“普通用户”账号。一般来说,这种“普通账号”只能够分配给某个团队的负责人。可以认为,一直到数据中心虚拟化这个层面,虚拟机的生命周期还是集中式管理的。

数据中心虚拟化管理系统是虚拟化管理工具的进一步延伸,它所解决的是由于硬件和应用规模上升所带来的系统复杂度问题。具体的物理设备被抽象成资源池之后,公司高管只需要了解各个资源池的规模、负载和健康状况,最终用户只需要了解分配给自己的资源池的规模、负载和健康状况。只有系统管理员还需要对每一台物理设备的配置、负载和故障了如指掌,但是资源池的概念也从逻辑上对所有的物理设备进行了重新整理和分类,使得系统管理员的工作变得更加容易了。

现代的数据中心虚拟化管理系统,往往提供了大量有助于运维自动化的功能。这些功能包括 (1)基于模板快速部署一系列相同或者是相似的运行环境;(2)监控、报表、预警、会计功能;和(3)高可用性、动态负载均衡、备份与恢复等等。一些相对开放的数据中心虚拟化管理系统,甚至以开放API的方式使得系统管理员能够根据自身的应用场景和流程开发额外的扩展功能。

在VMWare产品家族中,VMWare vCenter是一个数据中心虚拟化管理软件。其他值得推荐的数据中心虚拟化管理软件包括Convirt、XenServer、Oracle VM、OpenQRM等等。

D、云计算

云计算是对数据中心虚拟化的进一步封装。在云计算管理软件中,同样需要有云管理员和普通用户两种(甚至更多)不同的角色以及不同的权限。管理员对整个数据中心的物理机和虚拟机拥有管理权限,但是一般不对正常的虚拟机进行干涉。普通用户可以通过浏览器自助地进行虚拟机生命周期管理 ,也可以编写程序通过Web Service自动地进行虚拟机生命周期管理。

在云计算这个层面,虚拟机生命周期管理的权限被彻底下放真正的普通用户,但是也将资源池和物理机等等概念从普通用户的视野中屏蔽了。普通用户可以获得计算资源,但是无需对其背后的物理资源有任何了解。从表面看,云计算似乎就是以与Amazon EC2/S3相兼容的模式提供计算资源。在实质上,云计算是计算资源管理的模式发生了改变,最终用户不再需要系统管理员的帮助即可自助地获得获得和管理计算资源。

对于云管理员来说,将虚拟机生命周期管理权限下放到最终用户并没有降低其工作压力。相反,他有了更加令人头疼的事情需要去处理。在传统的IT架构中,往往 是一个应用配备一套计算资源,应用之间存在物理隔离,问题诊断也相对容易。升级到云计算模式之后,多个应用可能共享同一套计算资源,应用之间存在资源竞 争,问题诊断就相对困难。因此,云管理员往往希望选用的云计算管理软件能够有相对全面的数据中心虚拟化管理功能。对于云管理员来说,至关重要的功能包括 (1)监控、报表、预警、会计功能;(2)高可用性、动态负载均衡、备份与恢复等等;和(3)动态迁移,可以用于局部负载调整以及故障诊断。

显而易见,从虚拟化到云计算,对物理资源的封装程度不断提高,虚拟机生命周期的管理权限逐步下放。

在VMWare产品家族中,VMWare vCloud是一个云计算管理软件。其他值得推荐的云计算管理软件包括OpenStack、OpenNebula、Eucalyptus和CloudStack。虽然OpenStack、OpenNebula、Eucalyptus和CloudStack都是云计算管理软件,但是其功能有较大的差别,这些差异源于不同 的软件具有不同的设计理念。OpenNebula和CloudStack最初的设计目标是数据中心虚拟化管理软件,因此具有比较全面的数据中心虚拟化管理 功能。云计算的概念兴起之后,OpenNebula增加了OCCI和EC2接口,CloudStack则提供了称为CloudBridge的额外组件 (CloudStack从 4.0版本开始缺省地包含了CloudBridge组件),从而实现了与Amazon EC2的兼容。Eucalyptus和OpenStack则是以Amazon EC2为原型自上而下地设计成云计算管理软件的,从一开始就考虑与Amazon EC2的兼容性(OpenStack还增加了自己的扩展),但是在数据中心虚拟化管理方面的功能尚有所欠缺。在这两者当中,Eucalyptus项目由于起步较早,在数据中心虚拟化管理方面的功能明显强于OpenStack项目。

E、私有云与公有云

如D 所述的云计算,仅仅是一种狭义上的云计算,或者是与Amazon EC2相类似的云计算。 广义上的云计算,可以泛指是指各种通过网络访问物理/虚拟计算机并利用其计算资源的实践,包括如D 所述的云计算和如C 所述的数据中心虚拟化。这两者的共同点在于云计算服务提供商以虚拟机的方式向用户提供计算资源,用户无须了解虚拟机背后实际的物理资源状况。如果某个云平台仅对某个集团内部提供服务,那么这个云平台也可以被称为“私有云”;如果某个云平台对公众提供服务,那么这个云平台也可以被称为“公有云”。一般来说,私有云服务于集团内部的不同部门(或者应用),强调虚拟资源调度的灵活性(例如最终用户能够指定虚拟机的处理器、内存和硬盘配置);公有云服务于公众,强调虚拟资源的标准性(例如公有云服务提供商仅提供有限的几个虚拟机产品型号,每个虚拟机产品型号的处理器、内存和硬盘配置是固定的,最终用户只能够选择与自身需求最为接近的虚拟机产品型号)。

对于公有云服务提供商来说,其业务模式与Amazon EC2相类似。因此,公有云服务提供商通常应该选择如D 所述的云计算管理软件。对于私有云服务提供商来说,则应该根据集团内部计算资源的管理模式来决定选用的软件。如果对计算资源进行集中式管理,仅仅将虚拟机生命周期管理的权限下放到部门经理或者是团队负责人这个级别,那么就应该选择如C 所述的数据中心虚拟化管理系统。如果要将虚拟机生命周期管理的权限下放到真正需要计算资源的最终用户,则应该选择如D 所述的云计算管理软件。

传统上,人们认为私有云是建立在企业内部数据中心和自有硬件的基础上的。但是硬件厂商加入云计算服务提供商的行列之后,私有云与公有云之间的界限变得越来越模糊。Rackspace推出的私有云服务,客户可以选择使用自有的数据中心和硬件,也可以选择租用Rackspace的数据中心和硬件。Oracle最近更进一步提出了“由Oracle拥有并管理”( Owned by Oracle, Managed by Oracle)的私有云服务。在这种新的业务模式下,客户所独享的私有云是仅仅是云服务提供商的公有云当中与其他客户相对隔离的一个资源池(you got private cloud in my public cloud)。而对于云服务提供商来说,用于提供公有云服务的基础构架可能仅仅是其自有基础构架(私有云)中的一个资源池,甚至是硬件厂商自有基础构架(私有云)中的一个资源池(you got public cloud in my private cloud)。

对于客户来说,使用基于云服务提供商的数据中心和硬件的私有云服务在财务上是合理的。这样做意味着自建数据中心和采购硬件设备的固定资产投入(CapEX)变成了分期付款的运营费用(OPEX),宝贵的现金则可以作为用于拓展业务的周转资金。即使长期下来拥有此类私有云的总体费用比自建数据中心和采购硬件设备要高,但是利用多出来的现金进行业务拓展所带来的回报可能会超过两个方案之间的费用差额。在极端的情况下,即使企业最终没有获得成功,也无需心疼新近购置的一大堆硬件设备。除非是房地产市场在短时间内有较大的起色,一家濒临倒闭的公司通常是不会为没有自建一个数据中心而感到后悔的。(需要指出的是,对于一家能够长时间运作的公司来说,通过房地产来盈利是完全有可能的。在Sun 公司被Oracle公司收购之前,就曾经通过变卖祖业的方式使得财报扭亏为盈。)

那么,硬件厂商在这场游戏里面扮演的是什么角色呢?当用户的固定资产投入(CapEX)变成了分期付款的运营费用(OPEX)时,硬件厂商难道不是需要更长的时间才能够收回货款吗?

1865年,英国经济学家威廉杰文斯(Willian Jevons,1835-1882)写了一本名为《煤矿问题》(The Coal Question)的书。杰文斯描述了一个似乎自相矛盾的现象:蒸汽机效率方面的进步提高了煤的能源转换率,能源转换率的提高导致了能源价格降低,能源价格的降低又进一步导致了煤消费量的增加。这种现象称为杰文斯悖论,其核心思想是资源利用率的提高导致价格降低,最终会增加资源的使用量。在过去150年当中,杰文斯悖论在主要的工业原料、交通、能源、食品工业等多个领域都得到了实证。

公共云计算服务的核心价值,是将服务器、存储、网络等等硬件设备从自行采购的固定资产变成了按量计费的公共资源。虚拟化技术提高了计算资源的利用率,导致了计算资源价格的降低,最终会增加计算资源的使用量。明白了这个逻辑,就能够明白为什么HP会果断加入OpenStack的阵营并在OpenStack尚未成熟的情况下率先推出基于基于OpenStack的公有云服务。固然,做云计算不一定能够拯救HP于摇摇欲坠之中,但是如果不做云计算,HP恐怕就时日不多了。同样,明白了这个逻辑,就能够明白为什么Oracle会从对云计算嗤之以鼻摇身一变称为云计算的实践者。收购了Sun 公司之后,Oracle一夜之间变成了世界领先的硬件提供商。当时云计算的概念刚刚兴起,Oracle不以为然的态度说明它尚未充分适应自身地位的变化。如今云计算已经从概念炒作进入实战演习阶段,作为主要硬件厂商之一的Oracle如果不打算从云计算中分一杯羹的话,那就是真正的反射弧过长了。

根据杰文斯悖论,对于用户来说,价格降低是用量增加的前提。那么,应该如何给云计算资源定价呢?

目前,大部分公有云服务提供商的虚拟机产品都是按照配置定价的。以Amazon EC2为例,其中型(Medium)虚拟机(3.75 GB内存,2 ECU计算单元,410 GB存储,0.16美元每小时)的配置是小型(Small)虚拟机(1.7 GB内存,1 ECU计算单元,160 GB存储,0.08美元每小时)的两倍,其价格也是小型虚拟机的两倍。新近推出的HP Cloud Services,以及国内的盛大云和阿里云,基本上都照搬Amazon EC2的定价方法。问题在于,虚拟机的配置提高之后,虚拟机的性能并没有得到同比提高。一系列针对Amazon EC2、HP Cloud Services、盛大云和阿里云的性能测试结果表明,对于多种类型的应用来说,随着虚拟机配置的提高,其性价比实际上是不断降低的。这样的定价策略,显然不能达到鼓励用户使用更多计算资源的目的。

按照虚拟机的性能来定价可能是一个更加合适的做法。举个例子说,某个牌子的肥皂有大小两种包装,小包装有一块肥皂而大包装有两块肥皂。用户愿意花双倍的钱购买大包装,往往是因为大包装能够洗两倍的衣服而不是因为它看起来更大。同理,来自同一公有云服务提供商的不同虚拟机产品,应该尽可能使其性价比维持在同一水平线上。问题在于,不同类型的应用对处理器、内存和存储等计算资源的需求存在较大差异,其“性能–配置”变化曲线也各有不同。因此,在公有云服务领域需要一个对虚拟机性能进行综合评估的框架,通过该框架获得的评估结果可以表示一台虚拟机的综合处理能力,而不仅仅是处理器、内存和存储当中的任何一项。基于这样一个测试框架,不仅可以对同一公有云服务提供商的产品进行比较,还可以对不同公有云服务提供商的产品进行比较。

F、开放源代码

近些年来,我们在信息技术领域观察到一个规律。当一个闭源的解决方案在市场上取得成功时,很快就会出现一个甚至是多个提供类似功能(或者服务)的开源或者闭源的追随者。(首先出现开源软件,然后出现与之竞争的闭源软件的案例比较少见。)在操作系统领域,Linux逐渐达到甚至是超越了Unix的技术水平,进而取代Unix的市场地位。在虚拟化领域,Xen和KVM紧紧跟随VMWare的技术发展并有所突破,逐步蚕食VMware的市场份额。在云计算领域,Enomaly率先推出了以Amazon EC2为蓝本的闭源解决方案,紧跟着又出现了以Eucalyptus和OpenStack为代表的开源解决方案。与此同时,传统意义上的闭源厂商对开源项目和社区的态度也在发生转变。例如,多年来对开源项目持敌视态度的微软于今年四月组建了一家名为“微软开放技术”(Microsoft Open Technologies)的子公司,其目标是推进微软向开放领域的投资,包括互操作性、开放标准和开源软件。

我们今天所处的商业环境,与上个世纪80年代自由软件运动(Free Software Movement)刚刚兴起的时候已经有了较大的不同。自1998年NetScape第一次提出开放源代码(Open Source)这个术语起,开放源代码就已经成为一种新的软件研发、推广与销售模式,而不再是与商业软件相对立的替代品了。与传统的闭源软件商业模式相对比,基于开放源代码的商业模式具有如下特点:

(1)在项目萌芽阶段,通过开源软件或者自由软件等关键词吸引潜在客户以及合作伙伴。对于潜在客户来说,选择开源软件能够免费或者是低价获得闭源软件的(部分)功能。对于合作伙伴来说,其兴趣点可能在于销售基于开源软件的增强版本(例如企业版),提供基于开源软件的解决方案,或者是该开源软件的成功可能对其自身的产品的销售有促进作用。

(2)在项目成长阶段,主要的研发人员来自发起项目的企业以及该项目的企业合作伙伴。虽然也有一些单纯出于兴趣而向开源项目贡献代码的个人开发者,但是其数量相对较少。我们在开源软件的宣传资料当中经常会见到类似于“由某某社区开发”的描述。最近10年来,各种“社区”中的主要研发力量始终来自数量极为有限的企业合作伙伴。但是有些开源项目在宣传中通常会有意无意地淡化企业合作伙伴的重要性,甚至是误导受众以为社区的主要成分是个人开发者。

(3)在项目收割阶段,项目发起者以及主要合作伙伴可以通过销售增强版本或者是提供解决方案获取财务回报。虽然其他厂商也可以提供类似的产品或者服务,但是开源项目的主要参与者往往在市场上拥有更大的话语权和权威性。关于开源项目的盈利问题,Marten Mickos(Eucalyptus的CEO)在担任MySQL公司CEO期间曾指出:“如果要在开源软件上取得成功,那么你需要服务于:(A)愿意花费时间来省钱的人;和(B)愿意花钱来节约时间的人。”如果说一个公司在开源方面取得了成功,那么它从开源软件的销售和服务方面获得的回报至少应该大于在研发和推广方面的投入。显而易见,某些用户之所以能够免费使用开源软件,一方面固然是因为他们的参与降低了开源软件在研发和推广方面的投入,另一方面则是因为付费用户为开源软件付出了更多的钱。

那么,为什么基于开源软件的解决方案通常要比其闭源的竞争对手更便宜呢?通常来说,闭源软件作为一个领域的开创者,在市场研究、产品设计、研发测试、推广销售等等环节都面临很大的挑战。开源软件作为闭源软件的追随者,在市场研究方面有闭源软件作为成功案例,在产品设计方面有闭源软件作为参考模板,在推广销售方面也得益于闭源软件的市场拓展。在研发方面,开源软件出现的时间要稍晚于闭源软件,在这个时间段里发生的技术进步会明显降低开源软件进入相关领域的门槛。除此之外,开源软件可能在某些特性方面超越闭源软件,但在总体水平上其功能的完备性、易用性、稳定性、可靠性会稍逊于闭源软件。因此,基于开源软件的解决方案通常会采取“以闭源软件30%的价格提供闭源软件80%的功能”这样的营销思路。除此之外,基于开源软件的解决方案的可定制性对于某些客户来说也有特别的吸引力。

在中国的商业环境中,IT公司(或者说互联网公司)通常是愿意花费时间来省钱的,而非IT公司(或者说传统行业)通常是愿意花钱来节约时间的。需要指出的是,中国的非IT公司往往不在乎软件是否开源,但是非常注重开源软件的可定制性。

开放源代码作为一种新的商业模式,并不比传统的闭源模式具有更高的道德水准。同理,在道德层面上对不同的开放源代码实践进行评判也是不合适的。在OpenStack项目的萌芽阶段,Rackspace公司的宣传文案声称OpenStack是“世界上唯一真正开放源代码的IaaS系统”。CloudStack、Eucalyptus和OpenNebula等具有类似功能的开源项目由于保留了部分闭源的企业版(2012年4 月以前,CloudStack项目和Eucalyptus均同时发布完全开源的社区版和部分闭源的企业版。2012年4 月之后,Eucalyptus项目宣布全面开源,CloudStack项目被Citrix收购并捐赠给Apache基金会后也全面开源。)、或者是仅向付费客户提供的自动化安装包(OpenNebula Pro是一个包含了增强功能的自动化安装包,但是其全部组件都是开放源代码的。)而被Rackspace归类为“不是真正的开放源代码项目”。类似的宣传持续了接近两年时间,直到Rackspace公司推出了基于OpenStack项目的Rackspace Private Cloud软件 — 一个性质上与OpenNebula Pro类似的自动化包。OpenNebula Pro是一个仅向付费用户提供的软件包,但是任何用户都可以免费地下载与使用Rackspace Private Cloud软件。问题在于,当用户所管理的节点数量超过20台服务器时,就需要向Rackspace公司寻求帮助(购买必要的技术支持)。这里我们暂且不讨论将节点数量限制为20台服务器这部分代码是否开源的问题。开源项目的发起者和主要贡献者在其重新打包的发行版中添加了限制该软件应用范围的功能,从道德层面来看很难解释,但是在商业层面来看就很正常。在过去两年中,OpenStack项目在研发、推广、社区等领域所采取的种种措施,都堪称是基于开放源代码的商业模式的经典案例。

前面我们提到,在同一领域往往存在多个相互竞争的开源项目。以广义上的云计算为例,除了我们熟悉的CloudStack、Eucalyptus、OpenNebula、OpenStack之外,还有Convirt、XenServer、Oracle VM、OpenQRM等等诸多选择。针对一个特定的应用场景,如何在众多的开源方案中进行选型呢?根据我个人的经验,可以将整个方案选型过程分为需求分析、技术分析、商务分析三个阶段。

(1)在需求分析阶段,针对特定的应用场景深入挖掘该项目采用云计算技术的真正目的。在中国,很多项目决策者对云计算的认识往往停留在“提高资源利用率、降低运维成本、提供更多便利”的阶段,并没有意识到这个列表已经是大部分开源软件均可提供的基本功能。除此之外,很多项目决策者缺省地将VMWare vCenter提供的全部功能作为对开源软件的要求,而没有考虑特定项目是否需要这些功能。因此,非常有必要针对特定的应用场景进行调研,明确将其按照数据中心虚拟化和狭义上的云计算归类,并进一步挖掘项目在功能上的具体要求。在很多情况下,数据中心虚拟化和狭义上的云计算均能够满足客户的总体需求,那么销售的任务就是将客户的具体需求往有利于自身的方向上引导。这个技巧,我们称之为客户期望值管理(Expectation Management)。通过需求分析,明确特定应用场景的分类,可以过滤掉一部分选项。

(2)在技术分析阶段,首先比较各个开源软件的参考架构,重点考虑在特定应用场景下按照参考构架进行实施所面临的困难。其次在功能的层面对各个开源软件进行对比,并将必须具备的功能(Must Have)和能够加分的功能(Good to Have)区别对待。除此之外,还可以对安装配置的难易程度、具体功能的易用性、参考文档的完备性、二次开发的可能性等等进行评估。通过技术分析,可以给各个开源软件打分排名,在此基础上可以淘汰掉得分最低的选项。

(3)在商务分析阶段,必须明确决策者是否愿意为开源的解决方案付费。如果决策者不愿意为付费,那么该项目就属于“愿意花费时间来省钱”的场景,反之则属于“愿意花钱来节约时间”的场景。对于愿意花费时间来省钱的应用场景,主要依赖于开源社区获得技术支持,可以将开源项目的社区活跃度作为重要的参考数据。对于愿意花钱来节省时间的应用场景,主要依赖于服务提供商获得技术支持,应该重点考察服务提供商在业界的影响力以及在本地的服务能力,开源项目的社区活跃度则显得无关紧要了。

在中国(狭义上)的云计算市场, 对于愿意付费的客户来说,CloudStack和Eucalyptus是值得优先考虑的选项。这两个项目的启动时间比较早,具有更好的稳定性和可靠性,在业界有较大的影响力,并且在国内有团队可以提供支持和服务。与此同时,国内一些创业团队开始提供基于OpenStack的解决方案,但是在短时间内很难积累必要的实战经验,而具备丰富经验的新浪SAE团队尚未开拓对外提供技术支持的业务。国内虽然也有一些单位在使用OpenNebula,但是在近期内很难形成对第三方提供技术服务的能力。对于愿意花时间的客户来说,CloudStack和OpenStack的优势较为明显,因为两者的社区活跃度相对较高。在这两者当中,CloudStack的功能更加丰富,也有更多的企业级客户以及成功案例,可能是短期内的更佳选择。从长远来看,基于OpenStack的解决方案会越来越流行,但是其他解决方案在技术和市场上也都在不断取得进步,因此在未来三年内很难形成一统天下的局面。单纯从商业上考虑,CloudStack和Eucalyptus获得成功的几率可能会更大一些。

G、其他

有些朋友希望我补充一些云计算在中国的现状。坦率地说,目前我尚不掌握充足的数据,在这里暂不展开论述。刘黎明(新浪微博@刘黎明3000)最近发布了一篇题为《点评阿里云盛大云代表的云计算IaaS产业》的文章,值得参考。

关于不同开源项目的社区活跃度比较,可以参考我最近的一篇博客文章《CY12-Q3 OpenStack, OpenNebula,Eucalyptus,CloudStack社区活跃度比较》。另外,我在《HP Cloud Services性能测试》一文中,也初步提出了一个对公有云进行性能评测的方法。

本文中的所有插图,全部来自Google搜索。除此之外,部分概念性内容参考了维基百科的相关条目进行了改写。

CY12-Q3 OpenStack, OpenNebula,Eucalyptus,CloudStack社区活跃度比较

By , October 2, 2012 11:16 am

本文是对《CY12-Q2 OpenStack, OpenNebula,Eucalyptus,CloudStack社区活跃度比较》一文的补充和更新。对本文内容感兴趣的读者,可以通过电子邮件或者新浪微博(@qyjohn_)与我联系。

本文同时发布了一个英文版本,可以参见CY12-Q3 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack这个帖子。

本文的目的是通过论坛和邮件列表的原始数据对OpenStack、OpenNebula、Eucalyptus和CloudStack项目的社区活跃度进 行分析和比较。主要的原始数据是自2009年来这四个项目的官方论坛和邮件列表每个月所产生的讨论主题数、帖子数、以及参与讨论的总人数(邮件地址或者用 户账号)。为了获取这些数据,我写了一个Java程 序自动地从这四个项目的网站下载了所有的论坛和邮件列表信息,并且从这些信息中分析提取出我所需要的数据。程序提取的数据被导入MySQL数据库中以便进 行统计分析,统计分析的结果通过LibreOffice生成分析图表。

在CY12-Q3的分析中,我们增加了长期被忽视的数据源https://answers.launchpad.net/openstack和http://lists.openstack.org/pipermail/*/。这两个数据源的数据量很大,对分析结果有较大的影响。

此外,我们发布CY12-Q2分析报告之后,有些读者指出来自incubator-cloudstack-dev邮件列表的数据可能有些问题。这个邮件列表里包含了一些由JIRA自动生成的邮件内容。在CY12-Q3的分析中,我们设置了一个过滤器,自动地排除了所有标题中含有”[jira]“标识符的信息。

图1 和图2分别是如上所述四个项目每个月所产生的讨论主题数和帖子数。可以看出:

(1) 与OpenStack和CloudStack相关的讨论数量远大于与Eucalyptus和OpenNebula相关的讨论数量;

(2) 在过去三个月中,与Eucalyptus和OpenNebula相关的讨论数量在同等水平上,只有非常细微的差别。

通常来讲,一个讨论主题得到的回复数越多,表明该主题的讨论越深入。一个论坛或者邮件列表如果只有主帖而没有回复,说明这个社区的参与程度很低。因此,平 均意义上的“讨论帖子数/讨论主题数”则反映了一个社区的参与程度,这里我们暂且称之为参与度(Participation Ratio)。

长期以来,OpenStack项目的参与度远高于其他三个项目的参与度。但是在过去六个月中,CloudStack项目的参与度正在稳步增长。目前CloudStack和OpenStack项目的参与度最高,接近于4;OpenNebula与Eucalyptus项目的参与度次之,接近于3。

图4 所示为这四个项目每个月参与论坛或者邮件列表讨论的总人数。可以看出,CloudStack与OpenStack的活跃用户数量要远大于OpenNebula和Eucalyptus。但是,在过去三个月中,CloudStack与OpenStack的活跃用户数量均有一定程度的下降。

值得一提的是,虽然CloudStack的活跃用户数量稍微小于OpenStack,这两个项目的主题和帖子数量是基本相当的。换句话说,CloudStack的用户在社区中比OpenStack的用户更加活跃。

累计社区人口(简称社区人口)指的是曾经通过论坛或者邮件列表参与过讨论的用户和开发者总数。(不包括在论坛或者邮件列表中注册但是从未公开参与讨论的社 区成员。)这些人或多或少地使用过相关产品,但是并不代表他们目前还是活跃用户。

图5 所示为这四个项目的社区人口增长趋势。Eucalyptus的社区人口依然领先,但是OpenStack正在迎头赶上。可以预见,OpenStack项目的社区人口将会在CY12-Q4超越Eucalyptus。在CY12-Q2报告中,我们预测过CloudStack的社区人口很快将超过OpenNebula。事实上,CloudStack仅仅花了一个月时间就实现了这个目标。

如果我们比较一下CY12-Q3与CY12-Q2报告,就会发现OpenStack的社区人口增长曲线发生了较大的变化。这是因为我们增加了https://answers.launchpad.net/openstack和http://lists.openstack.org/pipermail/*/作为数据源。这两个数据源的数据量很大,因此对分析结果有较大的影响。需要注意的是,虽然launchpad answers与OpenStack邮件列表共享一个用户数据库,但是所显示的用户名是不一样的。因此,相当部分的OpenStack用户可能被数了两遍。在我们整理数据的时候,已经采取了一些初步的措施以消除重复,但是还有较大的优化空间。一个粗略的估计是OpenStack社区的实际人口应该是上图所示数字的85%左右。

CloudStack与Eucalyptus的社区人口数据可能也存在一定程度的重复计数。我们对原始数据进行了检查,并且确实发现了一些重复计数的例子。但是,这两个项目重复计数的程度都比较低,对分析结果不产生明显的影响。

图6 所示为这四个项目每个月新增加的社区人口数量。在过去三个月中,CloudStack与OpenStack的社区人口增长速度基本相当。

与CloudStack和OpenStack向比较,Eucalyptus和OpenNebula的社区人口增长较为缓慢。

图7 是图4 与图6的重新组合。其中,实线部分表示的是每个月参与论坛或者邮件列表讨论的人数,虚线部分表示的是每个月新加入论坛或者邮件列表的人数。

OpenStack与OpenNebula项目的新增人口占当月活跃用户的30%左右,CloudStack与Eucalyptus项目大概是50%。如 果不考虑社区人口的规模的话,可以认为OpenStack与OpenNebula项目的粘性大于CloudStack与Eucalyptus项目。

对于任何一个项目,社区人口增长速度与当月活跃用户数量基本上是同步的。也就是说,社区人口增长速度与社区活跃程度之间存在某种程度的正相关。这也意味着社区人口增长以及社区活跃程度可能是事件驱动的。一个新版本的发布,一次技术会议,或者是一场市场活动,都可能是促进社区人口增长和提高社区活跃程度的原因。

图8 所示分别是本文所述四个项目的社区人口,过去一个季度的活跃用户数量,以及过去一个月的活跃用户数量。可以看出:

(1) Eucalyptus的社区人口最多,然后是OpenStack、CloudStack、OpenNebula;

(2) 在过去一个季度中,OpenStack的活跃人口最多,然后是CloudStack、Eucalyptus、OpenNebula;

(3) 在过去一个月份中,OpenStack的活跃人口最多,然后是CloudStack、Eucalyptus、OpenNebula。

我时不时地会听到这样的说法:“亲你的话也忒多了吧。能不能直接告诉我这个领域哪个项目最活跃呢?”我同意这是一个非常重要的问题。我想还有很多人想问同样的问题。我猜他们之所以没有问我是因为他们知道我不知道问题的答案。

我一直在寻找一个参数来表示社区的相对活跃程度。这个参数应该是如下几个参数的某种组合:

(1) 当月帖子总数,代表相关讨论的规模;

(2) 当月参与度,代表每个问题获得的回帖数量;

(3) 当季活跃用户,代表从社区获得帮助的可能性(长期);

(4) 当月活跃用户,代表从社区获得帮助的可能性(短期)。

在这个分析中,我们选择如上各个社区的平均值作为参考数据,并将每个社区的数据与参考数据进行比较。我们将每个社区各个参数与平均值的比值之和称为“社区活跃度指数”。可以认为,社区活跃度指数最高的项目,是最活跃的项目。

从图9 中可以看出,目前OpenStack项目的“社区活跃度指数”最高(以明显的优势领先),然后是CloudStack、Eucalyptus、 OpenNebula。

如上所述“社区活跃度指数”的概念还非常原始,还有很大的优化空间。它有点在于用量化分析的方法取代了传统的感性分析,例如“我认为”、“我相信”、“我猜测”等等。在未来的报告中,我们将基于这个概念每个季度发布一次OpenStack、OpenNebula、 Eucalyptus、CloudStack项目的“社区活跃度指数”排名。当然,我们也会在必要的时候对相关算法进行调整,例如增加或者删除某些参数,或者是调整某些参数的权重。

对于很多云计算从业人员来说,CloudStack项目在过去6 个月中的快速崛起令人惊讶。因此,我们通过电子邮件采访了Citrix公司云计算平台的CTO梁胜。梁胜对于CloudStack项目在社区方面所取得的成功解释如下:

“Apache CloudStack项目的繁荣应该归功于Apache软件基金会,因为我们无需浪费精力去创造新的开源项目管理模式。我们的社区正在快速成长,目前已经拥有超过35000位成员。从来自社区的贡献来看,开发人员对Apache模式是认可的。我们很高兴地看到,其他技术提供商以及开源项目正在与Apache CloudStack结合在一起。Apache CloudStack项目的领导权也已经从Citrix转移到一系列更有紧迫感的个人Committer那里。即将发布的4.0版本是以Apache授权协议发布的第一个主要版本,包含了大量社区用户在真实的云计算生产环境中开发出来的功能。值得一提的是我们看到CloudStack正在被各种不同的行业所接受,包括银行机构、游戏公司和大学(我们了解到在上帝粒子相关的研究中使用了一个CloudStack集群来进行数据分析)。在短短的6 个月当中,CloudStack项目取得了令人惊讶的进步,我相信这个势头一定会继续下去。”

对于最终用户来说,厂商之间的竞争意味着更多的选择和更好的功能。云计算市场正在发展当中,远远还没有达到成熟的程度。可以预见,未来这个领域的竞争还会更加激烈。

与本文相关的PDF版本幻灯片可以从这里下载。如果您需要重新分发本文内容,敬请您保留相关作者信息。

CY12-Q3 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack

By , October 2, 2012 10:59 am

This article is an update version of my previous article CY12-Q2 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack. Readers who are intested in further discussions please contact me via email at the above-mentioned address.

A Chinese version of this article is published at the same time, which can be found at CY12-Q3 OpenStack, OpenNebula, Eucalyptus, CloudStack社区活跃度比较.

The objective of this article is to compare the OpenStack, OpenNebula, Eucalytpus and CloudStack user and developer communities, base on the communications between community members in the form of mailing lists or pubic forum discussions. The data being discussed include the total number of topics (threads), messages (posts), and participants (unique email addresses or registered members). To obtain the above-mentioned data, a Java program was written to retrieve all the forum posts and mailing list messages into a MySQL database for further processing. The analysis results were presented in the form of graphs generated by LibreOffice.

In CY12-Q3, we are adding the longly-neglected https://answers.launchpad.net/openstack and http://lists.openstack.org/pipermail/*/ into the analysis. It turns out that these two source contains a huge amount of data that is has a significant impact on the analysis result.

Also, when the CY12-Q2 report was published, some people questioned the inclusion of the incubator-cloudstack-dev mailing list. This particular mailing list contains a lot of messages that are automatically generated by JIRA. In CY12-Q3, we set up a filter to reject all messages with identifier “[jira]” in the subject.

Figure 1 and 2 represent the monthly number of topics (threads) and posts (messages). It can be seen that

(1) the volume of OpenStack and CloudStack related discussions is much higher than that of Eucalyptus and OpenNebula; and

(2) the Eucalyptus and OpenNebula clubs are exhibiting similar behaviors, with only minor differences.

Generally speaking, the number of replies to a specific topic represents the attention being received, and the depth of discussion for that particular topic. When the number of master posts (the original post that started a particular topic) is more than the number of replies, it is safe to conclude that the participation of the forum or mailing list is very low. Therefore, the ratio between “the number of  posts” and “the number of topics” represents the participation rate of an online community. In this study we call this ratio the Participatin Ratio.

In the past the OpenStack project had a much higher participation ratio than the others. However, the participation ratio of CloudStack is climing steadily. Currently CloudStack and OpenStack have the best participation ratio, which is close to 4. OpenNebula and Eucalytpus have similar participation ratios, which is close to 3.

Figure 4 shows the number of monthly participants of the four projects being discussed. It can be seen that the active participants of CloudStack and OpenStack are much higher than OpenNebula and Eucalyptus. However, during the past 3 months, the number of participants for both CloudStack and OpenStack have decreased slightly.

It should be noted that although the number of active participants of CloudStack is somewhat less than OpenStack, but the volume of discussion (in terms of monthly number of threads and messages) of the two projects are on the same level. This indicates that the active members in the CloudStack club are talking more than those in the OpenStack club (on average).

Accumulated Community Population refers to the total number of users and developers who have participated in forum or mailing list discussions. (This number does not include those who have registered into discussion forums or mailing lists but have never participated in any open discussions.) These are people who have tested or used a specific product for a while, but not necessary currently an active user.

Figure 5 shows the accumulated community populations of the four projects being discussed. The Eucalyptus project still has the biggest population, but OpenStack is quickly catching up. It is expected that the OpenStack population will exceed that of Eucalyptus in CY12-Q4. In our CY12-Q2 report we predicted that the CloudStack population will exceed the OpenNebula population in a very short period. It only took CloudStack a month to accomplish that!

If you compare the CY12-Q3 report with the CY12-Q2 report, you will find that the population curve for OpenStack has changed a lot. This is due to the inclusion of the https://answers.launchpad.net/openstack and http://lists.openstack.org/pipermail/*/ data source. It should be noted that launchpad answers and the mailing list share the same registeration database, but are displaying different names for the same person. Therefore, it is very possible that a large amount of users were counted twice for the OpenStack population. We have carried out some basic de-duplication efforts to eliminate some obvious duplications, but there are still a lot of space to optimize. A rough estimation is that the real OpenStack population would be about 85% of the numbers being shown in this analysis.

There might exist certain level of duplication for the community population of CloudStack and Eucalyptus. We did look into the data and found some duplications. However, the level of duplication seems to be very small for both projects that it does not produce much impact on the analysis results.

Figure 6 shows the monthly population growth of the four projects being discussed. During the past 3 months, the populations of OpenStack and CloudStack are growing at the same pace.

The populations of Eucalyptus and OpenNebula are growing at relatively slow paces, as compared to that of CloudStack and OpenStack.

Figure 7 is a combination of Figure 4 and Figure 6. The solid lines represent the monthly participants, while the dash lines represent the monthly new members.

For OpenStack and OpenNebula, around 30% of their monthly participants are new members.  For CloudStack and Eucalyptus, around 50% of their monthly participants are new members. This indicates OpenStack and OpenNebula communities are more “sticky” than CloudStack and Eucalyptus communities.

For each of the projects being discussed, the monthly population growth is somwhat “synchronous” with its monthly participants. That’s to say, the populatoin growth of a community is somewhat related to the “activeness” of the community. This also suggests that both the population growth and the “activeness” of a community might be event-driven. A new software release, a technical conference, or a marketing event, might be the cause of the growth in population and “activeness” of the respective community.

Figure 8 shows the total community population, active participants of the past quarter, and active participants of the past month, of the four projects being discussed. It can be seen that

(1) Eucalyptus has the largest total population, followed by OpenStack, CloudStack, and OpenNebula;

(2) OpenStack has the largest active population during the past quarter, followed by CloudStack, Eucalyptus, and OpenNebula;

(3) OpenStack has the largest active population during the past month, followed by CloudStack, Eucalyptus, and OpenNebula.

Occasionally I come across people saying “Hay, you are talking too much! What don’t you tell me which one is THE most active project in this area?” I agree that this is an important question, and I guess there are many more who do not ask simply because that they know that I don’t know the answer.

For quite some time I have been looking for a magic number to indicate the “relative activeness” of a comunity as compared to other alternatives. This magic number should be the combination of the following parameters:

(1) monthly messages, which represents the volume of the discussions;

(2) participation ratio, which represents the average number of answers to a question;

(3) active population of the past quarter, which represents the possibility to get help from community in the long term; and

(4) active population of the past month, which represents the possibility to get help from the community in the short term.

In this analysis, we choose the average values of these parameters as the reference data set, and compare the corresponding parameters of each community with the reference data set. Then we call the sum of the relative values of a community the “community activeness index” of the community. Now we can say the project with the highest “community activeness index” is THE most active project in this area.

As can be seen from Figure 9, OpenStack is currently THE most active project (with obvious advantage), followed by CloudStack, Eucalyptus, and OpenNebula.

The above-mentioned concept of “community activeness index” is still very primitive, with a lot of space to optimize. However, it is an attempt to replace the old-fashion “I think”, “I believe” and “I guess” practices with quantative analysis. In our future community analysis, we will continue to use this concept to provide a quarterly ranking for OpenStack, OpenNebula, Eucalyptus, and CloudStack. Improvements to the algorithm (such as adding/removing parameters or changing the weight of different parameters) will be make when necessary.

For many cloud computing professionals, the dramatic growth achieved by the CloudStack project during the past 6 months was quite unexpected. Therefore we conducted an email interview with Sheng Liang, the CTO of Cloud Platforms at Citrix. Below is Sheng’s explaination for CloudStack’s success in building a highly active open source community:

“Apache CloudStack has flourished under the Apache Software Foundation which kept us from having to waste efforts coming up with a new open source governance model. Developers have responded well to the Apache Way with contributions flowing in from our rapidly growing community of over 35,000 individuals. We also are pleased with the organic way technology providers and open source projects are integrating their software with Apache CloudStack. Leadership of the project has also shifted from Citrix to a number of other individual committers who have been driving an aggressive development schedule. The upcoming 4.0 release is very exciting as it’s the first major release under Apache including code from numerous  production users of CloudStack who developed features based on their experience running live cloud computing environments. Anecdotally we are seeing CloudStack deployments popping up everywhere from financial institutions and gaming companies to universities (we understand a CloudStack cluster even helped crunch research data for the Higgs Boson discovery).  I am sure the excitement around CloudStack will continue given the incredible progress in under six short months.”

From an end-user’s perspective, it is good to see the competition heating up because that means more choices with better quality. Cloud computing is still an evolving market that is highly inmature, and we expect more competition to come in the future.

For your convienience, a PDF version of this presentation can be downloaded from here. Please kindly keep the author information if you want to redistribute the content.

Panorama Theme by Themocracy