CY14-Q1 OpenStack, OpenNebula,Eucalyptus,CloudStack社区活跃度比较

By , April 14, 2014 4:19 pm

01

本文是对《CY13-Q4 OpenStack, OpenNebula,Eucalyptus,CloudStack社区活跃度比较》一文的补充和更新。对本文内容感兴趣的读者,可以通过电子邮件或者新浪微博(@qyjohn_)与我联系。

本文同时发布了一个英文版本,可以参见CY14-Q1 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack这个帖子。

Eucalyptus在本季度终止了面向用户的在线论坛(Engage),并将所有与用户相关的讨论引导到google group。这个举措对其用户讨论有较大的影响。

这个社区活跃度比较项目起源于CY11-Q4,此篇报告是到目前为止已经发布的第十个季度性的报告。本文中所表达之观点完全是作者本人的观点,而非作者目前或以前所在公司的观点。

02

本文的目的是通过论坛和邮件列表的原始数据对OpenStack、OpenNebula、Eucalyptus和CloudStack项目的社区活跃度进 行分析和比较。主要的原始数据是自2009年来这四个项目的官方论坛和邮件列表每个月所产生的讨论主题数、帖子数、以及参与讨论的总人数(邮件地址或者用 户账号)。为了获取这些数据,我写了一个Java程 序自动地从这四个项目的网站下载了所有的论坛和邮件列表信息,并且从这些信息中分析提取出我所需要的数据。程序提取的数据被导入MySQL数据库中以便进 行统计分析,统计分析的结果通过LibreOffice生成分析图表。

在过去几年种,有一些早期的论坛和邮件列表已经被停用,这些数据我们再也不能够访问到了。幸好我们有这个项目刚刚启动时所创建的MySQL数据库,还有过去每个季度所发布的季度性报告,使得我们可以对每个项目进行完整的分析。

对于具有多个会员系统的项目(例如一个论坛和一个邮件列表),我们采取了大量的措施来消除会员重复计数(同一个人被当成不同的会员计算了两次或者两次以上)的情况。

03

01

图1 和图2分别是如上所述四个项目每个月所产生的讨论主题数和帖子数。可以看出:

(1) 在过去12个月中,与OpenStack和CloudStack相关的讨论数量在同一水平上,与Eucalyptus和OpenNebula相关的讨论数量在同一水平上;

(2) 在过去12个月中,与OpenStack和CloudStack相关的讨论数量远大于与Eucalyptus和OpenNebula相关的讨论数量。

01

通常来讲,一个讨论主题得到的回复数越多,表明该主题的讨论越深入。一个论坛或者邮件列表如果只有主帖而没有回复,说明这个社区的参与程度很低。因此,平 均意义上的“讨论帖子数/讨论主题数”则反映了一个社区的参与程度,这里我们暂且称之为参与度(Participation Ratio)。

由图3 可以看出,在过去12个月中CloudStack和Eucalyptus项目的参与度相对较高高,接近于4;OpenStack与OpenNebula项目的参与度相对较低,接近于2。

我们也注意到参与度这个概念引起了一些争议。有些人认为“讨论帖子数/讨论主题数”较低象征着某个社区具有快速解决问题的能力,社区成员所提出的问题能够在很短时间内得到解答,因此不需要多个帖子来解决一个问题。有些人认为“讨论帖子数/讨论主题数”较高可能意味着某个社区可能出现了争论,而这种争论可能已经偏离了某个社区的讨论方向和范围。无论如何,参与度这个名称的确反映了我们的某些主观看法,在一定程度上削弱了本报告的客观性。由于我们暂时没有找到一个更加合适的替代名称,在这个报告里面还是延用原来的名称(欢迎各位读者贡献更好的参数名称)。

01

图4 所示为这四个项目每个月参与论坛或者邮件列表讨论的总人数。可以看出,OpenStack项目的活跃用户数量要远大于其他三个项目。CloudStack项目的活跃用户数量也明显大于OpenNebula和Eucalyptus。在过去12个月中,CloudStack和OpenStack项目的活跃用户数量都在稳步攀升(OpenStack项目有100%的增长,CloudStack项目有50%的增长),而Eucalyptus和OpenNebula项目的活跃用户数量基本上没有增长。

值得一提的是,虽然CloudStack的活跃用户数量稍微小于OpenStack,这两个项目的主题和帖子数量是基本相当的(参见图1和图2)。

01

累计社区人口(简称社区人口)指的是曾经通过论坛或者邮件列表参与过讨论的用户和开发者总数。(不包括在论坛或者邮件列表中注册但是从未公开参与讨论的社 区成员。)这些人或多或少地使用过相关产品,但是并不代表他们目前还是活跃用户。图5 所示为这四个项目的社区人口增长趋势。可以看出,OpenStack与Eucalyptus项目的社区人口遥遥领先,CloudStack与OpenNebula项目的社区人口相对较低。

问题在于,开源IaaS软件经过这么多年的发展,长期累计社区人口的意义已经越来越弱。一方面,某些早期用户可能已经多次改变了阵营;另一方面,某些早期论坛和邮件列表已经结束了历史使命。从社区活跃度的角度来看,我们认为最近6 个月或者最近12个月的累计社区人口可能是有意义的,但是将累计社区人口无限制地延伸到侏罗纪时代,可能会使这个参数失去实用价值。

01

曾经有多位读者建议我们基于Git活动对各个主流的开源IaaS项目的开发情况进行分析。我们注意到CloudStack、Eucalyptus、OpenNebula以及OpenStack项目都使用git进行版本管理。因此,我们基于来自git的日志数据对这几个项目的开发情况进行了一些简单的分析。需要指出的是,对于OpenStack项目来说,其数据源包括了托管在github.com上的openstack项目(包含57个子项目)和openstack-infra项目(包含33个子项目)。

在CY13-Q1报告中,我们使用了git log这个命令获取日志数据。从CY13-Q2起,我们将使用git log –no-merges这个命令获取日志数据。

值得一提的是,git是一个分布式的版本管理系统。使用git进行版本管理,开发人员通常都是在本地代码库上工作。当开发人员执行一次commit操作时,对应的代码改动只被记录在本地代码库中。除非时开发人员执行了push操作,这些代码改动不会被反映到主代码库中。很多开发人员倾向于在积累了一定数量的commit之后才进行push操作。因此,一些最近发生的commit操作不会被反映到我们的统计中来。根据我们的经验,上一个月的commit操作数量大概被低估了50%,而前一个月的commit操作数量大概被低估了20%。

01

图10所示分别是本文所述四个项目每个月进行提交(commit)操作的次数。总体来看,OpenStack项目提交代码的频率远远超过其他三个项目。这是由于OpenStack项目的数据源一共包含了90个子项目。CloudStack项目提交代码的频率也明显高于Eucalyptus和OpenNebula。与OpenNebula项目相比较,Eucalyptus项目提交代码的频率也不低,但是存在较大的波动情况,具有比较明显的批量更新的特征。OpenNebula项目提交代码的频率较低,平均每个月提交代码200次左右。

01

图11所示分别是OpenStack项目中各个子项目目每个月提交代码的次数。总体来看,Nova子项目提交代码的频率较高,是其他几个子项目的三倍左右。值得注意的是,尽管各个子项目提交代码的频率各不相同,但是其时程曲线基本上是一致的,波峰和波谷基本上出现在相同的时间。这个现象表明尽管OpenStack项目中各个子项目相对独立,但是具有相同或者相似的开发计划和开发进度。可以认为,OpenStack项目在子项目的管理和协调方面是做得比较好的。

01

图12所示分别是本文所述四个项目每个月提交代码的人数。总体来看,OpenStack项目提交代码的人数远远超过其他三个项目,并且一直保持迅猛增长的势头。CloudStack项目提交代码的人数也有所增长,但是其增长速度较为缓慢。Eucalyptus项目和OpenNebula项目提交代码的人数相对较少,并且在过去12个月当中基本上没有增长。

01

图13所示分别是OpenStack项目中各个子项目每个月提交代码的人数。总体来看,Nova子项目提交代码的人数较多,是其他几个子项目的三倍左右。

01

人们通常通过代码贡献者所使用的电子邮件地址来识别其所在的机构。尽管这种方式存在较大的缺陷(例如有一些机构鼓励雇员以个人的名义向开源项目贡献代码),但是还是可以从某种程度上揭示不同机构对某个开源项目的贡献力度。图14所示分别是每个月向本文所述四个项目提交代码(commit操作)的邮件地址所属域名数量。总体来看,OpenStack项目提交代码的域名数量远远超过其他三个项目,并且一直保持迅猛增长的势头。CloudStack项目提交代码的域名数量也有所增长,但是其增长速度较为缓慢。Eucalyptus项目和OpenNebula项目提交代码的域名数量相对较少,并且在过去12个月当中基本上没有增长。

01

图15所示分别是OpenStack项目中各个子项目每个月提交代码(commit操作)的域名数量。总体来看,Nova子项目提交代码的域名数量较多,是其他几个子项目的三倍左右。

下面这个表格以电子邮件所属域名的形式列出了在CY14-Q1期间向CloudStack、Eucalyptus、OpenNebula和OpenStack项目贡献代码次数最多的机构(以及贡献代码次数的百分比)。可以看出,Eucalyptus和OpenNebula属于以单一机构为主导的开源项目,而CloudStack和OpenStack属于由多家机构共同合作的开源项目。对于CloudStack项目来说,来自Citrix的影响依然非常明显,直接来自citrix.com和cloud.com的邮件地址占了44%(与CY13-Q4相比较上升了3%)。对于OpenStack项目来说,来自RedHat的贡献占了15%,来自IBM的贡献占了11%,随后分别十Mirantis (5%)、Rackspace (4%)、HP (4%)、Suse (3%)、Enovance (2%)和Huawei (1%)。

 CloudStack  Eucalyptus  OpenNebula  OpenStack
 Domain  %  Domain  %  Domain  %  Domain  %
 citrix.com  40  eucalyptus.com  80  opennebula.org  92  redhat.com  15
 gmail.com  19  gmail.com  19  c12g.com  5  gmail.com  15
 apache.org  13  fedoraproject.org  1  cuesoft.eu  1  ibm.com  11
 clogeny.com  5  openstack.org  7
 shubergphilis.com  5  mirantis.com  5
 cloud.com  4  rackspace.com  4
 leaseweb.com  3  hp.com  4
 netapp.com  1  suse.de  3
 betterservers.com  1  enovance.com  2
 cloudops.com  0.6  huawei.com  1

下面这个表格以电子邮件所属域名的形式列出了在CY14-Q1期间向OpenStack项目各个子项目贡献代码最多的机构(以及贡献代码次数的百分比)。

 Cinder  Glance  Horizon  Keystone
 Domain  %  Domain  %  Domain  %  Domain  %
 redhat.com  16  ibm.com  20  gmail.com  23  ibm.com  27
 huawei.com  12  gmail.com  13  redhat.com  11  dstanek.com  12
 gmail.com  12  enovance.com  9  hp.com  11  redhat.com  10
 ibm.com  9  mirantis.com  7  sheep.art.pl  10  metacloud.com  8
 openstack.org  8  rackspace.com  7  intel.com  7  gmail.com  7
 solidfire.com  7  huawei.com  6  da.jp.nec.com  5  dreamhost.com  5
 netapp.com  4  dmllr.de  5  mirantis.org  4  openstack.org  3
 hp.com  4  yahoo.com  5  openstack.org  3  huawei.com  3
 ebaysf.com  2  hp.com  3  enovance.com  3  hp.com  2
 dmllr.de  2  openstack.org  2  cloudwatt.com  2  mirantis.com  2

 

 Nova  Neutron  Swift
 Domain  %  Domain  %  Domain  %
 ibm.com  21  gmail.com  17  swiftstack.com  25
 redhat.com  20  openstack.org  14  redhat.com  18
 gmail.com  11  mirantis.com  10  gmail.com  17
 vmware.com  6  nicira.com  6  enovance.com  8
 rackspace.com  5  redhat.com  5  not.mn  6
 hp.com  4  ibm.com  4  brim.net  4
 intel.com  3  da.jp.nec.com  4  kotori.zaitcev.us  3
 huawei.com  2  cisco.com  3  rackspace.com  2
 stillhq.com  2  unitedstack.com  3  hgst.com  2
 openstack.org  2  hp.com  2  intel.com  1  -

01

累计开发者人口指的是曾经向某个项目提交过代码的开发者总数。图16所示为这四个项目的开发者人口增长趋势。可以看出,OpenStack项目的累计开发者人口最大,是第二名CloudStack的10倍左右。

01

累计贡献机构数量指的是曾经向某个项目提交过代码的机构数量。图16所示为这四个项目的累计贡献机构数量增长趋势。可以看出,OpenStack项目的累计贡献机构数量最多,是CloudStack和Eucalyptus的5倍左右。OpenNebula的累计贡献机构数量相对较少。

与本文相关的PDF版本幻灯片可以从<http://www.qyjohn.net/wp-content/uploads/2014/04/CY14-Q1-IaaS-Community-Analysis.pdf”>这里下载。如果您需要重新分发本文内容,敬请您保留相关作者信息。

CY14-Q1 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack

By , April 14, 2014 3:46 pm

01

This article is an update version of my previous article CY13-Q4 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack. Readers who are intested in further discussions please contact me via email at the above-mentioned address.

A Chinese version of this article is published at the same time, which can be found at CY14-Q1 OpenStack, OpenNebula, Eucalyptus, CloudStack社区活跃度比较.

This community analysis project was initiated in CY11-Q4, and this particular report is the 10th quarterly report being published since. The opinion presented in this report belongs strictly to the author rather than any current or previous employer of the author.

02

The objective of this quarterly report is to compare the OpenStack, OpenNebula, Eucalytpus and CloudStack user and developer communities, base on the communications between community members in the form of mailing lists or pubic forum discussions. The data being discussed include the total number of topics (threads), messages (posts), and participants (unique email addresses or registered members). To obtain the above-mentioned data, a Java program was written to retrieve all the forum posts and mailing list messages into a MySQL database for further processing. The analysis results were presented in the form of graphs generated by LibreOffice.

Eucalyptus EOL’ed its technical support forum (engage) during this quarter, and direct all user discussions to its google group. This produces a significant impact on its discussion traffic.

During the past several years, some of the early forums and mailing lists became EOL’ed and were no longer accessible. The MySQL database that was built at the beginning of this project, as well as the previous versions of this quarterly report, make it possible to carry out analysis since the beginning of each project.

For projects with multiple membership systems (such as a forum and a mailing list), extensive efforts were carried out to eliminate membership double counting (counting one person twice or more in the statistics).

03

04

Figure 1 and 2 represent the monthly number of topics (threads) and posts (messages). It can be seen that

(1) During the past 12 months, OpenStack-related discussions and CloudStack-related discussions were approximately on the same level, while Eucalyptus-related discussions and OpenNebula-related discussions were approximately on the same level.

(2) During the past 12 months, the volume of OpenStack and CloudStack related discussions were much higher than that of Eucalyptus and OpenNebula.

05

Generally speaking, the number of replies to a specific topic represents the attention being received, and the depth of discussion for that particular topic. When the number of master posts (the original post that started a particular topic) is more than the number of replies, it is safe to conclude that the participation of the forum or mailing list is very low. Therefore, the ratio between “the number of  posts” and “the number of topics” represents the participation rate of an online community. In this study we call this ratio the Participation Ratio.

As can be seen from Figure 3, during the past 12 months the participation ratios of CloudStack and Eucalyptus were relatively higher, which were close to 4;  the participation ratios of OpenStack and OpenNebula were relatively lower, which were a little bit higher than to 2.

We do notice that the concept of “participation ratio” generated some disagreements. Some people think that a lower “post-to-thread ratio” represents the ability to resolve problems in a very short time, therefore only a very limited number of discussions are needed. Some people think that a higher “post-to-thread ratio” might be an indicator that the community are on flame, and during a flame a large portion of the posts might be off-topic. Anyway, we agree that when we call this parameter “participation ratio” it somewhat represents our own opinion and it undermines the objectiveness of this report. However, because we do not find a better name to represent this parameter, we will just use it for the time being. (Dear readers, you are more than welcome to contribute a better name for this parameter.)

06

Figure 4 shows the number of monthly participants of the four projects being discussed. It can be seen that the number of active participants of OpenStack is much higher than the other three projects. The number of active participants of CloudStack is also significantly higher than OpenNebula and Eucalyptus. During the past 12 months, the number of active participants for OpenStack and CloudStack were growing steadily (100% growth for OpenStack, and 50% growth for CloudStack), while the number of active participants for Eucalyptus and OpenNebula exhibited no significant growth.

It should be noted that although the number of active participants of the CloudStack project is somewhat smaller than that of the OpenStack project, both projects have approximately the same amount of discussions (as shown in Figure 1 and Figure 2).

07

Accumulated Community Population refers to the total number of users and developers who have participated in forum or mailing list discussions. (This number does not include those who have registered into discussion forums or mailing lists but have never participated in any open discussions.) These are people who have tested or used a specific product for a while, but not necessary currently an active user.

Figure 5 shows the growth of the accumulated community populations of these 4 projects. Currently OpenStack has the larges accumulated community population, followed by Eucalyptus, CloudStack, and OpenNebule.

The problem is, after years of changes, a long-term (such as 4 to 5 years) accumulated community population might not be a good reference for community activeness. Some of the early members of one community might have switched to other communities (and probably more than once), some of the early community medium (such as mailing lists and forums) might have become EOL’ed. From a community analysis point of view, it might be better to count the accumulated community population of the past 6 to 12 months, while extending the range to the dinosaurs age will make this parameter meaningless.

08

We are seeing increasing number of suggestions to analyze the git activities of these open source IaaS projects. We also noticed that all of these four projects use git as the SVM for their source code. Starting from our CY13-Q1 report, we tried to do some basic analysis base on the git log data. It should be noted that for the OpenStack project, the data source includes all the sub-projects under openstack (57 sub-projects) and openstack-infra (33 sub-projects) on github.com.

In our CY13-Q1 report, we used “git log” to obtain log information. Starting from CY13-Q2, we will use “git log –no-merges” to obtain log information.

It should be pointed out that git is a distributed versioning system. With git, developers work with their own local repositories. When a developer executes a commit operation, the code changes are make to the local repositories, and will not be reflected in the master repository until such commits are pushed to and merged with the master repository. It is common practice that developers tend to accumulate many commits before they feel comfortable to make a push. Therefore, some of the recent commits might not get counted towards this analysis. Based on our observations, there exists about 50% under estimation in the number of commits for the previous month, and about 20% under estimation in the number of commits for the month before.

09

Figure 10 shows the monthly number of commit operations for these four projects. Generally speaking, the commit frequency of OpenStack is much higher than the commit frequencies of the other three projects. This is because the data source for OpenStack includes a total number of 90 sub-projects, which is far greater than the other three projects. The commit frequency of CloudStack is also significantly higher than Eucalyptus and OpenNebula. As compared to OpenNebula, Eucalyptus was also committing more frequently, but with significant fluctuations from month to month, which seems to be a typical batch-commit behavior. The commit frequency of the OpenNebula project is relatively small, with an average of 200 commits per month.

10

Figure 11 shows the monthly number of commit operations for the sub-projects of OpenStack. Generally speaking, the commit frequency of the Nova sub-project is about 3 times as high as the other sub-projects. It should be noted that although the commit frequency of these sub-projects are different, but they exhibit similar time-series curves, and their highs and lows occur at the same period of time. This indicates that although these sub-projects are relatively independent, but they work around the same development plan and the same release schedule. This is an indicator that the OpenStack project is well organized in terms of sub-project management.

11

Figure 12 shows the monthly number of contributors (identified by unique email addresses) for these projects. Generally speaking, the number of OpenStack contributors is much higher than the other three projects, and is growing rapidly. The number of CloudStack contributors also exhibits some growth, but the growth is relatively slow. The number of Eucalyptus and OpenNebula contributors is relatively small, and does not exhibit growth during the past 12 months.

12

Figure 13 shows the monthly number of contributors (identified by unique github.com accounts) for the sub-projects of OpenStack. It can be seen that the number of Nova contributors is about 3 times as big as the other sub-projects.

13

People usually try to identify the institute to which a contributor belongs to by his/her email address. It is true that such method is defect in nature (different institutes have different policies regarding contributing to open source projects, some institutes even encourage their employees to contribute to open source projects with their personal account), but still this parameter can be used to show the contributions of certain institutes to certain open source projects. Figure 14 shows the monthly number of unique institutes (identified by the domain name of the contributor’s email address) contributing to these projects. We can see that the number of contributing institutes for OpenStack is much larger than the other three projects, and is growing rapidly. The number of contributing institutes for CloudStack is also growing, but at a relatively slow pace. The number of contributing institutes to Eucalyptus and OpenStack is relatively small, and does not exhibit any growth during the past 12 months.

14

Figure 15 shows the monthly number of contributing institutes to the sub-projects of OpenStack. It can be seen that the number of contributing institutes for Nova is about 3 times as big as the other sub-projects.

The following table lists those institutes that make the most contributions to these projects during CY14-Q1, according to the number of commit operations, along with the percentage of their commit operations. It can be seen that both Eucalyptus and OpenNebula are open source projects dominated by single institutes, while CloudStack and OpenStack are open source projects contributed by multiple institutes. For the CloudStack projects, influence from Citrix is still quite obvious, over 44% of the commits come from accounts belonging to citrix.com and cloud.com (3% increase as compared to CY13-Q4). For the OpenStack project, redhat.com contributed to 15% of the commits, while ibm.com contributed 11% of the commits, followed by mirantis.com (5%), rackspace.com (4%), hp.com (4%), suse.de (3%), enovance.com ( 2%), and huawei.com (1%).

 CloudStack  Eucalyptus  OpenNebula  OpenStack
 Domain  %  Domain  %  Domain  %  Domain  %
 citrix.com  40  eucalyptus.com  80  opennebula.org  92  redhat.com  15
 gmail.com  19  gmail.com  19  c12g.com  5  gmail.com  15
 apache.org  13  fedoraproject.org  1  cuesoft.eu  1  ibm.com  11
 clogeny.com  5  openstack.org  7
 shubergphilis.com  5  mirantis.com  5
 cloud.com  4  rackspace.com  4
 leaseweb.com  3  hp.com  4
 netapp.com  1  suse.de  3
 betterservers.com  1  enovance.com  2
 cloudops.com  0.6  huawei.com  1

The following table lists those institutes that make the most contributions to the sub-projects of OpenStack during CY14-Q1, along with the percentage of their commit operations.

 Cinder  Glance  Horizon  Keystone
 Domain  %  Domain  %  Domain  %  Domain  %
 redhat.com  16  ibm.com  20  gmail.com  23  ibm.com  27
 huawei.com  12  gmail.com  13  redhat.com  11  dstanek.com  12
 gmail.com  12  enovance.com  9  hp.com  11  redhat.com  10
 ibm.com  9  mirantis.com  7  sheep.art.pl  10  metacloud.com  8
 openstack.org  8  rackspace.com  7  intel.com  7  gmail.com  7
 solidfire.com  7  huawei.com  6  da.jp.nec.com  5  dreamhost.com  5
 netapp.com  4  dmllr.de  5  mirantis.org  4  openstack.org  3
 hp.com  4  yahoo.com  5  openstack.org  3  huawei.com  3
 ebaysf.com  2  hp.com  3  enovance.com  3  hp.com  2
 dmllr.de  2  openstack.org  2  cloudwatt.com  2  mirantis.com  2

 

 Nova  Neutron  Swift
 Domain  %  Domain  %  Domain  %
 ibm.com  21  gmail.com  17  swiftstack.com  25
 redhat.com  20  openstack.org  14  redhat.com  18
 gmail.com  11  mirantis.com  10  gmail.com  17
 vmware.com  6  nicira.com  6  enovance.com  8
 rackspace.com  5  redhat.com  5  not.mn  6
 hp.com  4  ibm.com  4  brim.net  4
 intel.com  3  da.jp.nec.com  4  kotori.zaitcev.us  3
 huawei.com  2  cisco.com  3  rackspace.com  2
 stillhq.com  2  unitedstack.com  3  hgst.com  2
 openstack.org  2  hp.com  2  intel.com  1  ————————-

15

Accumulated Developer Population refers to the total number of developers who have contributed code to a particular project (as reflected in git commits). Figure 16 shows the growth of the accumulated developer populations of these 4 projects. Currently OpenStack has the largest accumulated developer population, which is about 10 times bigger than the distant number 2 CloudStack.

16

Accumulated Contributing Organizations refers to the total number of organizations (as reflected in unique domain names associated with developer email addresses) who have contributed code to a particular project (as reflected in git commits). Figure 17 shows the growth of the accumulated contributing organizations of these 4 projects. Currently OpenStack has the largest number of contributing organizations, which is 5 times larger than CloudStack and Eucalyptus. OpenNebula has the smallest number of contributing organizations.

For your convenience, a PDF version of this presentation can be downloaded from here. Please kindly keep the author information if you want to redistribute the content.

A Side Note:

My family recently relocated from China to Australia. I am currently located in Sydney, with full-time employment permissions. I am looking for job opportunities in Sydney, preferably related to cloud computing. I am an AWS Certified Solutions Architect (Associate Level), and an AWS Certified Developer (Associate Level), with extensive hands-on experience working with various private and pubic cloud platforms.  I am particular strong  in market analysis, technology evangelism, as well as customer facing technical services. You can refer to my LinkedIn profile for details about my professional experiences, as well as comments from my previous employers. You can always reach me via email (qjiang@ieee.org) for a discussion.

 

Data Source on the Economics of Computing Resource Market

By , April 2, 2014 6:22 pm

This blog post is created to provide additional reference materials for “Infrastructure-as-a-Service from a Business and Energy Consumption Perspective“, an article authored by Qingye Jiang, Young Choon Lee, and Albert. Y. Zomaya at the School of Information Technologies in the University of Sydney.

A wide range of data sources are utilized to facilitate this research, including Gartner’s quarterly reports on worldwide server shipments, AWS official blog entries regarding price deduction and resource usage, Rackspace’s annual financial reports, performance and power consumption parameters from the Top500 supercomputer list, as well as some other web pages and public accessible presentations. The majority of these data sources are not in the form of a peer-reviewed academic article, but in the form of a public accessible URL. Since the amount of URL’s to be referred to would overwhelm the limited references list, we created this dedicated blog entry, with links to all the data sources being used in this article. The authors will also update this blog entry as new data source related to the topic becomes available, which will be helpful for further researches on similar subjects.

[Gartner Quarterly Report on Worldwide Server Shipments]

2006-Q1:

http://www.gartner.com/newsroom/id/493001

2006-Q2:

http://www.gartner.com/newsroom/id/495891

2006-Q3:

http://www.gartner.com/newsroom/id/498468

2006-Q4:

http://www.gartner.com/newsroom/id/501405

2007-Q1:

http://www.gartner.com/newsroom/id/506336

2007-Q2:

http://www.gartner.com/newsroom/id/513509

2007-Q3:

http://www.gartner.com/newsroom/id/550315

2007-Q4:

http://www.gartner.com/newsroom/id/608710

2008-Q1:

http://www.gartner.com/newsroom/id/713321

2008-Q2:

http://www.gartner.com/newsroom/id/745516

2008-Q3:

http://www.gartner.com/newsroom/id/823712

2008-Q4:

http://www.gartner.com/newsroom/id/905914

2009-Q1:

http://www.gartner.com/newsroom/id/1000326

2009-Q2:

http://www.gartner.com/newsroom/id/1161313

2009-Q3:

http://www.gartner.com/newsroom/id/1238521

2009-Q4:

http://www.gartner.com/newsroom/id/1307717

2010-Q1:

http://www.gartner.com/newsroom/id/1375038

2010-Q2:

http://www.gartner.com/newsroom/id/1426834

2010-Q3:

http://www.gartner.com/newsroom/id/1479923

2010-Q4:

http://www.gartner.com/newsroom/id/1561014

2011-Q1:

http://www.gartner.com/newsroom/id/1700715

2011-Q2:

http://www.gartner.com/newsroom/id/1776714

2011-Q3:

http://www.gartner.com/newsroom/id/1859415

2011-Q4:

http://www.gartner.com/newsroom/id/1935717

2012-Q1:

http://www.gartner.com/newsroom/id/2031115

2012-Q2:

http://www.gartner.com/newsroom/id/2139315

2012-Q3:

http://www.gartner.com/newsroom/id/2254815

2012-Q4:

http://www.gartner.com/newsroom/id/2351518

2013-Q1:

http://www.gartner.com/newsroom/id/2497015

2013-Q2:

http://www.gartner.com/newsroom/id/2580515

2013-Q3:

http://www.gartner.com/newsroom/id/2632515

2013-Q4:

http://www.gartner.com/newsroom/id/2671315

[AWS Price Deduction History, S3 Growth, and EC2 Server Growth]

2008-10-09

http://aws.typepad.com/aws/2008/09/new-release-of.html

2009-10-27

http://aws.typepad.com/aws/2009/10/amazon-ec2-now-an-even-better-value.html

2009-12-08

http://aws.typepad.com/aws/2009/12/aws-price-reductions.html

2010-11-01

http://aws.typepad.com/aws/2010/11/what-can-i-say-another-amazon-s3-price-reduction.html

2012-02-06

http://aws.typepad.com/aws/2012/02/amazon-s3-price-reduction.html

2012-03-05

http://aws.typepad.com/aws/2012/03/dropping-prices-again-ec2-rds-emr-and-elasticache.html

2012-10-31

http://aws.typepad.com/aws/2012/10/new-ec2-second-generation-standard-instances-and-price-reductions-1.html

2012-11-29

http://aws.typepad.com/aws/2012/11/amazon-s3-price-reduction-december-1-2012.html

2013-02-01

http://aws.typepad.com/aws/2013/02/ec2s-m3-global-reduced-ec2-bandwidth.html

2013-11-05

http://aws.typepad.com/aws/2013/11/prices-reduced-for-ec2s-m3-second-generation-standard-instances.html

2014-01-21

http://aws.typepad.com/aws/2014/01/aws-update-new-m3-features-reduced-ebs-prices-reduced-s3-prices.html

2012-04-05

http://aws.typepad.com/aws/2012/04/amazon-s3-905-billion-objects-and-650000-requestssecond.html

2013-04-18

http://gigaom.com/2013/04/18/amazon-s3-goes-exponential-now-stores-2-trillion-objects/

[Rackspace Financial and Server Data]

2009:

http://www.getfilings.com/sec-filings/100226/RACKSPACE-HOSTING-INC_10-K/

2011:

http://www.sec.gov/Archives/edgar/data/1107694/000110769412000010/rax1231201110-k.htm

2012:

http://www.sec.gov/Archives/edgar/data/1112920/000119312513132408/d446382d10k.htm

2013:

http://biz.yahoo.com/e/140303/rax10-k.html

[SoftLayer Financial and Server Data]

SoftLayer Company Overview (2009.12):

SoftLayer Media Kit (2009 Q1)

SoftLayer Presentation for Telx Cbx (2010.06)

SoftLayer Presentation at Stifel Nicolaus Event (2011.02)

SoftLayer Presentation (2012.06)

[CERN Computing Center and LHC Grid]

CERN Agile Infrastructure – Road to Production

CERN Infrastructure Evolution V2

OpenStack in Production

http://openstack-in-production.blogspot.ch/2013/09/a-tale-of-3-openstack-clouds-50000.html

Operating Dedicated Data Centers – Is It Cost-Effective?

http://indico.cern.ch/event/247864/material/slides/0?contribId=60&sessionId=1

[NASA JPL]

Making IT Rain with Cloud Computing

[Estimations on the Size of AWS]

http://www.cloudscaling.com/blog/cloud-computing/amazons-ec2-generating-220m-annually/

http://huanliu.wordpress.com/2012/03/13/amazon-data-center-size/

[Supercomputer Performance and Power Consumption]

CDC 6600

http://en.wikipedia.org/wiki/CDC_6600

CDC 7600

http://en.wikipedia.org/wiki/CDC_7600

Cray 1

http://en.wikipedia.org/wiki/Cray-1

GF11

http://dl.acm.org/citation.cfm?id=327139

Cray 2

http://en.wikipedia.org/wiki/Cray-2

Numerical Wind Tunnel

http://en.wikipedia.org/wiki/Numerical_Wind_Tunnel_(Japan)

ASCI Red

http://en.wikipedia.org/wiki/ASCI_Red

ASCI White

http://en.wikipedia.org/wiki/ASCI_White

ASCI Q

http://top500.org/system/6359

Earth Simulator

http://www.top500.org/system/167148

BlueGene/L

http://www.top500.org/system/175171

JUGENE (BlueGene/P)

http://www.top500.org/system/176321

Cray XT5

http://www.top500.org/system/176208

Jaguar (XT5)

http://www.top500.org/system/176544

Tianhe-1A

http://www.top500.org/system/176929

K Computer

http://www.top500.org/system/177232

Titan (XK7)

http://www.top500.org/system/177975

Tianhe-2

http://www.top500.org/system/177999

 

CY13-Q4 OpenStack, OpenNebula,Eucalyptus,CloudStack社区活跃度比较

By , January 2, 2014 9:39 pm

本文是对《CY13-Q3 OpenStack, OpenNebula,Eucalyptus,CloudStack社区活跃度比较》一文的补充和更新。对本文内容感兴趣的读者,可以通过电子邮件或者新浪微博(@qyjohn_)与我联系。

本文同时发布了一个英文版本,可以参见CY13-Q4 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack这个帖子。

这个社区活跃度比较项目起源于CY11-Q4,此篇报告是到目前为止已经发布的第九个季度性的报告。尽管作者于2012年10月至2013年7 月间曾短暂地就职于Eucalyptus公司,但是本文中所表达之观点完全是作者本人的观点,而非作者目前或以前所在公司的观点。

01

本文的目的是通过论坛和邮件列表的原始数据对OpenStack、OpenNebula、Eucalyptus和CloudStack项目的社区活跃度进 行分析和比较。主要的原始数据是自2009年来这四个项目的官方论坛和邮件列表每个月所产生的讨论主题数、帖子数、以及参与讨论的总人数(邮件地址或者用 户账号)。为了获取这些数据,我写了一个Java程 序自动地从这四个项目的网站下载了所有的论坛和邮件列表信息,并且从这些信息中分析提取出我所需要的数据。程序提取的数据被导入MySQL数据库中以便进 行统计分析,统计分析的结果通过LibreOffice生成分析图表。

在过去几年种,有一些早期的论坛和邮件列表已经被停用,这些数据我们再也不能够访问到了。幸好我们有这个项目刚刚启动时所创建的MySQL数据库,还有过去每个季度所发布的季度性报告,使得我们可以对每个项目进行完整的分析。

对于具有多个会员系统的项目(例如一个论坛和一个邮件列表),我们采取了大量的措施来消除会员重复计数(同一个人被当成不同的会员计算了两次或者两次以上)的情况。

03

04

图1 和图2分别是如上所述四个项目每个月所产生的讨论主题数和帖子数。可以看出:

(1) 在过去12个月中,与OpenStack和CloudStack相关的讨论数量在同一水平上,与Eucalyptus和OpenNebula相关的讨论数量在同一水平上;

(2) 在过去12个月中,与OpenStack和CloudStack相关的讨论数量远大于与Eucalyptus和OpenNebula相关的讨论数量。

05

通常来讲,一个讨论主题得到的回复数越多,表明该主题的讨论越深入。一个论坛或者邮件列表如果只有主帖而没有回复,说明这个社区的参与程度很低。因此,平 均意义上的“讨论帖子数/讨论主题数”则反映了一个社区的参与程度,这里我们暂且称之为参与度(Participation Ratio)。

由图3 可以看出,在过去12个月中CloudStack和Eucalyptus项目的参与度相对较高高,接近于4;OpenStack与OpenNebula项目的参与度相对较低,接近于2。

我们也注意到参与度这个概念引起了一些争议。有些人认为“讨论帖子数/讨论主题数”较低象征着某个社区具有快速解决问题的能力,社区成员所提出的问题能够在很短时间内得到解答,因此不需要多个帖子来解决一个问题。有些人认为“讨论帖子数/讨论主题数”较高可能意味着某个社区可能出现了争论,而这种争论可能已经偏离了某个社区的讨论方向和范围。无论如何,参与度这个名称的确反映了我们的某些主观看法,在一定程度上削弱了本报告的客观性。由于我们暂时没有找到一个更加合适的替代名称,在这个报告里面还是延用原来的名称(欢迎各位读者贡献更好的参数名称)。

06

图4 所示为这四个项目每个月参与论坛或者邮件列表讨论的总人数。可以看出,OpenStack项目的活跃用户数量要远大于其他三个项目。CloudStack项目的活跃用户数量也明显大于OpenNebula和Eucalyptus。在过去12个月中,CloudStack和OpenStack项目的活跃用户数量都在稳步攀升(OpenStack项目有100%的增长,CloudStack项目有50%的增长),而Eucalyptus和OpenNebula项目的活跃用户数量基本上没有增长。

值得一提的是,虽然CloudStack的活跃用户数量稍微小于OpenStack,这两个项目的主题和帖子数量是基本相当的(参见图1和图2)。

07

累计社区人口(简称社区人口)指的是曾经通过论坛或者邮件列表参与过讨论的用户和开发者总数。(不包括在论坛或者邮件列表中注册但是从未公开参与讨论的社 区成员。)这些人或多或少地使用过相关产品,但是并不代表他们目前还是活跃用户。图5 所示为这四个项目的社区人口增长趋势。可以看出,OpenStack与Eucalyptus项目的社区人口遥遥领先,CloudStack与OpenNebula项目的社区人口相对较低。

问题在于,开源IaaS软件经过这么多年的发展,长期累计社区人口的意义已经越来越弱。一方面,某些早期用户可能已经多次改变了阵营;另一方面,某些早期论坛和邮件列表已经结束了历史使命。从社区活跃度的角度来看,我们认为最近6 个月或者最近12个月的累计社区人口可能是有意义的,但是将累计社区人口无限制地延伸到侏罗纪时代,可能会使这个参数失去实用价值。

08

图6 所示为这四个项目每个月新增加的社区人口数量。在过去六个月中,CloudStack与OpenStack的社区人口增长速度基本相当。

与CloudStack和OpenStack向比较,Eucalyptus和OpenNebula的社区人口增长较为缓慢。

09

图7 是图4 与图6的重新组合。其中,实线部分表示的是每个月参与论坛或者邮件列表讨论的人数,虚线部分表示的是每个月新加入论坛或者邮件列表的人数。

在过去12个月中,OpenStack与CloudStack项目的新增人口占当月活跃用户的30%左右,OpenNebula与Eucalyptus项目大概是50%。如 果不考虑社区人口的规模的话,可以认为OpenStack与CloudStack社区的粘性大于OpenNebula与Eucalyptus社区。

10

图8 所示分别是本文所述四个项目的社区人口,过去一个季度的活跃用户数量,以及过去一个月的活跃用户数量。可以看出:

(1) OpenStack的社区人口最多,然后是Eucalyptus、CloudStack、OpenNebula;

(2) 在过去一个季度中,OpenStack的活跃人口最多,然后是CloudStack、Eucalyptus、OpenNebula;

(3) 在过去一个月份中,OpenStack的活跃人口最多,然后是CloudStack、Eucalyptus、OpenNebula。

我们还计算了本季度活跃人口与累计社区人口的比值。对OpenStack项目来说,这个比值是32.4%;对于CloudStack项目来说,这个比值是21.3%;对于OpenNebula项目来说,这个比值是10.5%;对于Eucalyptus项目来说,这个比值是4.8%。很显然,相当部分的OpenStack和CloudStack社区成员选择了留下,而大部分的Eucalyptus社区成员选择了离开。这与我们在图7 中所观察到的现象(OpenStack与CloudStack社区的粘性大于OpenNebula与Eucalyptus社区)是一致的。

11

在CY12-Q3分析中,我们首次提出了“社区活跃度指数”这样一个参数。从CY13-Q3开始,这个参数修改为如下几个参数的组合:

(1) 本季度帖子总数,代表相关讨论的规模;

(2) 本季度参与度,代表每个问题获得的回帖数量;

(3) 本季度活跃用户,代表从社区获得帮助的可能性(长期)。

在这个分析中,我们选择如上各个社区的平均值作为参考数据,并将每个社区的数据与参考数据进行比较。我们将每个社区各个参数与平均值的比值之和称为“社区活跃度指数”。可以认为,社区活跃度指数最高的项目,是最活跃的项目。

从图9 中可以看出,目前OpenStack项目的“社区活跃度指数”最高(以明显的优势领先),然后是CloudStack、OpenNebula、Eucalyptus。

12

曾经有多位读者建议我们基于Git活动对各个主流的开源IaaS项目的开发情况进行分析。我们注意到CloudStack、Eucalyptus、OpenNebula以及OpenStack项目都使用git进行版本管理。因此,我们基于来自git的日志数据对这几个项目的开发情况进行了一些简单的分析。需要指出的是,对于OpenStack项目来说,其数据源包括了托管在github.com上的openstack项目(包含57个子项目)和openstack-infra项目(包含33个子项目)。

在CY13-Q1报告中,我们使用了git log这个命令获取日志数据。从CY13-Q2起,我们将使用git log –no-merges这个命令获取日志数据。

值得一提的是,git是一个分布式的版本管理系统。使用git进行版本管理,开发人员通常都是在本地代码库上工作。当开发人员执行一次commit操作时,对应的代码改动只被记录在本地代码库中。除非时开发人员执行了push操作,这些代码改动不会被反映到主代码库中。很多开发人员倾向于在积累了一定数量的commit之后才进行push操作。因此,一些最近发生的commit操作不会被反映到我们的统计中来。根据我们的经验,上一个月的commit操作数量大概被低估了50%,而前一个月的commit操作数量大概被低估了20%。

13

图10所示分别是本文所述四个项目每个月进行提交(commit)操作的次数。总体来看,OpenStack项目提交代码的频率远远超过其他三个项目。这是由于OpenStack项目的数据源一共包含了90个子项目。CloudStack项目提交代码的频率也明显高于Eucalyptus和OpenNebula。与OpenNebula项目相比较,Eucalyptus项目提交代码的频率也不低,但是存在较大的波动情况,具有比较明显的批量更新的特征。OpenNebula项目提交代码的频率较低,平均每个月提交代码200次左右。

14

图11所示分别是OpenStack项目中各个子项目目每个月提交代码的次数。总体来看,Nova子项目提交代码的频率较高,是其他几个子项目的三倍左右。值得注意的是,尽管各个子项目提交代码的频率各不相同,但是其时程曲线基本上是一致的,波峰和波谷基本上出现在相同的时间。这个现象表明尽管OpenStack项目中各个子项目相对独立,但是具有相同或者相似的开发计划和开发进度。可以认为,OpenStack项目在子项目的管理和协调方面是做得比较好的。

15

图12所示分别是本文所述四个项目每个月提交代码的人数。总体来看,OpenStack项目提交代码的人数远远超过其他三个项目,并且一直保持迅猛增长的势头。CloudStack项目提交代码的人数也有所增长,但是其增长速度较为缓慢。Eucalyptus项目和OpenNebula项目提交代码的人数相对较少,并且在过去12个月当中基本上没有增长。

16

图13所示分别是OpenStack项目中各个子项目每个月提交代码的人数。总体来看,Nova子项目提交代码的人数较多,是其他几个子项目的三倍左右。

17

人们通常通过代码贡献者所使用的电子邮件地址来识别其所在的机构。尽管这种方式存在较大的缺陷(例如有一些机构鼓励雇员以个人的名义向开源项目贡献代码),但是还是可以从某种程度上揭示不同机构对某个开源项目的贡献力度。图14所示分别是每个月向本文所述四个项目提交代码(commit操作)的邮件地址所属域名数量。总体来看,OpenStack项目提交代码的域名数量远远超过其他三个项目,并且一直保持迅猛增长的势头。CloudStack项目提交代码的域名数量也有所增长,但是其增长速度较为缓慢。Eucalyptus项目和OpenNebula项目提交代码的域名数量相对较少,并且在过去12个月当中基本上没有增长。

18

图15所示分别是OpenStack项目中各个子项目每个月提交代码(commit操作)的域名数量。总体来看,Nova子项目提交代码的域名数量较多,是其他几个子项目的三倍左右。

下面这个表格以电子邮件所属域名的形式列出了在CY13-Q4期间向CloudStack、Eucalyptus、OpenNebula和OpenStack项目贡献代码次数最多的机构(以及贡献代码次数的百分比)。可以看出,Eucalyptus和OpenNebula属于以单一机构为主导的开源项目,而CloudStack和OpenStack属于由多家机构共同合作的开源项目。对于CloudStack项目来说,来自Citrix的影响依然非常明显,直接来自citrix.com和cloud.com的邮件地址占了41%(与CY13-Q3相比较降低了7%)。对于OpenStack项目来说,来自RedHat的贡献占了16%,来自IBM的贡献占了10%,来自Rackspace、Mirantis、HP、Suse的贡献各占了4%,来自eNovance的贡献各占了2%。值得一提的是,来自华为的贡献占了1%。

 CloudStack  Eucalyptus  OpenNebula  OpenStack
 Domain  %  Domain  %  Domain  %  Domain  %
 citrix.com  39  eucalyptus.com  77  opennebula.org  98  redhat.com  16
 gmail.com  20  gmail.com  21  cuesoft.eu  0.6  gmail.com  16
 apache.org  15  fedoraproject.org  1  ibm.com  10
 clogeny.com  6  openstack.org  8
 shubergphilis.com  5  rackspace.com  4
 leaseweb.com  3  mirantis.com  4
 cloud.com  2  hp.com  4
 netapp.com  1  suse.de  4
 betterservers.com  1  enovance.com  2
 cloudops.com  0.6  huawei.com  1

下面这个表格以电子邮件所属域名的形式列出了在CY13-Q4期间向OpenStack项目各个子项目贡献代码最多的机构(以及贡献代码次数的百分比)。

 Cinder  Glance  Horizon  Keystone
 Domain  %  Domain  %  Domain  %  Domain  %
 redhat.com  16  ibm.com  20  gmail.com  23  ibm.com  25
 huawei.com  13  gmail.com  15  redhat.com  12  dstanek.com  15
 gmail.com  11  enovance.com  10  hp.com  12  redhat.com  11
 openstack.org  9  mirantis.com  8  sheep.art.pl  10  gmail.com  7
 ibm.com  8  yahoo.com  7  da.jp.nec.com  6  metacloud.com  7
 solidfire.com  7  rackspace.com  7  intel.com  6  dreamhost.com  7
 netapp.com  4  dmllr.de  6  openstack.org  4  openstack.org  4
 hp.com  3  openstack.org  4  mirantis.com  3  huawei.com  3
 dmllr.de  2  mxa.nes.nec.co.jp  4  enovance.com  3  hp.com  2
 ebaysf.com  2  hp.com  3  ibm.com  1  ifca.unican.es  1

 

 Nova  Neutron  Swift
 Domain  %  Domain  %  Domain  %
 ibm.com  23  openstack.org  18  swiftstack.com  24
 redhat.com  23  gmail.com  17  gmail.com  19
 gmail.com  11  mirantis.com  8  redhat.com  19
 rackspace.com  4  nicira.com  7  not.mn  7
 vmware.com  4  redhat.com  7  enovance.com  5
 hp.com  3  ibm.com  4  brim.net  4
 intel.com  3  unitedstack.com  4  kotori.zaitcev.us  3
 openstack.org  2  da.jp.nec.com  3  hgst.com  2
 stillhq.com  2  cisco.com  3  rackspace.com  2
 codestud.com  1  enovance.com  2  weirdlooking.com  1  ————————-

19

累计开发者人口指的是曾经向某个项目提交过代码的开发者总数。图16所示为这四个项目的开发者人口增长趋势。可以看出,OpenStack项目的累计开发者人口最大,是第二名CloudStack的10倍左右。

我们还计算了上个季度活跃的开发者与累计开发者人口之比。分别是OpenStack 37.1%,CloudStack 35.4%,OpenNebula 25.6%,Eucalyptus 22.7%。

20

累计贡献机构数量指的是曾经向某个项目提交过代码的机构数量。图16所示为这四个项目的累计贡献机构数量增长趋势。可以看出,OpenStack项目的累计贡献机构数量最多,是CloudStack和Eucalyptus的5倍左右。OpenNebula的累计贡献机构数量相对较少。

我们还计算了上个季度活跃的贡献机构与累计贡献机构数量之比。分别是OpenStack 32.6%,CloudStack 26.2%,OpenNebula 24.0%,Eucalyptus 20%。

与本文相关的PDF版本幻灯片可以从这里下载。如果您需要重新分发本文内容,敬请您保留相关作者信息。

 

CY13-Q4 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack

By , January 2, 2014 9:39 pm

This article is an update version of my previous article CY13-Q3 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack. Readers who are intested in further discussions please contact me via email at the above-mentioned address.

A Chinese version of this article is published at the same time, which can be found at CY13-Q4 OpenStack, OpenNebula, Eucalyptus, CloudStack社区活跃度比较.

This community analysis project was initiated in CY11-Q4, and this particular report is the 9th quarterly report being published since. The author served shortly as the Director of Customer Success in China for Eucalyptus Systems Inc during October 2012 and July 2013. However, the opinion presented in this report belongs strictly to the author rather than any current or previous employer of the author.

01

The objective of this quarterly report is to compare the OpenStack, OpenNebula, Eucalytpus and CloudStack user and developer communities, base on the communications between community members in the form of mailing lists or pubic forum discussions. The data being discussed include the total number of topics (threads), messages (posts), and participants (unique email addresses or registered members). To obtain the above-mentioned data, a Java program was written to retrieve all the forum posts and mailing list messages into a MySQL database for further processing. The analysis results were presented in the form of graphs generated by LibreOffice.

During the past several years, some of the early forums and mailing lists became EOL’ed and were no longer accessible. The MySQL database that was built at the beginning of this project, as well as the previous versions of this quarterly report, make it possible to carry out analysis since the beginning of each project.

For projects with multiple membership systems (such as a forum and a mailing list), extensive efforts were carried out to eliminate membership double counting (counting one person twice or more in the statistics).

03

04

Figure 1 and 2 represent the monthly number of topics (threads) and posts (messages). It can be seen that

(1) During the past 12 months, OpenStack-related discussions and CloudStack-related discussions were approximately on the same level, while Eucalyptus-related discussions and OpenNebula-related discussions were approximately on the same level.

(2) During the past 12 months, the volume of OpenStack and CloudStack related discussions were much higher than that of Eucalyptus and OpenNebula.

05

Generally speaking, the number of replies to a specific topic represents the attention being received, and the depth of discussion for that particular topic. When the number of master posts (the original post that started a particular topic) is more than the number of replies, it is safe to conclude that the participation of the forum or mailing list is very low. Therefore, the ratio between “the number of  posts” and “the number of topics” represents the participation rate of an online community. In this study we call this ratio the Participation Ratio.

As can be seen from Figure 3, during the past 12 months the participation ratios of CloudStack and Eucalyptus were relatively higher, which were close to 4;  the participation ratios of OpenStack and OpenNebula were relatively lower, which were a little bit higher than to 2.

We do notice that the concept of “participation ratio” generated some disagreements. Some people think that a lower “post-to-thread ratio” represents the ability to resolve problems in a very short time, therefore only a very limited number of discussions are needed. Some people think that a higher “post-to-thread ratio” might be an indicator that the community are on flame, and during a flame a large portion of the posts might be off-topic. Anyway, we agree that when we call this parameter “participation ratio” it somewhat represents our own opinion and it undermines the objectiveness of this report. However, because we do not find a better name to represent this parameter, we will just use it for the time being. (Dear readers, you are more than welcome to contribute a better name for this parameter.)

06

Figure 4 shows the number of monthly participants of the four projects being discussed. It can be seen that the number of active participants of OpenStack is much higher than the other three projects. The number of active participants of CloudStack is also significantly higher than OpenNebula and Eucalyptus. During the past 12 months, the number of active participants for OpenStack and CloudStack were growing steadily (100% growth for OpenStack, and 50% growth for CloudStack), while the number of active participants for Eucalyptus and OpenNebula exhibited no significant growth.

It should be noted that although the number of active participants of the CloudStack project is somewhat smaller than that of the OpenStack project, both projects have approximately the same amount of discussions (as shown in Figure 1 and Figure 2).

07

Accumulated Community Population refers to the total number of users and developers who have participated in forum or mailing list discussions. (This number does not include those who have registered into discussion forums or mailing lists but have never participated in any open discussions.) These are people who have tested or used a specific product for a while, but not necessary currently an active user.

Figure 5 shows the growth of the accumulated community populations of these 4 projects. Currently OpenStack has the larges accumulated community population, followed by Eucalyptus, CloudStack, and OpenNebule.

The problem is, after years of changes, a long-term (such as 4 to 5 years) accumulated community population might not be a good reference for community activeness. Some of the early members of one community might have switched to other communities (and probably more than once), some of the early community medium (such as mailing lists and forums) might have become EOL’ed. From a community analysis point of view, it might be better to count the accumulated community population of the past 6 to 12 months, while extending the range to the dinosaurs age will make this parameter meaningless.

08

Figure 6 shows the monthly population growth of the four projects being discussed. During the past 6 months, the populations of OpenStack is growing much faster than the other three projects. CloudStack is also exhibiting significant growth, but not as fast as OpenStack.

The populations of Eucalyptus and OpenNebula are growing at relatively slow paces, as compared to CloudStack and OpenStack.

09

Figure 7 is a combination of Figure 4 and Figure 6. The solid lines represent the monthly participants, while the dash lines represent the monthly new members.

During the past 12 months, for OpenStack and CloudStack, around 30% of their monthly participants are new members.  For OpenNebula and Eucalyptus, around 50% of their monthly participants are new members. This indicates OpenStack and CloudStack communities are becoming more “sticky” than OpenNebula and Eucalyptus communities.

10

Figure 8 shows the total community population, active participants of the past quarter, and active participants of the past month, of the four projects being discussed. It can be seen that

(1) OpenStack has the largest total population, followed by Eucalyptus, CloudStack, and OpenNebula;

(2) OpenStack has the largest active population during the past quarter, followed by CloudStack, Eucalyptus, and OpenNebula;

(3) OpenStack has the largest active population during the past month, followed by CloudStack, Eucalyptus, and OpenNebula.

We also calculated the ratio of  active population during the past quarter over total population. The result is 32.4% for OpenStack, 21.3% for CloudStack, 10.5% for OpenNebula, and 4.8% for Eucalyptus. Obviously a significant portion of OpenStack and CloudStack users choose to stay, while the majority of Eucalyptus users decide to leave. Such observation is in accordance with what we saw in Figure 7 - OpenStack and CloudStack communities are more “sticky” than OpenNebula and Eucalyptus communities.

11

In our CY12-Q3 report, we invented the concept of “Community Activeness Index”. Starting from CY13-Q3, we make some minor revision to this parameter, and it is now the combination of the following parameters:

(1) quarterly messages, which represents the volume of the discussions;

(2) quarterly participation ratio, which represents the average number of answers to a question; and

(3) active population of the past quarter, which represents the possibility to get help from community in the long term.

In this analysis, we choose the average values of these parameters as the reference data set, and compare the corresponding parameters of each community with the reference data set. Then we call the sum of the relative values of a community the “community activeness index” of the community. Now we can say the project with the highest “community activeness index” is THE most active project in this area.

As can be seen from Figure 9, OpenStack is currently THE most active project (with obvious advantage), followed by CloudStack, OpenNebula, and Eucalyptus.

12

We are seeing increasing number of suggestions to analyze the git activities of these open source IaaS projects. We also noticed that all of these four projects use git as the SVM for their source code. Starting from our CY13-Q1 report, we tried to do some basic analysis base on the git log data. It should be noted that for the OpenStack project, the data source includes all the sub-projects under openstack (57 sub-projects) and openstack-infra (33 sub-projects) on github.com.

In our CY13-Q1 report, we used “git log” to obtain log information. Starting from CY13-Q2, we will use “git log –no-merges” to obtain log information.

It should be pointed out that git is a distributed versioning system. With git, developers work with their own local repositories. When a developer executes a commit operation, the code changes are make to the local repositories, and will not be reflected in the master repository until such commits are pushed to and merged with the master repository. It is common practice that developers tend to accumulate many commits before they feel comfortable to make a push. Therefore, some of the recent commits might not get counted towards this analysis. Based on our observations, there exists about 50% under estimation in the number of commits for the previous month, and about 20% under estimation in the number of commits for the month before.

13

Figure 10 shows the monthly number of commit operations for these four projects. Generally speaking, the commit frequency of OpenStack is much higher than the commit frequencies of the other three projects. This is because the data source for OpenStack includes a total number of 90 sub-projects, which is far greater than the other three projects. The commit frequency of CloudStack is also significantly higher than Eucalyptus and OpenNebula. As compared to OpenNebula, Eucalyptus was also committing more frequently, but with significant fluctuations from month to month, which seems to be a typical batch-commit behavior. The commit frequency of the OpenNebula project is relatively small, with an average of 200 commits per month.

14

Figure 11 shows the monthly number of commit operations for the sub-projects of OpenStack. Generally speaking, the commit frequency of the Nova sub-project is about 3 times as high as the other sub-projects. It should be noted that although the commit frequency of these sub-projects are different, but they exhibit similar time-series curves, and their highs and lows occur at the same period of time. This indicates that although these sub-projects are relatively independent, but they work around the same development plan and the same release schedule. This is an indicator that the OpenStack project is well organized in terms of sub-project management.

15

Figure 12 shows the monthly number of contributors (identified by unique email addresses) for these projects. Generally speaking, the number of OpenStack contributors is much higher than the other three projects, and is growing rapidly. The number of CloudStack contributors also exhibits some growth, but the growth is relatively slow. The number of Eucalyptus and OpenNebula contributors is relatively small, and does not exhibit growth during the past 12 months.

16

Figure 13 shows the monthly number of contributors (identified by unique github.com accounts) for the sub-projects of OpenStack. It can be seen that the number of Nova contributors is about 3 times as big as the other sub-projects.

17

People usually try to identify the institute to which a contributor belongs to by his/her email address. It is true that such method is defect in nature (different institutes have different policies regarding contributing to open source projects, some institutes even encourage their employees to contribute to open source projects with their personal account), but still this parameter can be used to show the contributions of certain institutes to certain open source projects. Figure 14 shows the monthly number of unique institutes (identified by the domain name of the contributor’s email address) contributing to these projects. We can see that the number of contributing institutes for OpenStack is much larger than the other three projects, and is growing rapidly. The number of contributing institutes for CloudStack is also growing, but at a relatively slow pace. The number of contributing institutes to Eucalyptus and OpenStack is relatively small, and does not exhibit any growth during the past 12 months.

18

Figure 15 shows the monthly number of contributing institutes to the sub-projects of OpenStack. It can be seen that the number of contributing institutes for Nova is about 3 times as big as the other sub-projects.

The following table lists those institutes that make the most contributions to these projects during CY13-Q4, according to the number of commit operations, along with the percentage of their commit operations. It can be seen that both Eucalyptus and OpenNebula are open source projects dominated by single institutes, while CloudStack and OpenStack are open source projects contributed by multiple institutes. For the CloudStack projects, influence from Citrix is still quite obvious, over 41% of the commits come from accounts belonging to citrix.com and cloud.com (7% decrease as compared to CY13-Q3). For the OpenStack project, redhat.com contributed to 16% of the commits, while ibm.com contributed 10% of the commits. Rackspace.com, mirantis.com, hp.com and suse.de contributed another 4% of the commits each. It should be pointed out that huawei.com (headquartered in China) contributed 1% of the commits.

 CloudStack  Eucalyptus  OpenNebula  OpenStack
 Domain  %  Domain  %  Domain  %  Domain  %
 citrix.com  39  eucalyptus.com  77  opennebula.org  98  redhat.com  16
 gmail.com  20  gmail.com  21  cuesoft.eu  0.6  gmail.com  16
 apache.org  15  fedoraproject.org  1  ibm.com  10
 clogeny.com  6  openstack.org  8
 shubergphilis.com  5  rackspace.com  4
 leaseweb.com  3  mirantis.com  4
 cloud.com  2  hp.com  4
 netapp.com  1  suse.de  4
 betterservers.com  1  enovance.com  2
 cloudops.com  0.6  huawei.com  1

The following table lists those institutes that make the most contributions to the sub-projects of OpenStack during CY13-Q4, along with the percentage of their commit operations.

 Cinder  Glance  Horizon  Keystone
 Domain  %  Domain  %  Domain  %  Domain  %
 redhat.com  16  ibm.com  20  gmail.com  23  ibm.com  25
 huawei.com  13  gmail.com  15  redhat.com  12  dstanek.com  15
 gmail.com  11  enovance.com  10  hp.com  12  redhat.com  11
 openstack.org  9  mirantis.com  8  sheep.art.pl  10  gmail.com  7
 ibm.com  8  yahoo.com  7  da.jp.nec.com  6  metacloud.com  7
 solidfire.com  7  rackspace.com  7  intel.com  6  dreamhost.com  7
 netapp.com  4  dmllr.de  6  openstack.org  4  openstack.org  4
 hp.com  3  openstack.org  4  mirantis.com  3  huawei.com  3
 dmllr.de  2  mxa.nes.nec.co.jp  4  enovance.com  3  hp.com  2
 ebaysf.com  2  hp.com  3  ibm.com  1  ifca.unican.es  1

 

 Nova  Neutron  Swift
 Domain  %  Domain  %  Domain  %
 ibm.com  23  openstack.org  18  swiftstack.com  24
 redhat.com  23  gmail.com  17  gmail.com  19
 gmail.com  11  mirantis.com  8  redhat.com  19
 rackspace.com  4  nicira.com  7  not.mn  7
 vmware.com  4  redhat.com  7  enovance.com  5
 hp.com  3  ibm.com  4  brim.net  4
 intel.com  3  unitedstack.com  4  kotori.zaitcev.us  3
 openstack.org  2  da.jp.nec.com  3  hgst.com  2
 stillhq.com  2  cisco.com  3  rackspace.com  2
 codestud.com  1  enovance.com  2  weirdlooking.com  1  ————————-

19

 

Accumulated Developer Population refers to the total number of developers who have contributed code to a particular project (as reflected in git commits). Figure 16 shows the growth of the accumulated developer populations of these 4 projects. Currently OpenStack has the largest accumulated developer population, which is about 10 times bigger than the distant number 2 CloudStack.

We also calculated the ratio of  active developers during the past quarter over accumulated developer population. The result is 37.1% for OpenStack, 35.4% for CloudStack, 25.6% for OpenNebula, and 22.7% for Eucalyptus.

20

Accumulated Contributing Organizations refers to the total number of organizations (as reflected in unique domain names associated with developer email addresses) who have contributed code to a particular project (as reflected in git commits). Figure 17 shows the growth of the accumulated contributing organizations of these 4 projects. Currently OpenStack has the largest number of contributing organizations, which is 5 times larger than CloudStack and Eucalyptus. OpenNebula has the smallest number of contributing organizations.

We also calculated the ratio of  active contributing organizations during the past quarter over accumulated contributing organizations. The result is 32.6% for OpenStack, 26.2% for CloudStack, 24.0% for OpenNebula, and 20% for Eucalyptus.

For your convenience, a PDF version of this presentation can be downloaded from here. Please kindly keep the author information if you want to redistribute the content.

 

Panorama Theme by Themocracy