This article is an update version of my previous article CY14-Q1 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack. Readers who are intested in further discussions please contact me via email at the above-mentioned address.
This community analysis project was initiated in CY11-Q4, and this particular report is the 11th quarterly report being published since (I skipped CY14-Q2, CY14-Q3 and CY14-Q4 for personal reasons). Traditionally I also publish a Chinese version of the same report along with the English version. Unfortunately I don’ have the capacity to do a Chinese translation for this particular report. (Sorry for my friends back in China.)
It should be noted that the opinion presented in this report belongs strictly to the author rather than any current or previous employer of the author.
The objective of this quarterly report is to compare the OpenStack, OpenNebula, Eucalytpus and CloudStack user and developer communities, base on the communications between community members in the form of mailing lists or pubic forum discussions. The data being discussed include the total number of topics (threads), messages (posts), and participants (unique email addresses or registered members). To obtain the above-mentioned data, a Java program was written to retrieve all the forum posts and mailing list messages into a MySQL database for further processing. The analysis results were presented in the form of graphs generated by LibreOffice.
During the past several years, some of the early forums and mailing lists became EOL’ed and were no longer accessible. The MySQL database that was built at the beginning of this project, as well as the previous versions of this quarterly report, make it possible to carry out analysis since the beginning of each project.
For projects with multiple membership systems (such as a forum and a mailing list), extensive efforts were carried out to eliminate membership double counting (counting one person twice or more in the statistics).
There have been many significant changes in the open source IaaS community since my CY14-Q1 report. Eucalyptus was acquired by HP, Sheng Liang left Citrix and started RancherOS, CloudScaling was acquired by EMC, MetaCloud was acquired by Cisco, eNovance and Inktank were acquired by RedHat. And, sadly, Nebula shutted down just a few days ago. These events have changed, and will continue to change, the horizon in the open source IaaS community.
It should be noted that in January 2015 the OpenNebula community moved to https://forum.opennebula.org, and the original mailing lists became inactive. This new data source has not been added to this analysis. As a result, the data presented here does not represent the actual status of the OpenNebula community. I will add the new data source to my next report.
Figure 1 and 2 represent the monthly number of topics (threads) and posts (messages). It can be seen that during the past 12 months, OpenStack-related discussions continued to exhibit strong (close to linear) growth. CloudStack-related discussions were declining at a rapid rate. The volume of discussions around OpenNebula and Eucalyptus were still very small, both exhibited tendencies to decline.
Generally speaking, the number of replies to a specific topic represents the attention being received, and the depth of discussion for that particular topic. When the number of master posts (the original post that started a particular topic) is more than the number of replies, it is safe to conclude that the participation of the forum or mailing list is very low. Therefore, the ratio between “the number of posts” and “the number of topics” represents the participation rate of an online community. In this study we call this ratio the Participation Ratio.
As can be seen from Figure 3, during the past 12 months the participation ratios of CloudStack and Eucalyptus were relatively higher, which were close to 4; the participation ratios of OpenStack and OpenNebula were relatively lower, which were a little bit higher than to 3. (OpenNebula exhibited significant decline on this aspect during the past 12 months. The reason is that the OpenNebula community is moving to a new forum https://forum.opennebula.org/, which is not yet included in this report. Thank you Tim Bell for pointing this out.)
Figure 4 shows the active participants of the four projects being discussed. It can be seen that the number of active participants of OpenStack is much higher than the other three projects. The number of active participants of CloudStack is also significantly higher than OpenNebula and Eucalyptus. By looking at the break down figures, the number of active participants for OpenStack was growing steadily, while the number of active participants for CloudStack, Eucalyptus and OpenNebula exhibited significant decrease.
To understand the development activities with these four open source IaaS project, we carry out git log analysis extract information about contributing developers and organizations, as well as the frequency of the commit activities. We take advantage of the fact that all of these four projects use git as the SVM for their source code. Therefore, we will use “git log –no-merges” to obtain log information from the git repositories. The extracted log information were dumped into a MySQL database for further analysis. It should be noted that for the OpenStack project, the data source includes all the sub-projects under openstack (137 sub-projects) and openstack-infra (114 sub-projects) on github.com.
It should be pointed out that git is a distributed versioning system. With git, developers work with their own local repositories. When a developer executes a commit operation, the code changes are make to the local repositories, and will not be reflected in the master repository until such commits are pushed to and merged with the master repository. It is common practice that developers tend to accumulate many commits before they feel comfortable to make a push. Therefore, some of the recent commits might not get counted towards this analysis. Based on our observations, there exists about 50% under estimation in the number of commits for the previous month, and about 20% under estimation in the number of commits for the month before.
Figure 10 shows the monthly number of commit operations for these four projects. Generally speaking, the commit frequency of OpenStack is much higher than the commit frequencies of the other three projects. This is because the data source for OpenStack includes a total number of 251 sub-projects, which is far greater than the other three projects. The commit frequency of CloudStack is slightly higher than Eucalyptus and OpenNebula.
Figure 11 shows the monthly number of commit operations for the 7 major sub-projects of OpenStack (Cinder, Glance, Horizon, Keystone, Nova, Neutron, Swift). Generally speaking, the commit frequency of the Nova sub-project is about 2 to 3 times as high as the other sub-projects. It should be noted that although the commit frequency of these sub-projects are different, but they exhibit similar time-series curves, and their highs and lows occur at the same period of time. This indicates that although these sub-projects are relatively independent, but they work around the same development plan and the same release schedule. This is an indicator that the OpenStack project is well organized in terms of sub-project management.
Figure 12 shows the monthly number of contributors (identified by unique email addresses) for these projects. Generally speaking, the number of OpenStack contributors is much higher than the other three projects. By looking at the break down figures, the number of active contributors for OpenStack was growing steadily, while the number of active contributors for CloudStack, Eucalyptus exhibited significant decrease. For OpenNebula, the number of active contributors seemed to be quite stable, but the size of the whole developer community was relatively small.
Figure 13 shows the monthly number of contributors (identified by unique github.com accounts) for the 7 major sub-projects of OpenStack (Cinder, Glance, Horizon, Keystone, Nova, Neutron, Swift). During the past 12 months, the number of active contributors for Nova were decreasing, while the number of active contributors for Neutron, Horizon, and Cinder was increasing. There was not much change observed in Glance, Keystone, and Swift.
People usually try to identify the institute to which a contributor belongs to by his/her email address. It is true that such method is defect in nature (different institutes have different policies regarding contributing to open source projects, some institutes even encourage their employees to contribute to open source projects with their personal account), but still this parameter can be used to show the contributions of certain institutes to certain open source projects. Figure 14 shows the monthly number of unique institutes (identified by the domain name of the contributor’s email address) contributing to these projects. We can see that the number of contributing institutes for OpenStack is much larger than the other three projects, and is growing rapidly. During the same period, the number of contributing institutes for CloudStack, Eucalytus, and OpenNebula did not exhibit any growth. For CloudStack, the number of active contributing organizations seemed to be decreasing.
Figure 15 shows the monthly number of contributing institutes to the 7 major sub-projects of OpenStack (Cinder, Glance, Horizon, Keystone, Nova, Neutron, Swift). During the past 12 months, the number of active contributing organizations for Nova were decreasing, while the number of active contributing organizations for Neutron, Horizon, and Cinder was increasing. There was not much change observed in Glance, Keystone, and Swift.
The following table lists the organizations that make the most contributions to these projects during CY15-Q1, according to the number of commit operations, along with the percentage of their commit operations. It can be seen that both Eucalyptus and OpenNebula are open source projects dominated by single institutes, while CloudStack and OpenStack are open source projects contributed by multiple institutes. For the CloudStack projects, the influence from Citrix has gone away. In CY15-Q1, only 5.8% of the contributions came from citrix.com, as compared with the 44% in CY14-Q1 (combined contribution from citrix.com and cloud.com). For the OpenStack project, redhat.com contributed to 7.3% of the commits, while ibm.com contributed 5.0% of the commits, followed by mirantis.com (4.7%), hp.com (4.6%), rackspace.com (1.6%), Intel.com (1.4%), Yahoo-inc.com (1.2%), Doughellmann.com (1.1%), and Cisco.com (0.8%).
The following table lists the organizations that make the most contributions to the major sub-projects in OpenStack during CY15-Q1, according to the number of commit operations, along with the percentage of their commit operations.
Accumulated Developer Population refers to the total number of developers who have contributed code to a particular project (as reflected in git commits). Figure 16 shows the growth of the accumulated developer populations of these 4 projects. Currently OpenStack has the largest accumulated developer population, which is about 10 times bigger than the distant number 2 CloudStack.
Accumulated Contributing Organizations refers to the total number of organizations (as reflected in unique domain names associated with developer email addresses) who have contributed code to a particular project (as reflected in git commits). Figure 17 shows the growth of the accumulated contributing organizations of these 4 projects. Currently OpenStack has the largest number of contributing organizations, which is 5 times larger than CloudStack. Eucalyptus and OpenNebula have only very smalll number of contributing organizations.
For your convenience, a PDF version of this presentation can be downloaded from here. Please kindly keep the author information if you want to redistribute the content.
The Java program being used to dump git logs into MySQL database is now available on github:
Safe Harbor Statement
Qingye Jiang (John) is Senior Member of IEEE. He is currently a full-time graduate student (Master of Philosophy) in the School of Information Technologies at the University of Sydney. His research interests include parallel and distributed computing, high performance computing, open source community, as well as the impact of technology advancements on human society. This report is part of his on-going research on the growth of open source communities (started in 2011).
Qingye Jiang (John) is at the same time a full-time employee of Amazon Web Services (AWS). However, this report is not part of his duties with AWS. The opinions presented in this report strictly belong to the author himself, and do not reflect the opinions of his employer.
If you want to quote this report, please refer to the author as “Qingye Jiang (John) from the University of Sydney”.
The author would like to thank the following persons
- Young Choon Lee (Lecturer, Macquarie University), Joseph Davis (Professor, University of Sydney), and Albert Y. Zomaya (Professor, University of Sydney), for their guidance and insightful discussions.
- Randy Bias (VP of Technology, EMC), for reminding me to come up with an updated version of this community analysis.