This article is an update version of my previous article CY13-Q3 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack. Readers who are intested in further discussions please contact me via email at the above-mentioned address.
A Chinese version of this article is published at the same time, which can be found at CY13-Q4 OpenStack, OpenNebula, Eucalyptus, CloudStack社区活跃度比较.
This community analysis project was initiated in CY11-Q4, and this particular report is the 9th quarterly report being published since. The author served shortly as the Director of Customer Success in China for Eucalyptus Systems Inc during October 2012 and July 2013. However, the opinion presented in this report belongs strictly to the author rather than any current or previous employer of the author.
The objective of this quarterly report is to compare the OpenStack, OpenNebula, Eucalytpus and CloudStack user and developer communities, base on the communications between community members in the form of mailing lists or pubic forum discussions. The data being discussed include the total number of topics (threads), messages (posts), and participants (unique email addresses or registered members). To obtain the above-mentioned data, a Java program was written to retrieve all the forum posts and mailing list messages into a MySQL database for further processing. The analysis results were presented in the form of graphs generated by LibreOffice.
During the past several years, some of the early forums and mailing lists became EOL’ed and were no longer accessible. The MySQL database that was built at the beginning of this project, as well as the previous versions of this quarterly report, make it possible to carry out analysis since the beginning of each project.
For projects with multiple membership systems (such as a forum and a mailing list), extensive efforts were carried out to eliminate membership double counting (counting one person twice or more in the statistics).
Figure 1 and 2 represent the monthly number of topics (threads) and posts (messages). It can be seen that
(1) During the past 12 months, OpenStack-related discussions and CloudStack-related discussions were approximately on the same level, while Eucalyptus-related discussions and OpenNebula-related discussions were approximately on the same level.
(2) During the past 12 months, the volume of OpenStack and CloudStack related discussions were much higher than that of Eucalyptus and OpenNebula.
Generally speaking, the number of replies to a specific topic represents the attention being received, and the depth of discussion for that particular topic. When the number of master posts (the original post that started a particular topic) is more than the number of replies, it is safe to conclude that the participation of the forum or mailing list is very low. Therefore, the ratio between “the number of posts” and “the number of topics” represents the participation rate of an online community. In this study we call this ratio the Participation Ratio.
As can be seen from Figure 3, during the past 12 months the participation ratios of CloudStack and Eucalyptus were relatively higher, which were close to 4; the participation ratios of OpenStack and OpenNebula were relatively lower, which were a little bit higher than to 2.
We do notice that the concept of “participation ratio” generated some disagreements. Some people think that a lower “post-to-thread ratio” represents the ability to resolve problems in a very short time, therefore only a very limited number of discussions are needed. Some people think that a higher “post-to-thread ratio” might be an indicator that the community are on flame, and during a flame a large portion of the posts might be off-topic. Anyway, we agree that when we call this parameter “participation ratio” it somewhat represents our own opinion and it undermines the objectiveness of this report. However, because we do not find a better name to represent this parameter, we will just use it for the time being. (Dear readers, you are more than welcome to contribute a better name for this parameter.)
Figure 4 shows the number of monthly participants of the four projects being discussed. It can be seen that the number of active participants of OpenStack is much higher than the other three projects. The number of active participants of CloudStack is also significantly higher than OpenNebula and Eucalyptus. During the past 12 months, the number of active participants for OpenStack and CloudStack were growing steadily (100% growth for OpenStack, and 50% growth for CloudStack), while the number of active participants for Eucalyptus and OpenNebula exhibited no significant growth.
It should be noted that although the number of active participants of the CloudStack project is somewhat smaller than that of the OpenStack project, both projects have approximately the same amount of discussions (as shown in Figure 1 and Figure 2).
Accumulated Community Population refers to the total number of users and developers who have participated in forum or mailing list discussions. (This number does not include those who have registered into discussion forums or mailing lists but have never participated in any open discussions.) These are people who have tested or used a specific product for a while, but not necessary currently an active user.
Figure 5 shows the growth of the accumulated community populations of these 4 projects. Currently OpenStack has the larges accumulated community population, followed by Eucalyptus, CloudStack, and OpenNebule.
The problem is, after years of changes, a long-term (such as 4 to 5 years) accumulated community population might not be a good reference for community activeness. Some of the early members of one community might have switched to other communities (and probably more than once), some of the early community medium (such as mailing lists and forums) might have become EOL’ed. From a community analysis point of view, it might be better to count the accumulated community population of the past 6 to 12 months, while extending the range to the dinosaurs age will make this parameter meaningless.
Figure 6 shows the monthly population growth of the four projects being discussed. During the past 6 months, the populations of OpenStack is growing much faster than the other three projects. CloudStack is also exhibiting significant growth, but not as fast as OpenStack.
The populations of Eucalyptus and OpenNebula are growing at relatively slow paces, as compared to CloudStack and OpenStack.
Figure 7 is a combination of Figure 4 and Figure 6. The solid lines represent the monthly participants, while the dash lines represent the monthly new members.
During the past 12 months, for OpenStack and CloudStack, around 30% of their monthly participants are new members. For OpenNebula and Eucalyptus, around 50% of their monthly participants are new members. This indicates OpenStack and CloudStack communities are becoming more “sticky” than OpenNebula and Eucalyptus communities.
Figure 8 shows the total community population, active participants of the past quarter, and active participants of the past month, of the four projects being discussed. It can be seen that
(1) OpenStack has the largest total population, followed by Eucalyptus, CloudStack, and OpenNebula;
(2) OpenStack has the largest active population during the past quarter, followed by CloudStack, Eucalyptus, and OpenNebula;
(3) OpenStack has the largest active population during the past month, followed by CloudStack, Eucalyptus, and OpenNebula.
We also calculated the ratio of active population during the past quarter over total population. The result is 32.4% for OpenStack, 21.3% for CloudStack, 10.5% for OpenNebula, and 4.8% for Eucalyptus. Obviously a significant portion of OpenStack and CloudStack users choose to stay, while the majority of Eucalyptus users decide to leave. Such observation is in accordance with what we saw in Figure 7 - OpenStack and CloudStack communities are more “sticky” than OpenNebula and Eucalyptus communities.
In our CY12-Q3 report, we invented the concept of “Community Activeness Index”. Starting from CY13-Q3, we make some minor revision to this parameter, and it is now the combination of the following parameters:
(1) quarterly messages, which represents the volume of the discussions;
(2) quarterly participation ratio, which represents the average number of answers to a question; and
(3) active population of the past quarter, which represents the possibility to get help from community in the long term.
In this analysis, we choose the average values of these parameters as the reference data set, and compare the corresponding parameters of each community with the reference data set. Then we call the sum of the relative values of a community the “community activeness index” of the community. Now we can say the project with the highest “community activeness index” is THE most active project in this area.
As can be seen from Figure 9, OpenStack is currently THE most active project (with obvious advantage), followed by CloudStack, OpenNebula, and Eucalyptus.
We are seeing increasing number of suggestions to analyze the git activities of these open source IaaS projects. We also noticed that all of these four projects use git as the SVM for their source code. Starting from our CY13-Q1 report, we tried to do some basic analysis base on the git log data. It should be noted that for the OpenStack project, the data source includes all the sub-projects under openstack (57 sub-projects) and openstack-infra (33 sub-projects) on github.com.
In our CY13-Q1 report, we used “git log” to obtain log information. Starting from CY13-Q2, we will use “git log –no-merges” to obtain log information.
It should be pointed out that git is a distributed versioning system. With git, developers work with their own local repositories. When a developer executes a commit operation, the code changes are make to the local repositories, and will not be reflected in the master repository until such commits are pushed to and merged with the master repository. It is common practice that developers tend to accumulate many commits before they feel comfortable to make a push. Therefore, some of the recent commits might not get counted towards this analysis. Based on our observations, there exists about 50% under estimation in the number of commits for the previous month, and about 20% under estimation in the number of commits for the month before.
Figure 10 shows the monthly number of commit operations for these four projects. Generally speaking, the commit frequency of OpenStack is much higher than the commit frequencies of the other three projects. This is because the data source for OpenStack includes a total number of 90 sub-projects, which is far greater than the other three projects. The commit frequency of CloudStack is also significantly higher than Eucalyptus and OpenNebula. As compared to OpenNebula, Eucalyptus was also committing more frequently, but with significant fluctuations from month to month, which seems to be a typical batch-commit behavior. The commit frequency of the OpenNebula project is relatively small, with an average of 200 commits per month.
Figure 11 shows the monthly number of commit operations for the sub-projects of OpenStack. Generally speaking, the commit frequency of the Nova sub-project is about 3 times as high as the other sub-projects. It should be noted that although the commit frequency of these sub-projects are different, but they exhibit similar time-series curves, and their highs and lows occur at the same period of time. This indicates that although these sub-projects are relatively independent, but they work around the same development plan and the same release schedule. This is an indicator that the OpenStack project is well organized in terms of sub-project management.
Figure 12 shows the monthly number of contributors (identified by unique email addresses) for these projects. Generally speaking, the number of OpenStack contributors is much higher than the other three projects, and is growing rapidly. The number of CloudStack contributors also exhibits some growth, but the growth is relatively slow. The number of Eucalyptus and OpenNebula contributors is relatively small, and does not exhibit growth during the past 12 months.
Figure 13 shows the monthly number of contributors (identified by unique github.com accounts) for the sub-projects of OpenStack. It can be seen that the number of Nova contributors is about 3 times as big as the other sub-projects.
People usually try to identify the institute to which a contributor belongs to by his/her email address. It is true that such method is defect in nature (different institutes have different policies regarding contributing to open source projects, some institutes even encourage their employees to contribute to open source projects with their personal account), but still this parameter can be used to show the contributions of certain institutes to certain open source projects. Figure 14 shows the monthly number of unique institutes (identified by the domain name of the contributor’s email address) contributing to these projects. We can see that the number of contributing institutes for OpenStack is much larger than the other three projects, and is growing rapidly. The number of contributing institutes for CloudStack is also growing, but at a relatively slow pace. The number of contributing institutes to Eucalyptus and OpenStack is relatively small, and does not exhibit any growth during the past 12 months.
Figure 15 shows the monthly number of contributing institutes to the sub-projects of OpenStack. It can be seen that the number of contributing institutes for Nova is about 3 times as big as the other sub-projects.
The following table lists those institutes that make the most contributions to these projects during CY13-Q4, according to the number of commit operations, along with the percentage of their commit operations. It can be seen that both Eucalyptus and OpenNebula are open source projects dominated by single institutes, while CloudStack and OpenStack are open source projects contributed by multiple institutes. For the CloudStack projects, influence from Citrix is still quite obvious, over 41% of the commits come from accounts belonging to citrix.com and cloud.com (7% decrease as compared to CY13-Q3). For the OpenStack project, redhat.com contributed to 16% of the commits, while ibm.com contributed 10% of the commits. Rackspace.com, mirantis.com, hp.com and suse.de contributed another 4% of the commits each. It should be pointed out that huawei.com (headquartered in China) contributed 1% of the commits.
The following table lists those institutes that make the most contributions to the sub-projects of OpenStack during CY13-Q4, along with the percentage of their commit operations.
Accumulated Developer Population refers to the total number of developers who have contributed code to a particular project (as reflected in git commits). Figure 16 shows the growth of the accumulated developer populations of these 4 projects. Currently OpenStack has the largest accumulated developer population, which is about 10 times bigger than the distant number 2 CloudStack.
We also calculated the ratio of active developers during the past quarter over accumulated developer population. The result is 37.1% for OpenStack, 35.4% for CloudStack, 25.6% for OpenNebula, and 22.7% for Eucalyptus.
Accumulated Contributing Organizations refers to the total number of organizations (as reflected in unique domain names associated with developer email addresses) who have contributed code to a particular project (as reflected in git commits). Figure 17 shows the growth of the accumulated contributing organizations of these 4 projects. Currently OpenStack has the largest number of contributing organizations, which is 5 times larger than CloudStack and Eucalyptus. OpenNebula has the smallest number of contributing organizations.
We also calculated the ratio of active contributing organizations during the past quarter over accumulated contributing organizations. The result is 32.6% for OpenStack, 26.2% for CloudStack, 24.0% for OpenNebula, and 20% for Eucalyptus.
For your convenience, a PDF version of this presentation can be downloaded from here. Please kindly keep the author information if you want to redistribute the content.