This article is an update version of my previous article CY12-Q4 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack. Readers who are intested in further discussions please contact me via email at the above-mentioned address.
A Chinese version of this article is published at the same time, which can be found at CY13-Q1 OpenStack, OpenNebula, Eucalyptus, CloudStack社区活跃度比较.
It should be noted that this community analysis project was initiated in CY11-Q4, and this particular report is the 6th quarterly report being published since. Although the author became an employee of Eucalyptus Systems Inc in October 2012, the opinion presented in this report belongs strictly to the author rather than the employer of the author. It should also be noted that the employer of the author completely agreed that the author could continue this project with an independent perspective.
The objective of this article is to compare the OpenStack, OpenNebula, Eucalytpus and CloudStack user and developer communities, base on the communications between community members in the form of mailing lists or pubic forum discussions. The data being discussed include the total number of topics (threads), messages (posts), and participants (unique email addresses or registered members). To obtain the above-mentioned data, a Java program was written to retrieve all the forum posts and mailing list messages into a MySQL database for further processing. The analysis results were presented in the form of graphs generated by LibreOffice.
It should be noted that the Eucalyptus project moved from the original community mailing lists to a Google group based mailing list in mid February. Such changes will usually bring some impact on the traffic to the mailing list. Also, starting from CY13-Q1 we are adding the users-cn mailing list for the CloudStack project into this analysis. This mailing list currently represents 3~5% of the CloudStack mailing list traffic. We will add other data source into this analysis in the future when such data source is big enough to produce an impact.
Also, when the CY12-Q2 report was published, some people questioned the inclusion of the incubator-cloudstack-dev mailing list. This particular mailing list contains a lot of messages that are automatically generated by JIRA. In CY12-Q3, we set up a filter to reject all messages with identifier “[jira]” in the subject. It should be noted that there are increasing level of technical discussions happened in the JIRA activities. However, at this point we still decided to filter out messages from JIRA.
Figure 1 and 2 represent the monthly number of topics (threads) and posts (messages). It can be seen that
(1) the volume of OpenStack and CloudStack related discussions is much higher than that of Eucalyptus and OpenNebula; and
(2) during the past 6 months, Eucalyptus related discussions are growing. Although the volume of Eucalyptus related discussions is still smaller than OpenStack and CloudStack, it has exceeded the peak created by itself three years ago.
Generally speaking, the number of replies to a specific topic represents the attention being received, and the depth of discussion for that particular topic. When the number of master posts (the original post that started a particular topic) is more than the number of replies, it is safe to conclude that the participation of the forum or mailing list is very low. Therefore, the ratio between “the number of posts” and “the number of topics” represents the participation rate of an online community. In this study we call this ratio the Participatin Ratio.
In the past the OpenStack project had a much higher participation ratio than the others. However, during the past 6 months, the participation ratio of CloudStack and Eucalyptus are growing steadily, while the participation ratio of OpenStack is decreasing gradually. Currently CloudStack and Eucalyptus have the highest participation ratios, which are close to 4. OpenStack and OpenNebula have relatively low participation ratios, which are close to 3.
Figure 4 shows the number of monthly participants of the four projects being discussed. It can be seen that the active participants of CloudStack and OpenStack are much higher than OpenNebula and Eucalyptus. During the past 6 months, the number of active participants for CloudStack, Eucalyptus and OpenStack are growing at various degrees.
It should be noted that although the number of active participants of CloudStack is somewhat less than OpenStack, but the volume of discussion (in terms of monthly number of threads and messages) of the two projects are on the same level. This indicates that the active members in the CloudStack club are talking more than those in the OpenStack club (on average).
Accumulated Community Population refers to the total number of users and developers who have participated in forum or mailing list discussions. (This number does not include those who have registered into discussion forums or mailing lists but have never participated in any open discussions.) These are people who have tested or used a specific product for a while, but not necessary currently an active user.
In our CY12-Q3 report, we raised the problem of possible duplicated population counting due to the membership overlap in forums and mailing lists, and carried some measurements to eliminate duplications. Starting from our CY12-Q4 report, further measurements were taken to de-duplication. It is safe to say that some degree of duplication still exists, but does not have a significant impact on the analysis results.
Figure 6 shows the monthly population growth of the four projects being discussed. During the past 3 months, the populations of OpenStack and CloudStack are growing at about the same pace.
The populations of Eucalyptus and OpenNebula are growing at relatively slow paces, as compared to that of CloudStack and OpenStack.
Figure 7 is a combination of Figure 4 and Figure 6. The solid lines represent the monthly participants, while the dash lines represent the monthly new members.
For OpenStack and OpenNebula, around 30% of their monthly participants are new members. For CloudStack and Eucalyptus, around 50% of their monthly participants are new members. This indicates OpenStack and OpenNebula communities are more “sticky” than CloudStack and Eucalyptus communities.
Figure 8 shows the total community population, active participants of the past quarter, and active participants of the past month, of the four projects being discussed. It can be seen that
(1) OpenStack has the largest total population, followed by Eucalyptus, CloudStack, and OpenNebula;
(2) OpenStack has the largest active population during the past quarter, followed by CloudStack, Eucalyptus, and OpenNebula;
(3) OpenStack has the largest active population during the past month, followed by CloudStack, Eucalyptus, and OpenNebula.
In our CY12-Q3 report, we invented the concept of “Community Activeness Index”. This magic number should be the combination of the following parameters:
(1) monthly messages, which represents the volume of the discussions;
(2) participation ratio, which represents the average number of answers to a question;
(3) active population of the past quarter, which represents the possibility to get help from community in the long term; and
(4) active population of the past month, which represents the possibility to get help from the community in the short term.
In this analysis, we choose the average values of these parameters as the reference data set, and compare the corresponding parameters of each community with the reference data set. Then we call the sum of the relative values of a community the “community activeness index” of the community. Now we can say the project with the highest “community activeness index” is THE most active project in this area.
As can be seen from Figure 9, OpenStack is currently THE most active project (with obvious advantage), followed by CloudStack, Eucalyptus, and OpenNebula.
The above-mentioned concept of “community activeness index” is still very primitive, with a lot of space to optimize. However, it is an attempt to replace the old-fashion “I think”, “I believe” and “I guess” practices with quantative analysis. In our future community analysis, we will continue to use this concept to provide a quarterly ranking for OpenStack, OpenNebula, Eucalyptus, and CloudStack. Improvements to the algorithm (such as adding/removing parameters or changing the weight of different parameters) will be make when necessary.
We are seeing increasing number of suggestions to analyze the git activities of these open source IaaS projects. We also noticed that all of these four projects host their source code on github.com. So, in this CY13-Q1 report, we try to do some basic analysis base on the data from github.com. It should be noted that for the OpenStack project, the data source includes the Cinder, Glance, Horizon, Keystone, Nova, Quantum and Swift sub-projects hosted on github.com.
Figure 10 shows the monthly number of commit operations for these four projects. Generally speaking, the commit frequency of the OpenStack project is much higher than the others, with an average number of 1000 commits per month (and a peak value of 2000 commits in mid 2011). The commit frequency of the OpenNebula project is relatively small, with an average of 200 commits per month. The commit frequencies of CloudStack and Eucalyptus are similar, but there is significant fluctuation for the Eucalyptus project, which seems to be a typical batch-commit behavior.
Figure 11 shows the monthly number of commit operations for the sub-projects of OpenStack. Generally speaking, the commit frequency of the Nova sub-project is about 3 times as high as the other sub-projects. It should be noted that although the commit frequency of these sub-projects are different, but they exhibit similar time-series curves, and their highs and lows occur at the same period of time. This indicates that although these sub-projects are relatively independent, but they work around the same development plan and the same release schedule. This is an indicator that the OpenStack project is well organized in terms of sub-project management.
Figure 12 shows the monthly number of contributors (identified by unique github.com accounts) for these projects. Generally speaking, the number of OpenStack contributors is much higher than the other three projects, and is growing rapidly. The number of CloudStack contributors also exhibits some growth, but the growth is relatively slow. The number of Eucalyptus and OpenNebula contributors is relatively small, and does not exhibit any growth during the past 6 months.
Figure 13 shows the monthly number of contributors (identified by unique github.com accounts) for the sub-projects of OpenStack. It can be seen that the number of Nova contributors is about 3 times as big as the other sub-projects.
People usually try to identify the institute to which a contributor belongs to by his/her email address. It is true that such method is defect in nature (different institutes have different policies regarding contributing to open source projects, some institutes even encourage their employees to contribute to open source projects with their personal account), but still this parameter can be used to show the contributions of certain institutes to certain open source projects. Figure 14 shows the monthly number of unique institutes (identified by the domain name of the contributor’s email address) contributing to these projects. We can see that the number of contributing institutes for OpenStack is much larger than the other three projects, and is growing rapidly. The number of contributing institutes for CloudStack is also growing, but at a relatively slow pace. The number of contributing institutes to Eucalyptus and OpenStack is relatively small, and does not exhibit any growth during the past 6 months.
Figure 15 shows the monthly number of contributing institutes to the sub-projects of OpenStack. It can be seen that the number of contributing institutes for Nova is about 3 times as big as the other sub-projects.
The following table lists those institutes that make the most contributions to these projects during CY13-Q1, according to the number of commit operations on github.com, along with the percentage of their commit operations. It can be seen that both Eucalyptus and OpenNebula are open source projects dominated by a single institute, while CloudStack and OpenStack are open source projects contributed by multiple institutes. For the CloudStack projects, influence from Citrix is quite obvious, over 45% of github.com commits come from accounts belonging to citrix.com and cloud.com. For the OpenStack project, it is not that easy to determine the influence of Rackspace, because the majority of the commits come from a code review system (review.openstack.org) with a single github.com account (firstname.lastname@example.org). However, we noticed that during CY13-Q1 redhat.com contributed 9% of the commits, while ibm.com (linux.vnet.ibm.com and us.ibm.com combined) contributed 7% of the commits. It is safe to say that the influence of Rackspace on the OpenStack project is gradually decreasing.
The following table lists those institutes that make the most contributions to the sub-projects of OpenStack during CY13-Q1, along with the percentage of their commit operations.
For your convienience, a PDF version of this presentation can be downloaded from here. Please kindly keep the author information if you want to redistribute the content.