This article is an update version of my previous article CY12-Q2 Community Analysis — OpenStack vs OpenNebula vs Eucalyptus vs CloudStack. Readers who are intested in further discussions please contact me via email at the above-mentioned address.
A Chinese version of this article is published at the same time, which can be found at CY12-Q3 OpenStack, OpenNebula, Eucalyptus, CloudStack社区活跃度比较.
The objective of this article is to compare the OpenStack, OpenNebula, Eucalytpus and CloudStack user and developer communities, base on the communications between community members in the form of mailing lists or pubic forum discussions. The data being discussed include the total number of topics (threads), messages (posts), and participants (unique email addresses or registered members). To obtain the above-mentioned data, a Java program was written to retrieve all the forum posts and mailing list messages into a MySQL database for further processing. The analysis results were presented in the form of graphs generated by LibreOffice.
In CY12-Q3, we are adding the longly-neglected https://answers.launchpad.net/openstack and http://lists.openstack.org/pipermail/*/ into the analysis. It turns out that these two source contains a huge amount of data that is has a significant impact on the analysis result.
Also, when the CY12-Q2 report was published, some people questioned the inclusion of the incubator-cloudstack-dev mailing list. This particular mailing list contains a lot of messages that are automatically generated by JIRA. In CY12-Q3, we set up a filter to reject all messages with identifier “[jira]” in the subject.
Figure 1 and 2 represent the monthly number of topics (threads) and posts (messages). It can be seen that
(1) the volume of OpenStack and CloudStack related discussions is much higher than that of Eucalyptus and OpenNebula; and
(2) the Eucalyptus and OpenNebula clubs are exhibiting similar behaviors, with only minor differences.
Generally speaking, the number of replies to a specific topic represents the attention being received, and the depth of discussion for that particular topic. When the number of master posts (the original post that started a particular topic) is more than the number of replies, it is safe to conclude that the participation of the forum or mailing list is very low. Therefore, the ratio between “the number of posts” and “the number of topics” represents the participation rate of an online community. In this study we call this ratio the Participatin Ratio.
In the past the OpenStack project had a much higher participation ratio than the others. However, the participation ratio of CloudStack is climing steadily. Currently CloudStack and OpenStack have the best participation ratio, which is close to 4. OpenNebula and Eucalytpus have similar participation ratios, which is close to 3.
Figure 4 shows the number of monthly participants of the four projects being discussed. It can be seen that the active participants of CloudStack and OpenStack are much higher than OpenNebula and Eucalyptus. However, during the past 3 months, the number of participants for both CloudStack and OpenStack have decreased slightly.
It should be noted that although the number of active participants of CloudStack is somewhat less than OpenStack, but the volume of discussion (in terms of monthly number of threads and messages) of the two projects are on the same level. This indicates that the active members in the CloudStack club are talking more than those in the OpenStack club (on average).
Accumulated Community Population refers to the total number of users and developers who have participated in forum or mailing list discussions. (This number does not include those who have registered into discussion forums or mailing lists but have never participated in any open discussions.) These are people who have tested or used a specific product for a while, but not necessary currently an active user.
Figure 5 shows the accumulated community populations of the four projects being discussed. The Eucalyptus project still has the biggest population, but OpenStack is quickly catching up. It is expected that the OpenStack population will exceed that of Eucalyptus in CY12-Q4. In our CY12-Q2 report we predicted that the CloudStack population will exceed the OpenNebula population in a very short period. It only took CloudStack a month to accomplish that!
If you compare the CY12-Q3 report with the CY12-Q2 report, you will find that the population curve for OpenStack has changed a lot. This is due to the inclusion of the https://answers.launchpad.net/openstack and http://lists.openstack.org/pipermail/*/ data source. It should be noted that launchpad answers and the mailing list share the same registeration database, but are displaying different names for the same person. Therefore, it is very possible that a large amount of users were counted twice for the OpenStack population. We have carried out some basic de-duplication efforts to eliminate some obvious duplications, but there are still a lot of space to optimize. A rough estimation is that the real OpenStack population would be about 85% of the numbers being shown in this analysis.
There might exist certain level of duplication for the community population of CloudStack and Eucalyptus. We did look into the data and found some duplications. However, the level of duplication seems to be very small for both projects that it does not produce much impact on the analysis results.
Figure 6 shows the monthly population growth of the four projects being discussed. During the past 3 months, the populations of OpenStack and CloudStack are growing at the same pace.
The populations of Eucalyptus and OpenNebula are growing at relatively slow paces, as compared to that of CloudStack and OpenStack.
Figure 7 is a combination of Figure 4 and Figure 6. The solid lines represent the monthly participants, while the dash lines represent the monthly new members.
For OpenStack and OpenNebula, around 30% of their monthly participants are new members. For CloudStack and Eucalyptus, around 50% of their monthly participants are new members. This indicates OpenStack and OpenNebula communities are more “sticky” than CloudStack and Eucalyptus communities.
For each of the projects being discussed, the monthly population growth is somwhat “synchronous” with its monthly participants. That’s to say, the populatoin growth of a community is somewhat related to the “activeness” of the community. This also suggests that both the population growth and the “activeness” of a community might be event-driven. A new software release, a technical conference, or a marketing event, might be the cause of the growth in population and “activeness” of the respective community.
Figure 8 shows the total community population, active participants of the past quarter, and active participants of the past month, of the four projects being discussed. It can be seen that
(1) Eucalyptus has the largest total population, followed by OpenStack, CloudStack, and OpenNebula;
(2) OpenStack has the largest active population during the past quarter, followed by CloudStack, Eucalyptus, and OpenNebula;
(3) OpenStack has the largest active population during the past month, followed by CloudStack, Eucalyptus, and OpenNebula.
Occasionally I come across people saying “Hay, you are talking too much! What don’t you tell me which one is THE most active project in this area?” I agree that this is an important question, and I guess there are many more who do not ask simply because that they know that I don’t know the answer.
For quite some time I have been looking for a magic number to indicate the “relative activeness” of a comunity as compared to other alternatives. This magic number should be the combination of the following parameters:
(1) monthly messages, which represents the volume of the discussions;
(2) participation ratio, which represents the average number of answers to a question;
(3) active population of the past quarter, which represents the possibility to get help from community in the long term; and
(4) active population of the past month, which represents the possibility to get help from the community in the short term.
In this analysis, we choose the average values of these parameters as the reference data set, and compare the corresponding parameters of each community with the reference data set. Then we call the sum of the relative values of a community the “community activeness index” of the community. Now we can say the project with the highest “community activeness index” is THE most active project in this area.
As can be seen from Figure 9, OpenStack is currently THE most active project (with obvious advantage), followed by CloudStack, Eucalyptus, and OpenNebula.
The above-mentioned concept of “community activeness index” is still very primitive, with a lot of space to optimize. However, it is an attempt to replace the old-fashion “I think”, “I believe” and “I guess” practices with quantative analysis. In our future community analysis, we will continue to use this concept to provide a quarterly ranking for OpenStack, OpenNebula, Eucalyptus, and CloudStack. Improvements to the algorithm (such as adding/removing parameters or changing the weight of different parameters) will be make when necessary.
For many cloud computing professionals, the dramatic growth achieved by the CloudStack project during the past 6 months was quite unexpected. Therefore we conducted an email interview with Sheng Liang, the CTO of Cloud Platforms at Citrix. Below is Sheng’s explaination for CloudStack’s success in building a highly active open source community:
“Apache CloudStack has flourished under the Apache Software Foundation which kept us from having to waste efforts coming up with a new open source governance model. Developers have responded well to the Apache Way with contributions flowing in from our rapidly growing community of over 35,000 individuals. We also are pleased with the organic way technology providers and open source projects are integrating their software with Apache CloudStack. Leadership of the project has also shifted from Citrix to a number of other individual committers who have been driving an aggressive development schedule. The upcoming 4.0 release is very exciting as it’s the first major release under Apache including code from numerous production users of CloudStack who developed features based on their experience running live cloud computing environments. Anecdotally we are seeing CloudStack deployments popping up everywhere from financial institutions and gaming companies to universities (we understand a CloudStack cluster even helped crunch research data for the Higgs Boson discovery). I am sure the excitement around CloudStack will continue given the incredible progress in under six short months.”
From an end-user’s perspective, it is good to see the competition heating up because that means more choices with better quality. Cloud computing is still an evolving market that is highly inmature, and we expect more competition to come in the future.
For your convienience, a PDF version of this presentation can be downloaded from here. Please kindly keep the author information if you want to redistribute the content.