HP Cloud Services Performance Test

By , August 11, 2012 9:20 am

This article is a translation of my presentation on the OpenStack APAC Conference in Shanghai during August 10~11. You are welcome to comment on the work and data as described in the slide sets.

As we all know, after two years of hard work, several commercial companies are now deploying OpenStack onto production systems. Recently HP — and followed by RackSpace — launched OpenStack-based public IaaS service. I have been testing HP Cloud Services since April. In this presentation I would like to share with you some of our test results.

The screen capture shown on this slide is the web-based user interface of HP Cloud Services. In this test all VM’s were created through the web console. All VM’s reside in the az-1.region-a.geo-1 availibility zone.

HP Cloud Services provides several versions of CentOS, Debian, Fedora and Ubuntu. In our test we used Ubuntu 11.04 64-bit Server Edition. This table lists the model, configuration and price of all the VM products. For each VM model, at least 3 instances were created for comparison, and more instances were created when necessary. During the test a total number of 20 VM’s were created.

In this test, we used byte-unixbench to evaluate overall system performance, mbw to evaluate memory performance, iozone to evaluate disk IO performance, iperf to evaluate networking performance, pgbench to evaluate database performance, and Hadoop wordcount to evaluate the performance of a complex application.

During the test we did some basic data filtering. For a particular VM model, we only present data from the VM with the best performance. For a particular benchmark test, 10 test runs were performed and the average number was taken as the result.

 


First let’s take a look at the byte-unixbench results.

UnixBench is a benchmark suite with a long history. The testing result is often refered to as the overall system performance. Theoretically the testing result is affected by CPU, memory, disk IO, and operating system. According to our observations, however, the testing result is affected more by CPU than anything else on the system being tested. Therefore, we choose UnixBench results to indicate the CPU performance of the system being tested.

On this graph there are two curves. The blue curve represents single-thread test results, while the pink one represents multi-thread test results. In multi-thread testings, the number of concurrent threads equals to the number of vCPU’s available on system being tested. It can be seen that the single-thread results are very close for all VM models, because it represents the performance of a single vCPU. In multi-thread testings, VM’s with the same amount of vCPU’s exhibit similar performane, while the amount of memory does not have much impact on performance. Further more, the growth in performance is not as fast as the growth in vCPU number. When the amount of vCPU grows by 100% (2x), the performance only grows by 50% (1.5x). This kind of result is in accordance with what we observed in other IaaS service such as GrandCloud and Aliyun in China.

 


We used mbw to evaluate the memory performance of VM’s. MBW determines the “copy” memory bandwidth available to userspace programs. The orange, pink and blue curves represent the amount of memory that can be copied by different methods. It can be seen that for all testing methods all VM’s exhibit the same memory bandwidth regardless of their model.

All VM’s provided by HP Cloud Services include two disks, one for the operating system (10 GB) and the other for data storage (size varies for each VM model). This graph shows the disk IO performance for the OS disk. It can be seen that all VM’s exhibit similar write performance, which is capped by 200 MB/s. However, there exist significant difference in read performance. XSmall, Small and Medium VM’s can only reach 1 GB/s throughput, while Large, XLarge and XXLarge VM’s can reach 5 GB/s, which is 5 times faster.

Base on fact that all VM’s have the same write performance, it is unlikely that HP uses techniques such as cgroup blkio throttling or QEMU blk throttle to limit VM disk IO. It is quite possible that HP uses different disk types for different VM model. For VM’s with low read performance, their disk images might be residing on SATA or SAS disks. For VM’s with high read performance, their disk images might be residing on SSD disks.

 

This graph shows the disk IO for the data disk. Again, all VM’s exhibit similar write performance, which is capped by 200 MB/s. There also exist significant difference in read performance.  XSmall, Small, Medium and Large VM’s can only reach 1 GB/s throughput, while XLarge and XXLarge VM’s can reach 5 GB/s, which is 5 times faster.

Let’s compare this graph with the previous one. It should be noted that the OS disk of the Large VM shows high read performance, but the data disk shows low read performance. This might represent a design strategy of the HP Cloud Service team: VM’s with better configuration deserve better disk IO performance.

It should be noted that the difference in disk IO performance is not mentioned in HP Cloud Services documentations.

Now let’s take a look at the internal bandwidth between VM’s. In the matrix shown above, data point (X, Y) represents the bandwidth between two VM’s. It can be seen that the internal bandwith is limited by the smaller VM. XSmall VM’s only have 25 Mbps bandwidth to any other VM’s, while Small VM’s have 50 Mbps. And 100, 200, 400, 650 Mbps for Medium, Large, XLarge and XXLarge VM’s.

Similar to the disk IO performance senario, the difference in internal bandwidth is not mentioned in HP Cloud Services documentations.

It is probably reasonable to setup internal bandwith limits. For example, some users might run malacious code that consumpts bandwidth, which affects the overall networking performance for all users.

So the question becomes whether the above-mentioned numbers are acceptable. Take the XSmall VM for example, it will take 5 to 8 minutes to copy a 2 GB file between two XSmall VM’s. Is this kind of speed acceptable for customers? Or, let’s all recall whether any of us still have any servers connected to a 100 Mbps switch.

By now we have evaluate the performance of CPU, memory, disk IO and network bandwidth. Let’s take Hadoop as an example of a complex application, and see how it performs on different VM’s. In the official Hadoop distribution there include some example applications. Many of novice Hadoop users have put their hands on the wordcount example. This application takes a directory on HDFS as input parameter, traverses through all the files inside that directory, and counts the total number of different words and the number of repeated instances for each word. In this test we used a directory with 3 files as the input. The size of each file is about 700 MB, and the total size is 2 GB. We ran the wordcount application on different VM’s to analysis the same directory, and recorded the time needed to finish the analysis as the test result. Obviously, the more time it takes to finish the analysis, the lower is the VM performance.

As can be seen from the graph, Small VM’s takes significantly less time to finish the analysis as compared to XSmall VM’s. However, as VM configuration goes further up, the time needed to finish the analysis does not reduce to smaller numbers. Medium, Large, XLarge and XXLarge VM’s takes approximate the same amount of time to finish the same test. A possible explaination is Hadoop wordcount is a disk IO intensive application. Because of disk IO limits, VM’s with more vCPU’s do not have much advantage in this kind of analysis.

On the previous slide we compared the Hadoop wordcount performance for VM’s of different model. Now let’s take a look at the performance of a Hadoop cluster with multiple nodes. As can be seen from the graph above, as more nodes were added to the cluster, the time needed to finish the analysis becomes less and less, which indicates that cluster performance becomes better and better. The performance of a cluster with 2 XSmall VM’s is comparable to the performance of a Small VM. The performance of a cluster with 3 XSmall VM’s is significantly better than the performance of an XXLarge VM.

Just now we have mentioned that Hadoop wordcount is a disk IO intensive application. In a cluster with multiple nodes, the pressure on disk IO was distributed onto multiple VM’s, therefore the performance of the cluster is better than a single VM with multiple vCPU and big memory. This kind of result is in accordance with what we observed on other IaaS services such as GrandCloud and Aliyun in China. (Both GrandCloud and Aliyun use Xen hypervisor, while HP Cloud Services uses KVM hypervisor). Therefore, when running Hadoop-like applications on public/private IaaS services, horizontal scaling should be considered for better performance.

It should be noted that in both single-node and multi-node testings, the data presented only include the time needed to run the wordcount application, not including the time needed to copy the data from local file system onto HDFS.

Finally, let’s take a look at database performance on HP Cloud Services. This graph shows pgbench test results on different VM’s, where the blue curve represents single-thread testings and the pink curve represents multi-thread testings. In multi-thread testings, the number of threads equals to the number of vCPU’s on the VM. It can bee seen that results from single-thread testings are very similar, and it represents the performance of a single vCPU. In multi-thread testings, increases in both vCPU and memory have positive impact on system performance, but the performance improvement is very limited. For example, as compared to a Small VM, an XXLarge VM have 4 times vCPU and 16 times memory, but its pgbench multi-thread performance is only 50% higher than the former.

Database benchmark is a typical disk IO intensive application. Because of limits in disk IO performance, VM’s with more vCPU’s do not show much advantage in this test.

During pgbench testings we observed a puzzling phenomenon. For VM’s of the same model, the single-thread pgbench performance exhibit significant difference. As shown in this graph, on problematic VM’s the single pgbench performance is only 1/10 of normal VM’s. It is reasonable to assume that it is not design behavior for VM’s of the same model to exhibit significant degree of performance difference for the same application. Therefore we consider such phenomenon defects.

Base on our testings we have concluded the following rules:

1、The defect might occur on VM’s of any model;
2、On a particular VM instance, test results are consistent;
3、The defect does not have significant impact on byte-unixbench, mbw and iperf test results. In other words, vCPU, memory and networking on problematic VM’s are working normally.

To find out the reason behind the defects, we compare the disk IO performance obtained from iozone testings. Since PostgreSql puts data onto the OS disks, we only compare the result for OS disks. On the graph shown here, the blue curve represents disk write performance for normal VM’s, while the pink curve represents the defected VM’s. It can be seen that on defected VM’s the disk write performance is much worse than normal VM’s (approximate 1/10 of normal VM’s).

And now we take a further look at disk read performance. Again, the blue curve represents normal VM’s and the pink curve represents defected VM’s. It can be seen that in both cases the disk read performance are similar.

As mentioned before, database benchmark is a typical disk IO intensive application. We believe that the defected VM’s might have configuration problems in the back end storage system. Such problems only affect disk write performance but not disk read performance.

In our tests we created a total number of 20 VM’s of different model, and the above-mentioned defects were observed on 7 VM’s. In another words, in the availability zone where we carry out our tests, the possibility of getting a defected VM is 35%. Such a high defect rate is not acceptable in any production system. Therefore, I would not encourage anybody to deploy any production system onto HP Cloud Services.

Also, if you are creating VM’s automatically with API’s, you will need to double check whether the VM you get have defects.

I would like to add that considering the fact that the defect rate is so high, and the impact on disk IO performance is so serious, it should be relative easy to find the problem with a limited amount of simple performance checks. Therefore, we have the reason to believe that the HP Cloud Services team does not have the mechanism to carry our regular monitoring of the complex infrastructure they are in charge of. It is very unfortunate that HP defers such a criticle assignment to its paying customers.te

As said in the very beginning, HP is the first commercial company to launch OpenStack-based public IaaS service. The defects we encountered in our testings are obviously not introduced by OpenStack itself. However, it implies that OpenStack still lacks certain criticle functionalities such as system level monitoring and notification. Because of the lack of such functionalities, the HP Cloud Services team is not aware of the serious defects that are presented in their system. Therefore, I insist that OpenStack is still not mature enough for commercial deployments.

Providing IaaS service is a complex system engineering, which can not be accomplished by simply grabbing an open source software and install. The combination of different compute, storage and networking components, as well as the selection and configuration of operating system, hypervisor and IaaS management tool, would all have impacts on the performance of the system. HP is a company with strong R&D background, and should have good knowledge with OpenStack because it is one of the major corporate contributors. It is sad that none of these advantages could prevent HP from launching a defected IaaS product.

I personally believe that for small-to-medium companies, research institutes and government agencies who lacks hands-on experience in system integration and data center operations should consult with professional cloud computing companies when building their private/hybric IaaS. The future of open source IaaS software lies in the service and support related to private/hybric IaaS systems. This is why companies like Piston, Nebula and RackSpace would continue to invest in OpenStack.

A year ago I published an article to compare the architecture, functionality, community and commercialization of open source IaaS projects. In that article I refered to a piece of budhist sutra translated by Master Kumarajiva — attract with benefits, persuade with wisdom. Today, I still believe that this sutra represents the future of open source IaaS projects.

Thank you for taking time to read this, and I welcome comments on the work and data presented in this article. I can always be reached via the email address as shown on the slides.

 

6 Responses to “HP Cloud Services Performance Test”

  1. [...] 64-bit Server Edition, with three instances of every test VM for comparison. The test, according to Qingye’s blog entry: “In this test, we used byte-unixbench to evaluate overall system performance, mbw to [...]

  2. [...] performance testing on public cloud service providers, readers can refer to my other blog post HP Cloud Services Performance Tests for more [...]

  3. Thanks for sharing your thoughts about home improvement ideas for small houses.
    Regards

  4. Caren says:

    Today, I went to the beach front with my kids. I found a sea shell and
    gave it to my 4 year old daughter and said “You can hear the ocean if you put this to your ear.” She pput the shell to her ear
    and screamed. There was a hermit crab inside and it pinched her ear.
    She never wants to go back! LoL I know this is totally off topic but I had to tell someone!

  5. Adrianna says:

    If you want to grow your familiarity simply keep visiting this weeb site and be updated with the newest news update posted here.

  6. Rub in the mixture of lemon juice and salt thoroughly, and set aside.
    Just let Leon keep up his strafing runs at you and drop kick him in the teeth when he gets
    near. The top sirloin is a juicy cut taken from the center of the
    sirloin – the tenderest part – and a great cut for grilling.

Leave a Reply

Panorama Theme by Themocracy