最近种的花花草草

By , August 30, 2012 11:13 am

娃娃马上又要开学啦,盘点一下这个暑假种下的花花草草。

芙蓉花,散尾葵,茶花,红豆杉,凤凰木,榴莲,白玉兰,可可,咖啡,蛋黄果,面包树,番樱桃,红毛丹,大叶冬青(苦丁茶),金星果,台湾大青枣,杨梅,樱桃,银杏,黄肉猕猴桃,黑无花果,红心无花果。

 

台风天,晒晒娃

By , August 17, 2012 10:42 am

雄赳赳,气昂昂,我们去捉狗狗。

捉到了一只。呀,好沉!

哈!姐姐也捉到了一只。

站住,一个不许跑。

HP Cloud Service性能测试

By , August 11, 2012 12:57 pm

HP Cloud Service性能测试

这套幻灯片是我为8 月10~11日OpenStack亚太峰会上海站的发言准备的文字稿。现场演讲在语言上可能有所不同,但是内容基本上是一致的。请各位同仁多多批评指教。

【subject】

大家好,我叫蒋清野。大家都知道,OpenStack项目经过两年的发展,已经开始进入试商用的阶段。HP是最早推出基于OpenStack的公有云服务的公司,RackSpace也于最近推出了基于OpenStack的公有云服务。这两个IaaS服务在一定程度上可以反映OpenStack项目的成熟程度。我从今年4 月份起就在HP Cloud Service上进行一些测试性的项目。借OpenStack亚太峰会这个机会,将我在HP Cloud Service上获得的一些数据与各位同行进行分享,希望能够得到大家的批评和指教。

(1 分钟)

【测试介绍】

幻灯片上的这张截图,是HP Cloud Service的用户界面。在这次测试中,所有的虚拟机都是通过基于浏览器的用户界面创建的,并且集中在az-1.region-a.geo-1这个可用域里。HP Cloud Service提供了多个版本的CentOS、Debian、Fedora和Ubuntu操作系统,在我们的测试中选用的操作系统为Ubuntu 11.04 64 bit服务器版。幻灯片上着个表格列出了HP Cloud Service所提供的虚拟机规格、配置和价格。每个规格的虚拟机,至少创建3 个实例来进行比较。在有必要的情况下,还要创建更多个实例。在整个测试中,一共创建了20台虚拟机。

在这次测试中,我们使用byte-unixbench来评估vCPU性能,mbw来评估内存性能,iozone来评估磁盘IO性能,iperf来评估网络性能,pgbench来评估数据库性能,并通过Hadoop wordcount来评估复杂应用的性能。

在这次测试中,我们进行了一些数据筛选的工作。在同一规格的虚拟机中,我们展示的是性能最好的虚拟机上的测试结果。对于同一个测试项目,我们进行10次重复测试,并将其结果进行平均。

(3 分钟)

【byte-unixbench】

首先我们来看看byte-unixbench的测试结果。

UnixBench是一套具有悠久历史的性能测试工具,其测试结果反映的是一台主机的综合性能。从理论上来说UnixBench测试结果与被测试主机的CPU、内存、存储、操作系统都有直接的关系。但是根据我们的观察,对于现代的计算机系统来说,UnixBench测试结果受CPU 处理能力的影响更大一些。因此,在这里我们用UnixBench测试结果来代表虚拟机的vCPU 处理能力。

在这张图表上有两条曲线,其中蓝色的曲线代表单线程的测试结果,红色的曲线代表多线程的测试结果。在多线程测试中,线程的个数等于虚拟机的vCPU颗数。可以看到,在单线程测试中,所有虚拟机的测试结果是非常接近的,因为它反映的是单颗vCPU的处理能力。在多线程测试中,具有同样数量vCPU的虚拟机的测试结果是基本一致的,内存大小对测试结果基本上没有影响。此外,测试结果的增长慢于vCPU数量的增长。当vCPU的数量增长一倍(乘以2)的时候,测试结果的数值增长0.5倍(乘以1.5)。这样的测试结果,和我们在其他IaaS平台(例如盛大云和阿里云)上观察到的现象是一致的。

(3 分钟)

【mbw】

我们使用mbw来测试虚拟机的内存性能。mbw通常用来评估用户层应用程序进行内存拷贝操作所能够达到的带宽。这张图表上橙色、粉色和蓝色的曲线分别代表三种不同的内存拷贝方式在一秒钟里能够拷贝的内存大小。可以看到,对于任何一种测试方法,所有规格的虚拟机的内存带宽基本上是一样的。

(1 分钟)

【iozone – os disk】

HP Cloud Service所提供的虚拟机包括两块硬盘,一块是系统盘,另外一块是数据盘。这张图表展示的是通过iozone测试到的系统盘吞吐量。可以看出,所有规格的虚拟机在写性能方面是基本一致的,其吞吐量上限基本上都在200 MB/s左右。但是在读性能方面有很大的差异。 超小型(XSmall)、小型(Small)和中型(Medium)虚拟机的吞吐量只能够达到1 GB/s,而大型(Large)、超大型(XLarge)和特大型(XXLarge)虚拟机的吞吐量可以达到5 GB/s,是其他虚拟机的5 倍。

从不同规格的虚拟机具有类似的写性能这一点来看,HP Cloud Service不太可能是利用cgroup blkio throttling或者是qemu blk throttle对虚拟机的磁盘IO进行限制。读性能比较低的虚拟机,其磁盘映像可能存储在SATA或者是SAS硬盘上;读性能比较高的虚拟机,其磁盘映像可能存储在SSD硬盘上。

(2 分钟)

【iozone – data disk】

这张图表展示的是通过iozone测试到的数据盘吞吐量。同样,所有规格的虚拟机在写性能方面是基本一致的,其吞吐量上限基本上都在200 MB/s左右。但是在读性能方面有很大的差异。 超小型(XSmall)、小型(Small)、中型(Medium)和大型(Large)虚拟机的吞吐量只能够达到1 GB/s,而超大型(XLarge)和特大型(XXLarge)虚拟机的吞吐量可以达到5 GB/s,是其他虚拟机的5 倍。

我们将这张图表和上一张图表进行一下比较。各位注意一下这两张图表上的大型主机(Large),它的系统盘具有较高的读性能,但是数据盘具有较低的读性能。大型主机(Large)可能代表了HP Cloud Service设计团队的一个思路:配置更高的主机,系统盘和数据盘都需要更好的磁盘IO性能。

需要指出的是,不同规格的虚拟机在磁盘IO性能方面的差异,并没有在HP Cloud Service的文档中加以说明。

(2 分钟)

【iperf】

接下来我们看一下HP Cloud Service的内网带宽状况。在如图所示的这个矩阵中,数据点(X, Y)的数值代表两台虚拟机之间的带宽。可以看出,两台虚拟机之间的内网带宽是由配置较低的虚拟机所决定的。超小型主机(XSmall)与其他任意主机之间只有25 Mbps的带宽,小型主机(Small)与其他任意主机之间只有50 Mbps的带宽,中型主机(Medium)是100 Mbps,大型主机(Large)是200 Mbps,超大型主机(XLarge)是400 Mbps,特大型主机(XXLarge)是650 Mbps。

和前面提到的磁盘IO性能差异类似,不同规格的虚拟机在内网带宽方面的差异,也没有在HP Cloud Service的文档中加以说明。

限制虚拟机之间的内网带宽,从某些角度来看也许是合适的。譬如说,一些用户可能会运行某些大量消耗内网带宽的恶意程序,导致整个内网发生拥塞,从而影响到同一内网中的其他用户。

在这样一个前提下,我想潜在的问题就是这些具体数字是合适的吗?以内网带宽最小的超小型主机(XSmall)为例,通过内网拷贝一个2 GB的文件,实际需要5 分钟到8 分钟左右,这样的速度是客户可以接受的吗?或者,请在座的各位回想一下,你们现在还有任何服务器是接在百兆交换机上的吗?

(4 分钟)

【hadoop wordcount single node】

在分析完HP Cloud Service的处理器、内存、磁盘和网络性能之后,我们以目前比较流行的Hadoop为例,看看它在不同的虚拟机上的性能如何。在我们下载到的Hadoop软件包中包含了一些作为例子的应用,其中的wordcount应用估计很多人都运行过。这个应用读取输入目录中的文件,计算这些文件中一共有多少个不同的单词以及他们出现的次数。在这个测试中,我使用了三个700MB左右的文件作为测试数据,测试数据一共有2 GB。我们不同规格的虚拟机上运行wordcount应用对这些测试数据进行分析,并分别记录完成分析所需要的时间。显然,完成测试所需要的时间越长,表示虚拟机的性能越低。

从这张图表中可以看出,完成同样的测试,小型主机(Small)需要的时间比超小型主机(XSmall)有显著的减少。但是,随着主机配置的进一步提高,完成测试所需要的时间并没有进一步减少。中型(Medium)、大型(Large)、超大型(XLarge)和特大型(XXLarge)主机完成测试所需要的时间基本上维持在同一水平上。一个可能的解释是Hadoop wordcount是磁盘IO密集型应用,由于受到磁盘IO性能的限制,配置较多vCPU的虚拟机不能够发挥其计算能力方面的优势。

(3 分钟)

【hadoop wordcount multiple nodes】

在上一张胶片中,我们比较的是不同规格的虚拟机的Hadoop wordcount性能。接下来我们看看由多台虚拟机组成的Hadoop集群会有什么样的性能。从上面这张图表中可以看出,随着计算集群中主机数目的增加,完成测试所需要的时间越来越短,表明计算集群的整体处理能力越来越强。由两台超小型主机(2 x XSmall)组成的集群,其性能与单台小型主机(Small)接近。由三台超小型主机(3 x XSmall)组成的集群,其性能明显超过单台特大型主机(XXLarge)。

前面一张幻灯片中我们已经提到Hadoop wrdcount是磁盘IO密集型应用。由多台配置较低的虚拟机组成的计算集群,由于磁盘IO方面的压力被分散到多台虚拟机上,因此其性能超过了由单台配置较高的虚拟机组成的伪集群。这样的测试结果和之前我在盛大云和阿里云上观察到的现象是一致的。(盛大云和阿里云选择的虚拟化技术是Xen,HP Cloud Service选择的虚拟化技术是KVM。)当我们在IaaS平台上运行类似于Hadoop这样的应用的时候,应该尽可能考虑横向扩展,才能够获得最大的性能和性价比。

需要说明的是,如上所述数据仅仅包括进行Hadoop wordcount运算的时间,不包括将数据文件从本地文件系统拷贝到HDFS的时间。

(3 分钟)

【pgbench】

最后我们来看一看在HP Cloud Service上的数据库性能。这张图表展示的是在不同主机上的pgbench测试结果,其中蓝色的曲线是单线程的测试结果,粉色的曲线是多线程的测试结果。在多线程测试中,线程的个数等于虚拟机的vCPU颗数。可以看到,在单线程测试中,所有虚拟机的测试结果是非常接近的,因为它反映的是单颗vCPU的处理能力。在多线程测试中,vCPU以及内存的增加都会提高系统的性能,但是提升的幅度非常有限。例如,特大型主机(XXLarge)的vCPU数量是小型主机(Small)的4 倍,内存大小是小型主机的16倍,但是其pgbench多线程性能仅仅是小型主机(Small)的1.5倍。

数据库基准测试是典型的磁盘IO密集型应用,由于受到磁盘IO性能的限制,配置较多vCPU的虚拟机不能够发挥其计算能力方面的优势。

(2 分钟)

【defects – pgbench single thread】

在pgbench测试中,我们还发现了一些令人困惑的现象。同一规格的虚拟机,其pgbench单线程测试结果存在非常大的差异。就像这张图表所展示的一样,在存在问题的虚拟机上,pgbench单线程的性能只有正常虚拟机的1/10不到。在统一规格的虚拟机上运行同样的应用,竟然存在如此之大的性能差距,从常理来推断应该不是设计行为。因此,我们将这一现象定性为缺陷。

在大量测试的基础上,我们总结出如下几条规律:

1、这个缺陷可能出现在任何规格的虚拟机上;
2、在同一虚拟机实例上,测试结果是一致的(不会出现有时正常有时不正常的情况);
3、这个缺陷并不显著影响byte-unixbench、mbw和iperf测试结果。换句话说,在出现缺陷的虚拟机上,处理器、内存和网络的配置应该是正常的。

(3 分钟)

【defects – iozone write results】

为了找出这个缺陷的具体原因,我们又对虚拟机的iozone测试结果进行了对比。由于PostgreSql缺省地将数据存储在系统盘上,我们这里仅对比了系统盘的测试结果。这张图表上的蓝色曲线是表现正常的虚拟机的磁盘写入性能,粉色曲线则是出现缺陷的虚拟机的磁盘写入性能。可以看出,在出现缺陷的虚拟机上,其磁盘写入性能与正常的虚拟机有非常大的差距,大概只有正常虚拟机的1/10甚至是更低。

(1 分钟)

【defects – iozone read results】

我们再看一下磁盘读取性能方面的比较数据。

这张图表上的蓝色曲线是表现正常的虚拟机的磁盘读取性能,粉色曲线则是出现缺陷的虚拟机的磁盘读取性能。可以看出,在出现缺陷的虚拟机上,其磁盘读取性能与正常的虚拟机没有本质差别。

刚才我们已经提到,数据库基准测试是典型的磁盘IO密集型应用。可以推断,出现缺陷的虚拟机后端的存储系统可能存在配置方面的问题。这个配置影响到磁盘写入性能,但是对磁盘读取性能没有显著影响。

(2 分钟)

【defect rate】

在这次测试中,我们一共创建了20台不同规格的虚拟机,其中7 台观察到了前面所说的缺陷。换句话说,在我们进行测试的可用域(Availability Zone)中,创建任意虚拟机出现缺陷的概率是35%。如此之高的缺陷概率,对于任何生产系统来说都是无法容忍的。因此,我个人不建议往HP Cloud Service上部署任何生产系统。

此外,如果你通过API自动地在HP Cloud Service上创建虚拟机实例,一定要重复验证每一个实例是否存在缺陷。

值得注意的是,这个缺陷的出现概率非常之高,对磁盘IO性能的影响也非常明显,应该可以通过简单的测试来找出问题的根源。因此,我们有充分的理由相信,HP Cloud Service的研发和运维团队,并没有对他们所推出的产品和服务进行最小限度的测试。恰恰相反,对HP Cloud Service进行测试的任务交给了像我这样的小白鼠。更重要的是,我在帮助他们寻找缺陷的同时,还要向他们支付昂贵的服务费!

(3 分钟)

【conclusion】

在开始的时候我介绍过,HP是第一家推出基于OpenStack的公有云服务的公司。我们在测试过程中遇到的缺陷,显然不是由于OpenStack本身所造成,但是它反映了OpenStack还缺乏一个商用系统中必不可少的监控和报警功能。由于类似功能的缺乏,HP Cloud Service的研发和运维人员长期以来对系统中的严重缺陷一无所知。所以,我个人还是坚持我之前的观点,OpenStack项目的成熟度还不够高,离一个商用系统的要求还是存在较大的距离。

从另外一个角度来看,基础构架服务(IaaS)是一个比较复杂的系统工程,不是简单地拿一个开源软件过来安装配置就能够搞定的事情。各种计算、存储、网络硬件之间的排列组合,操作系统、虚拟化软件、虚拟化管理软件的选择和配置,都会对云平台的性能产生巨大的影响。即使是HP这样具有雄厚技术实力并且亲力亲为参与OpenStack研发的著名公司,也会在生产系统当中出现重大缺陷。我个人认为,在系统集成和数据中心运维领域缺乏经验的中小型公司、研究所、政府部门,在构建自己的私有云或者是混合云的时候,应该积极与专业从事云计算的公司进行合作,借助其长期积累的经验和教训避免缺陷和提高性能。可以这么认为,开源IaaS软件的市场前景,在于基于开源IaaS软件为客户提供私有云和混合云方面的支持和服务。这就是Piston、Nebula和RackSpace这样的公司的价值所在。

我在一年之前发表过一篇题为《开源IaaS软件的比较 — 构架、功能、社区、商业及其他》的博客文章,其中最后一章引用了鸠摩罗什大师所译的《维摩诘所说经》中这么一句话:“先以欲勾牵,后令入佛智。” 一年后的今天,我依然认为这句话完美地阐释了开源IaaS软件的发展趋势。

(4 分钟)

【thank you】

今天我要介绍的内容,到这里就结束了。谢谢各位花时间来听我的讲座,也请各位多多批评和指正。各位如果需要更多的资料,包括今天这个讲座的幻灯片和文本,都可以从我的博客获得。

再次感谢。

HP Cloud Services Performance Test

By , August 11, 2012 9:20 am

This article is a translation of my presentation on the OpenStack APAC Conference in Shanghai during August 10~11. You are welcome to comment on the work and data as described in the slide sets.

As we all know, after two years of hard work, several commercial companies are now deploying OpenStack onto production systems. Recently HP — and followed by RackSpace — launched OpenStack-based public IaaS service. I have been testing HP Cloud Services since April. In this presentation I would like to share with you some of our test results.

The screen capture shown on this slide is the web-based user interface of HP Cloud Services. In this test all VM’s were created through the web console. All VM’s reside in the az-1.region-a.geo-1 availibility zone.

HP Cloud Services provides several versions of CentOS, Debian, Fedora and Ubuntu. In our test we used Ubuntu 11.04 64-bit Server Edition. This table lists the model, configuration and price of all the VM products. For each VM model, at least 3 instances were created for comparison, and more instances were created when necessary. During the test a total number of 20 VM’s were created.

In this test, we used byte-unixbench to evaluate overall system performance, mbw to evaluate memory performance, iozone to evaluate disk IO performance, iperf to evaluate networking performance, pgbench to evaluate database performance, and Hadoop wordcount to evaluate the performance of a complex application.

During the test we did some basic data filtering. For a particular VM model, we only present data from the VM with the best performance. For a particular benchmark test, 10 test runs were performed and the average number was taken as the result.

 


First let’s take a look at the byte-unixbench results.

UnixBench is a benchmark suite with a long history. The testing result is often refered to as the overall system performance. Theoretically the testing result is affected by CPU, memory, disk IO, and operating system. According to our observations, however, the testing result is affected more by CPU than anything else on the system being tested. Therefore, we choose UnixBench results to indicate the CPU performance of the system being tested.

On this graph there are two curves. The blue curve represents single-thread test results, while the pink one represents multi-thread test results. In multi-thread testings, the number of concurrent threads equals to the number of vCPU’s available on system being tested. It can be seen that the single-thread results are very close for all VM models, because it represents the performance of a single vCPU. In multi-thread testings, VM’s with the same amount of vCPU’s exhibit similar performane, while the amount of memory does not have much impact on performance. Further more, the growth in performance is not as fast as the growth in vCPU number. When the amount of vCPU grows by 100% (2x), the performance only grows by 50% (1.5x). This kind of result is in accordance with what we observed in other IaaS service such as GrandCloud and Aliyun in China.

 


We used mbw to evaluate the memory performance of VM’s. MBW determines the “copy” memory bandwidth available to userspace programs. The orange, pink and blue curves represent the amount of memory that can be copied by different methods. It can be seen that for all testing methods all VM’s exhibit the same memory bandwidth regardless of their model.

All VM’s provided by HP Cloud Services include two disks, one for the operating system (10 GB) and the other for data storage (size varies for each VM model). This graph shows the disk IO performance for the OS disk. It can be seen that all VM’s exhibit similar write performance, which is capped by 200 MB/s. However, there exist significant difference in read performance. XSmall, Small and Medium VM’s can only reach 1 GB/s throughput, while Large, XLarge and XXLarge VM’s can reach 5 GB/s, which is 5 times faster.

Base on fact that all VM’s have the same write performance, it is unlikely that HP uses techniques such as cgroup blkio throttling or QEMU blk throttle to limit VM disk IO. It is quite possible that HP uses different disk types for different VM model. For VM’s with low read performance, their disk images might be residing on SATA or SAS disks. For VM’s with high read performance, their disk images might be residing on SSD disks.

 

This graph shows the disk IO for the data disk. Again, all VM’s exhibit similar write performance, which is capped by 200 MB/s. There also exist significant difference in read performance.  XSmall, Small, Medium and Large VM’s can only reach 1 GB/s throughput, while XLarge and XXLarge VM’s can reach 5 GB/s, which is 5 times faster.

Let’s compare this graph with the previous one. It should be noted that the OS disk of the Large VM shows high read performance, but the data disk shows low read performance. This might represent a design strategy of the HP Cloud Service team: VM’s with better configuration deserve better disk IO performance.

It should be noted that the difference in disk IO performance is not mentioned in HP Cloud Services documentations.

Now let’s take a look at the internal bandwidth between VM’s. In the matrix shown above, data point (X, Y) represents the bandwidth between two VM’s. It can be seen that the internal bandwith is limited by the smaller VM. XSmall VM’s only have 25 Mbps bandwidth to any other VM’s, while Small VM’s have 50 Mbps. And 100, 200, 400, 650 Mbps for Medium, Large, XLarge and XXLarge VM’s.

Similar to the disk IO performance senario, the difference in internal bandwidth is not mentioned in HP Cloud Services documentations.

It is probably reasonable to setup internal bandwith limits. For example, some users might run malacious code that consumpts bandwidth, which affects the overall networking performance for all users.

So the question becomes whether the above-mentioned numbers are acceptable. Take the XSmall VM for example, it will take 5 to 8 minutes to copy a 2 GB file between two XSmall VM’s. Is this kind of speed acceptable for customers? Or, let’s all recall whether any of us still have any servers connected to a 100 Mbps switch.

By now we have evaluate the performance of CPU, memory, disk IO and network bandwidth. Let’s take Hadoop as an example of a complex application, and see how it performs on different VM’s. In the official Hadoop distribution there include some example applications. Many of novice Hadoop users have put their hands on the wordcount example. This application takes a directory on HDFS as input parameter, traverses through all the files inside that directory, and counts the total number of different words and the number of repeated instances for each word. In this test we used a directory with 3 files as the input. The size of each file is about 700 MB, and the total size is 2 GB. We ran the wordcount application on different VM’s to analysis the same directory, and recorded the time needed to finish the analysis as the test result. Obviously, the more time it takes to finish the analysis, the lower is the VM performance.

As can be seen from the graph, Small VM’s takes significantly less time to finish the analysis as compared to XSmall VM’s. However, as VM configuration goes further up, the time needed to finish the analysis does not reduce to smaller numbers. Medium, Large, XLarge and XXLarge VM’s takes approximate the same amount of time to finish the same test. A possible explaination is Hadoop wordcount is a disk IO intensive application. Because of disk IO limits, VM’s with more vCPU’s do not have much advantage in this kind of analysis.

On the previous slide we compared the Hadoop wordcount performance for VM’s of different model. Now let’s take a look at the performance of a Hadoop cluster with multiple nodes. As can be seen from the graph above, as more nodes were added to the cluster, the time needed to finish the analysis becomes less and less, which indicates that cluster performance becomes better and better. The performance of a cluster with 2 XSmall VM’s is comparable to the performance of a Small VM. The performance of a cluster with 3 XSmall VM’s is significantly better than the performance of an XXLarge VM.

Just now we have mentioned that Hadoop wordcount is a disk IO intensive application. In a cluster with multiple nodes, the pressure on disk IO was distributed onto multiple VM’s, therefore the performance of the cluster is better than a single VM with multiple vCPU and big memory. This kind of result is in accordance with what we observed on other IaaS services such as GrandCloud and Aliyun in China. (Both GrandCloud and Aliyun use Xen hypervisor, while HP Cloud Services uses KVM hypervisor). Therefore, when running Hadoop-like applications on public/private IaaS services, horizontal scaling should be considered for better performance.

It should be noted that in both single-node and multi-node testings, the data presented only include the time needed to run the wordcount application, not including the time needed to copy the data from local file system onto HDFS.

Finally, let’s take a look at database performance on HP Cloud Services. This graph shows pgbench test results on different VM’s, where the blue curve represents single-thread testings and the pink curve represents multi-thread testings. In multi-thread testings, the number of threads equals to the number of vCPU’s on the VM. It can bee seen that results from single-thread testings are very similar, and it represents the performance of a single vCPU. In multi-thread testings, increases in both vCPU and memory have positive impact on system performance, but the performance improvement is very limited. For example, as compared to a Small VM, an XXLarge VM have 4 times vCPU and 16 times memory, but its pgbench multi-thread performance is only 50% higher than the former.

Database benchmark is a typical disk IO intensive application. Because of limits in disk IO performance, VM’s with more vCPU’s do not show much advantage in this test.

During pgbench testings we observed a puzzling phenomenon. For VM’s of the same model, the single-thread pgbench performance exhibit significant difference. As shown in this graph, on problematic VM’s the single pgbench performance is only 1/10 of normal VM’s. It is reasonable to assume that it is not design behavior for VM’s of the same model to exhibit significant degree of performance difference for the same application. Therefore we consider such phenomenon defects.

Base on our testings we have concluded the following rules:

1、The defect might occur on VM’s of any model;
2、On a particular VM instance, test results are consistent;
3、The defect does not have significant impact on byte-unixbench, mbw and iperf test results. In other words, vCPU, memory and networking on problematic VM’s are working normally.

To find out the reason behind the defects, we compare the disk IO performance obtained from iozone testings. Since PostgreSql puts data onto the OS disks, we only compare the result for OS disks. On the graph shown here, the blue curve represents disk write performance for normal VM’s, while the pink curve represents the defected VM’s. It can be seen that on defected VM’s the disk write performance is much worse than normal VM’s (approximate 1/10 of normal VM’s).

And now we take a further look at disk read performance. Again, the blue curve represents normal VM’s and the pink curve represents defected VM’s. It can be seen that in both cases the disk read performance are similar.

As mentioned before, database benchmark is a typical disk IO intensive application. We believe that the defected VM’s might have configuration problems in the back end storage system. Such problems only affect disk write performance but not disk read performance.

In our tests we created a total number of 20 VM’s of different model, and the above-mentioned defects were observed on 7 VM’s. In another words, in the availability zone where we carry out our tests, the possibility of getting a defected VM is 35%. Such a high defect rate is not acceptable in any production system. Therefore, I would not encourage anybody to deploy any production system onto HP Cloud Services.

Also, if you are creating VM’s automatically with API’s, you will need to double check whether the VM you get have defects.

I would like to add that considering the fact that the defect rate is so high, and the impact on disk IO performance is so serious, it should be relative easy to find the problem with a limited amount of simple performance checks. Therefore, we have the reason to believe that the HP Cloud Services team does not have the mechanism to carry our regular monitoring of the complex infrastructure they are in charge of. It is very unfortunate that HP defers such a criticle assignment to its paying customers.te

As said in the very beginning, HP is the first commercial company to launch OpenStack-based public IaaS service. The defects we encountered in our testings are obviously not introduced by OpenStack itself. However, it implies that OpenStack still lacks certain criticle functionalities such as system level monitoring and notification. Because of the lack of such functionalities, the HP Cloud Services team is not aware of the serious defects that are presented in their system. Therefore, I insist that OpenStack is still not mature enough for commercial deployments.

Providing IaaS service is a complex system engineering, which can not be accomplished by simply grabbing an open source software and install. The combination of different compute, storage and networking components, as well as the selection and configuration of operating system, hypervisor and IaaS management tool, would all have impacts on the performance of the system. HP is a company with strong R&D background, and should have good knowledge with OpenStack because it is one of the major corporate contributors. It is sad that none of these advantages could prevent HP from launching a defected IaaS product.

I personally believe that for small-to-medium companies, research institutes and government agencies who lacks hands-on experience in system integration and data center operations should consult with professional cloud computing companies when building their private/hybric IaaS. The future of open source IaaS software lies in the service and support related to private/hybric IaaS systems. This is why companies like Piston, Nebula and RackSpace would continue to invest in OpenStack.

A year ago I published an article to compare the architecture, functionality, community and commercialization of open source IaaS projects. In that article I refered to a piece of budhist sutra translated by Master Kumarajiva — attract with benefits, persuade with wisdom. Today, I still believe that this sutra represents the future of open source IaaS projects.

Thank you for taking time to read this, and I welcome comments on the work and data presented in this article. I can always be reached via the email address as shown on the slides.

 

海南山竹

By , August 2, 2012 6:11 pm

海南山竹,大乔木,树叶椭圆形,果实呈球形。果实尚未成熟时为嫩绿色,成熟后逐渐转为橙黄色。黄色果肉,酸里带甜,略有苦涩。一般来说,山竹需要树龄10年以上才能够开花结果。我们家里的山竹树是从2007年底从野外移植过来的老树,今年是移植之后第一次结果,大概有十多个的样子。

上图是市面上铺天盖地地卖的山竹,虽然号称是海南山竹,其实原产东南亚,只是最近才引种到海南的品种。这个果子的果皮较厚,呈黑褐色,白色果肉,甜中略带酸味。

Panorama Theme by Themocracy