Some Data on Aliyun’s New TeraSort Results

By , 2015年10月29日 6:14 上午

Recently Aliyun announced a break through in the TeraSort benchmark. They finish sorting 100 TB of data in 377 seconds. This is significantly faster than the previous world record 23 minutes created by Spark in 2014. Out of curiosity I compile some data regarding the clusters being used by Yahoo (2013), Spark (2014) and Aliyun (2015) to see what improvements are being made.

Vendor Yahoo Spark Aliyun
Year 2013 2014 2015
Data Source sortbenchmark.org Spark sortbennchmark.org
Single Node Configuration
System Dell R720xd AWS EC2 i2.8xlarge Unknown
CPU Intel Xeon E5-2630 Intel Xeon E5-2670 v2 Intel Xeon E5-2630
Intel Xeon E5-2650v2
Total CPU Cores 12 (2 Phyiscal CPUs) 32 (vCPU) 12 or 16 (2 Physical CPUs)
Memory 64 GB 244 GB 96 GB or 128 GB
Stroage 12 x 3 TB SATA 8 x 800 GB SSD 12 x 2 TB SATA
Single Disk Sequential Read Throughput (128 KB blocks) 120 MB/s 400 MB/s 120 MB/s
Single Disk Sequential Write Throughput (128 KB blocks) 120 MB/s 400 MB/s 120 MB/s
RAID0 Sequential Read Throughput (128 KB blocks) 1,440 MB/s 3,200 MB/s 1,440 MB/s
RAID0 Sequential Write Throughput (128 KB blocks) 1,440 MB/s 3,200 MB/s 1,440 MB/s
Networking 10 Gbps 10 Gbps 10 Gbps
Cluster Configuration
Number of Nodes 2100 206 3377
Number of CPU Cores 25,200 6,592 41,496
Total Memory 134,400 GB 50,264 GB 331,968 GB
Total Sequential Read Throughput (128 KB blocks) 3,024,000 MB/s 659,200 MB/s 4,862,880 MB/s
Total Sequential Write Throughput (128 KB blocks) 3,024,000 MB/s 659,200 MB/s 4,862,880 MB/s
100 TB Sorting Results
Time 72 minutes 23 minutes 377 seconds

The Aliyun cluster has 331,968 GB memory in total, which is significantly greater than the size of the data to be sorted. This allows the data to be sorted to reside in memory completely, avoiding performance impact from disk I/O. In fact, in their report Aliyun described a “I/O dual buffering” technique, which allows data processing and disk I/O to be done in parallel. The report pointed out that “we ensure data are not buffered in OS page cache by running a data purge job that randomly reads from local file system before each benchmark run”. However, data can be loaded into memory quickly at the beginning of the benchmark because the cluster has sufficient I/O capacity to achieve this in around 20 seconds. The “Overlapped Execution” section in the report implies that the abundance of memory might be playing a much greater role than the “I/O dual buffering” technique. This is very different from the Spark cluster with only 50,264 GB memory, where extensive disk I/O must occur as part of the sorting benchmark.

Based on the above-listed data, it is quite convincing that Aliyun’s solution is better than Yahoo’s solution, considering the obvious performance advantages. However, it is very hard to say that Aliyun’s solution is better than Spark’s solution, considering the obvious resource advantages (memory in particular).

Another important aspect is that Spark’s solution was deployed on top of Amazon EC2. This means that such very-large-scale computation can be done with an extremely low cost – researchers only need to pay for the actual amount of computing resource being used for the amount of time they are using it. In Aliyun’s case, the cluster was a set of fixed asset for Aliyun. Considering the fact that Aliyun also considers itself as a public cloud service provider, is it possible for them to run this benchmark on their public cloud offerings?

One Response to “Some Data on Aliyun’s New TeraSort Results”

  1. Satya Shrestha说道:

    Nice article and good comparison.

Leave a Reply

Panorama Theme by Themocracy