Some Data on Aliyun's New TeraSort Results
发表时间:2015-10-29 06:14:36Recently Aliyun announced a break through in the TeraSort benchmark. They finish sorting 100 TB of data in 377 seconds. This is significantly faster than the previous world record 23 minutes created by Spark in 2014. Out of curiosity I compile some data regarding the clusters being used by Yahoo (2013), Spark (2014) and Aliyun (2015) to see what improvements are being made.
| Vendor | Yahoo | Spark | Aliyun |
| Year | 2013 | 2014 | 2015 |
| Data Source | sortbenchmark.org | Spark | sortbennchmark.org |
| Single Node Configuration | |||
| System | Dell R720xd | AWS EC2 i2.8xlarge | Unknown |
| CPU | Intel Xeon E5-2630 | Intel Xeon E5-2670 v2 | Intel Xeon E5-2630
Intel Xeon E5-2650v2 |
| Total CPU Cores | 12 (2 Phyiscal CPUs) | 32 (vCPU) | 12 or 16 (2 Physical CPUs) |
| Memory | 64 GB | 244 GB | 96 GB or 128 GB |
| Stroage | 12 x 3 TB SATA | 8 x 800 GB SSD | 12 x 2 TB SATA |
| Single Disk Sequential Read Throughput (128 KB blocks) | 120 MB/s | 400 MB/s | 120 MB/s |
| Single Disk Sequential Write Throughput (128 KB blocks) | 120 MB/s | 400 MB/s | 120 MB/s |
| RAID0 Sequential Read Throughput (128 KB blocks) | 1,440 MB/s | 3,200 MB/s | 1,440 MB/s |
| RAID0 Sequential Write Throughput (128 KB blocks) | 1,440 MB/s | 3,200 MB/s | 1,440 MB/s |
| Networking | 10 Gbps | 10 Gbps | 10 Gbps |
| Cluster Configuration | |||
| Number of Nodes | 2100 | 206 | 3377 |
| Number of CPU Cores | 25,200 | 6,592 | 41,496 |
| Total Memory | 134,400 GB | 50,264 GB | 331,968 GB |
| Total Sequential Read Throughput (128 KB blocks) | 3,024,000 MB/s | 659,200 MB/s | 4,862,880 MB/s |
| Total Sequential Write Throughput (128 KB blocks) | 3,024,000 MB/s | 659,200 MB/s | 4,862,880 MB/s |
| 100 TB Sorting Results | |||
| Time | 72 minutes | 23 minutes | 377 seconds |
The Aliyun cluster has 331,968 GB memory in total, which is significantly greater than the size of the data to be sorted. This allows the data to be sorted to reside in memory completely, avoiding performance impact from disk I/O. In fact, in their report Aliyun described a "I/O dual buffering" technique, which allows data processing and disk I/O to be done in parallel. The report pointed out that "we ensure data are not buffered in OS page cache by running a data purge job that randomly reads from local file system before each benchmark run". However, data can be loaded into memory quickly at the beginning of the benchmark because the cluster has sufficient I/O capacity to achieve this in around 20 seconds. The "Overlapped Execution" section in the report implies that the abundance of memory might be playing a much greater role than the "I/O dual buffering" technique. This is very different from the Spark cluster with only 50,264 GB memory, where extensive disk I/O must occur as part of the sorting benchmark.
Based on the above-listed data, it is quite convincing that Aliyun's solution is better than Yahoo's solution, considering the obvious performance advantages. However, it is very hard to say that Aliyun's solution is better than Spark's solution, considering the obvious resource advantages (memory in particular).
Another important aspect is that Spark's solution was deployed on top of Amazon EC2. This means that such very-large-scale computation can be done with an extremely low cost - researchers only need to pay for the actual amount of computing resource being used for the amount of time they are using it. In Aliyun's case, the cluster was a set of fixed asset for Aliyun. Considering the fact that Aliyun also considers itself as a public cloud service provider, is it possible for them to run this benchmark on their public cloud offerings?
| 上一篇 | 下一篇 |
| 姓名: | |
| 评论: | |
|
请输入下面这首诗词的作者姓名。 渭城朝雨浥轻尘,客舍青青柳色新。 |
|
| 答案: | |
云与清风常拥有,
冰雪知音世难求。
击节纵歌相对笑,
案上诗书杯中酒。
2020年12月31日
洛杉矶
最新评论
2025-11-15 06:29:51
qyjohn 评论了 《青桔》
2025-11-14 01:55:04
wells 评论了 《青桔》
2025-10-24 18:04:06
Ted Wang 评论了 《关于作者(About Me)》
2025-03-27 08:28:09
Max 评论了 《父亲走了》
2025-03-08 04:13:52
W 评论了 《我不想颂扬普京大帝》
2024-07-22 16:02:03
云思 评论了 《无题》
2024-06-27 10:01:54
yiming 评论了 《致访客(Welcome)》
2024-05-06 08:50:06
qyjohn 评论了 《致访客(Welcome)》
2024-05-03 19:49:00
jimmie 评论了 《致访客(Welcome)》
2024-02-19 03:32:57
林小静 评论了 《埋剑渡》
2024-01-05 03:54:28
路过者 评论了 《2023年度盘点》
2024-01-04 10:22:05
qyjohn 评论了 《致访客(Welcome)》
2024-01-04 08:05:13
qyjohn 评论了 《2023年度盘点》
2023-12-17 15:18:45
qyjohn 评论了 《乱记》
2023-12-15 00:43:24
wells 评论了 《乱记》
Nice article and good comparison.