Spark vs Hadoop – Updated Version

By , October 12, 2014 9:42 am

Databricks recently posted a blog entry Spark Breaks Previous Large-Scale Sort Record, with some claimed breakthroughs in large scale sorting. This blog post said:

The previous world record was 72 minutes, set by Yahoo using a Hadoop MapReduce cluster of 2100 nodes. Using Spark on 206 EC2 nodes, we completed the benchmark in 23 minutes. This means that Spark sorted the same data 3X faster using 10X fewer machines. All the sorting took place on disk (HDFS), without using Spark’s in-memory cache.

The blog post by Databricks did not list all the configurations about the computing nodes. I spent some time reading through their test environment, as well as the test environment used by the Yahoo team. Then I come up with the following table with details information about each tests:

Hadoop Spark
Single Node Configuration
System Dell R720xd AWS EC2 i2.8xlarge
CPU Intel Xeon E5-2630 Intel Xeon E5-2670 v2
Total CPU Cores 12 (2 Phyiscal CPUs) 32 (vCPU)
Memory 64 GB 244 GB
Stroage 12 x 3 TB SATA 8 x 800 GB SSD
Single Disk Random Read IOPS (4 KB blocks) ~50 (measured) N/A
RAID0 Random Write IOPS (4 KB blocks) 600 (max) 365,000 (minimum)
Single Disk Random Write IOPS (4 KB blocks) ~110 (measured) N/A
RAID0 Random Write IOPS (4 KB blocks) 1320(max) 315,000 (minimum)
Single Disk Sequential Read Throughput (128 KB blocks) 120 MB/s (measured) 400 MB/s (measured)
Single Disk Sequential Write Throughput (128 KB blocks) 120 MB/s (measured) 400 MB/s (measured)
RAID0 Sequential Read Throughput (128 KB blocks) 1,440 MB/s (estimated max) 3,200 MB/s (measured)
RAID0 Sequential Write Throughput (128 KB blocks) 1,440 MB/s (estimated max) 2,200 MB/s (measured)
Networking 10 Gbps 10 Gbps
Cluster Configuration
Number of Nodes 2100 206
Number of CPU Cores 25200 6592
Total Memory 134,400 GB 50,264 GB
Total Random Read IOPS (4 KB blocks) 1,260,000 (max) 75,190,000 (minimum)
Total Random Write IOPS (4 KB blocks) 2,772,000 (max) 64,890,000 (minimum)
Total Sequential Read Throughput (128 KB blocks) 3,024,000 MB/s (estimated max) 659,200 MB/s (estimated)
Total Sequential Write Throughput (128 KB blocks) 3,024,000 MB/s (estimated max) 453,200 MB/s (estimated)
100 TB Sorting Results
Time 72 minutes 23 minutes

It should be noted that the reference performance data for a 3 TB SATA drive is Seagate Barracuda XT 3TB. The reference performance data for a 800 GB SSD driver is taken the AWS documentation for the i2.8xlarge instance for the IOPS part, while the single disk and RAID0 throughput is the result of a benchmark using my own AWS account. (After reading the comments posted by rxin, I realized that I did mis-estimate the sequential IO performance for the i2.8xlarge instance, so I did some testing to get the new data and updated this post.)

As we all know, large scale sorting is a typical IO intensive application. Because we do not have enough memory to hold all the data to be sorted, the sorting has to be done in multiple batches. In each batch, data is read from the disk for processing, and the result is written back to the disk. If we observe the CPU utilization during the sorting process, we will see that most of the time the CPU is waiting for IO. In the above-mentioned configurations, the Spark cluster has great advantage over the Hadoop cluster in terms of random read (IOPS, 60X), random write (IOPS, 24X). However, the Spark cluster is not comparable with the Hadoop cluster in terms of sequential read (throughput, 0.22X), as well as sequential write (throughput, 0.15X). The performance of large scale sorting depends more on sequential IO than on random IO. Although the Spark cluster uses SSD disk and the Hadoop cluster uses SATA disks, the Spark cluster is not in an advantageous position in terms of storage. (I made a mistake in my original post by using the SSD performance data for Intel SSD 910 Series, resulting in significant over-estimation of the sequential IO performance of the AWS i2.8xlarge instance. This mistake has been corrected in this updated version.)

In terms of CPU cores, the Spark cluster has 1/4 of the CPU cores as compared with the Hadoop cluster. However, the Spark cluster uses Intel E5-2670 v2 and the Hadoop cluster uses Intel E5-2630. I got the following CPU benchmark data from www.cpubenchmark.net. The per-core performance of the CPU being used in both clusters are comparable to each other. (In large scale sorting, the pressure on CPU is not as heavy as the pressure on IO.)

CPU Comparison
Hadoop Spark
CPU Intel Xeon E5-2630 Intel Xeon E5-2670 v2
Passmark 14033  22134
Number of Cores per CPU 6  10
Passmark per Core 2338  2213

In terms of memory, the Spark cluster has 1/3 of the memory as compared with the Hadoop cluster.

So, after revisiting this topic, I tend to believe that Spark does perform much better than Hadoop. (I apologize to the Databricks team for not carrying out a careful analysis when I posted the original message. Thank you rxin for pointing out my mistakes.)

格萨尔王

By , October 2, 2014 6:41 pm

King_Gesaer

 

阿来2009年的书。依然是一部好书,但是总是感觉不如《尘埃落定》那么痛快。

If you don’t like this government, it won’t last forever.

By , September 19, 2014 6:01 pm

JS46409703

 

“If you don’t like me, I wont’ be here forever. If you don’t like this government, it won’t last forever.”

This is the best sentence I’ve heard regarding country, government, politicians, and people. It is so simple, and so true.

http://www.dailyrecord.co.uk/news/politics/if-you-dont-like-me-4264479

我们仨

By , September 18, 2014 6:37 pm

womensa

 

读过无数关于生离死别的文字,从来没有杨绛先生的这一篇如此令人感动。梦中飞去探望病重的女儿,医院里与钱钟书先生相送一程又一程。杨绛先生的文字朴实而含蓄,可是那种浓重的忧郁、孤独、悲哀与无助跃然纸上。虽然早就有了心理准备,可是读到船夫和船娘拿着船上的东西离开时,泪水还是一下子就掉了下来。

跟咪咪说,一定会好好照顾自己。等将来有一天我们都老了,一定要让我送她,不要让她送我。

Pegasus / Montage workflow on Amazon Web Services

By , September 16, 2014 8:06 pm

I took some notes while going through Mats Rynge’s tutorial on “Pegasus / Montage workflow on Amazon Web Services“. The tutorial if officially available at the following URL, but the content is far from complete. I managed to finished this tutorial, and thought that my experience might be valuable for someone else out there in the dark.

https://confluence.pegasus.isi.edu/display/pegasus/2013+-+Montage+workflow+using+Amazon+Web+Services

Step 1. Launch an EC2 instance

In the Oregon region, launch an EC2 instance with AMI ami-f4e47cc4. It is recommend that you use the same security group for all EC2 instance to be launched in the Condor cluster. Also, in the security group, all communication between all EC2 instances using the same security group.

SSH into the instance using the following command:

ssh -i keypair.pem montage@IP
ssh-keygen
cd .ssh
cat id_rsa.pub >> authorized_keys
cd ~

Step 2. Configure Condor

Edit /etc/condor/config.d/20_security.conf,update ALLOW_WRITE and ALLOW_READ to the IP range of your VPC. For example, if the IP range of your VPC is 172.31.0.0/16, then you can set “ALLOW_WRITE = 172.31.*” and “ALLOW_READ = 172.31.*”. Then you need to restart Condor with the following command:

sudo service condor restart

Step 3. Create a Montage workflow

mkdir workfow
cd workflow
mDAG 2mass j M17 0.5 0.5 0.0002777778 . file://$PWD file://$PWD/inputs
generate-montage-replica-catalog 

Step 4. Pegasus Related

cd ~/etc
cp ../workflow/replica.catalog .
cp ../workflow/dag.xml .

Step 5. Update site.xml with the following content

<?xml version="1.0" encoding="UTF-8"?>
<sitecatalog xmlns="http://pegasus.isi.edu/schema/sitecatalog" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pegasus.isi.edu/schema/sitecatalog http://pegasus.isi.edu/schema/sc-4.0.xsd" version="4.0">
    <site  handle="local" arch="x86" os="LINUX">
        <directory type="shared-scratch" path="/home/montage/scratch">
            <file-server operation="all" url="file:///home/montage/scratch"/>
        </directory>
        <directory type="local-storage" path="/var/www/html/outputs">
            <file-server operation="all" url="file:///var/www/html/outputs"/>
        </directory>
        <profile namespace="env" key="SSH_PRIVATE_KEY">/home/montage/.ssh/id_rsa</profile>
    </site>
    <site  handle="condor_pool" arch="x86_64" os="LINUX">
        <directory type="shared-scratch" path="/home/montage/scratch">
            <file-server operation="all" url="scp://127.0.0.1/home/montage/scratch"/>
        </directory>
        <profile namespace="pegasus" key="style" >condor</profile>
        <profile namespace="condor" key="universe" >vanilla</profile>
        <profile namespace="env" key="MONTAGE_HOME" >/opt/montage/v3.3</profile>
    </site>
</sitecatalog>

Step 6. Plan the workflow

pegasus-plan --conf pegasus.conf --dax dag.xml

Step 7. Run the workflow

pegasus-run  /home/montage/etc/montage/pegasus/montage/run0001

Step 8. Monitor the workflow

pegasus-status -l /home/montage/etc/montage/pegasus/montage/run0001
condor_status
condor_q
tail montage/pegasus/montage/run0001/jobstate.log

The last command is very nice in that you can see what is currently being run. Please note that you need to replace the path with the actual path given to you by Pegasus.

That’s all. After spending weeks searching over the Internet, everything now seems to be simple.

Panorama Theme by Themocracy