Category: 技术相关

CUDA 8 on Amazon Linux 2017.03.1 HVM

By , 2017年8月16日 8:06 上午

I was able to install CUDA 8 on the EC2 instance with the following steps. It should be noted that the EC2 instance was created with a root EBS volume of 100 GB to avoid running into storage space issues.

#
# STEP 1: Install Nvidia Driver
# 384.66 is a version that has support for K80
#
cd ~
sudo yum install -y gcc kernel-devel-`uname -r`
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/384.66/NVIDIA-Linux-x86_64-384.66.run
sudo /bin/bash ./NVIDIA-Linux-x86_64-384.66.run
nvidia-smi

#
# STEP 2: Install CUDA Repo
#
wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-rhel6-8-0-local-ga2-8.0.61-1.x86_64-rpm
sudo rpm -i cuda-repo-rhel6-8-0-local-ga2-8.0.61-1.x86_64-rpm

#
# STEP 3: Install CUDA Toolkit
#
sudo yum install cuda-toolkit-8-0
export PATH=$PATH:/usr/local/cuda-8.0/bin
nvcc –version

#
# STEP 4: Compile a sample program (deviceQuery) to use CUDA
#
cd /usr/local/cuda-8.0
sudo chown -R ec2-user:ec2-user samples
cd samples/1_Utilities/deviceQuery
make
./deviceQuery

At this point everything should be all set. I have also compiled and tested some other sample code from the samples folder and they all seemed to work.

A quick example on cuBLAS can be obtained from http://docs.nvidia.com/cuda/cublas/ . Simply copy Example 1 or Example 2 from this web page and save it as test.c, then compile and run the code with the following commands. I have tested both of them and verified them to be working.

#
# STEP 5: Compile and test cuBLAS code
#
nvcc test.c -lcublas -o test
./test

CUDA 8 on EMR with g2.2xlarge instance type

By , 2017年2月10日 12:15 下午

Below is a quick recap of the steps to make CUDA 8 working on an single-node EMR cluster with the g2.2xlarge instance type. The challenges here being (1) finding a particular version of the Nvidia driver that would work with CUDA 8, and (2) installing the Nvidia driver and the CUDA Toolkit when there is only very limited disk space on /dev/xvda1.

#
# STEP 1: Install Nvidia Driver
# 367.57 is a version that has been verified to be working with CUDA 8
#
sudo yum install -y gcc kernel-devel-`uname -r`
cd /mnt
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/367.57/NVIDIA-Linux-x86_64-367.57.run
sudo /bin/bash ./NVIDIA-Linux-x86_64-367.57.run
nvidia-smi

#
# STEP 2: Install CUDA Repo
# Since we have limited disk space on /dev/xvda1, we use a symbolic link as a workaround
#
cd /mnt
wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-rhel6-8-0-local-ga2-8.0.61-1.x86_64-rpm
mkdir -p /mnt/cuda-repo-8-0-local-ga2
sudo ln -s /mnt/cuda-repo-8-0-local-ga2 /var/cuda-repo-8-0-local-ga2
sudo rpm -i cuda-repo-rhel6-8-0-local-ga2-8.0.61-1.x86_64-rpm

#
# STEP 3: Install CUDA Toolkit
# Since we have limited disk space on /dev/xvda1, we use a symbolic link as a workaround
#
cd /mnt
mkdir -p /mnt/cuda-8.0
sudo ln -s /mnt/cuda-8.0 /usr/local/cuda-8.0
sudo yum install cuda-toolkit-8-0
export PATH=$PATH:/usr/local/cuda-8.0/bin
nvcc –version

#
# STEP 4: Compile a sample program (deviceQuery) to use CUDA
#
cd /usr/local/cuda-8.0
sudo chown -R hadoop:hadoop samples
cd samples/1_Utilities/deviceQuery
make
./deviceQuery

At this point everything should be all set. I have also compiled and tested some other sample code from the samples folder and they all seemed to work.

A quick example on cuBLAS can be obtained from http://docs.nvidia.com/cuda/cublas/ . Simply copy Example 1 or Example 2 from this web page and save it as test.c, then compile and run the code with the following commands. I have tested both of them and verified them to be working.

#
# STEP 5: Compile and test cuBLAS code
#
nvcc test.c -lcublas -o test
./test

Emotion Coaching for Angry Customers

By , 2016年9月14日 10:46 上午

Slide1

This is a set of slides that I prepared on the topic of “Emotion Coaching for Angry Customers” for customer facing roles. I am making them publicly available so that more people can benefit from this work. You are more than welcome to use them to provide training to employees in your own organisation, provided that you preserve the original author information. If you need the original PowerPoint files, please drop me a note and I would be more than glad to provide them.

Slide1

When you work in a customer support role, it is inevitable that you will encounter angry customers from time to time. Is the scenario shown on the screen sound familiar to you? We ask the customer a simple question, and the customer shout at us over the phone. No matter what we say, they just won’t listen. It is so hard to talk to a customer that is angry, and we (almost always) try to avoid this type of customer as much as possible.

However, angry customers are like those lovely 1-stars. As we all know, no matter how hard you work or how good you are, 1-stars will come, and they come more than once.

If you worry about talking to angry customers, or you have been intimidated by angry customers, this training is for you.

In this training, we will talk about the theories behind anger, as well as techniques to deal with anger. First of all, we will need to understand how the brain works.

Slide1

So, how the brain works?

Most of us have been exposed to the left-brain and right brain theory to a certain degree. It is commonly believed that our left brain is responsible for rigorous reasoning such as science and mathematics (which is what we do), and our right brain is responsible for creativity such as art and entertainment (which is what we don’t do, at least during business hours). When the people we talk to stop reasoning, we tend to say that “your left brain has nothing right, and your right brain has nothing left”.

But this does not explain why people get angry.

Another theory divides our brain into four major parts the Cerebrum Cortex (or the Cortex), the Cerebellum, the Limbic System (or the Limbic), and the Brain Stem. The cortex is the largest part of the human brain, which is associated with higher brain functions such as thought and action. The cerebellum is also called the little brain, which is associated with regulation and coordination of movement, posture, and balance. The limbic system is often referred to as the “emotion brain”, which is responsible for human emotions. The brain stem is responsible for basic vital life functions such as breathing, heartbeat, and blood pressure.

As we can see, different parts of our brain are responsible for different functions. The cortex for reasoning, the cerebellum for movements, the limbic for emotions, and the brain stem for life. It should be noted that the limbic system develops in an early stage during brain development, while the cortex develops much later. Therefore we also call the limbic system the old brain and the cortex the new brain.

You can think of the brain as a computer with a single CPU core. The cortex system and the limbic system are two separate processes demanding CPU resources. In certain circumstances, a particular process gets executed with higher priority, reducing the amount of computing resources that is available to the other process. When this happens, the other process stops responding to external inputs. Under certain conditions, the old brain takes over and the new brain is shut down. At this point a person is taken over by his/her emotions, and loses his/her ability to reason. If you try to reason with him/her during this period, the conversation will be very difficult because you are talking to the wrong part of the brain.

So, do not spend time and energy talking to the wrong part of the brain.

Then the question becomes, why would a person lose his/her ability to reason?

Slide1

To answer this question, we need to understand how our body respond to danger. Assuming that you are hiking in the mountain, and suddenly a huge snake appear in front of you. Different people will respond to the snake differently, but all our responses can be categorized as flight (running away), fight (ha! ho!), freeze (petrified, can’t move at all), and faint (ah ou). These coping mechanisms were developed over the evolution process, and have become the fundamental survival function of all animals.

When being confronted with a danger, we act out of instinct instead of reasoning. The limbic system takes over to cop with the danger. The cortex shuts down to keep you survive. If you want to study what snake that is, how big it is, whether it is a native or a foreign species, you do that only when you are out of the danger, not when you are in the danger.

Now assuming that our customer is running a mission critical application on our platform. Suddenly their application stops working. In each and every minute, our customer is losing user, losing customers, facing critics, while the competitors are catching up. Our customer is in a real danger, and the coping mechanism is in action.

Now, the limbic system takes over, while the cortex shuts down. If you try to reason at this point, you are talking to the wrong part of the brain.

If we have to express this in technical terms, Mr. Customer’s brain is now experiencing a kernel panic, which is caused by a stack-overflow in the limbic system.

Slide1

In such circumstance, it is very important to understand that the customer is not targeting you as a support engineer. No matter what the customer says, you need to keep calm, and don’t take it personal.

Let’s repeat three times – don’t take it personal, don’t take it personal, don’t take it personal. If there is anything I want you to take away from this training, it is “don’t take it personal”.

When the customer has lost the ability to reason, we need to be the customer’s cortex!

But how? And how long will the customer regain the ability to reason?

Slide1

To answer this question, we need to understand the difference between primary emotions and secondary emotions.

Primary emotions are those that we feel first, as a first response to a trigger. For example, we feel fear when we are threatened, we feel sadness when we hear of a death. These are the instinctive responses that we have without going through the thinking process.

Secondary emotions, on the other hand, appear after primary emotions. They usually come from a complex chain of thinking process. More importantly, secondary emotion arises when the primary emotion is overwhelming, makes us uncomfortable or vulnerable. For example, when we are threatened by somebody, we feel fear. However, the feeling of fear makes me uncomfortable, makes me feel that I am a coward. Since I don’t won’t to be seen as a coward, I feel anger. Another example is that I ask my manager for a raise but my manager refuses it. I feel frustrated, but I am not able to change anything. The feeling of frustration makes me uncomfortable, but I don’t want to be uncomfortable. Then I might become angry, or numb, or shut down.

Primary emotions are the result of instinct and do not consume much computing power. Secondary emotions are the result of very complex reasoning processes, which demand a lot of computing power. Therefore, when we experience secondary emotions, the emotion part of the brain takes control (gets higher priority), while the reasoning part of the brain shuts down (gets lower priority). Reasoning becomes difficult because we are talking to the wrong part of the brain.

When we experience primary emotions, we seek connections, and we pull others towards us. When we experience secondary emotions, we attack and criticize others, and we push others away.

Slide1

Now we understand that anger is a secondary emotion. The underlying primary emotion for anger is usually fear or sadness, which makes one feel uncomfortable or vulnerable.

Again, let’s assume that our customer is running a mission critical application on our platform. Suddenly their application stops working. In each and every minute, our customer is losing user, losing customers, facing critics, while the competitors are catching up. Our customer is in a real danger, and the coping mechanism begins to take action.

Now, our customer feels frustrated that his mission critical application is down. He feels fear about the consequence – his boss might shout at him, he might receive a lot of complains from his team members, in the worst case he might lose his job. The frustration and fear he is experiencing make him feel uncomfortable and vulnerable. When such feeling becomes overwhelming, he feels a real danger approaching, and his brain automatically switches to “Flight and Fight” mode. The emotion part of the brain takes control, and the reasoning part of the brain shuts down. As a result, he becomes angry and begins to blame and criticize.

As we just said, anger is a secondary emotion. It pushes others away, and makes the communication difficult. In this case, it is time for us to provide some emotion coaching to our customer.

Slide1

The purpose of emotion coaching is to reactivate the customer’s cortex (the reasoning part of the brain) so that he/she can start reasoning again. We do this by leading the customer from his secondary emotions back to his primary emotions. As we discussed just now, when people experiences primary emotion, he seeks connections and is open for help.

We may think that this is a very complicate process. In fact, there are only several simple steps that we need to follow.

And, it does not take long for an adult to calm down if the appropriate steps are taken.

Slide9

Listen to the customer patiently, and wait for the customer to stop talking. During this process, you can use “yeah… ah… right…” as simple acknowledgements. However, do not make any comments. You might think that the customer wants the problem to be solved as soon as possible. This is the wrong perception. For the customer, at this point his/her primary need is TO BE HEARD.

After the customer slows down / finishes talking, the first step is to name the customer’s feelings, using the names of primary emotions. For example, we can say “I can see that you are very frustrated / sad / disappointed when XYZ happens”. When we name the primary emotions, we guide the customer back to his primary emotions. Do not point out that the customer is angry, or tell the customer to calm down. This would make the customer feel shameful, which is uncomfortable for the customer. When the customer does not want to feel shameful, it is very likely that he would choose to convert this uncomfortable feeling into secondary emotions, which is going to be anger.

The second step is to validate the customer’s feelings. When a customer experiences a thread, or a lost, he has all the rights to feel sad, or disappointed, or frustrated. There is nothing wrong with such feelings, and we need to allow our customer to fully experience and express such feelings. By allowing our customer to experience and express such feelings, the customer feels that he is being listened to. Such practice builds trust, and brings intimacy between us and the customer. It opens the door for future communications.

Do not worry about the time you spend on naming and validating the customer’s feelings. It does not take long for an adult to calm down if the appropriate steps are taken. Therefore, continue to stay with the customer’s feelings when needed. You may want to repeat the previous steps when necessary. It is very unlikely that a customer would refuse your empathy.

At some point, the customer will sooth himself and calm down. The reasoning part of his brain comes back and take control. At this point, it is time to teach our customer some general relativity, quantum mechanics and wavelet theory to resolve whatever issue he has.

Slide1

When communicating with the customer, use the “I” statement as much as possible. When you use an “I” statement, you take responsibility, and avoid criticizing the customer.

In the case that a customer made a mistake, avoid using “you” or “your” in your statement. For example, “user ABC did something” is a better wording are compared to “Your user ABC did something”.

When we did something that caused the issue, you can use “your resource” to take responsibility and acknowledge the customer’s lost. For example, “the underlying hardware running your virtual machine instance failed”.

Slide1

Getting Started with AWS SDK for Java (4)

By , 2016年7月29日 7:46 上午

The following is an example of using the AWS SimpleDB service along with AWS KMS. Since SimpleDB does not natively integrates with KMS, we will have to encrypt the data before storing it to SimpleDB, and decrypt the data after retrieving it from SimpleDB.


import java.nio.*;
import java.util.*;
import java.nio.charset.*;

import com.amazonaws.regions.*;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.simpledb.*;
import com.amazonaws.services.simpledb.model.*;
import com.amazonaws.services.kms.*;
import com.amazonaws.services.kms.model.*;


public class SDB
{

	public AmazonSimpleDBClient client;
	public AWSKMSClient kms;

	public String keyId = "arn:aws:kms:ap-southeast-2:[aws-account-id]:key/[aws-kms-key-very-long-id-ere]";
	public static Charset charset = Charset.forName("ASCII");
	public static CharsetEncoder encoder = charset.newEncoder();
	public static CharsetDecoder decoder = charset.newDecoder();

	public SDB()
	{
		client = new AmazonSimpleDBClient();
		client.configureRegion(Regions.AP_SOUTHEAST_2);

		kms = new AWSKMSClient();
		kms.configureRegion(Regions.AP_SOUTHEAST_2);

	}


	public void createDomain(String domain)
	{
		try
		{
			CreateDomainRequest request = new CreateDomainRequest(domain);
			client.createDomain(request);
		} catch (Exception e)
		{
			System.out.println(e.getMessage());
			e.printStackTrace();
		}
	}

	public void deleteAttribute(String domain, String item)
	{
		try
		{
			DeleteAttributesRequest request = new DeleteAttributesRequest(domain, item);
			client.deleteAttributes(request);
		} catch (Exception e)
		{
			System.out.println(e.getMessage());
			e.printStackTrace();
		}
	}

	public void putAttribute(String domain, String item, String name, String value)
	{
		try
		{
			ReplaceableAttribute attribute = new ReplaceableAttribute(name, value, true);
			List list = new ArrayList();
			list.add(attribute);

			PutAttributesRequest request = new PutAttributesRequest(domain, item, list);
			client.putAttributes(request);

		} catch (Exception e)
		{
			System.out.println(e.getMessage());
			e.printStackTrace();
		}
	}

	public String getAttribute(String domain, String item, String name)
	{
		String value = "Empty Result";
		try
		{
			GetAttributesRequest request = new GetAttributesRequest(domain, item);
			GetAttributesResult result = client.getAttributes(request);
			List list = result.getAttributes();
			for (Attribute attribute : list)
			{
				if (attribute.getName().equals(name))
				{
					return attribute.getValue();
				}
			}

		} catch (Exception e)
		{
			System.out.println(e.getMessage());
			e.printStackTrace();
		}
		return value;
	}

	public String encrypt(String message)
	{
		String result = "Encryption Error.";
		try
		{
			ByteBuffer plainText = encoder.encode(CharBuffer.wrap(message));
			EncryptRequest req = new EncryptRequest().withKeyId(keyId).withPlaintext(plainText);
			ByteBuffer cipherText = kms.encrypt(req).getCiphertextBlob();
			byte[] bytes = new byte[cipherText.remaining()];
			cipherText.get(bytes);
			result =  Base64.getEncoder().encodeToString(bytes);

			System.out.println("\nEncryption:");
			System.out.println("Original Text: " + message);
			System.out.println("Encrypted Text: " + result);
		} catch (Exception e)
		{
			System.out.println(e.getMessage());
			e.printStackTrace();
		}
		return result;
	}

	public String decrypt(String message)
	{
		String result = "Decryption Error.";
		try
		{
			byte[] encryptedBytes = Base64.getDecoder().decode(message);
			ByteBuffer ciphertextBlob = ByteBuffer.wrap(encryptedBytes);
			DecryptRequest req = new DecryptRequest().withCiphertextBlob(ciphertextBlob);
			ByteBuffer plainText = kms.decrypt(req).getPlaintext();
			result = decoder.decode(plainText).toString();

			System.out.println("\nDecryption:");
			System.out.println("Original Text: " + message);
			System.out.println("Encrypted Text: " + result);
		} catch (Exception e)
		{
			System.out.println(e.getMessage());
			e.printStackTrace();
		}
		return result;
	}

	public static void main(String[] args) 
	{
		String domainName = "demo-domain";    
		String itemName   = "demo-item";
		String attributeName    = "test-attribute";
		String attributeValue = "This is the information to be stored in SimpleDB.";

		SDB test = new SDB();
		String value = test.encrypt(attributeValue);
		test.putAttribute(domainName, itemName, attributeName, value);

		try
		{
			Thread.sleep(3000);	// Sleep for some time to make sure we can get the result
		} catch (Exception e) {}

		value = test.getAttribute(domainName, itemName, attributeName);
		test.decrypt(value);
	}


}

Getting Started with AWS SDK for Java (3)

By , 2016年2月13日 10:40 上午

This is the 3rd part of my tutorial on “Getting Started with AWS SDK for Java”. If you have not already do so, I suggest that you first take a look at the first chapter of this set of training “Getting Started with AWS SDK for Java (1)” to properly set up your development environment. In this part, we will cover the basic concepts related to the DataPipelineClient. Through this example, you will be able to create and activate a simple pipeline with a ShellCommandActivity running on an Ec2Resource.

Before you get started with this demo, you should get yourself familiar with what Data Pipeline is. In particular, the following AWS documentation on “Data Pipeline Concepts” is very helpful.

http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-concepts.html

First of all we create an instance of the DataPipelineClient in the constructor, then set the region to ap-southeast-2. For debugging purposes, we enable logging using log4j.

public class DemoDataPipeline
{
	static DataPipelineClient client;
	final static Logger logger = Logger.getLogger(DemoDataPipeline.class);

	/**
	 *
	 * Constructor
	 *
	 */

	public DemoDataPipeline()
	{
		// Create the DataPipelineClient
		client = new DataPipelineClient();
		// Set the region to ap-southeast-2
		client.configureRegion(Regions.AP_SOUTHEAST_2);
	}

We use the createPipeline() method in DataPipelineClient to create a new pipeline. This methods takes a CreatePipelineRequest as the parameter, which requires a name and a unique id for the pipeline to be created. Here we use the java.util.UUID utility to generate a unique id for the pipeline. This creates an empty pipeline for us.

	public void createPipeline() throws Exception
	{
		System.out.println("CREATE PIPELINE.");
		
		CreatePipelineRequest request = new CreatePipelineRequest();
		request.setName("Java SDK Demo");
		String uuid = UUID.randomUUID().toString();
		request.setUniqueId(uuid);
		client.createPipeline(request);
	}

We can use the listPipelines() method in DataPipelineClient to get a list of the pipelines. This returns a ListPipelinesResult, which includes a list of PipelineIdName objects. We traverse through this list to obtain the id and name of all the pipelines.

	public void listPipeline() throws Exception
	{
		System.out.println("LIST PIPELINE.");
		
		ListPipelinesResult result = client.listPipelines();
		List list = result.getPipelineIdList();
		for (PipelineIdName pipeline : list)
		{
			System.out.println(pipeline.getId() + "\t- " + pipeline.getName());
		}
	}

Now we have the id of the newly created pipeline. In the AWS SDK for Java, pipeline components specifying the data sources, activities, schedule, and preconditions of the workflow are represented in PipelineObject. The following code defines a Default object, a Schedule object, an Ec2Resource object, and a ShellCommandActivity object. A PipelineObject is a collection of key-value fields. For example, the following JSON string defines an Ec2Resource in a VPC:

{
“id” : “MyEC2Resource”,
“type” : “Ec2Resource”,
“actionOnTaskFailure” : “terminate”,
“actionOnResourceFailure” : “retryAll”,
“maximumRetries” : “1”,
“instanceType” : “m1.medium”,
“securityGroupIds” : [
“sg-12345678”,
“sg-12345678”
],
“subnetId”: “subnet-12345678”,
“associatePublicIpAddress”: “true”,
“keyPair” : “my-key-pair”
}

When the value of a key is another pipeline object, we use Field().withKey(“field_name”).withRefValue(“object_id”) to represent the key-value pair. Otherwise, we use Field().withKey(“field_name”).withStringValue(“field_value”) to represent the key-value pair. Please refer to the part of ShellCommandActivity in the following code for details.

	public void definePipeline(String id) throws Exception
	{
		System.out.println("Define PIPELINE.");

		// Definition of the default object
		Field defaultType = new Field().withKey("scheduleType").withStringValue("CRON");
		Field defaultScheduleType = new Field().withKey("schedule").withRefValue("RunOnceSchedule");
		Field defaultFailureAndRerunMode = new Field().withKey("failureAndRerunMode").withStringValue("CASCADE");
		Field defaultRole = new Field().withKey("role").withStringValue("DataPipelineDefaultRole");
		Field defaultResourceRole = new Field().withKey("resourceRole").withStringValue("DataPipelineDefaultResourceRole");
		Field defaultLogUri = new Field().withKey("pipelineLogUri").withStringValue("s3://331982-syd/java-dp-log");
		List defaultFieldList = Lists.newArrayList(defaultType, defaultScheduleType, defaultFailureAndRerunMode, defaultRole, defaultResourceRole, defaultLogUri);
		PipelineObject defaultObject = new PipelineObject().withName("Default").withId("Default").withFields(defaultFieldList);

		// Definition of the pipeline schedule
		Field scheduleType = new Field().withKey("type").withStringValue("Schedule");
		Field scheduleStartAt = new Field().withKey("startAt").withStringValue("FIRST_ACTIVATION_DATE_TIME");
		Field schedulePeriod = new Field().withKey("period").withStringValue("1 day");
		Field scheduleOccurrences = new Field().withKey("occurrences").withStringValue("1");
		List scheduleFieldList = Lists.newArrayList(scheduleType, scheduleStartAt, schedulePeriod, scheduleOccurrences);
		PipelineObject schedule = new PipelineObject().withName("RunOnceSchedule").withId("RunOnceSchedule").withFields(scheduleFieldList);

		// Definition of the Ec2Resource
		Field ec2Type = new Field().withKey("type").withStringValue("Ec2Resource");
		Field ec2TerminateAfter = new Field().withKey("terminateAfter").withStringValue("15 minutes");
		List ec2FieldList = Lists.newArrayList(ec2Type, ec2TerminateAfter);
		PipelineObject ec2 = new PipelineObject().withName("Ec2Instance").withId("Ec2Instance").withFields(ec2FieldList);

		// Definition of the ShellCommandActivity
		// The ShellCommandActivity is a command "df -h"
		Field activityType = new Field().withKey("type").withStringValue("ShellCommandActivity");
		Field activityRunsOn = new Field().withKey("runsOn").withRefValue("Ec2Instance");
		Field activityCommand = new Field().withKey("command").withStringValue("df -h");
		Field activityStdout = new Field().withKey("stdout").withStringValue("s3://331982-syd/dp-java-demo-stdout");
		Field activityStderr = new Field().withKey("stderr").withStringValue("s3://331982-syd/dp-java-demo-stderr");
		Field activitySchedule = new Field().withKey("schedule").withRefValue("RunOnceSchedule");
		List activityFieldList = Lists.newArrayList(activityType, activityRunsOn, activityCommand, activityStdout, activityStderr, activitySchedule);
		PipelineObject activity = new PipelineObject().withName("DfCommand").withId("DfCommand").withFields(activityFieldList);

		// setPipelineObjects
		List objects = Lists.newArrayList(defaultObject, schedule, ec2, activity);

		// putPipelineDefinition
		PutPipelineDefinitionRequest request = new PutPipelineDefinitionRequest();
		request.setPipelineId(id);
		request.setPipelineObjects(objects);
		PutPipelineDefinitionResult putPipelineResult = client.putPipelineDefinition(request);

		if (putPipelineResult.isErrored()) 
		{
			logger.error("Error found in pipeline definition: ");
			putPipelineResult.getValidationErrors().stream().forEach(e -> logger.error(e));
		}

		if (putPipelineResult.getValidationWarnings().size() > 0) 
		{
			logger.warn("Warnings found in definition: ");
			putPipelineResult.getValidationWarnings().stream().forEach(e -> logger.warn(e));
		}
	}

Now you can activate the pipeline for execution:

	public void activatePipeline(String id) throws Exception
	{
		System.out.println("ACTIVATE PIPELINE.");	

		ActivatePipelineRequest request = new ActivatePipelineRequest();
		request.setPipelineId(id);
		client.activatePipeline(request);
	}

Then, you can terminate the pipeline:

	public void deletePipeline(String id) throws Exception
	{
		System.out.println("DELETE PIPELINE.");	

		DeletePipelineRequest request = new DeletePipelineRequest();
		request.setPipelineId(id);
		client.deletePipeline(request);
	}

After checking out the demo code from github, you should modify the code to use your own S3 bucket for logging, as well as the stdout and stderr for the ShellCommandActivity. After making these changes, you can run the demo code using the following commands:

$ mvn clean; mvn compile; mvn package
$ java -cp target/demo-1.0-SNAPSHOT.jar:third-party/guava-18.0.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoDataPipeline create
$ java -cp target/demo-1.0-SNAPSHOT.jar:third-party/guava-18.0.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoDataPipeline list
$ java -cp target/demo-1.0-SNAPSHOT.jar:third-party/guava-18.0.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoDataPipeline define df-0098814S3FS9ACXICID  (make sure you change this part using your own pipeline id)
$ java -cp target/demo-1.0-SNAPSHOT.jar:third-party/guava-18.0.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoDataPipeline activate df-0098814S3FS9ACXICID  (make sure you change this part using your own pipeline id)
$ java -cp target/demo-1.0-SNAPSHOT.jar:third-party/guava-18.0.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoDataPipeline delete df-0098814S3FS9ACXICID  (make sure you change this part using your own pipeline id)

Some Data on Aliyun’s New TeraSort Results

By , 2015年10月29日 6:14 上午

Recently Aliyun announced a break through in the TeraSort benchmark. They finish sorting 100 TB of data in 377 seconds. This is significantly faster than the previous world record 23 minutes created by Spark in 2014. Out of curiosity I compile some data regarding the clusters being used by Yahoo (2013), Spark (2014) and Aliyun (2015) to see what improvements are being made.

Vendor Yahoo Spark Aliyun
Year 2013 2014 2015
Data Source sortbenchmark.org Spark sortbennchmark.org
Single Node Configuration
System Dell R720xd AWS EC2 i2.8xlarge Unknown
CPU Intel Xeon E5-2630 Intel Xeon E5-2670 v2 Intel Xeon E5-2630
Intel Xeon E5-2650v2
Total CPU Cores 12 (2 Phyiscal CPUs) 32 (vCPU) 12 or 16 (2 Physical CPUs)
Memory 64 GB 244 GB 96 GB or 128 GB
Stroage 12 x 3 TB SATA 8 x 800 GB SSD 12 x 2 TB SATA
Single Disk Sequential Read Throughput (128 KB blocks) 120 MB/s 400 MB/s 120 MB/s
Single Disk Sequential Write Throughput (128 KB blocks) 120 MB/s 400 MB/s 120 MB/s
RAID0 Sequential Read Throughput (128 KB blocks) 1,440 MB/s 3,200 MB/s 1,440 MB/s
RAID0 Sequential Write Throughput (128 KB blocks) 1,440 MB/s 3,200 MB/s 1,440 MB/s
Networking 10 Gbps 10 Gbps 10 Gbps
Cluster Configuration
Number of Nodes 2100 206 3377
Number of CPU Cores 25,200 6,592 41,496
Total Memory 134,400 GB 50,264 GB 331,968 GB
Total Sequential Read Throughput (128 KB blocks) 3,024,000 MB/s 659,200 MB/s 4,862,880 MB/s
Total Sequential Write Throughput (128 KB blocks) 3,024,000 MB/s 659,200 MB/s 4,862,880 MB/s
100 TB Sorting Results
Time 72 minutes 23 minutes 377 seconds

The Aliyun cluster has 331,968 GB memory in total, which is significantly greater than the size of the data to be sorted. This allows the data to be sorted to reside in memory completely, avoiding performance impact from disk I/O. In fact, in their report Aliyun described a “I/O dual buffering” technique, which allows data processing and disk I/O to be done in parallel. The report pointed out that “we ensure data are not buffered in OS page cache by running a data purge job that randomly reads from local file system before each benchmark run”. However, data can be loaded into memory quickly at the beginning of the benchmark because the cluster has sufficient I/O capacity to achieve this in around 20 seconds. The “Overlapped Execution” section in the report implies that the abundance of memory might be playing a much greater role than the “I/O dual buffering” technique. This is very different from the Spark cluster with only 50,264 GB memory, where extensive disk I/O must occur as part of the sorting benchmark.

Based on the above-listed data, it is quite convincing that Aliyun’s solution is better than Yahoo’s solution, considering the obvious performance advantages. However, it is very hard to say that Aliyun’s solution is better than Spark’s solution, considering the obvious resource advantages (memory in particular).

Another important aspect is that Spark’s solution was deployed on top of Amazon EC2. This means that such very-large-scale computation can be done with an extremely low cost – researchers only need to pay for the actual amount of computing resource being used for the amount of time they are using it. In Aliyun’s case, the cluster was a set of fixed asset for Aliyun. Considering the fact that Aliyun also considers itself as a public cloud service provider, is it possible for them to run this benchmark on their public cloud offerings?

SSL Connection to RDS Instances in phpMyAdmin

By , 2015年9月1日 6:56 上午

Setting up SSL connection between phpMyAdmin and the RDS MySQL server is quite straight forward. Below is a demo setup on Amazon Linux with phpMyAdmin 4.4.14. The web server is Apache with PHP 5.

First of all we download and unzip phpMyAdmin. At the same time we download the root certificate for RDS to the phpMyAdmin folder:

$ cd /var/www/html
$ wget https://files.phpmyadmin.net/phpMyAdmin/4.4.14/phpMyAdmin-4.4.14-all-languages.zip
$ unzip phpMyAdmin-4.4.14-all-languages.zip
$ cd phpMyAdmin-4.4.14-all-languages.zip
$ cp config.sample.inc.php config.inc.php 
$ wget https://s3.amazonaws.com/rds-downloads/rds-ca-2015-root.pem

Now edit config.inc.php, using the following configurations for the “First server”, which is your RDS instance.

/*
 * First server
 */
$i++;
/* Authentication type */
$cfg['Servers'][$i]['auth_type'] = 'cookie';
/* Server parameters */
$cfg['Servers'][$i]['host'] = 'instance-name.xxxxxxxxxxxx.ap-southeast-2.rds.amazonaws.com';
$cfg['Servers'][$i]['connect_type'] = 'tcp';
$cfg['Servers'][$i]['compress'] = false;
$cfg['Servers'][$i]['AllowNoPassword'] = false;
$cfg['Servers'][$i]['ssl'] = true;
$cfg['Servers'][$i][''] = '/var/www/html/phpMyAdmin-4.4.14-all-languages/rds-ca-2015-root.pem';

You will probably need to restart httpd to make things work.

$ sudo service httpd restart

At this point, you can use phpMyAdmin to login to your RDS instance. After you login, use the following SQL query to verify the SSL connection:

show status like 'Ssl_cipher';

If you see the following result, the SSL connection is successful:

Variable_name 	Value 	
Ssl_cipher 	AES256-SHA

Running cxxnet on Amazon EC2 (Ubuntu 14.04)

By , 2015年8月9日 8:47 上午

1. Launch an EC2 instance with the g2.8xlarge instance type, using a Ubuntu 14.04 HVM AMI. When I launched the EC2 instance, I used a root EBS volume of 300 GB (General Purpose SSD) to have a decent disk I/O capacity. With general purpose SSD, you have 3 IOPS for each GB of storage. So 300 GB storage gives me 900 baseline IOPS, with the capability to burst up to 3000 IOPS for an extended period of time.

2. SSH into the EC2 instance and install CUDA driver, as below:

There is a detailed tutorial on this topic available on Github:

https://github.com/BVLC/caffe/wiki/Install-Caffe-on-EC2-from-scratch-(Ubuntu,-CUDA-7,-cuDNN)

3. Install OpenBLAS, as below

$ sudo apt-get install make gfortran
$ wget http://github.com/xianyi/OpenBLAS/archive/v0.2.14.tar.gz
$ tar zxvf v0.2.14.tar.gz
$ cd OpenBLAS-0.2.14
$ make FC=gfortran
$ sudo make PREFIX=/usr/local/ install
$ cd/usr/local/lib
$ sudo ln -s libopenblas.so libblas.so

4. Install OpenCV

There is a detailed documentation available from the Ubuntu community:

https://help.ubuntu.com/community/OpenCV

You will also need to install the header files for OpenCV

$ sudo apt-get install libopencv-dev

3. Install cxxnet, as below

$ cd ~
$ wget https://github.com/dmlc/cxxnet/
$ cd cxxnet
$ ./build.sh

In most cases, the build will fail. You need to customize your Makefile a little bit to reflect the actual situation of your build environment. Below is an example from my environment:

CFLAGS += -g -O3 -I./mshadow/ -I./dmlc-core/include -I/usr/local/cuda/include -I/usr/include -fPIC $(MSHADOW_CFLAGS) $(DMLC_CFLAGS)
LDFLAGS = -pthread $(MSHADOW_LDFLAGS) $(DMLC_LDFLAGS) -L/usr/local/cuda/lib64 -L/usr/local/lib

Then do the make again:

$ make
g++ -DMSHADOW_FORCE_STREAM -Wall -g -O3 -I./mshadow/ -I./dmlc-core/include -I/usr/local/cuda/include -I/usr/include -fPIC -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -DCXXNET_USE_OPENCV=1 -DCXXNET_USE_OPENCV_DECODER=1 -fopenmp   -o bin/cxxnet src/local_main.cpp layer_cpu.o updater_cpu.o nnet_cpu.o main.o nnet_ps_server.o data.o dmlc-core/libdmlc.a layer_gpu.o updater_gpu.o nnet_gpu.o -pthread -lm -lcudart -lcublas -lcurand -lblas -lrt -L/usr/local/cuda/lib64 -L/usr/local/lib `pkg-config --libs opencv` -ljpeg
g++ -DMSHADOW_FORCE_STREAM -Wall -g -O3 -I./mshadow/ -I./dmlc-core/include -I/usr/local/cuda/include -I/usr/include -fPIC -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -DCXXNET_USE_OPENCV=1 -DCXXNET_USE_OPENCV_DECODER=1 -fopenmp   -o bin/im2rec tools/im2rec.cc dmlc-core/libdmlc.a -pthread -lm -lcudart -lcublas -lcurand -lblas -lrt -L/usr/local/cuda/lib64 -L/usr/local/lib `pkg-config --libs opencv` -ljpeg
g++ -DMSHADOW_FORCE_STREAM -Wall -g -O3 -I./mshadow/ -I./dmlc-core/include -I/usr/local/cuda/include -I/usr/include -fPIC -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -DCXXNET_USE_OPENCV=1 -DCXXNET_USE_OPENCV_DECODER=1 -fopenmp   -o bin/bin2rec tools/bin2rec.cc dmlc-core/libdmlc.a -pthread -lm -lcudart -lcublas -lcurand -lblas -lrt -L/usr/local/cuda/lib64 -L/usr/local/lib `pkg-config --libs opencv` -ljpeg
g++ -DMSHADOW_FORCE_STREAM -Wall -g -O3 -I./mshadow/ -I./dmlc-core/include -I/usr/local/cuda/include -I/usr/include -fPIC -msse3 -funroll-loops -Wno-unused-parameter -Wno-unknown-pragmas -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_MKL=0 -DMSHADOW_RABIT_PS=0 -DMSHADOW_DIST_PS=0 -fPIC -DDMLC_USE_HDFS=0 -DDMLC_USE_S3=0 -DDMLC_USE_AZURE=0 -DCXXNET_USE_OPENCV=1 -DCXXNET_USE_OPENCV_DECODER=1 -fopenmp  -shared -o wrapper/libcxxnetwrapper.so wrapper/cxxnet_wrapper.cpp layer_cpu.o updater_cpu.o nnet_cpu.o main.o nnet_ps_server.o data.o dmlc-core/libdmlc.a layer_gpu.o updater_gpu.o nnet_gpu.o -pthread -lm -lcudart -lcublas -lcurand -lblas -lrt -L/usr/local/cuda/lib64 -L/usr/local/lib `pkg-config --libs opencv` -ljpeg

Now we can run an example:

$ cd example/MNIST
$ ./run.sh MNIST_CONV.conf 
libdc1394 error: Failed to initialize libdc1394
Use CUDA Device 0: GRID K520
finish initialization with 1 devices
Initializing layer: cv1
Initializing layer: 1
Initializing layer: 2
Initializing layer: 3
Initializing layer: fc1
Initializing layer: se1
Initializing layer: fc2
Initializing layer: 7
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
SGDUpdater: eta=0.100000, mom=0.900000
node[in].shape: 100,1,28,28
node[1].shape: 100,32,14,14
node[2].shape: 100,32,7,7
node[3].shape: 100,1,1,1568
node[4].shape: 100,1,1,100
node[5].shape: 100,1,1,100
node[6].shape: 100,1,1,10
MNISTIterator: load 60000 images, shuffle=1, shape=100,1,28,28
MNISTIterator: load 10000 images, shuffle=0, shape=100,1,28,28
initializing end, start working
round        0:[     600] 2 sec elapsed[1]      train-error:0.211783	test-error:0.0435
round        1:[     600] 3 sec elapsed[2]      train-error:0.0522667	test-error:0.0263
round        2:[     600] 5 sec elapsed[3]      train-error:0.0370833	test-error:0.0214
round        3:[     600] 7 sec elapsed[4]      train-error:0.0316167	test-error:0.023
round        4:[     600] 9 sec elapsed[5]      train-error:0.02905	test-error:0.0152
round        5:[     600] 11 sec elapsed[6]     train-error:0.0265167	test-error:0.0166
round        6:[     600] 13 sec elapsed[7]     train-error:0.0248333	test-error:0.0164
round        7:[     600] 15 sec elapsed[8]     train-error:0.0226667	test-error:0.0144
round        8:[     600] 17 sec elapsed[9]     train-error:0.0234167	test-error:0.0139
round        9:[     600] 19 sec elapsed[10]    train-error:0.0221	test-error:0.0152
round       10:[     600] 21 sec elapsed[11]    train-error:0.0218667	test-error:0.0121
round       11:[     600] 23 sec elapsed[12]    train-error:0.02025	test-error:0.0128
round       12:[     600] 24 sec elapsed[13]    train-error:0.01925	test-error:0.0142
round       13:[     600] 26 sec elapsed[14]    train-error:0.0194333	test-error:0.0129
round       14:[     600] 28 sec elapsed[15]    train-error:0.0190167	test-error:0.0114

updating end, 28 sec in all

At this point you can proceed to work with the examples provided by the cxxnet authors:

https://github.com/dmlc/cxxnet/tree/master/example

Distributed File System on Amazon Linux — XtreemFS

By , 2015年8月5日 2:02 下午

[Introduction]

This article provides a quick started guide on how to set up and configure XtreemFS on Amazon Linux. Two EC2 instance are being launched to accomplish this goal. On both EC2 instances, there is an instance-store volume serving as the shared storage.

Edit /etc/hosts on both EC2 instances with the following entries (assuming that the private IP addresses are 172.31.0.11 and 172.31.0.12).

172.31.0.11	node01
172.31.0.12	node02

Then run the following commands to install the xtreemfs-server and xtreemfs-client packages:

$ cd cd /etc/yum.repos.d/
$ sudo wget "http://download.opensuse.org/repositories/home:/xtreemfs/CentOS_6/home:xtreemfs.repo"
$ sudo yum install xtreemfs-server
$ sudo yum install xtreemfs-client

[Configuration]

On both EC2 instances, create file system on the instance-store volume and mount it to /xtreemfs.

$ sudo mkdir -p /xtreemfs
$ sudo mkfs.ext4 /dev/xvdb
$ sudo mount /dev/xvdb /xtreemfs
$ sudo mkdir -p /xtreemfs/mrc/database
$ sudo mkdir -p /xtreemfs/mrc/db-log
$ sudo mkdir -p /xtreemfs/objs
$ sudo chown -R xtreemfs /xtreemfs

On both EC2 instances, modify /etc/xos/xtreemfs/mrcconfig.properties using the following values:

dir_service.host = node01
babudb.baseDir = /xtreemfs/database
babudb.logDir = /xtreemfs/db-log

On both EC2 instances, modify /etc/xos/xtreemfs/osdconfig.properties using the following values:

dir_service.host = node01 
object_dir = /xtreemfs/objs

On node01, start all DIR, MRC and OSD services:

$ sudo service xtreemfs-dir start
$ sudo service xtreemfs-mrc start
$ sudo service xtreemfs-osd start

On node02, start MRC and OSD services:

$ sudo service xtreemfs-mrc start
$ sudo service xtreemfs-osd start

On one of the nodes, create a new volume:

mkfs.xtreemfs localhost/myvolume

On both nodes, mount the volume:

$ sudo mkdir /data
$ sudo chown -R ec2-user:ec2-user /data
$ mount.xtreemfs node01/myvolume /data

Now the shared file system has been set up, and you can create a text file under /data and observe that the file created appears on both EC2 instances.

If you create an AMI for large-scale deployment, please note that in /etc/xos/xtreemfs/mrcconfig.properties and /etc/xos/xtreemfs/osdconfig.properties there are UUID at the end of each file. On each node, the UUID should be different otherwise you will end up very messy. There is a generate_uuid script in the same folder. It is suggest that you do the following to make sure that your AMI works:

(1) Before creating the AMI, remove the UUID lines from the above-mentioned configuration files.

(2) When you launch the instance, use the user-data section to run a bash script to generate the UUID, as below:

#!/bin/bash
cd /etc/xos/xtreemfs
./generate_uuid mrcconfig.properties
./generate_uuid osdconfig.properties

Please bear in mind that this is only a quick start guide, and you should not use this configuration directly in a production system without further tunings.

Getting Started with AWS SDK for Java (2)

By , 2015年6月11日 10:24 上午

This is the 2nd part of my tutorial on “Getting Started with AWS SDK for Java”. If you have not already do so, I suggest that you first take a look at the first chapter of this set of training “Getting Started with AWS SDK for Java (1)” to properly set up your development environment. In this part, we will cover the Amazon RDS client, as well as some common issues when using RDS as the back end database for your Java applications.

[Amazon RDS Client]

In this section, we use the AmazonRDSClient to accomplish some basic tasks such as launching an RDS instance, listing all RDS instances in a particular region, as well as terminating a particular RDS instance. The related source code for this demo is DemoRDS.java (you can click on the link to view the source code in a separate browser tab). You should also take a look at the Java docs for the AmazonRDSClient to get yourself familiar with the various properties and methods.

First of all we create an instance of the AmazonRDSClient in the constructor, then set the region to ap-southeast-2. For debugging purposes, we enable logging using log4j.

	
public class DemoRDS 
{
	public AmazonRDSClient client;
	final static Logger logger = Logger.getLogger(DemoRDS.class);

	/**
	 *
	 * Constructor
	 *
	 */

	public DemoRDS()
	{
		// Create the AmazonRDSClient
		client = new AmazonRDSClient();
		// Set the region to ap-southeast-2
		client.setRegion(Regions.AP_SOUTHEAST_2);
	}

To launch an RDS instance, you will need to create a CreateDBInstanceRequest object, then pass it to the createDBInstance() method of the AmazonRDSClient, which returns a DBInstance object. From the DBInstance object, you will be able to obtain information about the newly created RDS instance. Due to the asynchronous nature of AWS API calls, some information might not be available in the DBInstance object returned by the createDBInstance() method. For example, the DNS endpoint for the newly created RDS instance will not be available until several minutes later, therefore instance.getEndpoint() will return a null result. If you try to convert this null result into a String, you will get an exception.

	public String launchInstance()
	{
		System.out.println("\n\nLAUNCH INSTANCE\n\n");

		try
		{
			// The CreateDBInstanceRequest object
			CreateDBInstanceRequest request = new CreateDBInstanceRequest();
			request.setDBInstanceIdentifier("Sydney");	// RDS instance name
			request.setDBInstanceClass("db.t2.micro");
			request.setEngine("MySQL");		
			request.setMultiAZ(false);
			request.setMasterUsername("username");
			request.setMasterUserPassword("password");
			request.setDBName("mydb");		// database name 
			request.setStorageType("gp2");		// standard, gp2, io1
			request.setAllocatedStorage(10);	// in GB

			// VPC security groups 
			ArrayList list = new ArrayList();
			list.add("sg-efcc248a");			// security group, call add() again to add more than one
			request.setVpcSecurityGroupIds(list);

			// Create the RDS instance
			DBInstance instance = client.createDBInstance(request);

			// Information about the new RDS instance
			String identifier = instance.getDBInstanceIdentifier();
			String status = instance.getDBInstanceStatus();
			Endpoint endpoint = instance.getEndpoint();
			String endpoint_url = "Endpoint URL not available yet.";
			if (endpoint != null)
			{
				endpoint_url = endpoint.toString();
			}

			// Do some printing work
			System.out.println(identifier + "\t" + status);
			System.out.println(endpoint_url);

			// Return the DB instance identifier
			return identifier;
		} catch (Exception e)
		{
			// Simple exception handling by printing out error message and stack trace
			System.out.println(e.getMessage());
			e.printStackTrace();
			return "ERROR";
		}
	}

To list all RDS instances, we simply call the describeDBInstances() method of the AmazonRDSClient. This method returns a list of DBInstance objects, and you need to traverse through the list to obtain information about each individual DBInstance object.

	public void listInstances()
	{
		System.out.println("\n\nLIST INSTANCE\n\n");
        	try 
		{
			// Describe DB instances
			DescribeDBInstancesResult result = client.describeDBInstances();
			
			// Getting a list of the RDS instances
			List instances = result.getDBInstances();
			for (DBInstance instance : instances)
			{
				// Information about each RDS instance
				String identifier = instance.getDBInstanceIdentifier();
				String engine = instance.getEngine();
				String status = instance.getDBInstanceStatus();
				Endpoint endpoint = instance.getEndpoint();
				String endpoint_url = "Endpoint URL not available yet.";
				if (endpoint != null)
				{
					endpoint_url = endpoint.toString();
				}

				// Do some printing work
				System.out.println(identifier + "\t" + engine + "\t" + status);
				System.out.println("\t" + endpoint_url);
			}
	        } catch (Exception e) 
		{
			// Simple exception handling by printing out error message and stack trace
			System.out.println(e.getMessage());
			e.printStackTrace();
		}
	}

To terminate an RDS instance, we need to create a DeleteDBInstanceRequest, then pass the DeleteDBInstanceRequest to the deleteDBInstance() method. In the DeleteDBInstanceRequest, you should at least specify the DB instance identifier and whether you want to skip the final snapshot for the RDS instance to be deleted. If you want to create a final snapshot, you will need to set the name of the final snapshot in the DeleteDBInstanceRequest object.

	public void terminateInstance(String identifier)
	{
		System.out.println("\n\nTERMINATE INSTANCE\n\n");
		try
		{
			// The DeleteDBInstanceRequest 
			DeleteDBInstanceRequest request = new DeleteDBInstanceRequest();
			request.setDBInstanceIdentifier(identifier);
			request.setSkipFinalSnapshot(true);
			
			// Delete the RDS instance
			DBInstance instance = client.deleteDBInstance(request);

			// Information about the RDS instance being deleted
			String status = instance.getDBInstanceStatus();
			Endpoint endpoint = instance.getEndpoint();
			String endpoint_url = "Endpoint URL not available yet.";
			if (endpoint != null)
			{
				endpoint_url = endpoint.toString();
			}

			// Do some printing work
			System.out.println(identifier + "\t" + status);
			System.out.println(endpoint_url);
		} catch (Exception e)
		{
			// Simple exception handling by printing out error message and stack trace
			System.out.println(e.getMessage());
			e.printStackTrace();
		}
	}

Before running the demo code, please modify the source code with the appropriate arguments (such as the security groups when creating the RDS instance) for the API calls. It is recommended that you intentionally introduce some errors in the arguments to observe the logging information from the AWS SDK. The demo code comes with switches for each demo module. You can use the launch, list, and terminate switches to pick which demo module you would like to run. For example:

$ mvn compile
$ mvn package
$ java -cp target/demo-1.0-SNAPSHOT.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoRDS launch
$ java -cp target/demo-1.0-SNAPSHOT.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoRDS list
$ java -cp target/demo-1.0-SNAPSHOT.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoRDS list
$ java -cp target/demo-1.0-SNAPSHOT.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoRDS terminate
$ java -cp target/demo-1.0-SNAPSHOT.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoRDS list
$ java -cp target/demo-1.0-SNAPSHOT.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoRDS list

[JDBC Basics]

With Java, people interact with database using JDBC (Java Database Connectivity). This is done in a 4-step approach:

– loading the JDBC driver using a class loader
– establishing a connection using DriverManager
– working with the database
– close the connection

The JDBC drivers for MySQL, PostgreSQL, Oracle and SQL Server can be found from the following URL. You will need to put the corresponding JAR file into your CLASSPATH to make things work. In the third-party folder of our demo code, we provide a copy of MySQL Connector/J 5.1.35.

MySQL Connector/J
PostgreSQL JDBC Driver
Oracle JDBC Driver
Microsoft JDBC Driver for SQL Server

With JDBC, we connect to database using connection URL, which includes properties such as the hostname or IP address of the database server, the port number to use for the connection, the name of the database to work with, as well as username and password. For different database engines, the format of the connection URL is slightly different. The following pseudo-code provides example connection URLs for MySQL, PostgreSQL, Oracle and SQL Server. If you need a definitive guidance on constructing connection URL for a specific database engine, please refer to the following URL:

JDBC Connection URL for MySQL
JDBC Connection URL for PostgreSQL
JDBC Connection URL for Oracle
JDBC Connection URL for SQL Server

	// MySQL
	Class.forName("com.mysql.jdbc.Driver");
	String jdbc_url = "jdbc:mysql://hostname/database?user=username&password=password";
	Connection conn = DriverManager.getConnection(jdbc_url);
	
	// PostgreSQL
	Class.forName("org.postgresql.Driver");
	String jdbc_url = "jdbc:postgresql://hostname/database?user=username&password=password&ssl=true"";
	Connection conn = DriverManager.getConnection(jdbc_url);
	
	// Oracle
	Class.forName ("oracle.jdbc.OracleDriver");
	String jdbc_url = "jdbc:oracle:thin:@hostname:1521:orcl";
	Connection conn = DriverManager.getConnection(jdbc_url, "username", "password");	
	
	// SQL Server
	Class.forName("com.microsoft.jdbc.sqlserver.SQLServerDriver");
	String jdbc_url = "jdbc:microsoft:sqlserver://hostname:1433;DatabaseName=database";
	Connection conn = DriverManager.getConnection(jdbc_url, "username", "password");

The following demo code provides an example on using MySQL Connector/J to connect to an RDS instance, then carry out some operations such as CREATE TABLE, INSERT, and SELECT in an infinite loop. The properties of the database (including hostname, database, username, password) are provided in a property file db.properties in the top level folder of the demo code.  When we run the demo code, we load these properties from an InputStream. This way we do not need to provide database credentials in the source code. (The benefit of doing this is that when your database credentials changes, you do not need to recompile your Java code. All you need to do is to update the properties in db.properties.)

In this demo code we catch Exception in two levels – the first level Exception might occur when loading the property file (file does not exist, incorrect format, or required entry missing) or loading the MySQL JDBC driver (the JAR file is not in CLASSPATH), while the second level Exception might occur within the infinite loop (can not open a connection to the database, can not CREATE TABLE or execute INSERT or SELECT queries). When the first level Exception occurs, there are errors in the resource level, so we can’t move forward at all. When the second level Exception occurs, there might be things that we can fix from within the RDS instance, so we simply print out the error messages and keep on trying using the infinite loop.

	public void runJdbcTests()
	{
		System.out.println("\n\nJDBC TESTS\n\n");
		try 
		{
			// Getting database properties from db.properties
			Properties prop = new Properties();
			InputStream input = new FileInputStream("db.properties");
			prop.load(input);
			String db_hostname = prop.getProperty("db_hostname");
			String db_username = prop.getProperty("db_username");
			String db_password = prop.getProperty("db_password");
			String db_database = prop.getProperty("db_database");

			// Load the MySQL JDBC driver
			Class.forName("com.mysql.jdbc.Driver");
			String jdbc_url = "jdbc:mysql://" + db_hostname + "/" + db_database + "?user=" + db_username + "&password=" + db_password;

			// Run an infinite loop 
			Connection conn = null;
			while (true)
			{
				try
				{
					// Create a connection using the JDBC driver
					conn = DriverManager.getConnection(jdbc_url);

					// Create the test table if not exists
					Statement statement = conn.createStatement();
					String sql = "CREATE TABLE IF NOT EXISTS jdbc_test (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, content VARCHAR(80))";
					statement.executeUpdate(sql);

					// Do some INSERT
					PreparedStatement preparedStatement = conn.prepareStatement("INSERT INTO jdbc_test (content) VALUES (?)");
					String content = "" + UUID.randomUUID();
					preparedStatement.setString(1, content);
					preparedStatement.executeUpdate();
					System.out.println("INSERT: " + content);

					// Do some SELECT
					sql = "SELECT COUNT(*) as count FROM jdbc_test";
					ResultSet resultSet = statement.executeQuery(sql);
					if (resultSet.next())
					{
						int count = resultSet.getInt("count");
						System.out.println("Total Records: " + count);
					}

					// Close the connection
					conn.close();

					// Sleep for some time
					Thread.sleep(20000);
				} catch (Exception e1)
				{
					System.out.println(e1.getMessage());
					e1.printStackTrace();
				}
			}
		} catch (Exception e0)
		{
			System.out.println(e0.getMessage());
			e0.printStackTrace();
		}
	}

After creating an RDS instance and update the properties in db.properties, you can run the JDBC tests using the following command. You can stop the execution of this demo using CTRL C.

$ java -cp target/demo-1.0-SNAPSHOT.jar -Dlog4j.configurationFile=log4j2.xml net.qyjohn.aws.DemoRDS jdbc

JDBC TESTS

INSERT: cc6294da-fb84-4c6f-aa49-a33804058d03
Total Records: 1
INSERT: 1d7c8940-79cc-45ca-948a-27b809bb9e69
Total Records: 2
INSERT: 32d1acc5-c9ed-4bce-a6cd-44e7ac38ff42
Total Records: 3
INSERT: 88923f13-5ecd-41c5-a437-2d51099c2ff5
Total Records: 4

[Cloud Specific Considerations]

When building applications on top of AWS, it is important to assume that everything fails all the time. With this in mind, you should always connect to your RDS instance using the DNS endpoint instead of the IP address obtained from a DNS server, because the IP address of your RDS instance will change when a Multi-AZ fail over or a Single-AZ recovery occurs. In the case of Multi-AZ fail over, the DNS endpoint will be resolved to the IP address of the new master. In the case of Single-AZ recovery, a new instance will be launched and the DNS endpoint will be resolved to the IP address of the new instance.

For example, if we do a “reboot with fail over” of the RDS instance while running our JDBC tests against the RDS instance, we will see the following output. An Exception occurs when the Multi-AZ fail over occurs because the JDBC driver fails to connect to the old master due to connection timeout. When the Multi-AZ fail over is completed, subsequent connections are make to the new master successfully. (This is why we put each set of test inside a try… catch block.)

$ java -cp target/demo-1.0-SNAPSHOT.jar -Dlog4j.configurationFile=log4j2.xml -Djava.security.manager=default net.qyjohn.aws.DemoRDS jdbc

JDBC TESTS

INSERT: 14a03563-325e-4dc3-8456-bdbc1fee3034
Total Records: 48
INSERT: ec28a659-fc64-4995-b434-760e8b3274ae
Total Records: 49
Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

INSERT: d59f3e77-1a17-4bee-9056-018dde60fb27
Total Records: 50
INSERT: 4ef1b7c7-1de1-430f-a503-61227c4b1249
Total Records: 51

In the above-mentioned demo, the Java SE application successfully handle the Multi-AZ event. However, this might not be the case in a Java EE use case, where a security manger is in place. The reason is that in Java there is a networking property networkaddress.cache.ttl controlling the caching policy for successful name lookups from the name service (see Java Properties for details). A value of -1 indicates “cache forever”. The default behavior is to cache forever when a security manager is installed (with Java EE applications, this is a common practice enforced by the application server). When a Multi-AZ fail over is completed, the operating system already sees the new DNS record for the RDS endpoint, but the Java application still keeps the old DNS record. The result is, when you have a Java EE application running in Tomcat, JBoss, or GlassFish, the application keeps on trying to reach the old master (which is no longer in service) and keeps on failing, until a restart of the application.

We can simulate this behavior with the same JDBC tests. Before doing this, we need to add the following security manager entry to /usr/lib/jvm/java-8-oracle/jre/lib/security/java.policy (You should replace /home/ubuntu/aws-sdk-java-demo with the actual path of your demo code folder). This policy grants AllPermission to our demo application, which is represented by the JAR in the target folder.

grant codeBase "file:/home/ubuntu/aws-sdk-java-demo/target/*"
{
	permission java.security.AllPermission;
};

Then we run our demo application again with the default security manager, then do another “reboot with fail over” of the RDS instance while running our JDBC tests. Now we should see that the application fails to open a connection to the RDS instance for ever. If you do a “dig” against the DNS endpoint of the RDS instance before and after the fail over, you will see that the operating system does see the change in DNS records.

$ java -cp target/demo-1.0-SNAPSHOT.jar -Dlog4j.configurationFile=log4j2.xml -Djava.security.manager=default net.qyjohn.aws.DemoRDS jdbc

JDBC TESTS

INSERT: b9b8cb31-5ef7-41ed-a1d6-96f430bb6cb2
Total Records: 52
INSERT: 9afa8447-0876-4af5-88ce-e2aa782f91f8
Total Records: 53
Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

There are many different solutions to this issue, including (1) modifying the required system property in the java command line; (2) modifying the /usr/lib/jvm/java-8-oracle/jre/lib/security/java.security configuration file; (3) modifying the startup parameters of your application server; and (4) setting up a new value for networkaddress.cache.ttl directly in your Java code.

The first solution is to add a Dsun.net.inetaddr.ttl=0 (never cache) to your command line. As shown in the following example, with this setting our JDBC test is able to pick up the new DNS record after one Exception.

$ java -cp target/demo-1.0-SNAPSHOT.jar -Dlog4j.configurationFile=log4j2.xml -Djava.security.manager=default -Dsun.net.inetaddr.ttl=0 net.qyjohn.aws.DemoRDS jdbc

JDBC TESTS

INSERT: a8c1bcca-0335-4217-9f97-a3964f12c574
Total Records: 54
INSERT: 5fa1cb23-96b9-449f-98b8-b0184781d657
Total Records: 55
Communications link failure

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

INSERT: 4e3c8e51-9857-4024-8dc2-5a86f442a260
Total Records: 56
INSERT: 9ad6ec95-159a-4291-852d-903a08efc065
Total Records: 57

The second solution is to set a value for the networkaddress.cache.ttl property (0 for never cache) permanently in /usr/lib/jvm/java-8-oracle/jre/lib/security/java.security. This can be done by adding the following one line to this configuration file:

networkaddress.cache.ttl=0

The third solution is to modify the startup parameters of your application server. In the case of Tomcat7, you can modify JAVA_OPTS in /etc/default/tomcat7 with the desired setting, as below:

# You may pass JVM startup parameters to Java here. If unset, the default
# options will be: -Djava.awt.headless=true -Xmx128m -XX:+UseConcMarkSweepGC
#
# Use "-XX:+UseConcMarkSweepGC" to enable the CMS garbage collector (improved
# response time). If you use that option and you run Tomcat on a machine with
# exactly one CPU chip that contains one or two cores, you should also add
# the "-XX:+CMSIncrementalMode" option.
JAVA_OPTS="-Djava.awt.headless=true -Dsun.net.inetaddr.ttl=0 -Xmx128m -XX:+UseConcMarkSweepGC"

The fourth solution is to set up a new value for networkaddress.cache.ttl directly in your Java code, as describe by this AWS documentation Setting the JVM TTL for DNS Name Lookups. If you have the ability to modify your code, this is the recommended way, because you have full control of the behavior of your application, regardless of the configuration of the underlying runtime environment. (In the example below, 60 indicates the new TTL is 60 seconds. This way you still have some caching, but the caching is not that aggressive.)

java.security.Security.setProperty("networkaddress.cache.ttl" , "60");

The above-mentioned “Communication link failure” is one of the most commonly seen errors when working with RDS MySQL instances. In most cases, this issue can be resolved by asking yourself the following questions:

– Does your security group allows the communication between your EC2 instance (or on premise server) to communicate with your RDS instance (do a telnet to port 3306 on the RDS instance for a quick test)?

– Is a connection pool being used? Do you validate the connection when checking it out from the connection pool? Existing connections in a connection pool might become invalid due to various reasons (for example, timeouts).

– Has the MySQL service daemon been restarted? With RDS, the MySQL service daemon automatically restarts after it is crashed due to various reasons (for example, out of memory errors).

– Is there a fail over event (Multi-AZ) or recovery event (Single-AZ)?

– Does the operating system has the correct DNS record (do a dig to verify)? Does your Java application has the correct DNS record (check networkaddress.cache.ttl)?

– Is there anything in MySQL error log?

Panorama Theme by Themocracy