Add Spot Instances with Amazon EMR

Most of us know that Amazon Spot EC2 instances are usually good choice for Time-flexible and interruption-tolerant tasks. These instances gets traded frequently on a Spot market price and you can fix your Bid Price using AWS API’s or AWS Console. Once free Spot EC2 instances are available for your Bid Price, AWS will allot them for use in your account. Spot instances are usually available way cheaper than On-Demand EC2 instances most of the times. Example: On-Demand m1.xlarge per hour price is 0.48 USD and on spot market you can find them sometimes @ 0.052 per hour. This is ~9 times cheaper than the on-demand price; imagine if you can bid competitively and get hold of spot EC2 even around 0.24 USD most of the times, you are saving 50% from the on-demand price straight away. In Big data use cases usually you might need lots of EC2 nodes for processing, adopting such techniques can vastly make difference in your infra cost and operations in long term. I am sharing my experience on this subject as tips and techniques you can adopt to save costs while using EMR clusters in Amazon for big data problems.
Note : While dealing with spot you can be sure that you will never pay more than your maximum bid price per hour.

Tip 1: Make right choice (Spot vs On-Demand) for the cluster components
Data Critical workloads: For workloads which cannot afford to lose data you can have the Master + Core on Amazon On-Demand EC2 and your task nodes on Spot EC2. This is the most common pattern while combining Spot and On-Demand on Amazon EMR cluster. Since task nodes are operating on spot prices depending upon your bidding strategy you can save ~50% costs from running your task nodes using On-Demand EC2. You can further save(if you are lucky) by reserving your Core and Master Nodes , but you will be tied to an AZ. According to me this is not a good or common technique, because some AZ’s can be very noisy with high spot prices.
Cost Driven workloads: When solving big data problems, sometimes you might have to face scenarios where cost is very important than time. Example: You are processing archives of old logs as low priority jobs, where cost of processing is very important and usually with abundant time left. Such cases you can have all the Master+Core+Task run on Spot EC2 to get further savings from the data critical workloads approach. Since all the nodes are operating on spot prices depending upon your bidding strategy you can save ~60% or more costs from running your nodes using On-Demand EC2. The below mentioned table published by AWS gives an indication of the Amazon EMR + Spot combinations that are widely used:

Tip 2: There is free lunch sometimes
Spot Instances can be interrupted by AWS when the spot price reaches your bidding price. What interruption means is that, AWS can pull out the Spot EC2’s assigned to your account when the price matches/exceeds. If your Spot Task Nodes are interrupted you will not be charged for any partial hour of usage by AWS i.e. if you have started the instance @ 10:05 am and if your instances are interrupted by spot price fluctuations @ 10:45 am you will not be charged for the partial hour of usage. If your processing exercise is totally time insensitive, you can keep your bidding price at closer level to spot price which are easily interrupt-able by AWS and exploit this partial hours concept. Theoretically you can get most of the processing done through your task nodes for free* exploiting this strategy.

Tip 3: Use the AZ wisely when it comes to spot
Different AZ’s inside an Amazon EC2 region has different spot prices for the same Instance type. Observe this pattern for a while, build some intelligence around the price data collected and rebuild your cluster in the AZ with lowest price. Since the Master+Core+Task need to run on the same AZ for better latency, it is advisable to architect your EMR clusters in such a way they can be switched(i.e.recreate) to different AZ’s according to spot prices. If you can build this flexibility in your architecture you can save costs by leveraging the Inter AZ price fluctuations. Refer the below images for Spot Price variations in 2 AZ’s inside the same Region for same time period. “Make your choice wisely time to time”

Tip 4: Keep your Job logic small and store intermediate outputs in S3
Breakdown your complex processing logic into small jobs and design your jobs and tasks in EMR cluster in such a way that they run for very small period of time (example few minutes). Store all the intermediate job outputs in Amazon S3. This approach is helpful in EMR world and gives you following benefits:

When your Core+ Task nodes are interrupted frequently, you can still continue from the intermediate points. Data accessed from S3.
You now have the flexibility to recreate the EMR clusters in multiple AZ depending upon the Spot price fluctuations
You can decide the number of nodes needed for your EMR cluster(even every hour) depending upon the data volume, density and velocity

All the above 3 points when implemented contribute to elasticity in your architecture and there by helps you save costs in Amazon cloud. The above recommendation is not suitable for all Jobs, it has to be carefully mapped with right use cases by the architects.

The AdWantageS

Every customer has a reason to move in to the cloud. Be it cost, scalability, on demand provisioning, there are plenty of reasons why one moves into the Cloud. The latest whitepaper ”The Total Cost of (Non) Ownership of Web Applications in the Cloud” by Jinesh Varia, Technical evangelist of Amazon Web Services provides a good insight between hosting an infrastructure in-house and in the Cloud (AWS). There are plenty of pricing models available with AWS currently which can provide cost benefits ranging from 30% to 80% when compared to hosting the servers in-house.

On-Demand Instances – this is where every one starts with. You simply add your credit card to your AWS account and start spinning up Instances. You provision them on demand and pay for how long you run them. You of course have the option of stopping and starting them whenever needed. You are charged for every hour of running an Instance. For example, a Large Instance (2 CPU, 7.5GB Memory, Linux) you will pay $0.32/hr (US-East).

Reserved Instances – let’s say after you migrate/host your web application to AWS and run multiple web servers and DB servers in AWS. After couple of months, you may notice that some of your servers will run 24 hours/day. You may spin up additional web servers during peak load but will at least run 2 of them always; plus a DB server. For such cases, where you know that you will always run the Instances, AWS provides an option for reducing your cost – Reserved Instances. This is purely a pricing model, where you purchase the required Instance type (say Large/X-Large) in a region for a minimum amount. You will then be charged substantially less for your on-demand charge of that Instance. This way there is a potential savings of 30% over a period of one year when compared to on-demand Instances. The following provides an illustration of the cost savings for purchasing an m1.large Reserved Instance against using it on demand through an year

Cost comparison between On-Demand and Reserved Instance for m1.large Linux Instance in US-East

Be careful that,

Reserved Instances are purchased against an Instance type. If you purchase an m1.large Reserved Instance, at the end of the month when your bill is generated, only m1.large usage will be billed at the reduced Instance hourly charge. Hence, on that given month if you figure out m1.large is not sufficient and move up to m1.xlarge, you will not be billed at the reduced hourly charge. In such a case, you may end up paying more to AWS on an yearly basis. So, examine your usage pattern, fix your Instance type and purchase a Reserved Instance.
Reserved Instances are for a region – if you buy one in the US-East region and later decide to move your Instances to US-West, the cost reduction will not be applicable for Instances running out of US-West region
Of course, you have the benefits of,

Reduced cost – a minimum of 30-40% cost savings which increases if you purchase a 3-year term
Guaranteed Capacity – AWS will always provide you the number of Reserved Instances you have purchased (you will not get an error saying “Insufficient Capacity”)
Spot Instances – in this model, you will bid against the market price of an Instance and if your’s is the highest bid then you will get an Instance at your bid price. The Spot Market price is available through an API which can be queried regularly. You can write code that will check for the Spot Market price and keep placing bids mentioning the maximum price that you are willing to pay against the current Spot Market price. If your bid exceeds then you will get an Instance provisioned. The spot price will be substantially low than on demand pricing. For example, at the time of writing this article, the spot instance pricing for a m1.large Linux Instance in US-East was $0.026/hr as against $0.32/hr on demand pricing. This provides about 90% cost reduction on an hourly basis. But the catch is,

The Spot Market price will keep changing as other users place their bids and AWS has additional excess capacity
If your maximum bid price falls below the Spot Market price, then AWS may terminate your Instance. Abruptly
Hence you may loose your data or your code may terminate unfinished.
Jobs that you anticipate to be completed in one hour may take few more hours to complete
Hence Spot Instances are not suitable for all kind of work loads. Certain work loads like log processing, encoding can exploit Spot Instances but requires writing careful algorithms and deeper understanding. Here are some of the use cases for using Spot Instances.

Now, with that basic understanding, let’s examine the whitepaper little carefully not just from cost point of view. Web application can be classified in to three types based on traffic nature – Steady Traffic, Periodic Burst and Always Unpredictable. Here is a summary of the comparison of benefits of hosting them in AWS

AWS benefits for different type of web applications

Steady Traffic

The website has steady traffic. You are running couple of servers on-premise and consider moving it to AWS. Or you are hosting a new web application on AWS. Here’s the cost comparison from the whitepaper

Source: AWS Whitepaper quoted above

You will most likely start with spinning up On-Demand Instances. You will be running couple of them for web servers and couple of them for your database (for HA)
Over long run (3 years) if you only use On-Demand Instances, you may end up paying more than hosting it on-premise. Do NOT just run On-Demand Instances if you your usage is steady
If you are on AWS for about couple of months and are OK with the performance from your setup, you should definitely consider purchasing Reserved Instances. You will end up with a minimum of 40% savings against on-premise infrastructure and about 30% against running On-Demand Instances
You will still benefit from spinning up infrastructure on demand. Unlike on-premise, where you need to plan and purchase ahead, here you have the option of provisioning on demand; just in time
And in case, you grow and your traffic increases, you have the option to add more capacity to your fleet and remove it. You can change server sizes on demand. And pay only for what you use. This will go a long way in terms of business continuity, user experience and more sales
You can always mix and match Reserved and On-Demand Instances and reduce your cost whenever required. Reserved Instances can be purchased anytime
Periodic Burst

In this scenario, you have some constant traffic to your website but periodically there are some spikes in the traffic. For example, every quarter there can be some sales promotion. Or during thanks giving or Christmas you will have more traffic to the website. During other months, the traffic to the website will be low. Here’s the cost comparison for such a scenario

Source: AWS Whitepaper quoted above

You will spin up On-Demand Instances to start with. You will run couple of them for web servers and couple of them for the database
During the burst period, you will need additional capacity to meet the burst in traffic. You need to spin up additional Instances for your web tier and application tier to meet the demand
Here is where you will enjoy the benefits of on demand provisioning. This is something that is not possible in on-premise hosting. In on-premise hosting, you will purchase the excess capacity required well ahead and keep running them even though the traffic is less. With on demand provisioning, you will only provision them during the burst period. Once the promotion is over, you can terminate those extra capacity
For the capacity that you will always run as the baseline, you can purchase Reserved Instances and reduce the cost up to 75%
Even if you do not purchase Reserved Instances, you can run On-Demand Instances and save around 40% against on-premise infrastructure. Because, for the periodic burst requirement, you can purchase only during the burst period and turn off later. This is not possible in an on-premise setup where you will anyways purchase this ahead of time
Always Unpredictable

In this case, you have an application where you cannot predict the traffic all the time. For example, a social application that is in experimental stage and you expect it to go viral. If it goes viral and gains popularity you will need to expand the infrastructure quickly. If it doesn’t, then you do not want to risk a heavy cap-ex. Here’s the cost comparison for such a scenario

Source: AWS Whitepaper quoted above

You will spin up On-Demand Instances and scale them according to the traffic
You will use automation tools such as AutoScaling to scale the infrastructure on demand. You can align your infrastructure setup according to the traffic
Over a 3 year period, there will be some initial steady growth of the application. As the application goes viral you will need to add capacity. And beyond its lifetime of say, 18 months to 2 years the traffic may start to fall
Through monitoring tools such as CloudWatch you can constantly tweak the infrastructure and arrive at a baseline infrastructure. You will figure out that during the initial growth and “viral” period you will need certain baseline servers. You can go ahead and purchase Reserved Instances for them and mix them with On-Demand Instances when you scale them. You will enjoy a cost saving benefit of around 70% against on-premise setup
It is not advisable to plan for full capacity and run at full capacity. Or purchase full Reserved Instances for the full capacity. If the application doesn’t go well as anticipated, you may end up paying more to AWS than the actual
As you can see, whether you have a steady state application or trying out a new idea, AWS proves advantageous from different perspectives for different requirements. Cost, on-demand provisioning, scalability, flexibility, automation tools are things that even a startup can think off and get on board quickly. One question that you need to ask yourself is “Why am I moving in to AWS?”. Ask this question during the early stages and spend considerable time in design and architecture for the infrastructure setup. Otherwise, you may end up asking yourself “Why did I come into AWS?”.

  • August 20, 2012
  • blog

UIGestureRecognizer for IOS

Introductions:

UIGestureRecognizer is an abstract base class for concrete gesture-recognizer classes.

If you need to detect gestures in your app, such as taps, pinches, pans, or rotations, it’s extremely easy with the built-in UIGestureRecognizer classes.

In the old days before UIGestureRecognizers, if you wanted to detect a gesture such as a swipe, you’d have to register for notifications on every touch within a UIView – such as touchesBegan, touchesMoves, and touchesEnded.

The concrete subclasses of UIGestureRecognizer are the following:

UITapGestureRecognizer
UIPinchGestureRecognizer
UIRotationGestureRecognizer
UISwipeGestureRecognizer
UIPanGestureRecognizer
UILongPressGestureRecognizer

A gesture recognizer has one or more target-action pairs associated with it. If there are multiple target-action pairs, they are discrete, and not cumulative. Recognition of a gesture results in the dispatch of an action message to a target for each of those pairs. The action methods invoked must conform to one of the following signatures:

-(void)handleGesture;

-(void)handleGesture:(UIGestureRecognizer *)gestureRecognize;

Solution:

Step 1: Create a new file of class ViewController and a UIView

Step 2: in viewDidLoad implement concrete subclasses of UIGestureRecognizer

eg : UITapGestureRecognizer

-(void) viewDidLoad {

UITapGestureRecognizer * recognizer = [[UITapGestureRecognizer alloc] initWithTarget:self action:@selector(handleTap:)];

recognizer.delegate = self;

[view addGestureRecognizer:recognizer];

}

Step 3: Implement the gesture handle

– (void)handleTap:(UITapGestureRecognizer *)recognizer {

NSLog (@”your implementation here”);

}

Conclusion:

UIGestureRecognizer classes! These provide a default implementation of detecting common gestures such as taps, pinches, rotations, swipes, pans, and long presses. By using them, not only it reduce you code length , but it easy too.