CloudWatch + Lambda Case 4: Control launch of Specific “C” type EC2 instances post office hours to save costs

We have a customer who has predictable load volatility between 9 am to 6 pm and uses specific large EC2 instances during office hours for analysis, they use “c4.8xlarge” for that purpose. Their IT wanted to control launch of such large instance class post office hours and during nights to control costs, currently there is no way to restrict or control this action using Amazon IAM. In short we cannot create complex IAM policy with conditions that user A belonging to group A cannot launch instance type C every day between X and Y.

Some stop gap followed is to have a job running which removes the policy from an IAM user when certain time conditions are met. So basically what we would do is, to have a job that calls an API that removes the policy which restricts an IAM user or group from launching instances. This will make the IAM policy management complex and tough to assess/govern drifts between versions.

After the introduction of the CloudWatch events our Cloud operations started controlling it with lambda functions. Whenever an Instance type is launched it will trigger a lambda function, the function will filter whether it is a specific “C” type and check for the current time, if the time falls after office hours, it will terminate the EC2 instance launched immediately.

As a first step, we will be creating a rule in Amazon CloudWatch Events dashboard. We have chosen AWS API Call as an Event to be processed by an AWSCloudTrail Lambda function as a target.

CloudWatch Events Lambda EC2

The next step would be configuring rule details with Rule definition

CloudWatch Events Lambda EC2

Finally, we will review the Rules Summary

CloudWatch Events Lambda EC2

Amazon Lambda Function Code Snippet (Python)
import boto3

def lambda_handler(event, context):
#print (“Received event: ” + json.dumps(event, indent=2))
#print (“************************************************”)

ec2_client = boto3.client(“ec2”)

print “Event Region :”, event[‘region’]

event_time = event[‘detail’][‘eventTime’]
print “Event Time :”, event_time

time = event_time.split(‘T’)
t = time[1]
t = t.split(‘:’)
hour = t[0]

instance_type = event[‘detail’][‘requestParameters’][‘instanceType’]
print “Instance Type:”, instance_type

instance_id = event[‘detail’][‘responseElements’][‘instancesSet’][‘items’][0][‘instanceId’]
print “Instance Id:”,instance_id

if( instance_type.startswith( ‘t’ ) and hour > 18 or hour < 8 ):
print ec2_client.terminate_instances( InstanceIds = [ instance_id ] )

GitHub Gist URL:  https://github.com/cloud-automaton/automaton/blob/master/aws/events/TerminateAWSEC2.py

This post was co authored with Priya and Ramprasad of 8KMiles.

This article was originally published in: http://harish11g.blogspot.in/

CloudWatch + Lambda Case 3 -Controlling cross region EBS/RDS Snapshot copies for regulated industries

If you are part of regulated industry like Pharmaceutical/ Life sciences/BFSI running mission critical applications on AWS, at times as part of the compliance requirements you will have to restrict/control data movement to a particular geographic region in the cloud. This becomes complex to restrict sometimes. Let us explore in detail:

We all know there are varieties of ways to move data from one AWS region to another, but one commonly used method is Snapshot copy across AWS regions. Usually you can restrict snapshot copy permission in IAM Policy, but what if you need the permission enabled for moving data between AWS accounts inside a region, but still want to control EBS/RDS snapshot copy action across regions. It can be only mitigated by automatically deleting the snapshot on destination AWS region in case snapshot copy activity is done.

Our Cloud operations team used to altogether remove this permission in IAM or monitor this activity using polling scripts for customers with multiple accounts who need this permission and still need control. Now after the introduction of CloudWatch Events we have configured a rule that points to an AWS Lambda which gets triggered in near real time when snapshot is copied to destination AWS region. The lambda function will initiate a deletion process immediately. Though it is reactive it is incomparably faster than manual intervention.

In this use case, Amazon CloudWatch Event will identify the EBS Snapshot copies across the regions and delete them.

As a first step, we will be creating a rule in Amazon CloudWatch Events dashboard. We have chosen AWS API Call as an Event to be processed by an AWSCloudTrail Lambda function as a target.

CloudWatch Events Lambda

The next step would be configuring rule details with Rule definition

CloudWatch Events Lambda

Finally, we will review the Rules Summary

CloudWatch Events Lambda

Amazon Lambda Function Code Snippet (Python)

CloudWatch Events Lambda

GitHub Gist URL: https://github.com/cloud-automaton/automaton/blob/master/aws/events/AWSSnapShotCopy.py

https://github.com/cloud-automaton/automaton/blob/master/aws/events/AWSSnapShotCopy.py

This post was co-authored with Muthukumar and Ramprasad of 8KMiles

This article was originally published in: http://harish11g.blogspot.in/

CloudWatch + Lambda Case 2- Keeping watch on AWS ROOT user activity is normal or anomaly ?

As a Best Practice you should never use your AWS root account credentials to access AWS. Instead, create individual (IAM) users for anyone who needs access to your AWS account. This allows you to give each IAM user a unique set of security credentials and grant different permissions to each user. Example: Create an IAM user for yourself as well, give that user administrative privilege, and use that IAM user for all your work and never share your credentials to anyone else.

Usually Root has full access and it is not ideal to restrict the same in AWS IAM. Imagine you suddenly doubt some anomaly/suspicious activities done as Root user (using EC2 API’s etc) in your logs other than normal IAM user provisioning; this could be because Root user is compromised or forced, but ultimately it is a deviation from the best practice.

In the past we used to poll the CloudTrail logs using programs and differentiate between “root” and “Root”, and our cloud operations used to react to these anomaly behaviors. Now we can inform the cloud operations and customer stake holders near real time using CloudWatch events.

In this use case, Amazon CloudWatch Event will identify activities if any performed by an AWS ROOT user and notifications will be sent to SNS thru AWS Lambda.

As a first step, we will be creating a rule in Amazon CloudWatch Events dashboard. We have chosen AWS API Call as an Event to be processed by an AWSCloudTrail Lambda function as a target. The lambda function will detect if the event is triggered by root user and notifies through SNS.

CloudWatch Events Lambda Root Activity Tracking

The next step would be configuring rule details with Rule definition

CloudWatch Events Lambda Root Activity Tracking

Finally, we will review the Rules Summary

CloudWatch Events Lambda Root Activity Tracking

Amazon Lambda Function Code Snippet (Python)

CloudWatch Events Lambda Root Activity Tracking

GitHub Gist URL:

https://github.com/cloud-automaton/automaton/blob/master/aws/events/TrackAWSRootActivity.py

This post was co-authored with Saravanan and Ramprasad of 8KMiles

This article was originally published in: http://harish11g.blogspot.in/

CloudWatch + Lambda Case 1- Avoid malicious CloudTrail action in your AWS Account

As many of you know AWS CloudTrail provides visibility into API activity in your AWS account, Cloud Trail Logging lets you see which actions users have taken and which resources have been used, along with details such as the time and date of actions and the actions that have failed because of inadequate permissions. It enables you to answer important questions such as which user made an API call or which resources were acted upon in an API call. If a user disables CloudTrail logs accidentally or with malicious intent then audit logging events will not captured and hence you fail to have proper governance in place. The situation will get complex, If the user disables- enables back CloudTrail for a brief period of time where some important activities can go unlogged and unaudited. In short once CloudTrail logging is enabled it should not be disabled and this action needs to be defended in depth.

Our Cloud operations team had earlier written a program that periodically scans the Cloud Trail logs entries, if any log activity was missing after an X period of time it alerts the operations.  Overall reaction time on our cloud operations was >15-20 mins to mitigate this CloudTrail disable action.

Now after the introduction of CloudWatch Events we have configured a rule that points to an AWS Lambda function as target. This function gets triggered in near real time when CloudWatch is disabled and automatically enables it back without any manual interaction from Cloud operations. The advanced version of the program triggers workflow which logs entries into ticket system as well. This event model has helped us reduce the mitigation to less than a minute.
We have illustrated below the detailed steps on how to configure this event. Also we given the link for GIT with basic AWS Lambda Python code that can be used by your cloud operations.

In this use case, Amazon CloudWatch Event will identify whether an AWS account has got CloudTrail enabled or not, if not enabled, Amazon CloudWatch Events will take corrective actions by enabling the same.

As a first step, we will be creating a rule in Amazon CloudWatch Events dashboard. We have chosen AWS API Call as an Event to be processed by an AWSCloudTrail Lambda function as a target.

CloudWatch Events Lambda CloudTrail

The next step would be configuring rule details with Rule definition

CloudWatch Events Lambda CloudTrail

Finally, we will review the Rules Summary

CloudWatch Events Lambda CloudTrail

Amazon Lambda Function Code Snippet (Python)
import json
import boto3
print(‘Loading function’)
“”” Function to define Lambda Handler “””
def lambda_handler(event, context):
    try:
        client = boto3.client(‘cloudtrail’)
        if event[‘detail’][‘eventName’] == ‘StopLogging’:
            response = client.start_logging(Name=event[‘detail’][‘requestParameters’][‘name’])
    except Exception, e:
        sys.exit();

 

GitHub Gist URL:

This post was co-authored with Mohan and Ramprasad of 8KMiles

This article was originally published in: http://harish11g.blogspot.in/

Load Testing tool comparison – JMeter on it’s own vs JMeter & BlazeMeter together

Load testing is an important aspect of web applications life cycle on Amazon Cloud. Some of our customers ask us to generate 50000+ RPS to load test the scalability of their application deployed on Amazon cloud. Whenever we used to help such customers and migrate their applications on Amazon cloud for achieving scalability, load testing phase itself becomes a pain. Setting up the Load Testing infrastructure, writing automation around it, Managing, Maintaining and monitoring the load test infrastructure is an headache. Our Load testers and Infrastructure teams were spending considerable time and efforts on the above , instead of focusing only on load testing. We usually work with variety of tools from Grinder, JMeter, HP Load Runner to custom engineered load testing tools during the load testing phase. Some time back , our team started playing around with a SAAS based load testing tool called BlazeMeter. In this article i am going to share our experience in the form comparison between BlazeMeter and JMeter and why BlazeMeter has a bright future.
Blazemeter is a Saas based high scalable load testing tool that handle up to 300,000+ concurrent users. Their load test infrastructure is spread across major AWS regions. Since most of us have been using JMeter for years , the 100 % compatibility it provides to existing JMeter scripts is a good feature. Blazemeter also provides a Chrome Extension which can record browser actions & convert it to .jmx file.

10 Things I like about BlazeMeter

Point 1) Load Test becomes effective only when the load comes from different IP Addresses similar to real world scenario and not from a single source IP. When multiple virtual user load is generated from the same IP, the router as well as the server tries to cache information and optimize the throughput many times. Hence by using multiple IP addresses for the host, the EC2 server will get an illusion of receiving requests from multiple source IP’s. Also it is better that load is generated from multiple IP’s for Amazon ELB to evenly distribute load. Refer URL. BlazeMeter has capability to generate load from IP’s which is very important on load testing the cloud applications.
Point 2) Customizing the Network Emulation: Usually online applications will be accessed from multiple devices like PC’s, Laptop and mobiles. These devices have multiple network types such as 3G, broad band etc.Also at times times our online application will be accessed from locations which has poor network bandwidth , Both these parameters play an important role in capacity planning and load testing. We can chose the Bandwidth and network type emulation while doing the load test using BlazeMeter. Example we can configure the network type such as Unlimited Internet, 3G, Cable, Wifi etc and. Bandwidth download limit per device can also be set.
Point 3) Controlling the Throughput: Target throughput is a parameter of Apache JMeter that can be used to achieve a required throughput value of the application. A server’s performance need not always satisfy the target throughput value mentioned in JMeter. It could provide more throughput or lesser.The target throughput parameter can be controlled in run time in Blazemeter. Live server monitoring can help us identify if our servers are performing well for say 5000 Hits/sec & change the throughput value in run-time to a higher or lower value based on the server’s performance.
Point 4) Controlling the Agents: Apache JMeter works based on Master-Agent based architecture where the Master controls multiple agents generating the load. The number of agents parameter has to be usually decided before the starting of the test while using JMeter based load testing on Amazon Cloud. Option to dynamically change the throughput value is a very good feature to have while load testing a cloud application requiring thousands of Requests per second. BlazeMeter enables us to add or remove agent instances when a test is running. Any instance can be marked as Master or Slave(Agent) while the test is running.
Point 5) Controlling No. of Simulated Users on Slaves (Agents) : A load test strategy is mainly determined by following parameters like number of concurrent users, ramp up time, no. of test engines and test iterations and the test duration. Apache JMeter allows us to manually configure these values before the test is started. New EC2 instances have to provisioned for the Agents, the IP addresses (Usually Elastic IP) of the slaves/agents has to be manually added to the master. The entire setup has to be maintained, managed and monitored during the test cycles. This is ok for an load testing environment with few load test agents and low RPS, imagine an environment where you have generate thousands of RPS and having 50+ agents running. This process of managing the EC2 load test infrastructure will become tedious process overall for the load testing teams. In BlazeMeter, once the number of concurrent users is given, the number of test engines, number of threads and engine capacity is chosen automatically. This can be made semi-automatic, where the number of engines & number of threads as well can be selected by user and only engine capacity is chosen by BlazeMeter. Since it is a managed Load Test infrastructure, the Load Testers can concentrate the testing and not managing 100’s of EC2 load agents.
Point 6) Integrated Monitoring:
BlazeMeter offers live monitoring of essential parameters of test servers when the test is running which enables us to decide on the number & instance type for the test. In the conventional Apache JMeter load test setup in Amazon EC2 we have to observe the Key parameters using AWS Cloudwatch.
Blaze Meter provides AWS Cloud watch integration.An account with IAM access has to be created and
AWS Access Key & Secret Key values have to be configured so that the metrics are available in the Blazemeter’s dashboard. This features helps us to understand how the assets in the cloud are reacting to our load tests and help us accordingly tune the infrastructure.
While performing load testing, it is important not only to monitor your Web Servers & Databases but also the agents from where the load is generated . The New Relic plugin gives us the front end KPIs and back end KPIs.
BlazeMeters’s frontend KPIs provide insight on how many users are actually trying to access your website, mobile site or mobile apps.
BlazeMeters’s backend KPIs show how many users are getting through to your applications.
Point 7) Blazemeter allows us to have a different csv file per load test engine. Though this possible in Apache JMeter, it had to be done manually by copying the files onto the JMeter Agent EC2 instances and have the same filename since the agents refer to the Master’s properties. Blazemeter allows us to parameterize the values of even filenames and have different csv files in each engine without giving us to the trouble of copying files into specific EC2 instances & holds the files in a common repository so that it can be referred from there to each agent.
Point 8) Run the load test using older version of JMeter scripts: Old scripts can be reusable with this feature of BlazeMeter which lets us run the test using any version of Apache JMeter right from version 2.3.2 to 2.10. Some complex scripts prepared some months/years ago can be still be made usable and need not be redone. Saves efforts and costs.
Point 9) Schedule the Test & Stay Relaxed: BlazeMeter as well as JMeter lets you schedule your test duration & test time so that we can run longevity test at any time of the day. Even weekly scheduling is possible in BlazeMeter it is an added advantage, though it is not widely used.
Point 10) Interesting Plug-ins provided by Blazemeter :
Integration with Google Analytics: At the time of scripting, it is enough if we select the Google Analytics Option & provide account details of Google Analytics. BlazeMeter obtains the last 12 months of data and creates a test with 5 most visited pages and sets up the number of concurrent users based on that record.
Integration with WordPress: BlazeMeter provides integration with WordPress where WordPress users can test their App by using the BlazeMeter plug-in without any scripting.
Integration with Drupal & Jenkins: Plugins are available to load test Drupal & Jenkins servers as well.

Post Co Authored with Harine 8KMiles.