CloudWatch + Lambda Case 4: Control launch of Specific “C” type EC2 instances post office hours to save costs

We have a customer who has predictable load volatility between 9 am to 6 pm and uses specific large EC2 instances during office hours for analysis, they use “c4.8xlarge” for that purpose. Their IT wanted to control launch of such large instance class post office hours and during nights to control costs, currently there is no way to restrict or control this action using Amazon IAM. In short we cannot create complex IAM policy with conditions that user A belonging to group A cannot launch instance type C every day between X and Y.

Some stop gap followed is to have a job running which removes the policy from an IAM user when certain time conditions are met. So basically what we would do is, to have a job that calls an API that removes the policy which restricts an IAM user or group from launching instances. This will make the IAM policy management complex and tough to assess/govern drifts between versions.

After the introduction of the CloudWatch events our Cloud operations started controlling it with lambda functions. Whenever an Instance type is launched it will trigger a lambda function, the function will filter whether it is a specific “C” type and check for the current time, if the time falls after office hours, it will terminate the EC2 instance launched immediately.

As a first step, we will be creating a rule in Amazon CloudWatch Events dashboard. We have chosen AWS API Call as an Event to be processed by an AWSCloudTrail Lambda function as a target.

CloudWatch Events Lambda EC2

The next step would be configuring rule details with Rule definition

CloudWatch Events Lambda EC2

Finally, we will review the Rules Summary

CloudWatch Events Lambda EC2

Amazon Lambda Function Code Snippet (Python)
import boto3

def lambda_handler(event, context):
#print (“Received event: ” + json.dumps(event, indent=2))
#print (“************************************************”)

ec2_client = boto3.client(“ec2”)

print “Event Region :”, event[‘region’]

event_time = event[‘detail’][‘eventTime’]
print “Event Time :”, event_time

time = event_time.split(‘T’)
t = time[1]
t = t.split(‘:’)
hour = t[0]

instance_type = event[‘detail’][‘requestParameters’][‘instanceType’]
print “Instance Type:”, instance_type

instance_id = event[‘detail’][‘responseElements’][‘instancesSet’][‘items’][0][‘instanceId’]
print “Instance Id:”,instance_id

if( instance_type.startswith( ‘t’ ) and hour > 18 or hour < 8 ):
print ec2_client.terminate_instances( InstanceIds = [ instance_id ] )

GitHub Gist URL:  https://github.com/cloud-automaton/automaton/blob/master/aws/events/TerminateAWSEC2.py

This post was co authored with Priya and Ramprasad of 8KMiles.

This article was originally published in: http://harish11g.blogspot.in/

CloudWatch + Lambda Case 3 -Controlling cross region EBS/RDS Snapshot copies for regulated industries

If you are part of regulated industry like Pharmaceutical/ Life sciences/BFSI running mission critical applications on AWS, at times as part of the compliance requirements you will have to restrict/control data movement to a particular geographic region in the cloud. This becomes complex to restrict sometimes. Let us explore in detail:

We all know there are varieties of ways to move data from one AWS region to another, but one commonly used method is Snapshot copy across AWS regions. Usually you can restrict snapshot copy permission in IAM Policy, but what if you need the permission enabled for moving data between AWS accounts inside a region, but still want to control EBS/RDS snapshot copy action across regions. It can be only mitigated by automatically deleting the snapshot on destination AWS region in case snapshot copy activity is done.

Our Cloud operations team used to altogether remove this permission in IAM or monitor this activity using polling scripts for customers with multiple accounts who need this permission and still need control. Now after the introduction of CloudWatch Events we have configured a rule that points to an AWS Lambda which gets triggered in near real time when snapshot is copied to destination AWS region. The lambda function will initiate a deletion process immediately. Though it is reactive it is incomparably faster than manual intervention.

In this use case, Amazon CloudWatch Event will identify the EBS Snapshot copies across the regions and delete them.

As a first step, we will be creating a rule in Amazon CloudWatch Events dashboard. We have chosen AWS API Call as an Event to be processed by an AWSCloudTrail Lambda function as a target.

CloudWatch Events Lambda

The next step would be configuring rule details with Rule definition

CloudWatch Events Lambda

Finally, we will review the Rules Summary

CloudWatch Events Lambda

Amazon Lambda Function Code Snippet (Python)

CloudWatch Events Lambda

GitHub Gist URL: https://github.com/cloud-automaton/automaton/blob/master/aws/events/AWSSnapShotCopy.py

https://github.com/cloud-automaton/automaton/blob/master/aws/events/AWSSnapShotCopy.py

This post was co-authored with Muthukumar and Ramprasad of 8KMiles

This article was originally published in: http://harish11g.blogspot.in/

CloudWatch + Lambda Case 2- Keeping watch on AWS ROOT user activity is normal or anomaly ?

As a Best Practice you should never use your AWS root account credentials to access AWS. Instead, create individual (IAM) users for anyone who needs access to your AWS account. This allows you to give each IAM user a unique set of security credentials and grant different permissions to each user. Example: Create an IAM user for yourself as well, give that user administrative privilege, and use that IAM user for all your work and never share your credentials to anyone else.

Usually Root has full access and it is not ideal to restrict the same in AWS IAM. Imagine you suddenly doubt some anomaly/suspicious activities done as Root user (using EC2 API’s etc) in your logs other than normal IAM user provisioning; this could be because Root user is compromised or forced, but ultimately it is a deviation from the best practice.

In the past we used to poll the CloudTrail logs using programs and differentiate between “root” and “Root”, and our cloud operations used to react to these anomaly behaviors. Now we can inform the cloud operations and customer stake holders near real time using CloudWatch events.

In this use case, Amazon CloudWatch Event will identify activities if any performed by an AWS ROOT user and notifications will be sent to SNS thru AWS Lambda.

As a first step, we will be creating a rule in Amazon CloudWatch Events dashboard. We have chosen AWS API Call as an Event to be processed by an AWSCloudTrail Lambda function as a target. The lambda function will detect if the event is triggered by root user and notifies through SNS.

CloudWatch Events Lambda Root Activity Tracking

The next step would be configuring rule details with Rule definition

CloudWatch Events Lambda Root Activity Tracking

Finally, we will review the Rules Summary

CloudWatch Events Lambda Root Activity Tracking

Amazon Lambda Function Code Snippet (Python)

CloudWatch Events Lambda Root Activity Tracking

GitHub Gist URL:

https://github.com/cloud-automaton/automaton/blob/master/aws/events/TrackAWSRootActivity.py

This post was co-authored with Saravanan and Ramprasad of 8KMiles

This article was originally published in: http://harish11g.blogspot.in/

CloudWatch + Lambda Case 1- Avoid malicious CloudTrail action in your AWS Account

As many of you know AWS CloudTrail provides visibility into API activity in your AWS account, Cloud Trail Logging lets you see which actions users have taken and which resources have been used, along with details such as the time and date of actions and the actions that have failed because of inadequate permissions. It enables you to answer important questions such as which user made an API call or which resources were acted upon in an API call. If a user disables CloudTrail logs accidentally or with malicious intent then audit logging events will not captured and hence you fail to have proper governance in place. The situation will get complex, If the user disables- enables back CloudTrail for a brief period of time where some important activities can go unlogged and unaudited. In short once CloudTrail logging is enabled it should not be disabled and this action needs to be defended in depth.

Our Cloud operations team had earlier written a program that periodically scans the Cloud Trail logs entries, if any log activity was missing after an X period of time it alerts the operations.  Overall reaction time on our cloud operations was >15-20 mins to mitigate this CloudTrail disable action.

Now after the introduction of CloudWatch Events we have configured a rule that points to an AWS Lambda function as target. This function gets triggered in near real time when CloudWatch is disabled and automatically enables it back without any manual interaction from Cloud operations. The advanced version of the program triggers workflow which logs entries into ticket system as well. This event model has helped us reduce the mitigation to less than a minute.
We have illustrated below the detailed steps on how to configure this event. Also we given the link for GIT with basic AWS Lambda Python code that can be used by your cloud operations.

In this use case, Amazon CloudWatch Event will identify whether an AWS account has got CloudTrail enabled or not, if not enabled, Amazon CloudWatch Events will take corrective actions by enabling the same.

As a first step, we will be creating a rule in Amazon CloudWatch Events dashboard. We have chosen AWS API Call as an Event to be processed by an AWSCloudTrail Lambda function as a target.

CloudWatch Events Lambda CloudTrail

The next step would be configuring rule details with Rule definition

CloudWatch Events Lambda CloudTrail

Finally, we will review the Rules Summary

CloudWatch Events Lambda CloudTrail

Amazon Lambda Function Code Snippet (Python)
import json
import boto3
print(‘Loading function’)
“”” Function to define Lambda Handler “””
def lambda_handler(event, context):
    try:
        client = boto3.client(‘cloudtrail’)
        if event[‘detail’][‘eventName’] == ‘StopLogging’:
            response = client.start_logging(Name=event[‘detail’][‘requestParameters’][‘name’])
    except Exception, e:
        sys.exit();

 

GitHub Gist URL:

This post was co-authored with Mohan and Ramprasad of 8KMiles

This article was originally published in: http://harish11g.blogspot.in/

27 Best Practice Tips on Amazon Web Services Security Groups

AWS Security Groups are one of the most used and abused configurations inside an AWS environment if you are using them on cloud quite long. Since AWS security groups are simple to configure, users many times ignore the importance of it and do not follow best practices relating to it. In reality, operating on AWS security groups every day is much more intensive and complex than configuring them once. Actually, nobody talks about it! So in this article, I am going to share our experience in dealing with AWS Security groups since 2008 as a set of best practice pointers relating to configuration and day to day operations perspective.
In the world of security, proactive and reactive speed determines the winner. So a lot of these best practices should be automated in reality. In case your organizations’ Dev/Ops/Devops teams needs help with security group best practices automation, feel free to contact me.

AWS released so many features in the last few years relating to Security, that we should not visualize Security groups in isolation, It just does not make sense anymore. The Security Group should always be seen in the overall security context, with this I start the pointers.

Practice 1:  Enable AWS VPC Flow Logs for your VPC or Subnet or ENI level. AWS VPC flow logs can be configured to capture both accept and reject entries flowing through the ENI and Security groups of the EC2, ELB + some more services. This VPC Flow log entries can be scanned to detect attack patterns,alert abnormal activities and information flow inside the VPC and provide valuable insights to the SOC/MS team operations.

Practice 2: Use AWS Identity and Access Management (IAM) to control who in your organization has permission to create and manage security groups and network ACLs (NACL). Isolate the responsibilities and roles for better defense. For example, you can give only your network administrators or security admin the permission to manage the security groups and restrict other roles.

Practice 3: Enable AWS Cloud Trail logs for your account. The AWS Cloud Trail will log all the security group events and it is needed for management and operations of security groups. Event streams can be created from AWS Cloud Trail logs and it can be processed using AWS Lambda. For example : whenever a Security Group is deleted , this event will be captured with details on the AWS Cloud Trail logs. Events can be triggered in AWS Lamdba which can process this SG change and alert the MS/SOC on the dashboard or email as per your workflow. This is a very powerful way of reacting to events within span of <7 minutes. Alternatively, you can process the AWS Cloud Trail logs stored in your S3 every X frequency as a batch and achieve the above. But the Operation teams reaction time can vary depending on generation and polling frequency of the AWS Cloud Trail logs. This activity is a must for your operations team.
Practice 4: Enable AWS App Config for your AWS account. App records all events related to your security group changes and can even send emails.

Practice 5: Have proper naming conventions for the Amazon Web Services security group. The naming convention should follow a enterprise standards. For example it can follow the notation: “AWS Region+ Environment Code+ OS Type+Tier+Application Code”
Security Group Name – EU-P-LWA001
AWS Region ( 2 char ) = EU, VA, CA etc
Environment Code (1 Char)  = P-Production , Q-QA, T-testing, D-Development etc
OS Type (1 Char)= L -Linux, W-Windows etc
Tier (1 Char)= W-Web, A-App, C-Cache, D-DB etc
Application Code ( 4 Chars) = A001
We have been using Amazon Web Services from 2008 and found over the years managing the security groups in multiple environments is itself a huge task. Proper naming conventions from beginning is a simple practice, but will make your AWS journey manageable.

Practice 6: For security in depth, make sure your Amazon Web Services security groups naming convention is not self explanatory also make sure your naming standards stays internal. Example : AWS security group named UbuntuWebCRMProd is self explanatory for hackers that it is a Production CRM web tier running on ubuntu OS. Have an automated program detecting AWS security groups with Regex Pattern scanning of AWS SG assets periodically for information revealing names and alert the SOC/Managed service teams.

Practice 7: Periodically detect, alert or delete AWS Security groups not following the organization naming standards strictly. Also have an automated program doing this as part of your SOC/Managed service operations.  Once you have this stricter control implemented then things will fall in line automatically.

Practice 8: Have automation in place to detect all EC2,ELB and other AWS assets associated with Security groups. This automation will help us to periodically detect Amazon Web Services Security groups lying idle with no associations, alert the MS team and cleanse them. Unwanted security groups accumulated over time will create unwanted confusion.

Practice 9: In your AWS account, when you create a VPC, AWS automatically creates a default security group for the VPC. If you don’t specify a different security group when you launch an instance, the instance is automatically associated with the appropriate default security group. It will
allow inbound traffic only from other instances associated with the “default” security group and allow all outbound traffic from the instance. The default security group specifies itself as a source security group in its inbound rules. This is what allows instances associated with the default security group to communicate with other instances associated with the default security group. This is not a good security practice. If you don’t want all your instances to use the default security group, you can create your own security groups and specify them when you launch your instances. This is applicable to EC2 , RDS , ElastiCache and some more services in AWS. So detect “default” security groups periodically and alert to the SOC/MS.

Practice 10: Alerts by email and cloud management dash board should be triggered whenever critical security groups or rules are added/modified/deleted in production.  This is important for reactive action of your managed services/security operations team and audit purpose.

Practice 11 : When you associate multiple security groups with an Amazon EC2 instance, the rules from each security group are effectively aggregated to create one set of rules. AWS uses this set of rules to determine whether to allow access or not. If there is more than one SG rule for a specific port, AWS applies the most permissive rule. For example, if you have a rule that allows access to TCP port 22 (SSH) from IP address 203.0.113.10 and another rule that allows access to TCP port 22 for everyone, then everyone will have access to TCP port 22 because permissive takes precedence.
Practice X.1 : Have automated programs detecting EC2 associated with multiple SG/rules and alert the SOC/MS periodically. Condense the same manually to 1-3 rules max as part of your operations.

Practice X.1 : Have automated programs detecting conflicting SG/rules like restrictive+permissive rules together and alert the SOC/MS periodically.

Practice 12 : Do not create least restrictive security groups like 0.0.0.0/0 which is open to every one.
Since web servers can receive HTTP and HTTPS traffic open, only their SG can be permissive like
0.0.0.0/0,TCP, 80, Allow inbound HTTP access from anywhere
0.0.0.0/0,TCP, 443, Allow inbound HTTPS access from anywhere
All least restrictive SG created in your account should be alerted to SOC/MS teams immediately.

Practice 13: Have a security policy not to launch servers with default ports like 3306, 1630, 1433, 11211, 6379 etc. If the policy has to be accepted, then security groups also have to be created on the new hidden listening ports instead of the default ports. This provides a small layer of defense, since one cannot infer the information from the security group port on the EC2 service it is protecting. Automated detection and alerts should be created for SOC/MS, if security groups are created with default ports.

Practice 14: Applications which require stricter compliance requirements like HIPAA, PCI etc to be met need end to end transport encryption to be implemented on server back end in AWS. The communication from ELB to Web->App->DB->Others tiers need to be encrypted using SSL or HTTPS. This means only secured ports like 443, 465, 22 are permitted in corresponding EC2 security groups. Automated detection and alerts should be created for SOC/MS if security groups are created on secure ports for regulated applications.

Practice 15: Detection , alert and actions can be taken by parsing the AWS Cloud Trail logs based on usual patterns observed in your production environment
Example:
15.1 :If a port was opened and closed in <30 or X mins in production can be a candidate for suspicious activity if it is not normal pattern for your production
15.2 :If a permissive Security Group was created and closed in <30 or X mins can be a candidate for suspicious activity if it is not the normal pattern for your production
Detect anomalies on how long a change effected and reverted in security groups in production.

Practice 16: In case ports have to be opened in Amazon Web Services security groups or a permissive AWS security group needs to be applied, Automate this entire process as part of your operations such that a security group is open for X agreed minutes and will be automatically closed aligning with your change management. Reducing manual intervention avoids operational errors and adds security.

Practice 17: Make sure SSH/RDP connection is open in AWS Security Group only for jump box/bastion hosts for your VPC/subnets. Have stricter controls/policies avoid opening SSH/RDP to other instances of production environment. Periodically check , alert and close for this loop hole as part of your operations.

Practice 18: It is a bad practice to have SSH open to the entire Internet for emergency or remote support. By allowing the entire Internet access to your SSH port there is nothing stopping an attacker from exploiting your EC2 instance. The best practice is to allow very specific IP address in your security groups, this restriction improves the protection. This could be your office or on premise or DC through which you connect your jump box.

Practice 19: Too much or Too less: How many security groups for a usual multi tiered web app is preferred is a frequently asked question ?
Option 1 : One security group cutting across multiple tiers is easy to configure, but it is not a recommended for secure production applications.
Option 2: One Security group for every instance is too much protection and tough to manage operationally on longer term
Option 3: Individual Security group for different tiers of the application, For example : Have separate security groups for ELB, Web , App, DB and Cache tiers of your application stack.
Periodically check whether Option 1 type rule is being created in your production and alert the SOC/MS.

Practice 20: Avoid allowing  UDP or ICMP for private instances in Security groups. Not a good practice unless specifically needed.

Practice 21: Open only specific ports, Opening range of ports in a security group is not a good practice. In the security group you can add many inbound ingress rules, While opening the ports it is always advised to open for specific ports like 80,443, etc rather than range of ports like 200-300.

 

Add rules for communication between associated instances

Practice 22: Private Subnet instances can be accessed only from the VPC CIDR IP range. Opening instances to the public IP ranges is a possibility , but it does not make any sense. E.g., Opening HTTP to 0.0.0.0/0 in the SG of the private subnet instance does not make any sense. So detect and cleanse such rules.

 

Practice 23: AWS CloudTrail log captures the events related security. AWS lambda events or automated programs should trigger alerts to operations when abnormal activities are detected. For example:
23.1:Alert when X number of SG were added/deleted at “Y” Hours or Day by IAM user / Account
23.2:Alert when X number of SG Rules were added/deleted at “Y” Hours or Day by IAM user / Account

Practice 24: In case you are an enterprise make sure all security groups related activities of your production are part of your change management process. Security Group actions can be manual or automated with your change management in an enterprise.
In case you are an agile Startup or SMB and do not have complicated Change management process, then automate most of the security group related tasks and events as illustrated above on various best practices. This will bring immense efficiency into your operations

Practice 25: Use outbound/egress security groups wherever applicable within your VPC. Restrict FTP connection to any server on the Internet from your VPC. This way you can avoid data dumps and important files getting transferred out from your VPC. Defend harder and make it tougher !

Practice 26: For some tiers of your application, use ELB in front your instance as a security proxy with restrictive security groups – restrictive ports and IP ranges. This doubles your defense but increases the latency.

Practice 27: Some of the tools we use in conjunction to automate and meet above best practices are ServiceNow, Amazon CFT, AWS API’S, Rundeck, Puppet, Chef, Python , .Net and Java automated programs.

Note : In case your organizations Dev/Ops/Devops teams needs help on security group best practices automation on points listed above, feel free to contact me harish11g.aws@gmail.com

 

Source

 

About the Author

Harish Ganesan is the Chief Technology Officer (CTO) of 8K Miles and is responsible for the overall technology direction of the 8K Miles products and services. He has around two decades of experience in architecting and developing Cloud Computing, E-commerce and Mobile application systems. He has also built large internet banking solutions that catered to the needs of millions of users, where security and authentication were critical factors. He is also a prolific blogger and frequent speaker at popular cloud conferences.

 

EzIAM – IAM Made Easy on Cloud

8KMiles’ EzIAM combines the power of a reliable user provisioning and user management solution with the benefits of an AWS-hosted, cloud-based deployment model. Whether your organization is interested in provisioning to cloud-based applications and on-premise applications or providing user management for end users, 8KMiles EzIAM provides robust capabilities for user management, user provisioning, and access requests.