Azure Resource Lock: Safeguard Your Critical Resources

Prevention is better than Cure – There were quite a few instances when I thought I should have applied this logic and this has even more significance if you are playing around public cloud more so while dealing with mission critical resources there. There are numerous occasions when you want to protect your resources from some unwarranted human actions or to put it bluntly we are seeking a solution to prevent other users in organization from accidentally deleting or modifying critical resources.

Azure has given us couple of ways to apply that level of control, firstly with role-based access control (RBAC), With the Reader and various Contributor roles RBAC is a great way to help protect resources in Azure. You can effectively limit the actions that a user can take against a resource. However, even with one of the Contributor roles, it is still possible to delete specific resources. This makes it very easy to accidentally delete an item.

Azure Lock provides you the options using which you can effetely control any such adventure. Unlike RBACK, you use management locks to apply a restriction across all users and roles. To learn about setting permissions for users and roles, see Azure Role-based Access Control. Using Resource lock you can lock a particular subscription, a particular resource group or even a specific resource. With this in place authorize users can still be able to read or modify the resources but they CAN NOT breach that lock and delete the same.

To make this happen you have to apply the Resource Lock Level to aforementioned scopes. You can set the lock level toCanNotDelete or ReadOnly(As of now these two are the only options supported). CanNotDelete means authorized users can still read and modify a resource, but they can’t delete it. ReadOnly means authorized users can only read from a resource, but they can’t modify or delete it.

When you apply a lock at a parent scope, all child resources inherit the same lock.

One point worth mentioning here is that you will also need to be in either an Owner or User Access Administrator role for the desired scope, because to play with Resource Lock it’s prerequisite to have access to Microsoft.Authorization/* orMicrosoft.Authorization/locks/* actions (only these two have appropriate permissions).

Create Resource Lock Using ARM Template

With Azure Resource Manager template we can lock the resources at the time of its creation. An ARM template is a JSON-formatted template file which provide a declarative way to define the deployment of Azure resources. Here is the example of how to create a lock on particular Storage Account-

linkedin_sponsor_sentiment_v1

linkedin_sponsor_sentiment_v1

If you see the example clearly the name of storage account coming via parameter while the most important section to be noticed is how the lock (utLock) has been created by concatenating the resource name with /Microsoft.Authorization/ and the name of the lock.

Create Resource Lock using PowerShell

Placing a resource lock on an entire group could be helpful in situations where you want to ensure no resources in that group are deleted. With below example I have tried to create a resource lock on a particular resource Group” UT-RG”

linkedin_sponsor_sentiment_v1

To remove the resource Lock make use of Remove-AzureResourceLock cmdlet, make sure you are providing proper ResourceId.

linkedin_sponsor_sentiment_v1

Off late Azure has brought this support to ARM Portal as well, to achieve the similar things via portal click the Settings blade for the resource, resource group, or subscription that you wish to lock, select Locks. Once prompted Give the lock a name and lock level and you are immune to those talked about unwanted situations. It gives you options to lock an entire subscription to ReadOnly if malicious activity was detected.

 

Author Credits: This article was written by Utkarsh Pandey, Azure Solution Architect at 8KMiles Software Services and originally published here.

 

Azure Virtual Machine – Architecture

Microsoft Azure is built on Microsoft’s definition of commodity infrastructure. The most intriguing part of Azure is its cloud operating system that is at its heart. During the initial days of azure when it started it stated using fork of windows as its underlying platform Back then they named it as red dog operating system & red dog hypervisor. If you go into the history of Azure the project which became azure was originally named as project red dog. David Cutler was the brain behind designing and developing the various Red Dog core components and it was he who gave this name.in his own words- the premises of Red Dog (RD) is being able to share a single compute node across several properties. This enables better utilization of compute resources and the flexibility to move capacity as properties are added, deleted, and need more or less compute power. This is turn drives down capital and operational expenses.

It was actually a custom version of windows and the driving reason for this customization was because hyper v during those didn’t had the features which was needed for Azure (particularly support for booting from VHD). if you try to understand the main components of its architecture we can count four pillars-

  • Fabric Controller
  • Storage
  • Integrated Development Tools and Emulated Execution Environment
  • OS and Hypervisor

Those were initial (early 2006) days of azure as it matured running a fork of an OS is not ideal (in terms of cost and complexity), so Azure team talked to the Windows team, and efforts were made to use Windows itself. As time passed windows eventually caught up and now Azure runs on Windows.

Azure Fabric Controller
Among there one component which contributed immensely in its success is fabric controller. The fabric controller owns all the resources in the entire cloud and runs on a subset of nodes in a durable cluster. It manages the placement, provisioning, updating, patching, capacity, load balancing, and scale out of nodes in the cloud all without any operational intervention.

Fabric Controller which still is backbone of azure compute is the kernel of the Microsoft Azure cloud operating system. Azure Fabric Controller regulates the creation, provisioning, de-provisioning and supervising of all the virtual machines and their back-end physical server. In other words It provisions, stores, delivers, monitors and commands the virtual machines (VMs) and physical servers that make up Azure. One added benefit is that It also detects and responds to both software and hardware failure automatically.

Patch Management
When we try to understand the underlying mechanism/workflow which Microsoft follows for patch management the common misconception is that it keeps updating all the nodes just like we do in our environment. But things in cloud is little different, AS Azure hosts are image-based (hosts boot from VHD) and it follows the image based deployment. So instead of just having patches delivered, azure roll out new VHD of the host operating system. Means they are not actually going and patching everyone but instead azure update at one place and because its orchestrated update it can use this image to update the whole environment.

This offers a major advantage in host maintenance as the volume itself can be replaced, enabling quick rollback. Host updates role out every few weeks (4-6 weeks), with an approach where updates are well-tested before they are rolled out broadly to the data centers. It’s the responsibility of Microsoft to ensure that each roll out is tested before updating the data center servers. To do so they start this implementation with few fabric controller stamps which could be called as pilot cluster and then once through they will gradually push the updated to production (Data Center) hosts. The underlying technology behind this is called Update Domain (UDs). When you create VM’s and put them in an availability set they get bucketed into update domain (by default you get 5 but there are provisions to increase them to 20). So, all the VMs part of availability set will get distributed equally among these UDs. With this the patching will take place in batches and Microsoft will ensure that at a time only single update domain should go for patching. You can call this as staged rollout. To understand this in more detail let’s see how Fabric controller manages the partitioning-

Partitioning
Under Azure’s Fabric Controller it has two types of partitions: Update Domains(UDs) and Fault Domains(FDs). These two are responsible for not only high availability for also for resiliency of infrastructure with this in place in empowers the Azure with ability to recover from failures and continue to function. It’s not about avoiding failures, but responding to failures in a way that avoids downtime or data loss.

Update Domain: An Update Domain is used to upgrade a service’s role instances in groups. Azure deploys service instances into multiple update domains. For an in-place update, the FC brings down all the instances in one update domain, updates them, and then restarts them before moving to the next update domain. This approach prevents the entire service from being unavailable during the update process.

Fault Domain: Fault Domain defines potential points of hardware or network failure. For any role with more than one instance, the FC ensures that the instances are distributed across multiple fault domains, in order to prevent isolated hardware failures from disrupting service. All exposure to server and cluster failure in Azure is governed by fault domains.

Azure Compute Stamp
As in Azure, things gets divided into stamps where each stamp will have one fabric controller and this fabric controller is the one responsible for managing the VMs inside that stamp. In Azure, there are only two type of stamps, it could either be compute stamp or storage stamp. This Fabric controller is also not single; it has its distributed branches. Based on the available information, azure will have 5 replicas of the fabric controller where it uses synchronous mechanism to replicate the state. In this setup, there will be one primary and to which control pane will talk to. Now it’s the responsibility of this primary to act on the instruction (example- provision a VM) and also let other replicas know about it. And when at least 3 of them acknowledge the fact that this operation is going to happen then the operation take place (this is called quorum based approach).

VM Availability
Talking about Azure Virtual Machines there are three major components (Compute, Storage, Networking) which constitute Azure VM.While discussing Azure Virtual Machine (VM) resiliency with customers, they typically assume it is comparable to their on-prem VM architecture and as such, features from on-prem is expected in Azure. Well it is not the case, thus I wanted to put this together to provide more clarity on the VM construct in Azure to better understand how VM availability in Azure is typically more resilient then most on-prem configuration.
“Talking about Azure Virtual Machines there are three major components (Compute, Storage, Networking) which constitute Azure VM. So, when we talk about Virtual machine in Azure we must take two dependencies into consideration. Windows Azure Compute (to run the VM’s), and Windows Azure Storage (to persist the state of those VM’s). What this means is that you don’t have a single SLA, instead you actually have two SLA’s. And as such, they need to be aggregated since a failure in either, could render your service temporarily unavailable.”
Under this article lets have our discussions on Compute(VM) and Storage components.

Azure Storage:You can check my other article where I have talked about this in great details, on how an Azure Storage Stamp is a cluster of servers hosted in Azure Datacenter. These Stamps follows layer architecture with built-in redundancy to provide High Availability. Under this multiple (most of the times 3) replicas of each file, referred as Extent, are maintained on multiple different servers partitioned between Update Domains and Fault Domains. Each write operations are performed Synchronously (till we are talking about intra Stamp replication) and control is returned only after the 3 copies completed the write, thus making the write operation strongly consistent.

Virtual Machine:

azure-blog2

Microsoft Azure has provided a means to detect health of virtual machines running on the platform and to perform auto-recovery of those virtual machines should they ever fail. This process of auto-recovery is referred to as “Service Healing”, as it is a means of “healing” your service instances. In this case, Virtual Machines and the Hypervisor physical hosts are monitored and managed by the Fabric Controller. The Fabric Controller has the ability to detect failures.

It can perform the detection in two mode-Reactive and Proactive. If the FC detects failures in reactive mode (Heartbeats missing) or proactive mode (known situations leading to a failure) from a VM or a hypervisor host, it will initiate a recovery by either redeploying the VM on a healthy host (same host or another host) and mark the failed resource as unhealthy and remove it from the rotation for further diagnosis. This process is also known as Self-Healing or Auto Recovery.
With Above diagram we can see different layers of the system where faults can occur and the health checks that Azure performs to detect them

*auto-recovery mechanism is enabled and available on virtual machines across all the different VM sizes and offerings, across all Azure regions and datacenters.

Author Credits: This article was written by Utkarsh Pandey, Azure Solution Architect at 8KMiles Software Services and originally published here

For more interesting information follow us in LinkedIn by clicking here