Solutions in Azure : Azure CDN for Maximum Bandwidth & Reduced Latency – Part I

Under the Current Ecosystem of Microsoft Cloud; Azure CDN has been widely recognized as CDaaS (Content Delivery as a Service), with its growing network of POP locations it can be used for offloading content to a globally distributed network of servers. As its prime functionality it caches static content at strategically placed locations to help distributing Content with low latency and higher data transfer that ensures faster throughput to your end users.
Azure CDN particularly offers a global solution to the developers for delivering high-bandwidth content by caching the content at physical nodes across the world. Now requests for these contents has to travel shorter distance reducing the number of hops in between. With CDN in place you can be ensured that Static files such as (images, JS, CSS, videos etc.) and website assets are sent from servers closest to your website visitors. For content heavy websites like e- commerce this latency savings could be of significant performance factor.

In essence, Azure CDN puts your content in many places at once, providing superior coverage to your users. For example, when someone in London accesses your US-hosted website, it is done through an Azure UK PoP. This is much quicker than having the visitor’s requests, and your responses, travel the full width of the Atlantic and back.

There are two players (Verizon & Akamai) to provide us those edge locations for Azure CDN. Both providers have distinct ways of building their CDN infrastructures. Verizon on one hand has been quite happy disclosing their location on the contrary Azure CDN from Akamai POP locations are not individually disclosed. To get the updated list of locations keep checking Azure CDN POP Locations.

How Azure CDN Works?

Today, over half of all internet traffic is already being served by CDNs. Those numbers are rapidly trending upward with every passing year, and azure has been significant contributor there.

cdn
As with most of the azure services, Azure CDNs are not magic and actually work in a pretty simple and straightforward manner. Let’s just go through the actual case-
1) A user (XYZ) requests a file (also called an asset) using a URL with a special domain name, such as .azureedge.net. DNS routes the request to the best performing Point-of-Presence (POP) location. Usually this is the POP that is geographically closest to the user.
2) If the edge servers in the POP do not have the file in their cache, the edge server requests the file from the origin. The origin can be an Azure Web App, Azure Cloud Service, Azure Storage account, or any publicly accessible web server.
3) The origin returns the file to the edge server, including optional HTTP headers describing the file’s Time-to-Live (TTL).
4) The edge server caches the file and returns the file to the original requestor (Alice). The file will remain cached on the edge server until the TTL expires. If the origin didn’t specify a TTL, the default TTL is 7 days.
5) Additional users (ex. ABC) may then request the same file using that same URL, and may also be directed to that same POP.
6) If the TTL for the file hasn’t expired, the edge server returns the file from the cache. This results in a faster, more responsive user experience.

Reasons for using a CDN

1) To understand the reasons behind why Azure CDN is so widely used, we first have to recognize the issue they’re designed to solve (LATENCY). It’s the annoying delay that occurs from the moment you request to load a web page to the moment its content actually appears onscreen, especially applications where many “internet trips” are required to load content. There are quite a few factors which contribute to this, many being specific to a given web page. In all cases however, the delay duration is impacted by the physical distance between you and that website’s hosting server. Azure CDN’s mission is to virtually shorten that physical distance, the goal being to improve site rendering speed and performance.
2) Another obvious reason for using the Azure CDN is throughput. If you look at a typical webpage, about 20% of it is HTML which was dynamically rendered based on the user’s request. The other 80% goes to static files like images, CSS, JavaScript and so forth. Your server has to read those static files from disk and write them on the response stream, both actions which take away some of the resources available on your virtual machine. By moving static content to the Azure CDN, your virtual machine will have more capacity available for generating dynamic content.

When a request for an object is first made to the CDN, the object is retrieved directly from the Blob service or from the cloud service. When a request is made using the CDN syntax, the request is redirected to the CDN endpoint closest to the location from which the request was made to provide access to the object. If the object is not found at that endpoint, then it is retrieved from the service and cached at the endpoint, where a time-to-live (TTL) setting is maintained for the cached object.

Author Credits: This article was written by Utkarsh Pandey, Azure Solution Architect at 8KMiles Software Services and originally published here

Cortana Intelligence for Patient Length of Stay Prediction

Predictive Length of Stay
Length of Stay (LOS) is defined as total number of days a patient stayed in hospital from his/her initial admit date to discharge date. LOS varies from patient to patient as it depends on disease conditions and facilities provided to him/her in hospital.

Importance of Predictive Length of Stay
PLOS is a sophisticated model that can significantly improve the quality of treatment, in addition, to decreasing the workload pressure of doctor. It enhances accurate planning with existing facilities to understand patient disease conditions and focus on discharging the patient quickly avoiding re-admissions in the hospital.

Machine Learning Techniques for Predictive Length of Stay
Here we talk about two popular machine learning techniques that can be used for LOS prediction.

Random Forest
Random Forest is one of the machine learning tree based predictive algorithm which builds several decision trees and combines their output to improve model accuracy. Combining decision trees output is known as Ensemble and it  helps weak learners to become strong learner.

For example, when we are uncertain to take a particular decision, then we approach few persons for suggestions and then by combining all suggestions we take the final decision. Similarly, the Random Forest mechanism becomes a strong learner from a weak learners (individual decision trees).

Random Forest is useful to solve regression and classification problems. For regression problems, the dependent variable is continuous. Whereas in classification problems, the dependent variable is categorical.

Advantage of this model is, it runs efficiently on large data set or databases that consists of data sets with thousands of features.

Gradient Boosting
Gradient Boosting is another machine learning algorithm for developing prediction models to solve regression and classification kind of problems. It builds the model in iterative fashion like other boosting models do and the main objective here is to minimize the loss of the model by adding weak learners using a gradient descent procedure.

Gradient descent is used find best weights to minimize error or loss of model. In gradient boosting, weak learners or decision trees are used to make prediction.

Advantage of GBT is that the trees are built one at a time, where each new tree helps to correct errors made by previously trained tree. With each tree added, the model becomes even more effective.

Microsoft Cortana Intelligence Solution for Predictive Length of Stay
As part of Cortana Intelligence Solution, Microsoft has in-built solution for complete LOS platform comprising data storage, data pipeline/processing, ML algorithms and visualization.

Microsoft’s support for integrated SQL Server Service and R programming is a big advantage for any data science problems.

Hospital patients’ data is stored in SQL Server and the PLOS machine learning models are executed through R IDE. The models take input from SQL Server and the predicted results can be stored in SQL Server database. Provision to visualize the statistics and predicted LOS result for patients can be made through the visualization tool PowerBI.

plos

PLOS Model Working Procedure

To predict length of stay of newly joined patient in hospital, we are using two machine learning algorithms. Those are Regression Random Forest and Gradient Boosting Trees. Both models follow the below procedure.
1. Data Pre-processing and cleaning
2. Feature Engineering
3. Data Set Splitting, Training, Testing, Evaluation
4. Deploy and Visualize results

Data Pre-processing
Hospital patient data is loaded into SQL Server tables. If there are any missing values in tables, these will get replaced . Missing values are replaced with either -1 or mean or mode.

Feature Engineering
In feature engineering, standardize values of features of data set is used for training predictive models.

ds1
Splitting, Training, Testing and Evaluating
Data set gets split into training data set and testing data set with specified percentage (e.g. train data set: 60%, test data set: 40%). These two data sets gets stored in SQL Server database tables separately and two models, regression Random Forest and Gradient Boosting Trees with training data set are built.

Finally we predict length of stay on test data set and then evaluate performance metrics of Regression Random Forest model and Gradient Boosting model.

Deploy and Visualize Results
Deploy PowerBI in client machine and then load predicted results into PoerBI Dash Board. We can visualize patient predictive length of stay using PowerBI Dash Board.

Advantage of Predictive Length of Stay
This solution enables predictive length of stay for hospitals and the Predicted information is especially useful to two personnel.

For hospitals which require solution to predict length of stay, this is a good choice because of its robust integration of SQL and R code.

Chief Medical Information Officer (CMIO)
This solution is useful for CMIO to determine if resources are being allocated appropriately in a hospital network  and to see which disease conditions are most prevalent in patients that will be staying in-care facilities long term

cmio-weekly-report

Care Line Manager
A Care Line Manager takes care of all patients in hospital directly and his main job is to observe each and every patient status on their health condition and required resources. He will plan the patient discharge and allocation of resources. Length of stay prediction helps the care line manager to manage their patient’s care better.

care-line-manager

Conclusion
Microsoft Cortana based solution is impressive in terms of providing the necessary components to predict the duration of stay in hospital for patients.  It provides flexible features for integration with the hospital healthcare applications and data.  The framework for pre-processing and modeling can be modified to suit the needs and the R programming capability will attract enthusiasm for the data scientists.  The basic dashboards based on PowerBI are user friendly and can be customized for specific needs of hospitals.  The whole solution will help to plan resources very effectively like allocation of doctors, beds, required medicine, etc and avoid unnecessary extended patient stay in hospital bed.

Author Credits: Kattula T, Senior Associate, Data Science, Analytics SBU at 8K Miles Software Services Chennai.

Image Source: Microsoft PLOS

Azure Resource Lock: Safeguard Your Critical Resources

Prevention is better than Cure – There were quite a few instances when I thought I should have applied this logic and this has even more significance if you are playing around public cloud more so while dealing with mission critical resources there. There are numerous occasions when you want to protect your resources from some unwarranted human actions or to put it bluntly we are seeking a solution to prevent other users in organization from accidentally deleting or modifying critical resources.

Azure has given us couple of ways to apply that level of control, firstly with role-based access control (RBAC), With the Reader and various Contributor roles RBAC is a great way to help protect resources in Azure. You can effectively limit the actions that a user can take against a resource. However, even with one of the Contributor roles, it is still possible to delete specific resources. This makes it very easy to accidentally delete an item.

Azure Lock provides you the options using which you can effetely control any such adventure. Unlike RBACK, you use management locks to apply a restriction across all users and roles. To learn about setting permissions for users and roles, see Azure Role-based Access Control. Using Resource lock you can lock a particular subscription, a particular resource group or even a specific resource. With this in place authorize users can still be able to read or modify the resources but they CAN NOT breach that lock and delete the same.

To make this happen you have to apply the Resource Lock Level to aforementioned scopes. You can set the lock level toCanNotDelete or ReadOnly(As of now these two are the only options supported). CanNotDelete means authorized users can still read and modify a resource, but they can’t delete it. ReadOnly means authorized users can only read from a resource, but they can’t modify or delete it.

When you apply a lock at a parent scope, all child resources inherit the same lock.

One point worth mentioning here is that you will also need to be in either an Owner or User Access Administrator role for the desired scope, because to play with Resource Lock it’s prerequisite to have access to Microsoft.Authorization/* orMicrosoft.Authorization/locks/* actions (only these two have appropriate permissions).

Create Resource Lock Using ARM Template

With Azure Resource Manager template we can lock the resources at the time of its creation. An ARM template is a JSON-formatted template file which provide a declarative way to define the deployment of Azure resources. Here is the example of how to create a lock on particular Storage Account-

linkedin_sponsor_sentiment_v1

linkedin_sponsor_sentiment_v1

If you see the example clearly the name of storage account coming via parameter while the most important section to be noticed is how the lock (utLock) has been created by concatenating the resource name with /Microsoft.Authorization/ and the name of the lock.

Create Resource Lock using PowerShell

Placing a resource lock on an entire group could be helpful in situations where you want to ensure no resources in that group are deleted. With below example I have tried to create a resource lock on a particular resource Group” UT-RG”

linkedin_sponsor_sentiment_v1

To remove the resource Lock make use of Remove-AzureResourceLock cmdlet, make sure you are providing proper ResourceId.

linkedin_sponsor_sentiment_v1

Off late Azure has brought this support to ARM Portal as well, to achieve the similar things via portal click the Settings blade for the resource, resource group, or subscription that you wish to lock, select Locks. Once prompted Give the lock a name and lock level and you are immune to those talked about unwanted situations. It gives you options to lock an entire subscription to ReadOnly if malicious activity was detected.

 

Author Credits: This article was written by Utkarsh Pandey, Azure Solution Architect at 8KMiles Software Services and originally published here.

 

Securing Cassandra

Data security is a major concern and is given top priority in every organization. Securing sensitive data and keeping it out of hands from those who should not have access is challenging even in traditional database environments, let alone a cloud hosted database.  Data should be secured on the fly and on rest. In this blog, we will talk about securing data in Cassandra database on cloud environment specifically on AWS. We will divide the blog into two.

  1. Secure Cassandra on AWS
  2. Cassandra data access security

Secure Cassandra on AWS

Cassandra is best used when hosted across multiple datacenters.  Hosting it on cloud across multiple datacenters will reduce lot of cost and peace of mind knowing that you can survive regional outages. However, securing cloud infra is most fundamental activity that need to be carried when hosted on cloud.

Securing Ports

Securing ports and unknown host access is the foremost think when hosted on cloud. Cassandra needs the following ports to be opened on your firewall for a multi-node cluster else it will act as standalone cluster.

Public ports

Port Number Description
22 SSH port

 

Create a Security Group with default rule as SSH traffic allowed on port 22 (both inbound and outbound).

  1. Click ‘ADD RULE’ (both inbound and outbound)
  2. Choose ‘SSH’ from the ‘Type’ dropdown
  3. Enter only allowed IPs from the ‘Source’ (inbound) / ‘Destination’ (outbound).

Private – Cassandra inter node ports

Ports used by the Cassandra cluster for inter-node communication must be restricted to communicate within the node, restricting the traffic flow from and to the external resources.

Port Number Description
7000 Inter node communication without SSL encryption enabled
7001 Inter node communication with SSL encryption enabled
7199 Cassandra JMX monitoring port
5599 Private port for DSEFS inter-node communication port.

 

To configure inter-node communication ports in a Security Group:

  1. Click ‘ADD RULE’.
  2. Choose ‘Custom TCP Rule’ from the ‘Type’ dropdown.
  3. Enter the port number in the ‘Port Range’ column.
  4. Choose ‘Custom’ from the ‘Source’ (inbound) / ‘Destination’ (outbound) dropdown and enter the same Security Group ID as the value. This allows communication only within the cluster over the configured port, when this Security Group would be attached to all the nodes in the Cassandra cluster.

Private – Cassandra inter node ports

The following port needs to be secured and opened only for the clients which will be connecting with our cluster.

Port Number Description
9042 Client port without SSL encryption enabled
9160 Client port SSL encryption enabled
9142 Should be open when both encrypted and unencrypted connections are required
9160 DSE client port (Thrift) port

 

To configure public ports in a Security Group:

  1. Click ‘ADD RULE’.
  2. Choose ‘Custom TCP Rule’ from the ‘Type’ dropdown.
  3. Enter the port number in the ‘Port Range’ column.
  4. Choose ‘Anywhere’ from the ‘Source’ (inbound) / ‘Destination’ (outbound).

To restrict the public ports to certain known IP or IP Range:

d.Choose ‘Custom’ from the ‘Source’ (inbound) / ‘Destination’ (outbound) dropdown and provide the IP value or CIDR block corresponding to the IP Range.

Now that we have configured the firewall, our VMS are secured for unknown access.  It is recommended to create Cassandra clusters in a private subnet within your VPC which does not have Internet access.

Create a NAT instance in a public subnet or configure NAT Gateway that can route the traffic from the Cassandra cluster in the private subnet for software updates.

Cassandra Data Access Security

Securing data involves the following security accesses,

  1. Node to node communication
  2. Client to node communication
  3. Encryption at rest
  4. Authentication and authorization

Node to Node and Client to Node Communication Encryption

Cassandra is a master-less database. Master-less design offers no single point of failure for any database process or function. Every node is same on Cassandra. Read and write is served by every node for any query on the database. So, there is lot of data transfer between each node on the cluster. When the database is hosted on public cloud network, this communication needs to be secured. Likewise, the data transferred between the database and client on the public network is always at risk. To secure the data on flight during these scenarios, usually encryption of data by sending over a SSL is preferred widely.

Most developers are not exposed to encryption in their day to day work. And setting up an encryption layer is always a tedious process. Cassandra helps this by providing a built-in feature. All we need to do is enable the server_encryption_options: and client_encryption_options: configurations on your cassandra.yaml file and provide the required certificates and keys. Cassandra takes care of the encryption of data during node to node and client to server communications.

Additionally, Cassandra follows Client Certificate Authentication. Imagine, without authentication that we are talking to another Cassandra node, the cluster is only expecting a SSL key, we can write programs to attach to a cluster and execute any commands, listen to writes on arbitrary token ranges, even create a admin account into the system_auth table.

To avoid this, Cassandra follows Client Certificate Authentication. Using this approach Cassandra takes the extra step of verifying the client against a local trust store. If it does not recognize the client’s certificate, it will not accept the connection. This additional verification can be enabled by setting require_client_auth:true in cassandra.yaml configuration file.

In the rest of the blog we will see step by step process of enabling and configuring the cluster for SSL connection. If you have a certificate already, you can skip Generating certificates using OpenSSL.

Generating Certificates using OpenSSL

Most of the UNIX system should have OpenSSL tool installed on it. If not available, install OpenSSL before proceeding further.

Steps:

  1. Create a configuration file gen_ca_cert.conf with the below configurations.

linkedin_sponsor_sentiment_v1

2.Run the following OpenSSL command to create the CA:
linkedin_sponsor_sentiment_v1
linkedin_sponsor_sentiment_v1

3.You can verify the contents of the certificate you just created with the following command:
linkedin_sponsor_sentiment_v1

You can generate certificate for each node if required, but doing that is not recommended. Because it is very tough to maintain separate key for each node. Imagine, when a new node is added to the cluster, the certificate for that node needs to be added to all other nodes which is tedious process. So, we recommend using the same certificate for all the nodes. Following steps will help you to use the same certificate for all the nodes.

Building Keystore

I will be explaining the keystore building for a 3-node cluster. Same can be followed for a n node cluster.

linkedin_sponsor_sentiment_v1
linkedin_sponsor_sentiment_v1

To verify that the keystore is generated with correct key pair information and accessible, execute the below command

linkedin_sponsor_sentiment_v1

With our key stores created and populated, we now need to export a certificate from each node’s key store as a “Signing Request” for our CA:

linkedin_sponsor_sentiment_v1

With the certificate signing requests ready to go, it’s now time to sign each with our CA’s public key via OpenSSL:

linkedin_sponsor_sentiment_v1

Add CA to the keystore into each node’s keystore via -import sub command of keytool.

linkedin_sponsor_sentiment_v1
linkedin_sponsor_sentiment_v1

Building Trust Store

Since Cassandra uses Client Certificate Authentication, we need to add a trust store to each node. This is how each node will verify incoming connections from the rest of the cluster.

We need to create trust store by importing CA root certificate’s public key:

linkedin_sponsor_sentiment_v1

Since all our instance-specific keys have now been signed by the CA, we can share this trust store instance across the cluster.

Configuring the Cluster

After creating all the required files, you can keep the keystore and truststore files in /usr/local/lib/cassandra/conf/ or any directory of your choice. But make sure that the cassandra demon has access to the directory. By making he below configuration in cassandra.yaml file the inbound and outbound requests will be encrypted.

Enable Node to Node Encryption

linkedin_sponsor_sentiment_v1

Enable Client to Node Encryption

linkedin_sponsor_sentiment_v1
linkedin_sponsor_sentiment_v1

Repeat the above process on all the nodes on the cluster and your cluster data is secured on flight and from unknowns.

Author Credits: This article was written by Bharathiraja S, Senior Data Engineer at 8KMiles Software Services.

Cassandra Backup and Restore Methods

Cassandra Backup and Restore Methods

Cassandra is a distributed database management system. In Cassandra, data is replicated among multiple nodes across multiple data centers. Cassandra can survive without any interruption in service when one or more nodes are down. It keeps its data in SSTable files. SSTables are stored in the keyspace directory within the data directory path specified by the ‘data_file_directories’ parameter in the cassandra.yaml file.  By default, its SSTable directory path is /var/lib/cassandra/data/<keypace_name>. However, Cassandra backups are still necessary to recover from following scenario

  1. Any errors made in data by client applications
  2. Accidental deletions
  3. Catastrophic failure that will require you to rebuild your entire cluster
  4. Data can become corrupt
  5. Useful to roll back the cluster to a known good state
  6. Disk failure

Cassandra Backup Methods

Cassandra provides two types of backup. One is snapshot based backup and the other is incremental backup.

Snapshot Based Backup

Cassandra provides nodetool utility which is a command line interface for managing a cluster. The nodetool utility gives a useful command for creating snapshots of the data. The nodetool snapshot command flushes memtables to the disk and creates a snapshot by creating a hard link to SSTables. SSTables are immutable. The nodetool snapshot command takes snapshot per node basis. To take an entire cluster snapshot, the nodetool snapshot command should be run using a parallel ssh utility, such as pssh.  Alternatively, snapshot of each node can be taken one by one.

It is possible to take a snapshot of all keyspaces in a cluster, or certain selected keyspaces, or a single table in a keyspace. Note that you must have enough free disk space on the node for taking the snapshot of your data files.

The schema does not get backed up in this method.  This must be done manual separately.

Example:

a.All keyspaces snapshot

If you want to take snapshot of all keyspaces on the node then run the below command.

$ nodetool snapshot

The following message appears:

Requested creating snapshot(s) for [all keyspaces] with snapshot name [1496225100] Snapshot directory: 1496225100

The snapshot directory is /var/lib/data/keyspace_name/table_nameUUID/ snapshots/1496225100

b.Single keyspace snapshot

Assuming you created the keyspace university. To took a snapshot of the keyspace and you want a name of the snapshot the run the below command

$ nodetool snapshot -t 2017.05.31 university

The following output appears:

Requested creating snapshot(s) for [university] with snapshot name [2015.07.17]

Snapshot directory: 2017.05.31

c.Single table snapshot

If you want to take a snapshot of only the student table in the university keyspace then run the below command

$ nodetool snapshot --table student university

The following message appears:

Requested creating snapshot(s) for [university] with snapshot name [1496228400]

Snapshot directory: 1496228400

After completing the snapshot, you can move the snapshot files to another location like AWS S3 or Google Cloud or MS Azure etc. You must backup the schema because Cassandra can only restore data from a snapshot when the table schema exists.

Advantages:

  1. Snapshotbased backup is simple and much easier to manage.
  2. Cassandra nodetool utility provides nodetool clearsnapshot command which removesthe snapshot files.

Disadvantages:

  1. For large datasets, it may be hard to take a daily backup of the entire keyspace.
  2. It is expensive to transfer large snapshot data to a safe location like AWS S3

Incremental Backup

Cassandra also provides incremental backups. By default incremental backup is disabled. This can be enabled by changing the value of “incremental_backups” to “true” in the cassandra.yaml file.

Once enabled, Cassandra creates a hard link to each memtable flushed to SSTable to a backup’s directory under the keyspace data directory. In Cassandra, incremental backups contain only new SSTable files; they are dependent on the last snapshot created.

In the case of incremental backup, less disk space is required because it only contains links to new SSTable files generated since the last full snapshot.

Advantages:

  1. The incremental backup reduces disk space requirements.
  2. Reducesthe transfer cost.

Disadvantages:

  1. Cassandra does not automatically clear incremental backup files. If you want to remove the hard-link files then write your own script for that. There is no built-in tool to clear them.
  2. Creates lots of small size file in backup. File management and recovery not a trivial task.
  3. It is not possible to select a subset of column families for incremental backup.

Cassandra Restore Methods

Backups are meaningful when they are restorable under situations when keyspace gets deleted or new cluster gets launched from the backup data or a node get replaced. Restoring backed up data is possible from snapshots and if you are using incremental backups then you need all incremental backup files created after the snapshot. There are mainly two ways to restore data from backup. One is using nodetool refresh and another one using sstableloader.

Restore using nodetool refresh:

Nodetool refresh command loads newly placed SSTables onto the system without a restart. This method is used when new node replace a node which is not recoverable. Restore data from a snapshot is possible if the table schema exists. Assuming you have created a new node then follow the below steps

  1. Create the schema if not created already.
  2. Truncate the table,if necessary.
  3. Locate the snapshot folder(/var/lib/keyspace_name/table_name UUID/snapshots/snapshot_name) and copy the snapshot SSTable directory to the /var/lib/keyspace/table_name-UUID directory.
  4. Run nodetool refresh.

Restore using sstableloader:

The sstableloader loads a set of SSTable files in a Cassandra cluster. The sstableloader provides the following options.

  1. Loading external data
  2. Loading existing SSTables
  3. Restore snapshots

The sstableloader does not simply copy the SSTables to every node, but also transfers the relevant part of the data to each node and also maintain the replication factor. Here sstableloader used for restore snapshots. Follow the below steps for restore using sstableloader

  1. Create the schema if not exists.
  2. Truncate the table if necessary.
  3. Bring your back up data to a node from AWS S3 or Google Cloud or MS AzureExample: Download your backup data in /home/data
  4. Run the below command
    sstableloader -d ip /home/data

 

Author Credits: This article was written by Sebabrata Ghosh, Data Engineer at 8KMiles Software Services  and can reach him here.

 

8KMiles strikes the right balance between Cloud Security and Performance

Healthcare industry is one of those sectors that face major challenges when it comes to embracing Cloud transformation. Regulatory specific security and huge amounts of sensitive data are the major reasons and there is a constant need for Technology and Information heads in the Healthcare organization to maintain the right equilibrium between security and privacy yet not compromising on the IT infra budgets and performance. On this context, A Capability matured level 5 Healthcare prospect approached 8KMiles with a specific set of requirement. The prospect company were using CPSI application and had an enormous amount of rectifications that were required to be either made or migrated. The reason behind the prospect opted 8KMiles the most preferred  choice of development partner because,  8KMiles is one of the  State of the Art Solution providers and has an Agile team of experts who practice Scrum and are ready to take up ad-hoc requirement with 24/7 development support system.

8KMiles worked extensively to collaborate with prospect company to :

1) Establish formal Business Relationship with the prospect.

2) Understand the Business needs and requirements:

a. User Interface : The interface involves multiple billing screens complicating navigation and unduly delays task completion

b. E-mail Messaging limitation : It is not possible to send messages to more than one person.

c. Compliance with HL7 requirements – Integration with other HL7 compliant systems is minimal, unable to interface with the radiologist even with a HL7 interface through a pretty basic MS SQL server database.

d. Workflow Management – The workflow management does not capture many areas of healthcare thus missing out on benefits

e.  Interoperability – CPSI does not allow FHIR (Fast Healthcare Interoperability Resources Specification) compatible APIs for open access to patient data

f.  Security – Multi-layered approach to security is not provided for limiting employee education

g.  Medical Records Synch – Updated information of patients treated at different facilities is not available on the fly

h.  Lack of Standardized terminology, system architecture and indexing – System is inflexible and incapacitated to capture the diverse requirements of the different healthcare disciplines

i.   Integration Issues – Integration of the hospital EMR with the Physician office EMR in a seamless fashion is not happening

j.   There were too many Switches in Role Hierarchy which were not recorded properly

3) 8KMiles studied the requirement systematically and came up with solutions based on Agile methodology for the above pain-points.

a. User Interface – The interface involves a SSO which allows the User to provide a onetime Credential making them to Open N-Number of Application/Resources with a single click .

b. E-mail Messaging limitation :

i. 8K Miles Access Governance & Identity Management Solution allows to send multiple mails based on approvals, rejection, attestation, re-certification.

ii. Multi-Level approval and messaging is possible .

c. Compliance with HL7 requirements:

i. 8K Miles Access Governance & Identity Management Solution provides the Customer to integrate any database such as MS SQL Server. Oracle, IDM dB 2.

ii. We provide our Customer/Employees to integrate with various portals through an SSO – Single Sign On (e.g.) A radiologist can login to multiple portals with single credential .

d. Workflow Management – The Workflow Management and Policy Compliance helps/Facilitates in capturing areas of restrictions in Health Care such as providing right access of resource to right user at right time .

e. Interoperability – 8KMiles Access Governance & Identity Management Solution & SSO helps in providing fast access to FHIR related Applications .

f. Security – 8K Miles Access Governance & Identity Management Solution Provides Multilevel Approval & Parallel Approval

g. Medical Records Synch – 8K Miles Access Governance & Identity Management Solution will be integrated and Synced with different databases, updated information of patients treated at different facilities will be available on the fly at any point of time .

h. Lack of Standardized terminology, system architecture and indexing – Highly customizable, flexible to handle any requirement based on Health Care Needs related to Identity & Access Governance.

i. Integration Issues – Integration of the hospital EMR with the Physician office EMR in a seamless fashion is provided using SSO.

j. Switch – 8K Miles Access Governance & Identity Management Solution helps in providing distribution of switches & roles to multiple users on a daily basis.

If you are experiencing similar problems in your healthcare business, please write to sales@8kmiles.com.

Cost Optimization Tips for Azure Cloud-Part III

In continuation to my previous blog am going to jot down more on how to optimize cost while moving into Azure public cloud

1. UPGRADE INSTANCES TO THE LATEST GENERATION-

With Microsoft Introducing next generation of Azure deployment via Azure Resource Manager (ARM) we can avail significant performance improvement just by upgrading the VM’s to latest versions (From Azure V1 to Azure V2). In all case the price would either be same or near to same.
For example- if you are upgrading a DV1-series VM to DV2- Series it gives you 35-40% faster processing for the same price point .

2. TERMINATE ZOMBIE ASSETS –

It is not enough to shut down VMs from within the instance to avoid being billed because Azure continues to reserve the compute resources for the VM including a reserved public IP. Unless you need VMs to be up and running all the time, shut down and deallocate them to save on cost. This can be achieved from Azure Management portal or Windows Powershell.

3. DELETING A VM-

If you delete a VM, the VHDs are not deleted. That means you can safely delete the VM without losing data. However, you will still be charged for storage. To delete the VHD, delete the file from Blob storage.

  •  When an end-user’s PC makes a DNS query, it doesn’t contact the Traffic Manager Name servers directly. Instead, these queries are sent via “recursive” DNS servers run by enterprises and ISPs. These servers cache the DNS responses, so that other users’ queries can be processed more quickly. Since these cached responses don’t reach the Traffic Manager Name servers, they don’t incur a charge.

The caching duration is determined by the “TTL” parameter in the original DNS response. This parameter is configurable in Traffic Manager—the         default is 300 seconds, and the minimum  is 30 seconds.

By using a larger TTL, you can increase the amount of caching done by recursive DNS servers and thereby reduce your DNS query charges. However, increased caching will also impact how quickly changes in endpoint status are picked up by end users, i.e. your end-user failover times in the event of an endpoint failure will become longer. For this   reason, we don’t recommend using very large TTL values.

Likewise, a shorter TTL gives more rapid failover times, but since caching is reduced the query counts against the Traffic Manage name servers will be higher.

By allowing you to configure the TTL value, Traffic Manager enables you to make the best choice of TTL based on your application’s business needs.

  • If you provide write access to a blob, a user may choose to upload a 200GB blob. If you’ve given them read access as well, they may choose do download it 10 times, incurring 2TB in egress costs for you. Again, provide limited permissions, to help mitigate the potential of malicious users. Use short-lived Shared Access Signature (SAS) to reduce this threat (but be mindful of clock skew on the end time).
  • Azure App Service charges are applied to apps in stopped state. Please delete apps that are not in use or update tier to Free to avoid charges.
  • In Azure Search, The stop button is meant to stop traffic to your service instance. As a result, your service is still running and will continue to be charged the hourly rate.
  • Use Blob storage to store Images, Videos and Text files instead of storing in SQL Database. The cost of the Blob storage is much less than SQL database. A 100GB SQL Database costs $175 per month, but the Blob storage costs only $7 per month. To reduce the cost and increase the performance, put the large items in the blob storage and store the Blob Record key in SQL database.
  • Cycle out old records and tables in your database. This saves money, and knowing what you can or cannot delete is important if you hit your database Max Size and you need to quickly delete records to make space for new data.
  • If you intend to use substantial amount of Azure resources for your application, you can choose to use volume purchase plan. These plans allow you to save 20 to 30 % of your Data Centre cost for your larger applications.
  • Use a strategy for removing old backups such that you maintain history but reduce storage needs. If you maintain backups for last hour, day, week, month and year, you have good backup coverage while not incurring more than 25% of your database costs for backup. If you have 1GB database, your cost would be $9.99 per month for the database and only $0.10 per month for the backup space.
  • Azure Document DB with the stored procedure is that they enable applications to perform complex batches and sequence of operations directly inside the database engine, closer to the data. So, the network traffic latency cost for batching and sequencing operations can be completely avoided. Another advantage to using stored procedure is that they get implicitly pre-complied to the byte code format upon registration, avoiding script compilation costs at the time of each invocation.
  • The default of a cloud service size is ‘small’. You can change it to extra small in your cloud service – properties – settings. This will reduce your costs from $90 to $30 a month at the time of writing. The difference between ‘extra small’ and ‘small’ is that the virtual machine memory is 780 MB instead of 1780 MB.
  • Windows Azure Diagnostic may burst your bill on Storage Transaction. If you do not control it properly.

We’ll need to define what kind of log (IIS Logs, Crash Dumps, FREB Logs, Arbitrary log files, Performance Counters, Event Logs, etc.) to be collected and send to Windows Azure Storage either on-schedule-basis or on-demand.

However, if you are not carefully define what you are really need for the diagnostic info, you might end up paying the unexpected bill.

Assuming the following figures:

  • You a few application that require high processing power of 100 instances
  • You apply 5 performance counter logs (Processor% Processor Time, Memory Available Bytes, Physical Disk% Disk Time, Network Interface Connection: Bytes Total/sec, Processor Interrupts/sec)
  • Performing a schedule transfer for every 5 seconds
  • The instance will run 24 hours per day, 30 days per month

How much it costs for Storage Transaction per month?

5 counters X 12 times X 60 min X 24 hours X 30 days X 100 instances = 259,200,000 transactions

$ 0.01 per 10,000 transactions X 129,600,000 transactions =$ 259.2 per month

To bring it down, if you really need to monitor all 5 performance counters on every 5 seconds? What if you reduce them to 3 counters and monitor it every 20 seconds?

3 counters X 3 times X 60 min X 24 hours X 30 days X 100 instances = 3,8880,000 transactions

$ 0.01 per 10,000 transactions X 129,600,000 transactions =$ 38.8 per month

You can see how much you save for this numbers. Windows Azure Diagnostic is really needed but use it improperly may cause you paying unnecessary money

  • An application will organize the blobs in different container per each user. It also allows the users to check size of each container. For that, a function is created to loop through entire files inside the container and return the size in decimal. Now, this functionality is exposed at UI screen. An admin can typically call this function a few times a day.

Assuming the following figures for illustration:

  • I have 1,000 users.
  • I have 10,000 of files in average for each container.
  • Admin call this function 5 times a day in average.
  • How much it costs for Storage Transaction per month?

Remember: a single Get Blob request is considered 1 transaction!

1,000 users X 10,000 files X 5 times query X 30 days = 1,500,000,000 transaction

$ 0.01 per 10,000 transactions X 1,500,000,000 transactions = $ 1,500 per month

Well, that’s not cheap at all so to bring it down.

Do not expose this functionality as real time query to admin. Considering to automatically run this function once in a day, save the size in somewhere. Just let admin to view the daily result (day by day).With limiting the admin to just only view once a day, what will be the monthly cost looks like:

1,000 users X 10,000 files X 1 times query X 30 days = 300,000,000 transaction

$ 0.01 per 10,000 transactions X 300,000,000 transactions = $ 300 per month

Author Credits: This article was written by Utkarsh Pandey, Azure Solution Architect at 8KMiles Software Services and originally published here

Cost Optimization Tips for Azure Cloud-Part II

Cloud computing comes with myriad benefits with its various as-a-service models and hence most businesses consider it wise to move their IT infrastructure to cloud. However, many IT admins worry that hidden costs will lower their department’s total cost of ownership.

We believe that it is more about estimating your requirements correctly and managing resources in the right way.

Microsoft Azure Pricing

Microsoft Azure allows you to quickly deploy infrastructures and services to meet all of your business needs. You can run Windows and Linux based applications in 22 Azure data-center regions, delivered with enterprise grade SLAs. Azure services come with:

  • No upfront costs
  • No termination fees
  • Pay only for what you use
  •  Per minute billing

You can calculate your expected monthly bill using Pricing Calculator and track your actual account usage and bill at any time using the billing portal.

1. Azure allows you to set a monthly spending limit on your account. So, if you forget to turn off your VMs, your Azure account will get disabled before you run over your predefined monthly spending limit. You can also set email billing alerts if your spend goes above a preconfigured amount.

2. It is not enough to shut down VMs from within the instance to avoid being billed because Azure continues to reserve the compute resources for the VM including a reserved public IP. Unless you need VMs to be up and running all the time, shut down and deallocate them to save on cost. This can be achieved from Azure Management portal or Windows Powershell.

3. Delete the unused VPN gateway and application gateway as they will be charged whether they run inside virtual network or connect to other virtual networks in Azure. Your account will be charged based on the time gateway is provisioned and available.

4. At least one VM is required to be running all the time, with one reserved IP included in 5 reserved public IP in use, in order to avoid reserved IP address charges. If you down all your VMs in service, then Microsoft is likely to reassign that IP to some other customer’s cloud service, which can hamper your business.

5. Minimize the number of compute hours by using auto scaling. Auto scaling can minimize the cost by reducing the total compute hours so that the number of nodes on Azure scales up or down based on demand.

6. When an end-user’s PC makes a DNS query, recursive DNS servers run by enterprises and ISPs cache the DNS responses. These cached responses don’t incur charge as they don’t reach the Traffic Manager Name servers. The caching duration is determined by the “TTL” parameter in the original DNS response. With larger TTL value, you can reduce DNS query charges but it would result in longer end-user failover times. On the other hand, shorter TTL value will reduce caching resulting in more query counts against Traffic Manager Name server. Hence, configure TTL in Traffic Manager based on your business needs.

7. Blob storage offers a cost effective solution to store graphics data. Blob storage of type Table and Queue of 2 GB costs $0.14/month and type block blob costs just $0.05/month

az03

A SQL Database of similar capacity will cost $4.98/month. Hence, use blob storage to store images, videos and text files instead of storing in SQL Database.

az02

To reduce the cost and increase the performance, put the large items in the blob storage and store the blob record key in SQL database.

Above tips will definitely help you cut cost on Azure and leverage the power of cloud computing to the best!

 

Cost Optimization Tips for Azure Cloud-Part I

In general there are quite a few driving forces behind rapid adoption of cloud platforms off late, but doing it within the industry cost budget is the actual challenge. Though the key benefit from public cloud providers like Azure is its pay-as-you-go pricing model which makes customers immune of any capital investment but there are chances that the expenses in cloud start to add up and can soon get out of control if we are not practicing effective cost management. It needs attention and care to “Take Control over Your Cloud Costs” and decide about a better cost management strategy.

Under these Articles I will try to outline few of the Azure’s cost saving and optimization considerations .Its gonna be 3 part article first of this can be subtitled as “7 consideration for highly effective azure architecture “ because it covers the stuff from an architect’s point of view—

1. Design for Elasticity

Elasticity has been one of the fundamental properties of Azure that drives many of its economic benefits. By designing you architecture for elasticity you will avoid Over Provisioning of resources, that way you should always restrict yourself to use only what is needed. There are umbrella of service in azure which helps customers getting rid of under-utilization of resources. (Always make use of services like VM scale set & Auto scaling).

2. Leverage Azure Application Services (Notification, Queue, Service Bus etc.)
Application services in azure doesn’t only help you in performance optimization but they can greatly affect the cost of overall infrastructure. Judicially decide on which all are the service needed for your workload and provision them in optimum way. Make use of the existing service don’t try to reinvent the wheel.
When you install software’s to suffice the requirements there is a benefit of Customize features but the trade-off is immense you have to have an instance for this which intern restrict the availability of these software’s by tying in to a particular VM. Whereas if you choose different services from Azure you enjoy the inbuilt Availability, Scalability and High Performance with option of Pay as you go.

3. Always Use Resource Group
Keep the related resource in close proximity that way you can save money on communication among the services in addition to that application will get boost on performance as latency would no longer be a factor. In the latter articles I will specifically talk about other benefits this particular service can offer.

4. Off Load From Your Architecture
Try to offload as much as possible by distributing things to their more suited services it doesn’t only reduce the maintenance headache but help in optimizing the cost too.Move the session related data out of server, Optimize the infrastructure for performance and cost by caching and edge caching static content.

Combine Multiple JS & CSS files into one and then perform the Compression for minification. Once bundled into compressed form move them to azure blob.When you’re content (Static content) is popular frontend it with Azure Content delivery network. Use Blob + Azure CDN as it will reduce the cost as well as latency (depends on cache-hit ratio).For anything related to media streaming make use of Azure CDN as it frees you from running Adobe FMS.

5. Caching And Compression For CDN Content
After analyzing multiple Customer subscriptions, we can derive a pattern of modest to huge CDN spends. As a common practice, customers would have forgotten to enable caching for CDN resources either at origin servers like Azure Blob. You should enable compression for content like CSS, JavaScript, Text Files, JSON, HTML etc. to ensure cost savings on bandwidth. Also, frequently deploy production changes and often forget to enable caching & compression for static resources, dynamic content like text/HTML/JSON etc. We recommend you to have post-deploy job as a part of your release automation to ensure client side caching, server-side compression etc. are enabled for your application and resources.

6. Continuous Optimization In Your Architecture
If you are using Azure for the past few years, there is high possibility of using outdated services, Though once designed you should not do too much tinkering with architecture but it’s good to have a look and see if there are things which can be replaced with new generation service. They might be best fit for the workload and can offer same results in less expenses. Always match resources with the workload.
With that it doesn’t only give you instant benefits but offers you recurring savings in your next month’s bill.

7. Optimize The Provisioning Based On Consumption Trend

You need to be aware of what you are using. There is no need of wasting your money on expensive instances or services if you don’t need them. Automatically turn off what you don’t need, there are services like Azure Automation which can help you achieving that.Make use of azure service like auto-scaling, VM scale set and azure automation for uninterrupted services even when traffic tends to increase beyond expectations.Special mention for Azure DevTest- a service specially designed for Development and testing scenarios. With this service azure helps end users to model their infrastructure where they will be charged only for office hours (usually 8*5) these settings are customizable which makes it even more flexible.While dealing with Azure storage, make use of Appropriate Storage Classes with required redundancy options. Service like File Storage, Page-Blob, Block-Blob etc. have their specific purpose so be clear while designing your architecture.

Author Credits: This article was written by Utkarsh Pandey, Azure Solution Architect at 8KMiles Software Services and originally published here

Enhanced Security In Cloud Computing – A Traditional Approach In Modern Technology

Cloud computing is now a part of our day to day activities and we can’t deny the fact that all applications in smartphones are integrated with cloud. The data which is uploaded, stored or downloaded via the cloud needs to be secured during its static and dynamic status.

Watermarking is a technique of authenticity and helps to secure data which enhances cloud computing security. Have you ever thought how we use watermarking in our everyday activities and how it is available in our wallets or purses? Yes! Am talking about the currency notes which has the watermark on it.

Can this be digitized? Yes, it has already been digitized which we often see in our TV channels which are digital watermarked with logo: be it BBC or our local channels.

Consider introducing the traditional approach of watermarking techniques in cloud computing which has enabled to prevent breaches and alleviate security threats that has risen due to technology growth.

We all know that cloud business model supports on-demand, pay-for-use, and economies-of-scale IT services over the Internet. The virtualized data centers combine to form the internet cloud. To enhance the multiple data residence on the same cloud, the cloud needs to be designed to be secure and private because security breaches will lead to data being compromised. Cloud platforms are dynamically built through virtualization with provisioned hardware, software, networks, and data sets. The idea is to migrate desktop computing to a service-oriented platform using virtual server clusters at data centers.

We need to identify best practice process for cost effective security enhancements in cloud computing and watermarking has been analyzed to fit into this category . Increasing the public cloud usage with security enhanced clouds like using digital watermarking techniques helps in betterment of revenue for the cloud service providers and client.

Digital watermarking is a method that can be applied to protect documents, images, video, software, and relational databases.These techniques protect shared data objects and massively distributed software modules.

This combined with data coloring can prevent data objects from being damaged, stolen, altered, or deleted. Protecting data center must first secure cloud resources and uphold user privacy and data integrity.

cloudsecurity2
(Image Source: Google)

The new approach could be more cost-effective than using the traditional encryption and firewalls to secure the clouds. This can be implemented to protect data-center access at a coarse-grained level and secure data access at a fine-grained file level. This can be interlinked with security as a service (SECaaS) and data protection as a service (DPaaS) and be widely used for personal, business, finance, and digital government data. It safeguards user authentication and tighten the data access-control in public clouds.

Public Watermarked clouds are an effective solution for security threats
It ensures confidentiality, integrity, and availability in a multi-tenant environment. Computing clouds with enhanced privacy controls demands ubiquity, efficiency, security, and trustworthiness.

Effective trust management, guaranteed security, user privacy, data integrity, mobility support, and copyright protection are crucial to the universal acceptance of cloud as a ubiquitous service. Effective less cost usage of public clouds leads to satisfied customers.

This blog would have thus enabled to identify the different security threats in cloud computing and identify best practice process for cost effective security enhancements in cloud computing which will in turn benefit the organization.

Author Credits: This article was written by Ramya Deepika, Cloud Architect at 8KMiles Software Services and originally published here

Benchmarking Sentiment Analysis Systems

Gone are the days when consumers depended on word-of-mouth from their near and dear ones for any product purchase. The Gen-Y generation now majorly go for the online reviews to not only get the virtual look & feel of the product but also understand the specs & cons of the product.  The online reviews could be from various sources like forum discussions, blogs, microblogs, twitter & social networks and are humongous in nature which has led to inception and rapid growth of Sentiment analysis.

Sentiment analysis helps to understand the opinion of people towards a product or an issue.  Sentiment analysis has grown to be one of the most active research areas in Natural Language Processing (NLP). It is also widely studied in data mining, web mining and text mining.  In this blog, we will discuss the techniques to evaluate and benchmark sentiment analysis feature in NLP products.

8KMiles’ recent engagement with a leading cloud provider involved applying sentiment analysis on different review datasets with some of the top products available in the market and assess their effectiveness in correctly identifying them as positive or negative or neutral.  Our own team had tracked opinions about enormous number of movie reviews from IMDb, product reviews from Amazon & Yelp and predicted sentiment polarity with very accurate results. Tweets are different from reviews because of their purpose: while reviews represent summarized thoughts of authors, tweets are more casual and limited to 140 characters of text.  Because of this nature, the accuracy results for tweets significantly vary from other datasets.  A systematic approach to benchmark the accuracy of the sentiment polarity helps to reveal the strength and weakness of various products under different scenarios.  Here, we share some of the top performing products and key information on how accuracy is evaluated for various NLP APIs and comparison report is prepared.

There is a wide range of products available in the market; a few important products with their language supports is shown below.

Google NL API Microsoft Linguistic
Analysis API
IBM AlchemyAPI Stanford CoreNLP Rosette Text Analytics Lexalytics
Sentiment Analysis Language Support English, Spanish, Japanese English, Spanish, French, Portuguese English, French, Italian, German, Portuguese, Russian and Spanish English English, Spanish, Japanese English, Spanish, French, Japanese, Portuguese, Korean, etc.

 

Not all products return sentiment polarity directly. Some directly return polarity like Positive, Negative, Neutral whereas some other return scores and these score ranges in turn have to be converted to get the polarity if we want to compare products. Following sections explain the results returned by some of the APIs.

Google’s NL API Sentiment Analyzer returns numerical score and magnitude values which represent the overall attitude of the text. After analyzing the results for various ranges, range from -0.1 to 0.1 found to be appropriate for neutral sentiment. Any score greater than 0.1 was considered as positive and score less than -0.1 was considered as negative.

Microsoft Linguistic Analysis API returns a numeric score between 0 & 1. Scores closer to 1 indicate positive sentiment, while scores closer to 0 indicate negative sentiment. A range of scores between 0.45 and 0.60 might be considered as neutral sentiment. Scores less than 0.45 may be used as negative sentiment and scores greater than 0.60 may be taken as positive sentiment.

IBM Alchemy API returns a score as well as sentiment polarity (positive, negative or neutral). So, the sentiment label can be used directly to calculate the accuracy.

Similarly, Stanford CoreNLP API returns 5 labels viz. very positive, positive, very negative, negative and neutral. For comparison with other products, very positive and positive may be combined and treated as a single group called positive and similarly very negative and negative may be combined and treated as a single group called negative.

After above conversion, we need a clean way to show the actual and predicted results for the sentiment polarities. This is explained with examples in the following section.

Confusion Matrix

A confusion matrix contains information about actual and predicted classifications done by a classification system. Let’s consider the below confusion matrix to get a better understanding.
linkedin_sponsor_sentiment_v1

Above example is based on a dataset having 1,500 reviews with split-up of 780 positives, 492 negatives and 228 neutrals in actual.  Product A has predicted 871 positives, 377 negatives and 252 neutrals whereas Product B predicts 753 positives, 404 negatives and 343 neutrals.

From the above table, we can easily understand that Product A rightly identifies 225 negative reviews as negative reviews.  But it wrongly classifies 157 negative reviews as positive and 110 negative reviews as neutral.

Note that all the correct predictions are located along the diagonal of the table (614, 225 and 55).  This helps to quickly identify the errors, as they are shown by values outside the diagonal.

Precision, Recall & F-Measure
Precision measures the exactness of a classifier.  A higher precision means less false positives, while a lower precision means more false positives. Recall measures the completeness, or sensitivity, of a classifier. Higher recall means less false negatives, while lower recall means more false negatives.

  • Precision = True Positive / (True Positive + False Positive)
  • Recall = True Positive / (True Positive + False Negative)

F1 Score is measure of a test’s accuracy. It considers both Precision and Recall of the test to compute score, F-Score is the Harmonic mean of precision and recall. This will tell you how your system is performing.

  • F1-Measure= [2 * (Precision * Recall) / (Precision + Recall)]

Here is the Precision, Recall and F1-Score for Product A and Product B.
linkedin_sponsor_sentiment_v1
Product A achieves 70% precision in finding the positive sentiment. This is calculated as 614 divided by 871 (refer confusion matrix table). This means, out of the 871 reviews that Product A identified as positive, 70% is correct (Precision) and 30% of reviews that Product A identified as positive is incorrect.

Product A achieves 79% recall in finding the positive sentiment. This is calculated as 614 divided by 780 (refer confusion matrix table). This means, out of the 780 reviews that Product A should have identified as positive, it has identified 79% correct (Recall) and 21% ((79 + 87)/780) is incorrect.

It is desired to have both high precision and high recall to get a final high accuracy. F1 score considers both precision and recall and gives a single number to compare across products. Based on F1 score comparison, the following is arrived for the given dataset.

  • Product B is slightly better than Product A in finding positive sentiment.
  • Product B is better than Product A in finding negative sentiment.
  • Product B is slightly better than Product A in finding neutral sentiment.

Final Accuracy can be calculated using Num. of Correct Prediction / Total Num. of Records.

linkedin_sponsor_sentiment_v1
To conclude, we understand for the given dataset, Product B performs better than product A. It is important that we must consider multiple datasets and take the average accuracy to find out product final standing.

Author Credits: This article was written by Kalyan Nandi, Lead, Data Science at Big Data Analytics SBU, 8KMiles Software Services.

Azure Virtual Machine – Architecture

Microsoft Azure is built on Microsoft’s definition of commodity infrastructure. The most intriguing part of Azure is its cloud operating system that is at its heart. During the initial days of azure when it started it stated using fork of windows as its underlying platform Back then they named it as red dog operating system & red dog hypervisor. If you go into the history of Azure the project which became azure was originally named as project red dog. David Cutler was the brain behind designing and developing the various Red Dog core components and it was he who gave this name.in his own words- the premises of Red Dog (RD) is being able to share a single compute node across several properties. This enables better utilization of compute resources and the flexibility to move capacity as properties are added, deleted, and need more or less compute power. This is turn drives down capital and operational expenses.

It was actually a custom version of windows and the driving reason for this customization was because hyper v during those didn’t had the features which was needed for Azure (particularly support for booting from VHD). if you try to understand the main components of its architecture we can count four pillars-

  • Fabric Controller
  • Storage
  • Integrated Development Tools and Emulated Execution Environment
  • OS and Hypervisor

Those were initial (early 2006) days of azure as it matured running a fork of an OS is not ideal (in terms of cost and complexity), so Azure team talked to the Windows team, and efforts were made to use Windows itself. As time passed windows eventually caught up and now Azure runs on Windows.

Azure Fabric Controller
Among there one component which contributed immensely in its success is fabric controller. The fabric controller owns all the resources in the entire cloud and runs on a subset of nodes in a durable cluster. It manages the placement, provisioning, updating, patching, capacity, load balancing, and scale out of nodes in the cloud all without any operational intervention.

Fabric Controller which still is backbone of azure compute is the kernel of the Microsoft Azure cloud operating system. Azure Fabric Controller regulates the creation, provisioning, de-provisioning and supervising of all the virtual machines and their back-end physical server. In other words It provisions, stores, delivers, monitors and commands the virtual machines (VMs) and physical servers that make up Azure. One added benefit is that It also detects and responds to both software and hardware failure automatically.

Patch Management
When we try to understand the underlying mechanism/workflow which Microsoft follows for patch management the common misconception is that it keeps updating all the nodes just like we do in our environment. But things in cloud is little different, AS Azure hosts are image-based (hosts boot from VHD) and it follows the image based deployment. So instead of just having patches delivered, azure roll out new VHD of the host operating system. Means they are not actually going and patching everyone but instead azure update at one place and because its orchestrated update it can use this image to update the whole environment.

This offers a major advantage in host maintenance as the volume itself can be replaced, enabling quick rollback. Host updates role out every few weeks (4-6 weeks), with an approach where updates are well-tested before they are rolled out broadly to the data centers. It’s the responsibility of Microsoft to ensure that each roll out is tested before updating the data center servers. To do so they start this implementation with few fabric controller stamps which could be called as pilot cluster and then once through they will gradually push the updated to production (Data Center) hosts. The underlying technology behind this is called Update Domain (UDs). When you create VM’s and put them in an availability set they get bucketed into update domain (by default you get 5 but there are provisions to increase them to 20). So, all the VMs part of availability set will get distributed equally among these UDs. With this the patching will take place in batches and Microsoft will ensure that at a time only single update domain should go for patching. You can call this as staged rollout. To understand this in more detail let’s see how Fabric controller manages the partitioning-

Partitioning
Under Azure’s Fabric Controller it has two types of partitions: Update Domains(UDs) and Fault Domains(FDs). These two are responsible for not only high availability for also for resiliency of infrastructure with this in place in empowers the Azure with ability to recover from failures and continue to function. It’s not about avoiding failures, but responding to failures in a way that avoids downtime or data loss.

Update Domain: An Update Domain is used to upgrade a service’s role instances in groups. Azure deploys service instances into multiple update domains. For an in-place update, the FC brings down all the instances in one update domain, updates them, and then restarts them before moving to the next update domain. This approach prevents the entire service from being unavailable during the update process.

Fault Domain: Fault Domain defines potential points of hardware or network failure. For any role with more than one instance, the FC ensures that the instances are distributed across multiple fault domains, in order to prevent isolated hardware failures from disrupting service. All exposure to server and cluster failure in Azure is governed by fault domains.

Azure Compute Stamp
As in Azure, things gets divided into stamps where each stamp will have one fabric controller and this fabric controller is the one responsible for managing the VMs inside that stamp. In Azure, there are only two type of stamps, it could either be compute stamp or storage stamp. This Fabric controller is also not single; it has its distributed branches. Based on the available information, azure will have 5 replicas of the fabric controller where it uses synchronous mechanism to replicate the state. In this setup, there will be one primary and to which control pane will talk to. Now it’s the responsibility of this primary to act on the instruction (example- provision a VM) and also let other replicas know about it. And when at least 3 of them acknowledge the fact that this operation is going to happen then the operation take place (this is called quorum based approach).

VM Availability
Talking about Azure Virtual Machines there are three major components (Compute, Storage, Networking) which constitute Azure VM.While discussing Azure Virtual Machine (VM) resiliency with customers, they typically assume it is comparable to their on-prem VM architecture and as such, features from on-prem is expected in Azure. Well it is not the case, thus I wanted to put this together to provide more clarity on the VM construct in Azure to better understand how VM availability in Azure is typically more resilient then most on-prem configuration.
“Talking about Azure Virtual Machines there are three major components (Compute, Storage, Networking) which constitute Azure VM. So, when we talk about Virtual machine in Azure we must take two dependencies into consideration. Windows Azure Compute (to run the VM’s), and Windows Azure Storage (to persist the state of those VM’s). What this means is that you don’t have a single SLA, instead you actually have two SLA’s. And as such, they need to be aggregated since a failure in either, could render your service temporarily unavailable.”
Under this article lets have our discussions on Compute(VM) and Storage components.

Azure Storage:You can check my other article where I have talked about this in great details, on how an Azure Storage Stamp is a cluster of servers hosted in Azure Datacenter. These Stamps follows layer architecture with built-in redundancy to provide High Availability. Under this multiple (most of the times 3) replicas of each file, referred as Extent, are maintained on multiple different servers partitioned between Update Domains and Fault Domains. Each write operations are performed Synchronously (till we are talking about intra Stamp replication) and control is returned only after the 3 copies completed the write, thus making the write operation strongly consistent.

Virtual Machine:

azure-blog2

Microsoft Azure has provided a means to detect health of virtual machines running on the platform and to perform auto-recovery of those virtual machines should they ever fail. This process of auto-recovery is referred to as “Service Healing”, as it is a means of “healing” your service instances. In this case, Virtual Machines and the Hypervisor physical hosts are monitored and managed by the Fabric Controller. The Fabric Controller has the ability to detect failures.

It can perform the detection in two mode-Reactive and Proactive. If the FC detects failures in reactive mode (Heartbeats missing) or proactive mode (known situations leading to a failure) from a VM or a hypervisor host, it will initiate a recovery by either redeploying the VM on a healthy host (same host or another host) and mark the failed resource as unhealthy and remove it from the rotation for further diagnosis. This process is also known as Self-Healing or Auto Recovery.
With Above diagram we can see different layers of the system where faults can occur and the health checks that Azure performs to detect them

*auto-recovery mechanism is enabled and available on virtual machines across all the different VM sizes and offerings, across all Azure regions and datacenters.

Author Credits: This article was written by Utkarsh Pandey, Azure Solution Architect at 8KMiles Software Services and originally published here

For more interesting information follow us in LinkedIn by clicking here

Tale of how a Fortune 50 Giant got the right Identity Access Partner

As organizations constantly seek to expand their market reach and attract new business opportunities, identity management ( specifically SSO, User provisioning and Management) has evolved as an enabling technology to mitigate risks and improve operational efficiency. As the shift to cloud-based services increases, identity management capabilities can be delivered as hosted services to help drive more operational efficiency and to improve business agility. A Fortune 50 Giant designed and developed a Cloud based Identity Management Solution and proposed an opportunity to existing and prospective SaaS Vendors to have a tie up with its product to test and make the product live as a full-fledged Single Sign-on Solution. 8KMiles being a Cloud based Identity Services company accepted to fulfill the Client requirement.
This company opted for 8KMiles, the most preferred choice, because 8KMiles is a State of the Art Solution provider that practiced Scrum Methodology. 8KMiles never hesitated to take up any ad-hoc requirements because of their industry specific team of experts who are ready to offer 24/7 development support system. 8KMiles thus pitched in to help the client by finding their Pain Points & AS-IS Scenarios. Later 8KMiles worked extensively to collaborate with their company and its respective SAAS Vendors to:
1. Establish formal Business Relationship with SaaS Vendors
2. Pre-qualify SaaS Vendor
3. Configure SaaS Application for Partner company on Identity Cloud Service SAML SSO integration, Test and Certify
4. Prepare IDP Metadata
5. Establish a stringent QA process
6. Complete Documentation
a. Confirmance and interoperability test report
b. SAML SSO Technical documentation
c. A video explaining the steps involved in the integration
d. Provide metadata configuration and mapping attributes details
7. Build Monitoring Tool
8. Adopt Quality Assurance with 2 level Testings (Manual & Automation)
9. Configure, integrate, troubleshoot, monitor and produce reports using 8KMiles MISPTM tool.

Thus, 8KMiles enabled this Fortune 50 Biggie to attain the following business benefits:
• Refinement of user self-service functionalities
• Activation users & groups and linking SaaS applications to the user accounts in the cloud
• Enablement of SSO to these SaaS Apps & enable user access via SAML2.0
• Usage of OAuth 2.0 to authorize changes to configuration.
• Adoption & Testing of different methods of SSO for the same SaaS App
• Documentation of the process in a simplistic manner
• Automation to test & report on all aspects of the integration without human involvement
For more information or details about our Cloud Identity access solution, please write to sales@8kmiles.com

Author Credit:  Ramprasshanth Vishwanathan, Senior Business Analyst- IAM SBU

LifeSciences Technology Trends to expect in 2017

There is a constant change in Life Sciences industry dynamics especially in terms of handling the ever growing data, using modern cloud technology, implementing agile business models and alignment with the compliance standards. Here are some of the Lifesciences Tech trends that are predicted for this 2017.

1) Cloud to manage Ever-growing Data

The growing volume of data is one of the major concerns amongst the Life science players. There is a constant need to manage and optimize this vast data into actionable information in real time and this where cloud technology will give the agility required to achieve this. Life sciences will continue to shift to cloud to address the inefficiencies and streamline and scale their operations.

2) Analytics to gain importance

Data is the key driver for any Pharma or Lifesciences organization and will determine the way drugs are developed and brought to market. The data are generally distributed and fragmented as clinical trial systems, databases, research data, physician notes, hospital records, etc and analytics will aid to a great extent to analyze, explore and curate these data to realize real business benefits out of this data ecosystem. Year 2017 will see a rise in trends like Risk analytics, Product failure analytics, drug discovery analytics, supply disruptions predictive analytics and Visualizations.

3) Lifesciences and HCPs will now go Digital for interactions

There was a time when online engagements were just a dream due to limitations in technology and regulations. Embracing a digital channel will open up faster mode of communication amongst Lifescience players, HCPs and consumers. These engagements are not only easy and compliant but are integrated with applications to meet industry requirements. This will also aid life sciences players reach more HCPs and also meet customer’s growing expectations for online interactions

4) Regulatory Information Management will be the prime focus

When dealing with overseas market it is often very critical to keep track of all the regulatory information at various levels. Many a times information on product registrations, submission of content plans, health authority correspondence, source documents to published dossiers etc, are disconnected and are not recorded at one centralized place. So programs that aid in alignment and streamlining of all regulatory activities will gain momentum this year.

To conclude, Daniel Piekarz, Head of Healthcare and Life Sciences Practice, DataArt stated that, “New start-ups will explode into the healthcare industry with disruptive augmented reality products without the previous limitations of virtual reality. As this technology advances the everyday healthcare experience, it will exist on the line between the real world and virtual in what is being called mixed reality.” Thus 2017 will see a paradigm shift in the way technology will revolutionize Life Sciences players’ go-to market leading to early adopters of the above gaining the competitive edge and reaping business benefits as compared to laggards!

Identity Federation – 10 Best Practices from a User’s perspective

Federation is a concept which deals with connection of two parties/providers (Identity Provider (IDP) and Service Provider (SP)). One vetting the credentials of the user and the other providing a service to the user depending upon the successful vetting of the credential by the first provider.  While setting up these federations, certain best practices can be followed by the two parties that would make the federation experience holistic for a user. This blog post explores and highlights these practices.

Let us start with the SP side, as this is where the user lands after a federation. The following are some of the best practices to be followed on the SP side.

  1. If the user has reached the SP for the first ever time, it will be good to make sure (with the consent of the user and with due thought to the user’s privacy), if some identifying information/data (like the immutable id, email id, etc.) of the user can be stored in the SP. This allows the user’s subsequent visits to be tied to it. This may be needed in order to ensure that the user gets a better service experience at the SP each time.  If the intention of the federation is not to expose/tailor user/usage specific sites, then this need not be followed.
  2. The SP should be able to link the user to multiple applications protected by the SP, with the identifying information from the federated transaction, preferably immediately after federation time, in order to establish continuity of services that the particular user was offered last time they logged in to the SP applications and/or tailor the application’s preferences to the federated user’s profile.
  3. Wherever possible it will be better to use local provisioning or remote provisioning of the user at the SP. Critical aspects like security, privacy organization’s policy in handling external users and their attributes dictate which type of provisioning would be best.    This provisioning process again would help speed up the user experience at the SP application and also will assist in giving a better service to the same returning user.
  4. Sending the right assertion parameters to the downstream application.

This is critical, as some of the vital information such as role information, auxiliary user attributes, preferences that the application requires need to be passed on appropriately to the application.  The application might be making important decisions based on these parameters in order to address the user’s needs correctly.

  1. Redirect to appropriate URLs at the Service Provider in both the cases of “User Success” or “User Failure” to get to those URLs. Failure could be because of the following reasons:

a) User not having the right role, privilege or permission to access the site or part of the site, as the assertion did not have them

b) User got authenticated correctly at the IDP, but IDP failed to send the right assertion to the SP

c) Failure of user disambiguation process at the SP

d) User unable to be linked to the right accounts at the SP

In each case, if the Failure URL gives an appropriate error message to the user, the user would know exactly why he could not access the resource. Ticketing software would probably help the user generate a ticket for the same and get a solution for the failed transaction from the SP.

Let us now focus on the IDP side, as this is where the user usually authenticates in order to reach an SP application in a federation.   The following are some of the best practices to be followed on the IDP side:

  1. Most important thing is for the IDP to display an error that is meaningful to the user, if and when his/her authentication fails at the IDP. This would make it easier for the user to know if it was any credential issue, network issue, domain issue or some other issue that made the authentication process fail.
  2. The IDP should mention to the user (either in their website or application) what the supported types of credentials allowed for authentication are. This could vary from userid/password to X509 Certificates, smartcards, OTPs or other hardware/software tokens.   The user interface should appropriately lead the user to the right type of authentication, using the right type of credentials, depending on the type of service he/she wishes to get from the IDP.
  3. The IDP would be able to issue assertions to the SP, that contains details like Level of assurance at which the user credential was accepted, other user attributes like role, user preferences etc., if applicable. This is other than the primary subject information that the IDP is contracted to send to the SP, during the initial metadata exchange.  These extra attributes would help the SP and its applications to tailor their user preferences.
  1. In the case of IDP supporting a particular profile and service, the IDP should support all the possible standard options/features linked with these profiles/services. Otherwise the users should be let known, what is supported.  This is to ensure that the sure users would not be misled, if they assume that all related option/features are supported.  For example:

a) If the IDP is supporting IDP-initiated Browser Post Profile, then it would be better if it supports IDP initiated Single Logout, Common Name ID Formats linked with the Browser Post Profile, Signing and Encryption of Assertions, Protocol Response in POST Formats etc.

b) if the IDP is supporting SP-initiated Browser Post Profile, then it would be better if it supports IDP or SP initiated Single Logout, Common NameID Formats, Signing and Encryption of Assertions, Protocol Response in POST Formats, Relay State, Accept AuthenRequest in GET and POST Formats, support “allow create” of new IDs if an ID is not already present for a federation transaction etc.

c) if the IDP is supporting multiple Protocols and features such as delegated authentication, redirection to other IDPs, etc., it should clearly mention the protocols and the corresponding profiles, features supported in each of the IDP supported website/application.

d) If exclusively a particular feature is not followed or supported by the IDP, it should be clearly mentioned by the IDP to its users.

All the above should be provided in laymen terms, so that the user can understand what features are supported and what are not.

  1. IDP should clearly mention the conditions associated with privacy clause/rules/protection with respect to user credentials/identities and their secure transport. This is to keep the user informed about how their credentials will be used. It also highlights the protection measures followed to make the federated transaction secure.

 

Author Bio:
Raj Srinivas, the author of this blog is an IAM and Security Architect with 8K Miles.   His passion includes analyzing problems that enterprises have in the IAM & Cloud Security domain, from various verticals that include Banking, Insurance, HealthCare, Government, Finance & Mortgage and provide in-depth solutions that will have far-reaching effects for the enterprise.

SaaS Data Security More Critical Now Than Ever Before in Healthcare

If, as healthcare payer and provider, you are using Software-as-a-Service (SaaS) solutions to provide better service to your patients and customers, data security might be as critical to you as your business. Healthcare industry has shifted to cloud based solutions to maintain electronic Protected Health Information (ePHI), and hence considering the sensitivity of information, it has become more important now than ever before.

In order to keep pace with growing demand, healthcare industry has faced the heat to provide faster, better, and more accessible care by adopting new technologies while complying with industry mandates like the Health Insurance Portability and Accountability (HIPAA) Act and Health Information Technology for Economic and Clinical Health (HITECH) Act.

Why Healthcare needs Data Security in SaaS applications?

It is because of the astonishing number of data breaches and attacks on healthcare data that has forced involved organizations to look for higher and stronger methods of data security at various levels, be it at physical level or application level.

According to a recent study by Symantec Corporation, approximately 39 percent of breaches in 2015 occurred in the health services sector. The same report found that ransomware and tax fraud rose as increasingly sophisticated attack tactics were being used by organized criminals with extensive resources. These criminals utilize professional businesses and adopt best business practices to exploit the loopholes prevailing in the security of ePHI. They first recognize the vulnerabilities and then exploit the weakness of unsecured system. The stolen health records are then sold in black market for ten times more value than that of stolen credit card.

In a statement given by Kevin Haley, director, Symantec Security Response, he said, “Advanced criminal attack groups now echo the skill sets of nation-state attackers. They have extensive resources and a highly-skilled technical staff that operate with such efficiency that they maintain normal business hours and even take the weekends and holidays off.”

Loopholes in Healthcare Data Security

Public cloud services are cost-efficient because the infrastructure often involves shared multitenant environments, whereby consumers share components and resources with other consumers often unknown to them. However, this model has many associated risks. It gives one consumer a chance to access the data of another and there is even a possibility that data could be co-mingled.

Cloud services allow data to be stored in many locations as part of Business Continuity Plan (BCP). It can be beneficial in case of an emergency such as a power outage, fire, system failure or natural disaster. If data is made redundant or backed up in several locations, it can provide reassurance that critical business operations will not be interrupted.

However, consumers that do not know where their data resides lose control of ePHI at another level. Knowing where their data is located is essential for knowing which laws, rules and regulations must be complied with. Certain geographical locations might expose ePHI to international laws that change who has access to data in contradiction to HIPAA and HITECH laws.

Many employees use their smartphones that do not have the capability to send and receive encrypted email. So, while answering emails at home from their phone, employees may be putting sensitive data at risk.

Bring Your Own Device (BYOD) policies also put data at risk if devices are lost or stolen. Logging on to insecure internet connections can also put business and patient information at risk. Storing sensitive data on unsecured local devices like laptops, tablets or hard drives can also expose unencrypted information at the source.

Conclusion

It is obvious from such startling statistics that large number of data breaches and cyber-attacks can occur only if the applications and storage of data are not secure. Also, all the employees involved should be given unique username and password and must be trained on how to keep login credentials secure apart from training sessions on Privacy and Security Rules.

Transferring data to the cloud comes with various issues that complicate HIPAA compliance for covered entities, Business Associates (BAs), and cloud providers such as control, access, availability, shared multitenant environments, incident readiness and response, and data protection. Although storage of ePHI in the cloud has many benefits, consumers and cloud providers must be aware of how each of these issues affects HIPAA and HITECH compliance.

The need of the hour is that all the involved parties must come together and take the responsibility of data security from their end till next level.

It is better to invest in securing SaaS applications and medical data instead of paying huge fines which could be in millions of dollars!

Related Posts :-

Steps to HIPAA Compliance for Cloud-Based Systems

Why Healthcare Organizations Need to Turn to Cloud

Steps to HIPAA Compliance for Cloud-Based Systems

The rapid growth of cloud computing has also led a rapid growth in concerns pertaining to security and privacy in cloud-based infrastructure. Hence, such fears create a huge requirement to understand and implement cloud computing for healthcare organizations, while being compliant with the Health Insurance Portability and Accountability Act (HIPAA).

The benefits offered by cloud-based technology are too good to let go. The agility and flexibility that can be gained by utilizing public, private, and hybrid clouds are quite compelling.  We need cloud based environment that can provide secure and HIPAA compliant solutions.

But, how do you achieve HIPAA compliance with cloud?

HIPAA

Image Source: Mednautix

Follow below steps to better understand how to ensure HIPAA compliance and reduce your risk of a breach.

1.      Create a Privacy Policy

Create a comprehensive privacy policy and make sure your employees are aware of it.

2.      Conduct trainings

Having a privacy policy in place wouldn’t be enough. You would require to make sure that they are implemented as well. For that employees must be given all required trainings during the on-boarding process. You should also require this training for all third-party vendors. Develop online refresher courses in HIPAA security protocols and make it mandatory for all employees and vendors to go through such courses at regular intervals.

3.      Quality Assurance Procedure

Make sure all the quality assurance standards are met and are HIPAA compliant. Conduct surprise drills to find out loopholes, if any.

4.      Regular audits

Perform regular risk assessment programs to check the probability of HIPAA protocol breach and evaluate potential damage in terms of legal, financial and reputational effects on your business. Document the results of your internal audits and changes that need to be made to your policies and procedures. Based on your internal audit results, review audit procedure and update with necessary changes.

5.      Breach Notification SOP

Create a standard operating procedure (SOP) document mentioning details about what steps should be taken in order to avoid a protocol breach. Mention steps to be followed in case a patient data breach occurs.

Most often you would have a cloud service provider who will take care of your wide range of requirements ranging from finding resources, developing apps & hosting them to maintenance of cloud based infrastructure. While the primary responsibility of HIPAA compliance falls on healthcare company, compliance requirements can extend to the cloud service provider as “business associates”.

Are your cloud service providers HIPAA business associates?

Figuring out if your cloud service provider can be considered as HIPAA business associate can be tough. The decision may vary depending on the type of cloud usage. Considering that the cloud provider agency is an active participant, it must also adhere to security requirements, such as including encryption, integrity controls, transmission protections, monitoring, management, employee screening and physical security.

Investing in HIPAA compliance procedures can save you from many hassles. Follow these steps and minimize your risk of being found noncompliant.

Ransomware on the Rise: What You Can Do To Protect Your Organisation From The Attack

Ransomware is malicious software used by the cyber criminals to hold your computer files or data and demand for a payment from you to release the data back. This is the popular method used by malware authors to extract money from organisations or individuals. Different ransomware varieties are used to get on to a person’s computer, but the most common technique is to install a software or use social engineering tactics, like displaying fake messages from law enforcement department, to attack on a victims computer. The criminals do not restore the computer access until the ransom is paid.

Ransomware is very scary as the files once damaged are almost beyond repair. But you can overcome this attack if you have prepared your system. Here are a few measures that will help you to protect your organisation from the attack.

Data Backup

To defeat ransomware, it is important to regularly backup your data. Once you get attacked, you will lose all your documents; but if you could clean your machine, restore your system and other lost documents from backup then you need not worry. So backup the files to an external hard drive or backup service, then you should can turn off your computer and start over with a new setup after attack.

Use Reputable Security Precaution

Using both antivirus software and a firewall will prevent you. It is critical to keep the software up-to-date and maintain a strong firewall, otherwise the hacker might easily exploit through security holes. Also purchase antivirus software from a reputable company because there are many fake software.

Ransomware Awareness Training

It is important to be aware of the cyber security issues and get properly trained to identify the phishing attempts. Creating awareness to staffs will help them to take action and deal with the ransomware. As the methods used by hackers constantly change it is necessary to keep your users up-to-date. Also, it is tough for untrained users to question the origin of a well-crafted phishing email. So, providing security training to staffs is the best way to prevent malware infection through social engineering.

Disconnect from Internet

If you are suspicious about a file or receive a ransomware note then immediately stop communicating with server. By disconnecting from the internet you might lessen the damage, as it takes some time to encrypt all your files. This isn’t foolproof but disconnecting from internet is better than nothing. As you can always re-install software if you have backed up your data.

Check File Extensions

Always see the full file extension, it helps to easily spot suspicious files. If possible try to filter the files in your mail by extension, like you can deny mails sent with ‘.EXE’ files. In case you exchange .EXE files in your organisation then it is better to use ZIP files with password-protection.

Exercise Caution, Warn Authorities, Never Pay

Avoid any links inside emails and suspicious websites. It is better to use another computer to research details if your PC falls under attack. Also, inform the local FBI or cybercrime about the attack. Finally, never pay them as it would be a mistake because they may continue to further demand from you and will not release your information as well. So, taking precautions to protect your data and being alert are the best ways to prevent ransomware attack.

In reality, dealing with ransomware requires an effective backup plan so you could protect your organisation from the attack.

Why Healthcare Organizations Need to Turn to Cloud

It is important for every healthcare organization to develop an effective IT roadmap in order to provide best services to customers and patients. Most healthcare payers and providers are moving to cloud based IT infrastructure in order to utilize the benefits that were once considered unimaginable.

But, before moving ahead, let’s check out some industry statistics and research studies.

Healthcare Organizations and Cloud Computing Statistics

Healthcare Organizations and Cloud Computing Statistics

Source: Dell GTAI

According to Dell’s Global Technology Adoption 2015, adoption of cloud technology increased from 25% in 2014 to 41% in 2015 alone.

Spending on cloud computing or in simpler terms – hosted medical services – in global healthcare was $4.2bn in 2004, but this will grow by 20% every year until 2020, reaching $12.6bn.

North America is the biggest consumer of cloud computing services and by 2020 its spending on cloud based solutions will reach $5.7bn.

What kind of data can be moved to Cloud?

Critical healthcare applications can be hosted on cloud platform in order to increase their accessibility and availability. Apart from them, below mentioned hardware, software and data can also be moved to cloud.

  • Email
  • Electronic Protected Health Information (ePHI)
  • Picture archiving and communication systems
  • Pharmacy information systems
  • Radiology information systems
  • Laboratory information systems.
  • Disaster recovery systems
  • Databases & Back up data

Why Healthcare Organizations should move to Cloud?

1.      Low Cost

Healthcare organizations can reduce IT costs to a significant extent by moving to the cloud. Cloud based software require lesser resources for development and testing. This implies fewer resources for maintenance and more robust solutions at a lesser cost. It is believed that over a period of 10 years, cloud based applications cost 50% lesser than traditional in-house hosted applications.

2.      More Accessibility

It is important that healthcare data is available to doctors as quickly as possible so that they can diagnose and analyze the situation of patient soon and take the right steps to improve the condition. Cloud computing improves web performance for users in remote locations as well without having to build out additional data centers.

3.      Higher Flexibility

Cloud based platform allows organizations to scale up or down based on their needs. With conventional on-premise hosted solutions, it can be tough to align their physical infrastructure quickly to varying demands. Migrating to cloud can help to deploy scalable IT infrastructure that can adjust itself as per the requirements, making sure that the resources are always available when required.

4.      Improved Efficiency

Moving to cloud also helps to avoid money being spent on infrastructure to be under-utilized. With early access to wide range of data, businesses can gather valuable insights about the performance of systems and plan their future strategy accordingly. Pharmaceutical companies, hospitals and doctors can focus on their core objective – giving the best possible treatment and service to patient – while the cloud service providers take care of their IT needs.

5.      More Reliability

Cloud based software remains available 24*7 from anywhere to any authorized personnel having an internet connection. Apart from that, it is easier to recover from loss due to natural disasters because of its distributed architecture.

Conclusion

The cloud’s resiliency and high availability make it a cost-effective alternative to on-site hosted solutions. However, security has been a major barrier to cloud adoption in many verticals. It’s especially critical in healthcare industry which is regulated by HIPAA and HITECH Acts and plays a major role in such organizations’ decisions to move their data into a public cloud app.

7 Tips to Save Costs in Azure Cloud

Cloud computing comes with myriad benefits with its various as-a-service models and hence most businesses consider it wise to move their IT infrastructure to cloud. However, many IT admins worry that hidden costs will lower their department’s total cost of ownership.

We believe that it is more about estimating your requirements correctly and managing resources in the right way.

Microsoft Azure Pricing

Microsoft Azure allows you to quickly deploy infrastructures and services to meet all of your business needs. You can run Windows and Linux based applications in 22 Azure data center regions, delivered with enterprise grade SLAs. Azure services come with:

  • No upfront costs
  • No termination fees
  • Pay only for what you use
  • Per minute billing

You can calculate your expected monthly bill using Pricing Calculator and track your actual account usage and bill at any time using the billing portal.

How to save cost on Azure Cloud?

  1. Azure allows you to set a monthly spending limit on your account. So, if you forget to turn off your VMs, your Azure account will get disabled before you run over your predefined monthly spending limit. You can also set email billing alerts if your spend goes above a preconfigured amount.
  2. It is not enough to shut down VMs from within the instance to avoid being billed because Azure continues to reserve the compute resources for the VM including a reserved public IP. Unless you need VMs to be up and running all the time, shut down and deallocate them to save on cost. This can be achieved from Azure Management portal or Windows Powershell.
  3. Delete the unused VPN gateway and application gateway as they will be charged whether they run inside virtual network or connect to other virtual networks in Azure. Your account will be charged based on the time gateway is provisioned and available.
  4. At least one VM is required to be running all the time, with one reserved IP included in 5 reserved public IP in use, in order to avoid reserved IP address charges. If you down all your VMs in service, then Microsoft is likely to reassign that IP to some other customer’s cloud service, which can hamper your business.
  5. Minimize the number of compute hours by using auto scaling. Auto scaling can minimize the cost by reducing the total compute hours so that the number of nodes onAzure scales up or down based on demand.
  6. When an end-user’s PC makes a DNS query, recursive DNS servers run by enterprises and ISPs cache the DNS responses. These cached responses don’t incur charge as they don’t reach the Traffic Manager Name servers. The caching duration is determined by the “TTL” parameter in the original DNS response. With larger TTL value, you can reduce DNS query charges but it would result in longer end-user failover times. On the other hand, shorter TTL value will reduce caching resulting in more query counts against Traffic Manager Name server. Hence, configure TTL in Traffic Manager based on your business needs.
  7. Blob storage offers a cost effective solution to store graphics data. Blob storage of type Table and Queue of 2 GB costs $0.14/month and type block blob costs just $0.05/month.

SQL Database

A SQL Database of similar capacity will cost $4.98/month. Hence, use blob storage to store images, videos and text files instead of storing in SQL Database.

SQL Database

To reduce the cost and increase the performance, put the large items in the blob storage and store the blob record key in SQL database.

Above tips will definitely help you cut cost on Azure and leverage the power of cloud computing to the best!

8K Miles Tweet Chat 3: IAG Issues and Solutions

8K Miles organised a Tweet chat on IAG issues and solutions on May 10th, if you have questions related to IAG for your organisation or wish to understand IAG better this blog is the right place. Go through this blog which is a recap on what happened during the Tweet chat, as we compile all the questions asked and answers given by the tweet chat participants. The official twitter handle of 8K Miles being @8KMiles shared frequently asked questions on IAG issues and solutions which were discussed and answered by the participants.

Tweet Chat  Q1

A1 Answer 2

A2 answer1 - IAG issues

A3 ANSWER1

 

A4 answer

A5 answer

A6 answer

A7 answer

 

A7 answr2

 

A8 answer

 

a9 answer

A10 ANSWER

It was an informative chat on IAG issues and solutions. For more such tweet chats on cloud industry follow our Twitter handle @8KMiles.

7 Common AWS Cost Issues and How You Can Fix Them

Cloud solutions offer significant business benefits for startups as well as established enterprises. To help the cloud setup, Amazon Web Services (AWS) delivers brilliant cloud infrastructure solutions with pay-per-use service and other computing resources with respect to the growing needs of a business. However, even with all the benefits there still exist cost related issues. Even though AWS model saves building and maintenance cost there are cost management issues which users encounter while using cloud. So keeping cost management in mind, we have the 7 common AWS cost issues and how you can fix them.

Resource Purchase

Remember to check your resource utilization before purchasing. The reserved, on-demand and spot instances should be purchased appropriately depending on use and risks; as spot instances have termination risk or reserved instances become inactive due to improper mapping. For long term usage, use reserved EC2 instances.

Instance Size

Remember to analyze your needs and choose the appropriate size rather than sizing for the highest demand. As the size may vary, like large, medium or small, so do not choose the defaults. You can use autoscaling services to manage high load for certain period of time. Also, save cache or non-critical data from application into non-persistent storage, instead of increasing the size of the Elastic Block Storage (EBS).

EC2 Utilization

The Elastic Compute cloud (EC2) is charged as per their usage time even if the EC2 instance is using less than destined capacity or sitting idle. You can identify the idle and underutilized instances and analyze the CPU utilization and network activity. If these data hit low then EC2 instance should be flagged. So you could contact the instance owner and verify if the instance is needed or not or the correct size. Shutdown the instances when they are not needed. This can help you reduce cost. Also find way to reuse the unused reserved instances.

Using elastic cache to store cache information from application and database reduces the instance CPU utilization and bandwidth; this allows us to minimize the bandwidth usage and thus reduces the cost. Also, you can use ECS service on underutilized EC2 instances to increase the workload and efficiency of the Instances, instead of launching new resources.

S3 Lifecycle

Keep an eye on your object storage and regularly track the following: what storage you have, where the storages are and how you are storing it. By using the Simple Storage Service (S3) lifecycle you can control and reduce the storage costs. During expiration and transition of object storage class to RRS and Glacier you can reduce your S3 and storage costs. The data that are no longer needed or need to be highly available can be deleted or moved to Glacier storage using the S3 Lifecycle Policy.

Use Glacier to archive data for longer time period and plan the data retrieval process from Glacier, do not retrieve data frequently.

Data Transfer Charges

It is important to constantly track the data transfer charges as they could cause unnecessary expenses. Maintaining a precise resource inventory on ‘what data is transferred’ and ‘where’ (i.e. to which region) would prevent money wastage on data transfer.

AWS Support Services

The EC2 hourly charges are greater for many users than the pay-as-you-go usage charges. So using AWS support services like ELB and pay-as-you-go would help to reduce cost. Analyze your costs and check if these services are effective for your usage.

Remove Resources

Detach elastic IPs from instances that are in stopped state and release the other unattached IPs. Also, delete older and unwanted AMIs and snapshots as well as delete snapshots of deleted AMIs. These resources should be tracked regularly so they don’t get missed among the many other resources. Also individually these items cost less but together they create large expense. Meanwhile in AWS environment, all resources are accounted even if they are inactive so it is important to turn off the unused ones. So take snapshot of unused RDS instance (if needed) and terminate the RDS instance. So keep track and remove all unwanted RDS manual snapshots and all unused resources.

Though AWS is a dynamic and effective cloud service it is important to regularly check the progress manually or via automated reports to avoid mistakes. As well as these 7 cost issues can be avoided by regularly monitoring the AWS services, which will largely help to reduce the cost.

Cloud Boundaries Redefined in AWS Chennai Meetup on 30th April @8KMiles

AWS Chennai Meetup
“You don’t have to say everything to be a light. Sometimes a fire built on a hill will bring interested people to your campfire.” ― Shannon L. Alder

This is one of the days where the above quote is proven to be right. As a market leader in delivering quality Cloud solutions, 8K Miles has this habit of stretching every new service offered by different cloud service providers to explore and solve the contemporary business problems. In yet another effort in that direction, we had a bunch of technical evangelists and architects gathering at 8K Miles today for the #AWSChennaiMeetup event, to discuss two broad areas on AWS architecture designs.

1) The Pros and Cons of Architecting Microservices on AWS

2) Cloud Boundaries redefined: Running ~600 million jobs every month on AWS

AWS Chennai MeetUp I Session

Session 1: Pros and Cons of Architecting Microservices on AWS

This topic was discussed by Sudhir Jonathan from Real Image. Sudhir works as a consultant to Real Image, on the teams that build Moviebuff.com and Justickets.in. His history includes ThoughtWorks, Own Startup and a few personal projects. He is an avid coder and specialities includes Ruby on Rails, Go, React, AWS and Heroku and  a few.

AWS Chennai Meetup

His valuable knowledge sharing session started with the Pros and Cons of Architecting Microservices on AWS, also covering automated deployment, inter process communication using SQS, ECS, cost reductions using spot instances, ELB and Autoscaling groups.

Session 2: Cloud Boundaries redefined: Running ~600 million jobs every month on AWS

In the world of cloud “Speed is Everything”.  To identify various security, compliance, risk and vulnerability drifts instantly on our customer environment, 8K Miles  cloud operations team runs ~600 million jobs every month.  Mohan and Saravanan – the technical architects of 8K Miles shared their experience in running distributed and fault tolerant scheduler stack and how it has evolved.

AWS Chennai Meetup

During the event we also organized a simple tweet quiz in our handle @8KMiles for all the participants. Dwarak discussed each question in detail with all the participants.

AWS Chennai Meetup

For more detailed updates on this event, please check the hashtag #AWSChennaiMeetup and our handle @8KMiles in Twitter
**Chennai Amazon Web Services Meetup, is organized by AWS Technology Evangelists from Chennai for AWS Cloud Enthusiasts. The goal is to conduct meetups often, share and learn the latest technology implementations on AWS, the challenges, the learnings, the limitations etc.

8K Miles is a leading Silicon Valley based Cloud Services firm, specializing in high-performance Cloud computing, Analytics, and Identity Management solutions and is emerging as one of the top solution providers for the IT and ITIS requirement on Cloud for the Pharma, Health Care and allied Life Sciences domains.

 

 

Demand for Cloud EHR is Increasing Rapidly

Considering the changing landscape of requirements in healthcare data management, the new cloud-based Electronic Health Record (EHR) has seen a rapid growth in demand for various reasons. When Epic systems declared to acquire Mayo Clinic data center for $46 million, it instated the belief the demand for cloud EHR is increasing steadily in healthcare domain.

In 2016, the demand for cloud-based technology solutions that assist medical practitioners to deliver better care while reducing administrative burdens is expected to gain momentum.

What is cloud-based EHR?

EHR is a collection of electronic health data of individual patients or populations. It includes medical history, demographics, genetic history, medication and allergies, laboratory test results, age/weight/blood group, x-ray images, vital signs, etc. in digital format and is capable of being shared across different stakeholders.

Cloud-based EHR allows software and clinical data to be stored, shared and updated in the cloud, whereas traditional EHR systems usually allow information to users that are in the same physical location as software and servers.

Putting it in simpler words, cloud EHR allows accessing and working with data available hosted at a shared online location, rather than on a personal disk drive or local server. All software and information is stored exclusively on an online network (also known as “in the cloud”). Any authorized user having internet connection can have access to this information.

Why demand for cloud EHR has increased?

With the existing demand for cloud EHR solutions, it is expected that the market of EHR will be about $30 billion by 2020 and is only expected to grow further.

graph-01
Source: Grand View Research

This demand is primarily driven by increased need for anytime-anywhere accessible software solutions that reduce errors and increase ease of use.

geographic analysis

Legacy on-premise solutions are unable to meet the changing requirements of today’s healthcare sector. They are built on outdated client-server systems that are costly, inflexible and can’t meet the need to analyze data on real-time basis. Such issues pose a significant challenge to healthcare providers who work with complex and disconnected datasets.

As compared to traditional on-site hosted solutions, cloud computing offers benefits such as:

  • Cost Reduction

Cloud based software require less development and testing resources, which implies lesser cost for support and maintenance of applications.

  • Improved Efficiency

Cloud solutions can automate many business processes such upgrade of systems. Being able to understand the bigger picture in real time allows you to focus on your core strengths.

  • Accessibility

Users can access applications from anywhere and any device, thus breaking down barriers of geography, thereby improving the speed with which decisions need to be taken.

  • Flexibility

Cloud-based network can easily scale, accommodate and respond to a rapid increase in the number of users and spikes in demand.

  • Reliability

Cloud computing allows applications to run independent of hardware through a virtual environment running out of secure data centers.

Today’s technological capabilities have made it possible to make health records more attractive to end users. Cloud based EHR solutions with visually appealing interfaces and innovative methods of interpretation, analysis and presentation of health records has been successful in improving the doctor-patient relationship.

Related post from 8KMiles

Top Health IT Issues You Should Be Aware Of

How Cloud Computing Can Address Healthcare Industry Challenges

How pharmaceuticals are securely embracing the cloud

5 Reasons Why Pharmaceutical Company Needs to Migrate to the Cloud

6 Reasons why Genomics should move to Cloud

AAEAAQAAAAAAAAbMAAAAJGVkYWJjYzI1LTc5ODktNGNiMC1hYTAyLWE5ZTFmYWU3OThkZQ

In the exciting, dynamic world of Genomics**, one witnesses path-breaking discoveries being made every day. On a mission to empower Pharmaceutical and Health Care industries with deeper understanding of the genome*, their activities and likelihoods of mutation, research activities in Genomics generate massive amounts of very important, significant data.

Research in Genomics churns out solutions, which means, a vast amount of useful information, with which identification, treatment, and prevention of numerous diseases and disorders could be realized with improved efficiency. Now, think about advanced Gene therapy and molecular medicine!

This enormous range of data and information needs a system that is not just capable of handling the colossal data load but also preserve the same with highsecurity and managed accessibility options.

  1. Large-scale Genome sequencing, Comparative genomics andpharmacogenomics require storage and processing of enormous volumes of data, to derive valuable insights that facilitate gene-mapping, diagnosis and drug discovery.
  2. The exhaustive genome database on a perpetual expansion mode simply exceeds the capacity of existing on-premise data storage facilities.
  3. In addition, the research-intensive domain requires managed services for user governance, access management and data encryption, which require synchronized efficiencies and compatibility with multiple systems that comply with international best practices and standard protocols.
  4. Cloud Architecture, empowered by scalability and elastic efficiencies, provides virtual storage spaces for the expansive genome database, with assisted migration, accessibility and security implementation in place.
  5. Large scale data-processing, storage, simulation and computation could be achieved on virtual laboratories on the cloud.
  6. Last but not the least, Cloud solutions for Genomics could be configured to fit specific research and standardized protocols requirement, rendering huge advantages, in terms of flexibility, compliance with protocols and regulatory standards, cost-savings and time efficiencies.

The leading Silicon Valley based Cloud Services firm, 8K Miles, specializes in high-performance Cloud computing solutions to Bioinformatics, Proteomics, Medical informatics and Clinical Trials for CROs, emerging as one of the top solution providers for the IT and ITIS requirement on Cloud for the Pharma, Health Care and allied Life Sciences domains.

*A Genome is the collection of the entire set of genes present in an organism.
**Genomics is the branch of science that deals with the structure, function, evolution and mapping of genomes.

  • April 20, 2016
  • blog