Solutions in Azure : Azure CDN for Maximum Bandwidth & Reduced Latency – Part I

Under the Current Ecosystem of Microsoft Cloud; Azure CDN has been widely recognized as CDaaS (Content Delivery as a Service), with its growing network of POP locations it can be used for offloading content to a globally distributed network of servers. As its prime functionality it caches static content at strategically placed locations to help distributing Content with low latency and higher data transfer that ensures faster throughput to your end users.
Azure CDN particularly offers a global solution to the developers for delivering high-bandwidth content by caching the content at physical nodes across the world. Now requests for these contents has to travel shorter distance reducing the number of hops in between. With CDN in place you can be ensured that Static files such as (images, JS, CSS, videos etc.) and website assets are sent from servers closest to your website visitors. For content heavy websites like e- commerce this latency savings could be of significant performance factor.

In essence, Azure CDN puts your content in many places at once, providing superior coverage to your users. For example, when someone in London accesses your US-hosted website, it is done through an Azure UK PoP. This is much quicker than having the visitor’s requests, and your responses, travel the full width of the Atlantic and back.

There are two players (Verizon & Akamai) to provide us those edge locations for Azure CDN. Both providers have distinct ways of building their CDN infrastructures. Verizon on one hand has been quite happy disclosing their location on the contrary Azure CDN from Akamai POP locations are not individually disclosed. To get the updated list of locations keep checking Azure CDN POP Locations.

How Azure CDN Works?

Today, over half of all internet traffic is already being served by CDNs. Those numbers are rapidly trending upward with every passing year, and azure has been significant contributor there.

As with most of the azure services, Azure CDNs are not magic and actually work in a pretty simple and straightforward manner. Let’s just go through the actual case-
1) A user (XYZ) requests a file (also called an asset) using a URL with a special domain name, such as DNS routes the request to the best performing Point-of-Presence (POP) location. Usually this is the POP that is geographically closest to the user.
2) If the edge servers in the POP do not have the file in their cache, the edge server requests the file from the origin. The origin can be an Azure Web App, Azure Cloud Service, Azure Storage account, or any publicly accessible web server.
3) The origin returns the file to the edge server, including optional HTTP headers describing the file’s Time-to-Live (TTL).
4) The edge server caches the file and returns the file to the original requestor (Alice). The file will remain cached on the edge server until the TTL expires. If the origin didn’t specify a TTL, the default TTL is 7 days.
5) Additional users (ex. ABC) may then request the same file using that same URL, and may also be directed to that same POP.
6) If the TTL for the file hasn’t expired, the edge server returns the file from the cache. This results in a faster, more responsive user experience.

Reasons for using a CDN

1) To understand the reasons behind why Azure CDN is so widely used, we first have to recognize the issue they’re designed to solve (LATENCY). It’s the annoying delay that occurs from the moment you request to load a web page to the moment its content actually appears onscreen, especially applications where many “internet trips” are required to load content. There are quite a few factors which contribute to this, many being specific to a given web page. In all cases however, the delay duration is impacted by the physical distance between you and that website’s hosting server. Azure CDN’s mission is to virtually shorten that physical distance, the goal being to improve site rendering speed and performance.
2) Another obvious reason for using the Azure CDN is throughput. If you look at a typical webpage, about 20% of it is HTML which was dynamically rendered based on the user’s request. The other 80% goes to static files like images, CSS, JavaScript and so forth. Your server has to read those static files from disk and write them on the response stream, both actions which take away some of the resources available on your virtual machine. By moving static content to the Azure CDN, your virtual machine will have more capacity available for generating dynamic content.

When a request for an object is first made to the CDN, the object is retrieved directly from the Blob service or from the cloud service. When a request is made using the CDN syntax, the request is redirected to the CDN endpoint closest to the location from which the request was made to provide access to the object. If the object is not found at that endpoint, then it is retrieved from the service and cached at the endpoint, where a time-to-live (TTL) setting is maintained for the cached object.

Author Credits: This article was written by Utkarsh Pandey, Azure Solution Architect at 8KMiles Software Services and originally published here

Cortana Intelligence for Patient Length of Stay Prediction

Predictive Length of Stay
Length of Stay (LOS) is defined as total number of days a patient stayed in hospital from his/her initial admit date to discharge date. LOS varies from patient to patient as it depends on disease conditions and facilities provided to him/her in hospital.

Importance of Predictive Length of Stay
PLOS is a sophisticated model that can significantly improve the quality of treatment, in addition, to decreasing the workload pressure of doctor. It enhances accurate planning with existing facilities to understand patient disease conditions and focus on discharging the patient quickly avoiding re-admissions in the hospital.

Machine Learning Techniques for Predictive Length of Stay
Here we talk about two popular machine learning techniques that can be used for LOS prediction.

Random Forest
Random Forest is one of the machine learning tree based predictive algorithm which builds several decision trees and combines their output to improve model accuracy. Combining decision trees output is known as Ensemble and it  helps weak learners to become strong learner.

For example, when we are uncertain to take a particular decision, then we approach few persons for suggestions and then by combining all suggestions we take the final decision. Similarly, the Random Forest mechanism becomes a strong learner from a weak learners (individual decision trees).

Random Forest is useful to solve regression and classification problems. For regression problems, the dependent variable is continuous. Whereas in classification problems, the dependent variable is categorical.

Advantage of this model is, it runs efficiently on large data set or databases that consists of data sets with thousands of features.

Gradient Boosting
Gradient Boosting is another machine learning algorithm for developing prediction models to solve regression and classification kind of problems. It builds the model in iterative fashion like other boosting models do and the main objective here is to minimize the loss of the model by adding weak learners using a gradient descent procedure.

Gradient descent is used find best weights to minimize error or loss of model. In gradient boosting, weak learners or decision trees are used to make prediction.

Advantage of GBT is that the trees are built one at a time, where each new tree helps to correct errors made by previously trained tree. With each tree added, the model becomes even more effective.

Microsoft Cortana Intelligence Solution for Predictive Length of Stay
As part of Cortana Intelligence Solution, Microsoft has in-built solution for complete LOS platform comprising data storage, data pipeline/processing, ML algorithms and visualization.

Microsoft’s support for integrated SQL Server Service and R programming is a big advantage for any data science problems.

Hospital patients’ data is stored in SQL Server and the PLOS machine learning models are executed through R IDE. The models take input from SQL Server and the predicted results can be stored in SQL Server database. Provision to visualize the statistics and predicted LOS result for patients can be made through the visualization tool PowerBI.


PLOS Model Working Procedure

To predict length of stay of newly joined patient in hospital, we are using two machine learning algorithms. Those are Regression Random Forest and Gradient Boosting Trees. Both models follow the below procedure.
1. Data Pre-processing and cleaning
2. Feature Engineering
3. Data Set Splitting, Training, Testing, Evaluation
4. Deploy and Visualize results

Data Pre-processing
Hospital patient data is loaded into SQL Server tables. If there are any missing values in tables, these will get replaced . Missing values are replaced with either -1 or mean or mode.

Feature Engineering
In feature engineering, standardize values of features of data set is used for training predictive models.

Splitting, Training, Testing and Evaluating
Data set gets split into training data set and testing data set with specified percentage (e.g. train data set: 60%, test data set: 40%). These two data sets gets stored in SQL Server database tables separately and two models, regression Random Forest and Gradient Boosting Trees with training data set are built.

Finally we predict length of stay on test data set and then evaluate performance metrics of Regression Random Forest model and Gradient Boosting model.

Deploy and Visualize Results
Deploy PowerBI in client machine and then load predicted results into PoerBI Dash Board. We can visualize patient predictive length of stay using PowerBI Dash Board.

Advantage of Predictive Length of Stay
This solution enables predictive length of stay for hospitals and the Predicted information is especially useful to two personnel.

For hospitals which require solution to predict length of stay, this is a good choice because of its robust integration of SQL and R code.

Chief Medical Information Officer (CMIO)
This solution is useful for CMIO to determine if resources are being allocated appropriately in a hospital network  and to see which disease conditions are most prevalent in patients that will be staying in-care facilities long term


Care Line Manager
A Care Line Manager takes care of all patients in hospital directly and his main job is to observe each and every patient status on their health condition and required resources. He will plan the patient discharge and allocation of resources. Length of stay prediction helps the care line manager to manage their patient’s care better.


Microsoft Cortana based solution is impressive in terms of providing the necessary components to predict the duration of stay in hospital for patients.  It provides flexible features for integration with the hospital healthcare applications and data.  The framework for pre-processing and modeling can be modified to suit the needs and the R programming capability will attract enthusiasm for the data scientists.  The basic dashboards based on PowerBI are user friendly and can be customized for specific needs of hospitals.  The whole solution will help to plan resources very effectively like allocation of doctors, beds, required medicine, etc and avoid unnecessary extended patient stay in hospital bed.

Author Credits: Kattula T, Senior Associate, Data Science, Analytics SBU at 8K Miles Software Services Chennai.

Image Source: Microsoft PLOS