Cortana Intelligence for Patient Length of Stay Prediction

Predictive Length of Stay
Length of Stay (LOS) is defined as total number of days a patient stayed in hospital from his/her initial admit date to discharge date. LOS varies from patient to patient as it depends on disease conditions and facilities provided to him/her in hospital.

Importance of Predictive Length of Stay
PLOS is a sophisticated model that can significantly improve the quality of treatment, in addition, to decreasing the workload pressure of doctor. It enhances accurate planning with existing facilities to understand patient disease conditions and focus on discharging the patient quickly avoiding re-admissions in the hospital.

Machine Learning Techniques for Predictive Length of Stay
Here we talk about two popular machine learning techniques that can be used for LOS prediction.

Random Forest
Random Forest is one of the machine learning tree based predictive algorithm which builds several decision trees and combines their output to improve model accuracy. Combining decision trees output is known as Ensemble and it  helps weak learners to become strong learner.

For example, when we are uncertain to take a particular decision, then we approach few persons for suggestions and then by combining all suggestions we take the final decision. Similarly, the Random Forest mechanism becomes a strong learner from a weak learners (individual decision trees).

Random Forest is useful to solve regression and classification problems. For regression problems, the dependent variable is continuous. Whereas in classification problems, the dependent variable is categorical.

Advantage of this model is, it runs efficiently on large data set or databases that consists of data sets with thousands of features.

Gradient Boosting
Gradient Boosting is another machine learning algorithm for developing prediction models to solve regression and classification kind of problems. It builds the model in iterative fashion like other boosting models do and the main objective here is to minimize the loss of the model by adding weak learners using a gradient descent procedure.

Gradient descent is used find best weights to minimize error or loss of model. In gradient boosting, weak learners or decision trees are used to make prediction.

Advantage of GBT is that the trees are built one at a time, where each new tree helps to correct errors made by previously trained tree. With each tree added, the model becomes even more effective.

Microsoft Cortana Intelligence Solution for Predictive Length of Stay
As part of Cortana Intelligence Solution, Microsoft has in-built solution for complete LOS platform comprising data storage, data pipeline/processing, ML algorithms and visualization.

Microsoft’s support for integrated SQL Server Service and R programming is a big advantage for any data science problems.

Hospital patients’ data is stored in SQL Server and the PLOS machine learning models are executed through R IDE. The models take input from SQL Server and the predicted results can be stored in SQL Server database. Provision to visualize the statistics and predicted LOS result for patients can be made through the visualization tool PowerBI.


PLOS Model Working Procedure

To predict length of stay of newly joined patient in hospital, we are using two machine learning algorithms. Those are Regression Random Forest and Gradient Boosting Trees. Both models follow the below procedure.
1. Data Pre-processing and cleaning
2. Feature Engineering
3. Data Set Splitting, Training, Testing, Evaluation
4. Deploy and Visualize results

Data Pre-processing
Hospital patient data is loaded into SQL Server tables. If there are any missing values in tables, these will get replaced . Missing values are replaced with either -1 or mean or mode.

Feature Engineering
In feature engineering, standardize values of features of data set is used for training predictive models.

Splitting, Training, Testing and Evaluating
Data set gets split into training data set and testing data set with specified percentage (e.g. train data set: 60%, test data set: 40%). These two data sets gets stored in SQL Server database tables separately and two models, regression Random Forest and Gradient Boosting Trees with training data set are built.

Finally we predict length of stay on test data set and then evaluate performance metrics of Regression Random Forest model and Gradient Boosting model.

Deploy and Visualize Results
Deploy PowerBI in client machine and then load predicted results into PoerBI Dash Board. We can visualize patient predictive length of stay using PowerBI Dash Board.

Advantage of Predictive Length of Stay
This solution enables predictive length of stay for hospitals and the Predicted information is especially useful to two personnel.

For hospitals which require solution to predict length of stay, this is a good choice because of its robust integration of SQL and R code.

Chief Medical Information Officer (CMIO)
This solution is useful for CMIO to determine if resources are being allocated appropriately in a hospital network  and to see which disease conditions are most prevalent in patients that will be staying in-care facilities long term


Care Line Manager
A Care Line Manager takes care of all patients in hospital directly and his main job is to observe each and every patient status on their health condition and required resources. He will plan the patient discharge and allocation of resources. Length of stay prediction helps the care line manager to manage their patient’s care better.


Microsoft Cortana based solution is impressive in terms of providing the necessary components to predict the duration of stay in hospital for patients.  It provides flexible features for integration with the hospital healthcare applications and data.  The framework for pre-processing and modeling can be modified to suit the needs and the R programming capability will attract enthusiasm for the data scientists.  The basic dashboards based on PowerBI are user friendly and can be customized for specific needs of hospitals.  The whole solution will help to plan resources very effectively like allocation of doctors, beds, required medicine, etc and avoid unnecessary extended patient stay in hospital bed.

Author Credits: Kattula T, Senior Associate, Data Science, Analytics SBU at 8K Miles Software Services Chennai.

Image Source: Microsoft PLOS