Benchmarking Sentiment Analysis Systems

Gone are the days when consumers depended on word-of-mouth from their near and dear ones for any product purchase. The Gen-Y generation now majorly go for the online reviews to not only get the virtual look & feel of the product but also understand the specs & cons of the product.  The online reviews could be from various sources like forum discussions, blogs, microblogs, twitter & social networks and are humongous in nature which has led to inception and rapid growth of Sentiment analysis.

Sentiment analysis helps to understand the opinion of people towards a product or an issue.  Sentiment analysis has grown to be one of the most active research areas in Natural Language Processing (NLP). It is also widely studied in data mining, web mining and text mining.  In this blog, we will discuss the techniques to evaluate and benchmark sentiment analysis feature in NLP products.

8KMiles’ recent engagement with a leading cloud provider involved applying sentiment analysis on different review datasets with some of the top products available in the market and assess their effectiveness in correctly identifying them as positive or negative or neutral.  Our own team had tracked opinions about enormous number of movie reviews from IMDb, product reviews from Amazon & Yelp and predicted sentiment polarity with very accurate results. Tweets are different from reviews because of their purpose: while reviews represent summarized thoughts of authors, tweets are more casual and limited to 140 characters of text.  Because of this nature, the accuracy results for tweets significantly vary from other datasets.  A systematic approach to benchmark the accuracy of the sentiment polarity helps to reveal the strength and weakness of various products under different scenarios.  Here, we share some of the top performing products and key information on how accuracy is evaluated for various NLP APIs and comparison report is prepared.

There is a wide range of products available in the market; a few important products with their language supports is shown below.

Google NL API Microsoft Linguistic
Analysis API
IBM AlchemyAPI Stanford CoreNLP Rosette Text Analytics Lexalytics
Sentiment Analysis Language Support English, Spanish, Japanese English, Spanish, French, Portuguese English, French, Italian, German, Portuguese, Russian and Spanish English English, Spanish, Japanese English, Spanish, French, Japanese, Portuguese, Korean, etc.


Not all products return sentiment polarity directly. Some directly return polarity like Positive, Negative, Neutral whereas some other return scores and these score ranges in turn have to be converted to get the polarity if we want to compare products. Following sections explain the results returned by some of the APIs.

Google’s NL API Sentiment Analyzer returns numerical score and magnitude values which represent the overall attitude of the text. After analyzing the results for various ranges, range from -0.1 to 0.1 found to be appropriate for neutral sentiment. Any score greater than 0.1 was considered as positive and score less than -0.1 was considered as negative.

Microsoft Linguistic Analysis API returns a numeric score between 0 & 1. Scores closer to 1 indicate positive sentiment, while scores closer to 0 indicate negative sentiment. A range of scores between 0.45 and 0.60 might be considered as neutral sentiment. Scores less than 0.45 may be used as negative sentiment and scores greater than 0.60 may be taken as positive sentiment.

IBM Alchemy API returns a score as well as sentiment polarity (positive, negative or neutral). So, the sentiment label can be used directly to calculate the accuracy.

Similarly, Stanford CoreNLP API returns 5 labels viz. very positive, positive, very negative, negative and neutral. For comparison with other products, very positive and positive may be combined and treated as a single group called positive and similarly very negative and negative may be combined and treated as a single group called negative.

After above conversion, we need a clean way to show the actual and predicted results for the sentiment polarities. This is explained with examples in the following section.

Confusion Matrix

A confusion matrix contains information about actual and predicted classifications done by a classification system. Let’s consider the below confusion matrix to get a better understanding.

Above example is based on a dataset having 1,500 reviews with split-up of 780 positives, 492 negatives and 228 neutrals in actual.  Product A has predicted 871 positives, 377 negatives and 252 neutrals whereas Product B predicts 753 positives, 404 negatives and 343 neutrals.

From the above table, we can easily understand that Product A rightly identifies 225 negative reviews as negative reviews.  But it wrongly classifies 157 negative reviews as positive and 110 negative reviews as neutral.

Note that all the correct predictions are located along the diagonal of the table (614, 225 and 55).  This helps to quickly identify the errors, as they are shown by values outside the diagonal.

Precision, Recall & F-Measure
Precision measures the exactness of a classifier.  A higher precision means less false positives, while a lower precision means more false positives. Recall measures the completeness, or sensitivity, of a classifier. Higher recall means less false negatives, while lower recall means more false negatives.

  • Precision = True Positive / (True Positive + False Positive)
  • Recall = True Positive / (True Positive + False Negative)

F1 Score is measure of a test’s accuracy. It considers both Precision and Recall of the test to compute score, F-Score is the Harmonic mean of precision and recall. This will tell you how your system is performing.

  • F1-Measure= [2 * (Precision * Recall) / (Precision + Recall)]

Here is the Precision, Recall and F1-Score for Product A and Product B.
Product A achieves 70% precision in finding the positive sentiment. This is calculated as 614 divided by 871 (refer confusion matrix table). This means, out of the 871 reviews that Product A identified as positive, 70% is correct (Precision) and 30% of reviews that Product A identified as positive is incorrect.

Product A achieves 79% recall in finding the positive sentiment. This is calculated as 614 divided by 780 (refer confusion matrix table). This means, out of the 780 reviews that Product A should have identified as positive, it has identified 79% correct (Recall) and 21% ((79 + 87)/780) is incorrect.

It is desired to have both high precision and high recall to get a final high accuracy. F1 score considers both precision and recall and gives a single number to compare across products. Based on F1 score comparison, the following is arrived for the given dataset.

  • Product B is slightly better than Product A in finding positive sentiment.
  • Product B is better than Product A in finding negative sentiment.
  • Product B is slightly better than Product A in finding neutral sentiment.

Final Accuracy can be calculated using Num. of Correct Prediction / Total Num. of Records.

To conclude, we understand for the given dataset, Product B performs better than product A. It is important that we must consider multiple datasets and take the average accuracy to find out product final standing.

Author Credits: This article was written by Kalyan Nandi, Lead, Data Science at Big Data Analytics SBU, 8KMiles Software Services.

Azure Virtual Machine – Architecture

Microsoft Azure is built on Microsoft’s definition of commodity infrastructure. The most intriguing part of Azure is its cloud operating system that is at its heart. During the initial days of azure when it started it stated using fork of windows as its underlying platform Back then they named it as red dog operating system & red dog hypervisor. If you go into the history of Azure the project which became azure was originally named as project red dog. David Cutler was the brain behind designing and developing the various Red Dog core components and it was he who gave this his own words- the premises of Red Dog (RD) is being able to share a single compute node across several properties. This enables better utilization of compute resources and the flexibility to move capacity as properties are added, deleted, and need more or less compute power. This is turn drives down capital and operational expenses.

It was actually a custom version of windows and the driving reason for this customization was because hyper v during those didn’t had the features which was needed for Azure (particularly support for booting from VHD). if you try to understand the main components of its architecture we can count four pillars-

  • Fabric Controller
  • Storage
  • Integrated Development Tools and Emulated Execution Environment
  • OS and Hypervisor

Those were initial (early 2006) days of azure as it matured running a fork of an OS is not ideal (in terms of cost and complexity), so Azure team talked to the Windows team, and efforts were made to use Windows itself. As time passed windows eventually caught up and now Azure runs on Windows.

Azure Fabric Controller
Among there one component which contributed immensely in its success is fabric controller. The fabric controller owns all the resources in the entire cloud and runs on a subset of nodes in a durable cluster. It manages the placement, provisioning, updating, patching, capacity, load balancing, and scale out of nodes in the cloud all without any operational intervention.

Fabric Controller which still is backbone of azure compute is the kernel of the Microsoft Azure cloud operating system. Azure Fabric Controller regulates the creation, provisioning, de-provisioning and supervising of all the virtual machines and their back-end physical server. In other words It provisions, stores, delivers, monitors and commands the virtual machines (VMs) and physical servers that make up Azure. One added benefit is that It also detects and responds to both software and hardware failure automatically.

Patch Management
When we try to understand the underlying mechanism/workflow which Microsoft follows for patch management the common misconception is that it keeps updating all the nodes just like we do in our environment. But things in cloud is little different, AS Azure hosts are image-based (hosts boot from VHD) and it follows the image based deployment. So instead of just having patches delivered, azure roll out new VHD of the host operating system. Means they are not actually going and patching everyone but instead azure update at one place and because its orchestrated update it can use this image to update the whole environment.

This offers a major advantage in host maintenance as the volume itself can be replaced, enabling quick rollback. Host updates role out every few weeks (4-6 weeks), with an approach where updates are well-tested before they are rolled out broadly to the data centers. It’s the responsibility of Microsoft to ensure that each roll out is tested before updating the data center servers. To do so they start this implementation with few fabric controller stamps which could be called as pilot cluster and then once through they will gradually push the updated to production (Data Center) hosts. The underlying technology behind this is called Update Domain (UDs). When you create VM’s and put them in an availability set they get bucketed into update domain (by default you get 5 but there are provisions to increase them to 20). So, all the VMs part of availability set will get distributed equally among these UDs. With this the patching will take place in batches and Microsoft will ensure that at a time only single update domain should go for patching. You can call this as staged rollout. To understand this in more detail let’s see how Fabric controller manages the partitioning-

Under Azure’s Fabric Controller it has two types of partitions: Update Domains(UDs) and Fault Domains(FDs). These two are responsible for not only high availability for also for resiliency of infrastructure with this in place in empowers the Azure with ability to recover from failures and continue to function. It’s not about avoiding failures, but responding to failures in a way that avoids downtime or data loss.

Update Domain: An Update Domain is used to upgrade a service’s role instances in groups. Azure deploys service instances into multiple update domains. For an in-place update, the FC brings down all the instances in one update domain, updates them, and then restarts them before moving to the next update domain. This approach prevents the entire service from being unavailable during the update process.

Fault Domain: Fault Domain defines potential points of hardware or network failure. For any role with more than one instance, the FC ensures that the instances are distributed across multiple fault domains, in order to prevent isolated hardware failures from disrupting service. All exposure to server and cluster failure in Azure is governed by fault domains.

Azure Compute Stamp
As in Azure, things gets divided into stamps where each stamp will have one fabric controller and this fabric controller is the one responsible for managing the VMs inside that stamp. In Azure, there are only two type of stamps, it could either be compute stamp or storage stamp. This Fabric controller is also not single; it has its distributed branches. Based on the available information, azure will have 5 replicas of the fabric controller where it uses synchronous mechanism to replicate the state. In this setup, there will be one primary and to which control pane will talk to. Now it’s the responsibility of this primary to act on the instruction (example- provision a VM) and also let other replicas know about it. And when at least 3 of them acknowledge the fact that this operation is going to happen then the operation take place (this is called quorum based approach).

VM Availability
Talking about Azure Virtual Machines there are three major components (Compute, Storage, Networking) which constitute Azure VM.While discussing Azure Virtual Machine (VM) resiliency with customers, they typically assume it is comparable to their on-prem VM architecture and as such, features from on-prem is expected in Azure. Well it is not the case, thus I wanted to put this together to provide more clarity on the VM construct in Azure to better understand how VM availability in Azure is typically more resilient then most on-prem configuration.
“Talking about Azure Virtual Machines there are three major components (Compute, Storage, Networking) which constitute Azure VM. So, when we talk about Virtual machine in Azure we must take two dependencies into consideration. Windows Azure Compute (to run the VM’s), and Windows Azure Storage (to persist the state of those VM’s). What this means is that you don’t have a single SLA, instead you actually have two SLA’s. And as such, they need to be aggregated since a failure in either, could render your service temporarily unavailable.”
Under this article lets have our discussions on Compute(VM) and Storage components.

Azure Storage:You can check my other article where I have talked about this in great details, on how an Azure Storage Stamp is a cluster of servers hosted in Azure Datacenter. These Stamps follows layer architecture with built-in redundancy to provide High Availability. Under this multiple (most of the times 3) replicas of each file, referred as Extent, are maintained on multiple different servers partitioned between Update Domains and Fault Domains. Each write operations are performed Synchronously (till we are talking about intra Stamp replication) and control is returned only after the 3 copies completed the write, thus making the write operation strongly consistent.

Virtual Machine:


Microsoft Azure has provided a means to detect health of virtual machines running on the platform and to perform auto-recovery of those virtual machines should they ever fail. This process of auto-recovery is referred to as “Service Healing”, as it is a means of “healing” your service instances. In this case, Virtual Machines and the Hypervisor physical hosts are monitored and managed by the Fabric Controller. The Fabric Controller has the ability to detect failures.

It can perform the detection in two mode-Reactive and Proactive. If the FC detects failures in reactive mode (Heartbeats missing) or proactive mode (known situations leading to a failure) from a VM or a hypervisor host, it will initiate a recovery by either redeploying the VM on a healthy host (same host or another host) and mark the failed resource as unhealthy and remove it from the rotation for further diagnosis. This process is also known as Self-Healing or Auto Recovery.
With Above diagram we can see different layers of the system where faults can occur and the health checks that Azure performs to detect them

*auto-recovery mechanism is enabled and available on virtual machines across all the different VM sizes and offerings, across all Azure regions and datacenters.

Author Credits: This article was written by Utkarsh Pandey, Azure Solution Architect at 8KMiles Software Services and originally published here

For more interesting information follow us in LinkedIn by clicking here

Tale of how a Fortune 50 Giant got the right Identity Access Partner

As organizations constantly seek to expand their market reach and attract new business opportunities, identity management ( specifically SSO, User provisioning and Management) has evolved as an enabling technology to mitigate risks and improve operational efficiency. As the shift to cloud-based services increases, identity management capabilities can be delivered as hosted services to help drive more operational efficiency and to improve business agility. A Fortune 50 Giant designed and developed a Cloud based Identity Management Solution and proposed an opportunity to existing and prospective SaaS Vendors to have a tie up with its product to test and make the product live as a full-fledged Single Sign-on Solution. 8KMiles being a Cloud based Identity Services company accepted to fulfill the Client requirement.
This company opted for 8KMiles, the most preferred choice, because 8KMiles is a State of the Art Solution provider that practiced Scrum Methodology. 8KMiles never hesitated to take up any ad-hoc requirements because of their industry specific team of experts who are ready to offer 24/7 development support system. 8KMiles thus pitched in to help the client by finding their Pain Points & AS-IS Scenarios. Later 8KMiles worked extensively to collaborate with their company and its respective SAAS Vendors to:
1. Establish formal Business Relationship with SaaS Vendors
2. Pre-qualify SaaS Vendor
3. Configure SaaS Application for Partner company on Identity Cloud Service SAML SSO integration, Test and Certify
4. Prepare IDP Metadata
5. Establish a stringent QA process
6. Complete Documentation
a. Confirmance and interoperability test report
b. SAML SSO Technical documentation
c. A video explaining the steps involved in the integration
d. Provide metadata configuration and mapping attributes details
7. Build Monitoring Tool
8. Adopt Quality Assurance with 2 level Testings (Manual & Automation)
9. Configure, integrate, troubleshoot, monitor and produce reports using 8KMiles MISPTM tool.

Thus, 8KMiles enabled this Fortune 50 Biggie to attain the following business benefits:
• Refinement of user self-service functionalities
• Activation users & groups and linking SaaS applications to the user accounts in the cloud
• Enablement of SSO to these SaaS Apps & enable user access via SAML2.0
• Usage of OAuth 2.0 to authorize changes to configuration.
• Adoption & Testing of different methods of SSO for the same SaaS App
• Documentation of the process in a simplistic manner
• Automation to test & report on all aspects of the integration without human involvement
For more information or details about our Cloud Identity access solution, please write to

Author Credit:  Ramprasshanth Vishwanathan, Senior Business Analyst- IAM SBU

LifeSciences Technology Trends to expect in 2017

There is a constant change in Life Sciences industry dynamics especially in terms of handling the ever growing data, using modern cloud technology, implementing agile business models and alignment with the compliance standards. Here are some of the Lifesciences Tech trends that are predicted for this 2017.

1) Cloud to manage Ever-growing Data

The growing volume of data is one of the major concerns amongst the Life science players. There is a constant need to manage and optimize this vast data into actionable information in real time and this where cloud technology will give the agility required to achieve this. Life sciences will continue to shift to cloud to address the inefficiencies and streamline and scale their operations.

2) Analytics to gain importance

Data is the key driver for any Pharma or Lifesciences organization and will determine the way drugs are developed and brought to market. The data are generally distributed and fragmented as clinical trial systems, databases, research data, physician notes, hospital records, etc and analytics will aid to a great extent to analyze, explore and curate these data to realize real business benefits out of this data ecosystem. Year 2017 will see a rise in trends like Risk analytics, Product failure analytics, drug discovery analytics, supply disruptions predictive analytics and Visualizations.

3) Lifesciences and HCPs will now go Digital for interactions

There was a time when online engagements were just a dream due to limitations in technology and regulations. Embracing a digital channel will open up faster mode of communication amongst Lifescience players, HCPs and consumers. These engagements are not only easy and compliant but are integrated with applications to meet industry requirements. This will also aid life sciences players reach more HCPs and also meet customer’s growing expectations for online interactions

4) Regulatory Information Management will be the prime focus

When dealing with overseas market it is often very critical to keep track of all the regulatory information at various levels. Many a times information on product registrations, submission of content plans, health authority correspondence, source documents to published dossiers etc, are disconnected and are not recorded at one centralized place. So programs that aid in alignment and streamlining of all regulatory activities will gain momentum this year.

To conclude, Daniel Piekarz, Head of Healthcare and Life Sciences Practice, DataArt stated that, “New start-ups will explode into the healthcare industry with disruptive augmented reality products without the previous limitations of virtual reality. As this technology advances the everyday healthcare experience, it will exist on the line between the real world and virtual in what is being called mixed reality.” Thus 2017 will see a paradigm shift in the way technology will revolutionize Life Sciences players’ go-to market leading to early adopters of the above gaining the competitive edge and reaping business benefits as compared to laggards!