The world’s leading publication for data science, AI, and ML professionals.

10 Exciting Examples of Machine Learning Applications in Healthcare

The Industry That Offers the Most Thrilling Jobs to Data Scientists

Industry Use Cases

Photo by MART PRODUCTION from Pexels
Photo by MART PRODUCTION from Pexels

The global healthcare expenditures are 10.3% of the global gross domestic product or nearly $9 trillion globally, and the annual growth rate is expected to be 3.9% over the next years.

These are impressive numbers, and not surprisingly, digitalization and new technologies are changing the healthcare business completely.

People have wearables and use technology for monitoring their health. The number of wearables is estimated to be one billion in 2022, and a $93 billion market.

Telemedicine is expanding enormously, and nearly 30% of consumers now have virtual visits first. The medical equipment became more advanced and integrates AI to support doctors and physicians. And finally, to improve the providers’ services and enhance the customer experience, data ecosystems are built with end-to-end data integration.

So, the care model is changing, and our health and wellness become data-driven.


The healthcare industry can be classified into the following 5 sectors.

  • Healthcare providers and facilities like hospitals, surgery centers, nursing homes, or doctors and physicians
  • Medical equipment and devices like diagnostic equipment, orthopedic devices, or medical instruments
  • Distributors and wholesalers, like pharmacies or distributors of drugs and equipment to healthcare providers
  • Health insurance and managed care
  • Pharma, life sciences, and biotechnology

COVID-19 accelerated the innovation and digitalization of all these sectors, and the consumers and patients experienced more convenience and benefits.

The fast development of COVID-19 vaccinations was only possible due to data-driven development approaches. The quality of diagnoses in radiology increased by integrating image recognition algorithms for minimal abnormality detection like cancer metastasis.

From social media posts to wearable data and medical records, various data are used to predict health conditions and diseases.

But not only the consumer side is disrupting. Medical practitioners and healthcare professionals are changing their behavior, too.

Medical nomads are an increasing group of people, and even platforms exist. Data availability and the extracted information is the key driver of such developments. It starts from experience and time schedule matching to diagnosis tools.

Thus, the generation of health-related data is skyrocketing, and health care is in considerable disruption.


An analysis of job platforms shows that around 1/3 of the open data scientist positions are in the healthcare industry. Looking at data scientist profiles on LinkedIn reveals, not surprisingly, that there are nearly as many data scientists in this industry as in the tech sector.

Most aspiring and senior data scientists focus solely on the tech industry and miss an exciting industry. In addition, there is an entry barrier because very specialized knowledge is needed. Besides the importance of business acumen, sound experience with (bio)statistics, causality, and model accuracy beyond the usual basics is required. The results of the algorithms affect the human’s health, and in extreme cases, can decide literally about life or death.

However, the high demand for data scientists gives a unique opportunity to enter this exciting industry without prior experience in this field. It is a great opportunity that I can highly recommend.

I worked for many years as a Data Science consultant in the healthcare industry and provided services for all five aforementioned sectors.

In the following, I present 10 case studies of machine learning applications across all five healthcare sectors.


Healthcare providers and facilities

1. Hospitalization decision support for cancer patient

Cancer patients suffer from many secondary disorders of the illness and adverse effects from therapies. It is often not evident if and when specific symptoms like temperature or sickness are caused by the cancer disease, treatment, or other diseases like flu or a cold. It is even more challenging to decide when a patient needs to go to the hospital or not.

Patients usually wait too long, which results in two issues: either they go too late to the hospital and put their health at high risk, resulting in longer convalescence and higher costs, or they need to go to the hospital outside of the regular opening hours to the emergency department and are treated by Healthcare professionals not familiar with their medical history which eventually leads to wrong or unnecessary treatments.

So, based on wearable data and information entered by the patient, a mobile app gives indications and further details on a dashboard such that the patient can better and earlier decide if it is needed to go to the hospital or not.

Machine Learning support in acute disease management is an evolving trend not only for cancer patients.

Machine learning methods applied are XGBoost (a gradient boosting algorithm but also a library for Python, Julia, Java, etc.), random forest, support vector machine. When enough data is available, recurrent neural networks give excellent results.

2. Personalized health treatments

Personalized treatments are not new in medicine. Already the old Greeks aimed to develop personalized cures and drugs. With the availability of more health data based on larger population size, more granular data per person like genotypes and phenotypes, and continuous data like heartbeat frequency, or continuous glucose monitoring, the shift to data-driven decision making could be done.

An example is the personalized treatment of diabetes, where machine learning algorithms improve the therapy. Globally, more than 400 million people have diabetes. Diabetes can cause blindness, heart attack and stroke, and kidney failure, amongst others.

The effectiveness of diabetes medication depends on many variables of personal lifestyle, health, and physical activities. In addition, these factors change over time. Due to rapid medical advancement, therapy options are changing too.

So, the initial determination of medication is challenging and even more the ongoing adjustment and refinement. Machine learning and deep learning algorithms increasingly support doctors in diagnosis and prescribing the most effective treatment.

Methods like Support Vector Machine (SVM), Random Forest, and k-nearest neighbor are used for clinical and medical decision support or patient self-management tools. Logistic regression and multilayer perceptron support predictions of the personal treatment result.


Medical equipment and devices

3. Reduction of false positives / false negative in diagnostic tests

In many applications like alerts from sensors, the reduction of false positives is demanded. A false positive means that a test result is incorrectly classified that a particular condition is present, e.g., a disease when it is not. However, in medicine, false negatives are of equal importance. A false negative indicates that there is no condition present if there is actually one.

Let’s look at examples to understand the issue better.

Breast cancer is the cancer type that leads to most deaths in women globally. With the current breast scanning methods, it is known that 10%-30% of breast cancers are missed, which leads to a higher positive cancer labeling rate by radiologists. This leads up to 30% false positives in the subsequent detailed diagnosis. In this case, a reduction of the false positives is demanded.

On the other hand, the early-stage detection of lung cancer can save many lives. The initial diagnosis has up to two-digit percentages of false negatives. In the case of lung cancer, there is a need to reduce the false negatives and not miss any lung cancer.

So, the problem to solve is highly contextual and needs to be fully understood.

For improving false positives or false negatives, machine learning methods are applied to the diagnosis results. These results are further classified by logistic regression or random forest – two methods that work well with these medical applications – if they are false positives / false negatives or not. When a large number of data is available, e.g., COVID-19 data, convolutional neural networks (CNNs) give excellent results and improve the accuracy enormously.

4. Improvement of medical device performance and higher quality maintenance

Electronic medical devices are becoming mainstream with the rise of technology and more sophisticated machine learning and AI applications that tremendously improve the accuracy of results.

Examples of medical devices are diagnosis equipment like computer tomography scanners, ventilators, pacemakers, heart-lung machines, diabetes monitoring tools, or infant incubators. These devices give life-important measurements about the patient’s condition, and medical professionals need to rely on the correct operation and measurement. Otherwise, serious injuries or deaths of patients would occur.

Understandably, the use of such medical devices requires the approval of the supervising authority and high safety and quality assurance. So, maintenance costs of these devices are high.

Getting an impression of the safety level needed, think about a 99.9% reliability of a device which seems relatively high. Our heart beats about 100’000 times a day. So, with a reliability of 99.9%, this would mean that we would miss on average every 1000th heartbeat, or 100 heartbeats a day which is a high number.

So, a medical device needs even higher reliability.

With today’s data availability, machine learning algorithms are developed to support performance and maintenance prediction. It is essential to mention that such support systems need to fulfill standards like IEC 60601–1 – Medical electrical equipment – Part 1: General requirements for basic safety and essential performance, which give additional complexity to the model developments.

ML models are developed based on 30–50 variables like device age, manufacturer, technical measures like voltage, performance inspection results, and past safety inspection decisions.

Decision trees have the advantage that they are explainable but also often give the most accurate results. Other used methods are Random Forest, Support Vector Machines, and Naïve Bayes. Because of needed explainability, no neural network / deep learning algorithms are used.


Distributors and wholesalers

5. Storage and distribution quality control of vaccine cold chains

Many drugs, especially, vaccines need to be stored in the fridge. Recent examples are COVID-19 vaccines like Pfizer/BioNTech or Moderna, which must be kept between -80 and -60 degrees Celsius in the refrigerator. This so-called cold chain must be ensured during the whole logistics from the production plant, the delivery globally, and when used by pharmacists or doctors for patients.

Managing such a cold chain is very complex. It starts with the packaging and how much dry ice is used to ensure the right temperature. The temperature needs to stay in a certain range while the goods are in the truck or when the freight is in a plane for several hours. Finally, it has to reach the end-users, typically pharmacists and doctors, where it is stored until usage.

Today, many sensors and advanced tracking technologies are installed to measure temperatures and GPS positions. Further data like traffic/air traffic data, road conditions, airport information, vehicle specifications, weather conditions, and packaging data are integrated.

Machine learning is used to tackle several challenges. It starts with traffic and air traffic predictions and optimizing the logistics route. Based on the conditions, the temperature drop of the freight can be predicted and the packaging optimized in advance. The continuous measurement during the delivery enables monitoring the temperature in real-time and predicting the further temperature drop. Especially when unexpected delays happen, one can predict when an intervention during the delivery is needed.

This application field is still in its infancy, and there are no clear set of methods identified that are superior in a specific situation. So, all the different supervised, unsupervised, semi-supervised, and ensemble learning methods are tested and applied.

6. Demand forecast for drugs

Another field that is currently in development for more sophisticated data-driven approaches is the demand forecast of drugs. The forecasts have been based on simple statistics, historical sales figures, and time series models like Moving Average or ARIMA for many years.

Compared to consumer goods, the pharmaceutical industry has several peculiarities that make the forecast more complex. The pharmaceutical industry is heavily regulated. So, regulations and changing regulations can enormously influence demand. Patents and expirations of patents can change totally market competition when, e.g., generic medicaments are produced.

Special contracts or special contract conditions with governments, e.g., for COVID-19 vaccines, all governments had their own unique conditions and prices with the producers, new research results on a drug, or social media publicity, negative or positive, have all significant impact on demand.

Drugs have tight expiration dates such that they cannot be produced long in advance. But the production process, with all the regulatory and quality requirements, mandatory certification processes, and regulatory approvals, needs to be planned and performed quite a while in advance. So, there is a large gap in reaction time between production possibilities and the demand process.

An example is the influenza vaccine production which starts in spring for delivery in autumn. If there is a much higher demand for influenza vaccines, like in 2020, the demand cannot be fulfilled before six months.

Integrating social media data opens a tremendous field for machine learning using all the natural language processing methods. If the information is once extracted, e.g., sentiments towards a particular drug, you need to forecast it over the next, e.g., 90 days. The level of sentiment shows the level of acceptance and demand for a drug. The methods start from regression models to boosting on tree-based methods, Support Vector Machine with different kernel methods, to neural networks.


Health insurance and managed care

7. Digital health coach for physical activities

Physical inactivity increases the risk significantly for diabetes, cardiovascular diseases, cancer, hypertension, obesity, and mental disorders. On the other hand, regular physical activities of more than one hour a week can prevent these diseases and halve the mortality rates.

This saves a lot of costs for health insurers but especially for society as a whole. So, there are a lot of efforts to motivate people for regular physical activities.

On the other side, there is the trend of all the wearables. The benefit of wearables is that they measure health-related factors continuously and enable data-driven coaching support.

An important aspect is that the recommendation is individualized. For example, a physically inactive person for several years needs other recommendations and motivations than an athlete who already has a high level of activities but would maybe need some coaching in sleep or nutrition management.

Machine learning is an essential tool to provide individualized coaching and incentive system, ongoing and in real-time, such that recommendations are given based on the daily performance of activities.

First, the people must be classified into groups of responsiveness. E.g., people who would like to start with activities but need an extrinsic nudge, people who are already doing little activities but need to be motivated to do more, etc. The standard classification algorithms are applied.

Further, based on the individual progress, further recommendations must be given. Forecasting of physical activities is done with logistic regressions, AdaBoost, decision trees, Random Forest, Support Vector Machine, and neural networks. Especially for recommendations of behavioral changes, recurrent neural networks methods like long short-term memory (LSTM) are used.

8. Quality improvement in pediatric care

The quality of health care services and the possibility to treat complex diseases is enhancing continuously. Nevertheless, many challenges remain, especially, dosage and duration of therapies based on individual characteristics or for patient groups where not many clinical studies are available like children.

The dosage of drugs and therapies for small children remains challenging. A too high dose can lead to permanent injuries, while a too low one delay recovery and can cause secondary diseases. Both cases can lead to death. Also, children’s immune system is not fully trained, and reactions to therapies can differ from reactions of adults. Further, children can often not verbally express the state of illness such that an adverse development is detected too late.

So, over the last years, machine learning has been incorporated into pediatric care with great success to predict the right and individualized treatments for children. It is still in its infancy and bears a lot of development potential.

One of the main obstacles to fully leveraging its potential is the missing or insufficient data available for methods that can detect and predict complex patterns. So, currently, simpler methods are used.

First, cluster algorithms like k-means are applied to determine different cohorts. Then, the characteristics of the different cohorts, like the length of treatment, mortality rate, etc., are analyzed. As we are working in critical health fields, the characteristics of the cohorts are tested against each other with, e.g., Chi-squared test (categorical variables) and, e.g., Wilcoxon-Mann-Whitney test (continuous variables).

After, various classification algorithms are applied, whereas Random Forest with cross-validation is the most used approach.

There are already a few applications in pediatric care with genomic data where deep learning methods are used. But this is only at the beginning.


Pharma, life sciences, and biotechnology

9. Early prediction of diabetes

The WHO estimates that about 422 million people globally have diabetes, and 1.6 million deaths annually can directly link to it.

A permanent cure for diabetes does not exist, and it cannot be reverted. Further, diabetes leads to severe secondary diseases like heart diseases and strokes, kidney failure, nerve damages, and fatty degeneration of liver which increases, in addition, the probability for liver cancer significantly.

So, predicting diabetes and taking measures in advance would prevent many diseases and deaths and increases life quality.

Fortunately, there are already several models and apps available that do precisely this. Independent variables are, e.g., glucose concentration in blood, blood pressure, skin thickness, insulin, body mass index, age, job type, and so on. K-fold cross-validation is applied. Whereas all methods like linear regression, decision trees, Random Forest, Naïve Bayes, or Support Vector Machine have an accuracy between 75% and 80%, and neural networks with two or three hidden layers can achieve up to 90% accuracy.

Applications with diabetes give already robust results. Based on this success, predictions of more complex disorders like Alzheimer’s disease of Multiple Sclerosis are in development.

10. Prediction of the success of clinical drug development

The development of a new drug costs on average $1.3 billion. The failure rate during the clinical development stage is 90%.

Clinical research usually contains three phases.

Phase 1: the first stage is where drugs are tested in humans. The purpose is to test the safety, best dose, and side effects.

Phase 2: testing drugs for efficacy (influence of the outcome) and side effects in determined therapeutic dose.

Phase 3: testing drugs for safety, effectiveness (influence of disease in actual practice), and efficacy in determined therapeutic dose.

Each phase costs a lot, and at the beginning of every phase, an estimation is made about the success rate. Based on the rate, a decision is made to either continue or abandon the drug development.

For improving this estimation and making better decisions, machine learning methods are applied.

The data to estimate the success rate contains the variables related to clinical trials incl. data from the previous phase, molecule properties, regulatory environment, approved drugs incl. detailed features, patient protection, healthcare laws, market demand, company and competitor information, information about similar drugs, i.e., related to similar diseases or of a similar type of design, etc.

The methods mainly applied and with the best success are tree-based methods.

The data forms complex patterns, and the variables contain complicated relationships. So, more important than the method applied is feature selection, hyperparameters tuning, and various tests for model robustness, e.g., analyze and mitigate look-ahead bias with time series techniques. And all these procedures differ for each of the three phases.

These procedures are pretty complex and time-consuming and need most of the project time but are required to ensure reliable prediction results. All that done well, an accuracy of success prediction up to 90% can be achieved.


Connecting the dots

Machine learning in healthcare is one of the most challenging tasks. The exploration and development of applications are still in the startup phase. This makes it enormously exciting for data scientists.

The 10 applications show impressively the potential of machine learning in healthcare. It is a real win-win-win situation.

  • It is a win for all patients who get more effective and efficient treatments on an individual basis
  • It is a win for the healthcare industry to provide with limited resources better services for more people, and
  • It is a win situation for data scientists who can work on complex and exciting problems by applying all their knowledge and advanced methods.

I can highly recommend entering the healthcare industry as a data scientist.


Do you like my story? Here you can find more.

10 Mistakes You Should Avoid as a Data Science Beginner

7 Awesome Data Science Jobs Where You Don’t Need Any Coding Skills

Discover 9 Consultancy Segments to Start an Exciting Data Science Journey for Any Experience Level


Related Articles