Blood Cancer Type Classification Solution
- #Healthcare
- #Machine learning
About the Client
A company that uses predictive analytics and machine learning to deliver cost-effective solutions within the healthcare industry, helping private and governmental institutions achieve better results in diagnostics.
Business Challenge
There are 3 most common types of blood cancer – Leukemia, Lymphoma, and myeloma. To determine blood cancer type patient is going through a biopsy procedure. After that, the results of a biopsy are sent to a pathology lab to perform pathology tests, with the doctor’s comment on which type to check them. Unfortunately, almost 30% of doctors can’t identify the right type even during a biopsy, which leads to the need to perform the biopsy again. That’s why identifying and confirming blood cancer type causes physical and moral pain to patients waiting for approved results.
Our client wanted to help doctors assume more accurately blood cancer types based on the patient’s historical data. The project’s goal was to develop an AI solution that runs on Windows 10, which will assist medical staff in classifying blood cancer types using patients’ symptoms, historical data, demographical and medical parameters.
Solution Overview
Quantum has developed a solution based on an AI model that was trained on historical data of blood cancer diagnostics in many cases. It classifies the type of cancer using the following criteria:
- Patient’s Demographics
- CBC tests detailed results
- Detailed results of chemistry tests
- Patient History Data
To get the result, a user must run the script with a tabular file of a predefined format with a patient’s historical data as input.
The solution processes the file and returns the cancer classification for each patient from the input data.
As a result, doctors receive a recommendation system focusing on specific parameters to pay special attention to for correct diagnosis.
Project Description
The project was developed in 3 consequent stages, which were arranged in compliance with CRISP-DM recommendations.
Stage 1. Data cleaning and preprocessing
The stage consisted of different data manipulations to obtain a clean dataset ready to use for training models in the next step.
Stage 2. Modeling and evaluation
The next step was to try out different models to identify the best one for the classification of blood cancer. The main delivery of this stage was a trained AI model ready to be used for cancer type classification.
Stage 3. Integration
The final stage aimed to develop a set of scripts required to run the solution by end-users at their local computer. A Docker image was built to ease the deployment on Windows 10. The key stage deliverable is a command file that takes an input file with patients’ data, runs the classification model, and adds to the cancer type classification file columns.
Let's discuss your idea!
Technological Details
During the project, we have tried different classification models, such as Logistic Regression, Random Forest, XGBoost, and others. After a thorough examination, we have decided to use the XGBoost model with tuned hyperparameters. After doing a feature, engineering gave the best f1 score (we have chosen a metric f1 score, which turned out to be a good choice for unbalanced datasets).
The system was wrapped in a Docker image to quickly set up at any machine and deliver it to the end-user. Besides that, DVC (Data Version Control) was used to make research reproducible and easily track experiments, data, and code.