The project was developed in 3 consequent stages, which were arranged in compliance with CRISP-DM recommendations.
Stage 1. Data cleaning and preprocessing
The stage consisted of different data manipulations to obtain a clean dataset ready to use for training models in the next step.
Stage 2. Modeling and evaluation
The next step was to try out different models to identify the best one for the classification of blood cancer. The main delivery of this stage was a trained AI model ready to be used for cancer type classification.
Stage 3. Integration
The final stage aimed to develop a set of scripts required to run the solution by end-users at their local computer. A Docker image was built to ease the deployment on Windows 10. The key stage deliverable is a command file that takes an input file with patients’ data, runs the classification model, and adds to the cancer type classification file columns.