How to get actual value from health data

    October 19, 2020

    The amount of data that humanity generates is growing exponentially. IDC estimates that in 2021 only, about 75 trillion gigabytes, i.e., 75 zettabytes, of data will be created. With the help of data science, financial, surveillance, and social media companies, among others, analyze this information and derive additional benefits for their businesses for many years to come. Still, not all industries are equally good at adapting to and leveraging the power of data to drive their business.

    The healthcare industry generates a massive amount of data that can be used by biotech companies to deliver advanced health tools to hospitals and research labs. However, this industry is quite conservative with its unstructured records management, huge volumes of research, and unique medical cases. Sometimes, it’s hardly possible to use the same treatment methods in seemingly the same cases.

    However, scientists, researchers, and businesses have been hard at work on extracting useful information from health data and showing some positive results. Let’s zoom in on the most effective practices for today.

    Medical examination

    According to the study by the National Academy of Sciences, Engineering, and Medicine, 12 million adults in the United States are misdiagnosed each year. This situation is fraught with dangerous health consequences. As reported by BBC, 40,000 to 80,000 people in the US die annually from complications from diagnostic errors.

    When it comes to diagnosis, data science can be a real game-changer. The market offers a whole range of instruments to quickly analyze X-rays, CT scans, mammograms, and other types of imagery. Machine learning algorithms can learn to interpret images, identify patterns, and detect cancers, bone damage, internal organ abnormalities, and more.

    Data scientists have gone even further, making it possible to generate one kind of image from another one. In healthcare, this can be useful when a patient needs multiple procedures, such as computed tomography and MRI, that must be done when planning radiation therapy. To calculate the radiation dose, one has to know the permeability of all tissues through which the X-rays will pass.

    To accurately assess the contours of the radiation zones, it’s best to use the information provided by an MRI scan, which is harmless to humans. However, the MRI image does not provide any data about the X-ray permeability of tissues – this info can be obtained using CT scans only. Computed tomography, though, is based on harmful X-rays. On a CT scan, the contours of various soft tissues are less visible, so patients have to do both CT and MRI and then combine the two pictures.

    To reduce the level of radiation exposure, especially if the patient is a child, and cut the overall cost of surgical planning, scientists developed a special method to generate synthetic CT images from MRI data. An AI-powered program learns to generate CT scans based on existing MRI scans. As a result, the patient undergoes one procedure instead of two, which cuts down the time and costs of the examination, and, most importantly, the radiation dose.

    Another case comes from Stanford University, where data scientists have developed a model to detect heart rhythm problems based on ECG results. Tests have shown that the algorithm detects abnormalities faster than a cardiologist does

    Predictive analytics

    Predictive analytics is a major trend in healthcare. Analyzing the health data of millions of people, one could detect correlations and patterns and figure out why some illnesses are more common in a particular location than others. Then, based on the information received, one could identify risk groups and take preventive measures before the predicted outbreak.

    This is what we did when we started developing a solution that predicts drug or non-drug resistant epilepsy. Across the world, about 65 million people suffer from this disease. Sometimes, years pass before doctors find the right drug to help their patients. To develop a system for predicting drug resistance, we used historical data from 450,000 epilepsy patients. Using machine learning, we created an algorithm that predicts drug resistance with an accuracy of 82%. Predictive models like this one can help doctors find the correct drug faster and more accurately.

    Here’s another case when predictive analytics was successfully implemented. In 2017, the Philadelphia-based Penn Medicine healthcare system began collecting data from the patient electronic medical history. Over the past three years, it has been using a machine-learning algorithm to make prognostic estimates. The resulting score, based on 30 factors, helps medical staff make a prognosis for the next six months. Ultimately, the system identifies patients with the highest risk of bad outcome upon admission to the hospital. This helps doctors recognize these patterns and actively engage with them.

    So, armed with enough high-quality historical data, you can predict almost any case in medicine — from drug prescription to the outcome of a specific treatment.


    Pharmaceutical companies spend up to $2.6 billion to develop a new drug, and it takes 12 years to release it to the market. But now, with all kinds of healthcare data processing applications coming into play, it has become easier. Pharmacy data analytics allows scientists to process hundreds of thousands of clinical trial results in a matter of weeks, simulate the human body responds to a particular drug, and accelerate the development of medicine or vaccine by up to one year. It is data science and machine learning that act as enablers here, having revolutionized the R&D in the pharmaceutical industry.

    In 2020, a British startup Exscientia and a Japanese company Sumitomo Dainippon Pharma announced that their machine learning algorithms had invented a drug molecule that would be used to treat obsessive-compulsive disorder. Researchers said that the algorithm developed the drug in just 12 months compared to 5 years it usually takes to undergo human trials.

    Aggregation of research works

    Data extraction is a crucial task of natural language processing (NLP) to discover and extract important knowledge hidden in the unstructured clinical data. Every single day, thousands of new medical articles are published on the Internet, describing the nature of illnesses and methods of their treatment. Each scientific work certainly makes a huge contribution to healthcare evolution; each new discovery brings humanity closer to overcoming another disease.

    However, there are two sides to the same coin. The main obstacle to the effective use of scientific articles is that there are too many of them, and one keyword search is not enough. As a result, researchers need costly and time-consuming text review. In 2020, Google teamed up with Microsoft, the National Library of Medicine, and the Allen Institute for AI to release the Covid-19 Open Research Dataset (CORD-19). It will enable the global AI community to use text and data mining approaches, as well as NLP techniques to find solutions in response to the pandemic.

    The dataset consists of 29,000 documents related to the new virus and the broader family of coronaviruses, 13,000 of which have been processed so that computers could read basic data, information about the authors, and their affiliations.


    Just like any emerging trend, data science is facing certain challenges. In healthcare, the ethical aspect comes to the fore. Philips’ Future Health Index 2019 study found data privacy is a significant barrier to digital health adoption. People want to know how secure is the information they upload to a computer for analysis or send to their doctor.

    It will be easier for people to embrace change in healthcare when everyone understands that innovation is not there to replace medical staff. Digital technology only helps professionals make the most accurate and informed decisions. The neural network can identify the illness based on its symptoms and suggest prescription options, but patients can rest assured that the doctor still has the last word – only healthcare professionals are authorized to make the final diagnosis and determine the necessary treatment.

    Healthcare data science struggles with not just ethical but also technical concerns. Too often, there is a lack of complete, consistent, representative, pre-labeled data that could be used to train a machine to analyze and classify materials and make predictions. Health information is still collected and processed manually. It is a laborious, monotonous, and time-consuming process that often lacks resources.

    Even if there is enough data, problems can come up at the stage of implementing the ready-to-use algorithms. Many illnesses evolve over time, and common disorders might display a whole variety of signs. It is impossible to predict how the system is going to behave if it confronts an unusual situation. Most algorithms can only pass a final verdict – yes or no, norm or pathology. Not a single algorithm can yet report: “I have never seen this, and I do not know what it is.” So, computers should be taught not only to give an answer but also to assess how reliable the results are.

    What is the future of health data?

    The deployment of new technologies can be a lengthy process complicated by ethical, legal, and financial issues. However, the very fact that data science is in such high demand in healthcare proves that it does help us deal with problems more efficiently. Government agencies have started to embrace this, digitizing healthcare in public-funded programs, while big companies keep hiring data science specialists.

    According to CB Insights, every month, investors are finding more and more companies and startups working on AI-based healthcare solutions. In the first quarter of 2020, the amount of venture capital investments in healthcare AI startups globally exceeded $980 million. Researchers expect that shortly innovations will become part of doctors’ daily routine and help improve the quality of life around the world.

    • #AI
    • #Data analytics
    • #Data science
    • #Healthcare

    Share Article

    Success stories

    LLM-based financial investment advisory chatbot
    #Large Language Model
    #Text analysis

    LLM-powered investment advisory chatbot for efficient investment decision making

    Digital financial market infrastructure for efficient and secure transactions
    #Distributed ledger technology
    #Transaction monitoring

    Building a scalable, secured system allows users to instantly create transactions in any asset, from anywhere on Earth, at any time.

    Transaction monitoring and suspicious data detection solution
    #Data analytics
    #Sensitive data
    #Transaction monitoring

    Transaction monitoring system development with complying data security standards