Book Image

Healthcare Analytics Made Simple

By : Vikas (Vik) Kumar, Shameer Khader
Book Image

Healthcare Analytics Made Simple

By: Vikas (Vik) Kumar, Shameer Khader

Overview of this book

In recent years, machine learning technologies and analytics have been widely utilized across the healthcare sector. Healthcare Analytics Made Simple bridges the gap between practising doctors and data scientists. It equips the data scientists’ work with healthcare data and allows them to gain better insight from this data in order to improve healthcare outcomes. This book is a complete overview of machine learning for healthcare analytics, briefly describing the current healthcare landscape, machine learning algorithms, and Python and SQL programming languages. The step-by-step instructions teach you how to obtain real healthcare data and perform descriptive, predictive, and prescriptive analytics using popular Python packages such as pandas and scikit-learn. The latest research results in disease detection and healthcare image analysis are reviewed. By the end of this book, you will understand how to use Python for healthcare data analysis, how to import, collect, clean, and refine data from electronic health record (EHR) surveys, and how to make predictive models with this data through real-world algorithms and code examples.
Table of Contents (11 chapters)

History of healthcare analytics

The origin of healthcare analytics can be traced back to the 1950s, just a few years after the world's first computer (ENIAC) was invented in 1946. At the time, medical records were still on paper, regression analysis was done by hand, and there were no incentives given by the government for pursuing value-based care. Nevertheless, there was a burgeoning interest in developing automated applications to diagnose and treat human disease, and this is reflected in the scientific literature of the time. For example, in 1959, the journal Science published an article entitled "Reasoning Foundations of Medical Diagnosis," by Robert S. Ledley and Lee B. Lusted that explains mathematically how physicians make a medical diagnosis (Ledley and Lusted, 1959). The paper explains many concepts that are central to modern biostatistics, although at times using terminology and symbols that we may not recognize today.

In the 1970s, as computers gained prominence and became accessible in academic research centers, there was a growing interest in developing medical diagnostic decision support (MDDS) systems, an umbrella term for broadly based, all-in-one computer programs that pinpoint medical diagnoses when input with patient information. The INTERNIST-1 system is the most well-known of these systems and was developed by a group of researchers at the University of Pittsburgh in the 1970s (Miller et al., 1982). Described by its inventors as "an experimental program for computer-assisted diagnosis in general internal medicine," the INTERNIST system was developed over 15 person-years of work and involved extensive consultation with physicians. Its knowledge base spanned 500 individual diseases and 3,500 clinical manifestations across all medical subspecialties. The user starts by entering positive and negative findings for a patient, after which they can check a list of differential diagnoses and see how they change as new findings are added. The program intelligently asks for specific test results until a clear diagnosis is achieved. While it showed initial promise and captured the imagination of the medical world, it ultimately failed to enter the mainstream after its recommendations were outperformed by those made by a panel of leading physicians. Other reasons for its demise (and the demise of MDDS systems in general) may include the lack of an inviting visual interface (Microsoft Windows had not been invented yet) and the fact that modern machine learning techniques were yet to be discovered.

In the 1980s, there was a rekindled interest in artificial intelligence techniques that had largely been extinguished in the late 1960s, after the limitations of perceptrons had been explicated by Marvin Minsky and Seymour Papert in their book, Perceptrons (Minsky and Papert, 1969). The paper "Learning representations by back-propagating errors" by David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams was published in Nature in 1986 and marked the birth of the back-propagation-trained, nonlinear neural network, which today rivals humans in its performance on a variety of artificial intelligence, such as speech and digit recognition (Rumelhart et al., 1986).

It took only a few years before such techniques were applied to the medical field. In 1990, William Baxt published a study entitled "Use of an Artificial Neural Network for Data Analysis in Clinical Decision-Making: The Diagnosis of Acute Coronary Occlusion" in the journal Neural Computation (Baxt, 1990). In the study, an artificial neural network outperformed a group of medical physicians in diagnosing heart attacks using findings from electrocardiograms (EKGs). This pioneering study helped to open the floodgates for a tsunami of biomedical machine learning research that persists even today. Indeed, searching for "machine learning" using the biomedical search engine PubMed returns only 9 results in 1990 and over 4,000 results in 2017, with the results steadily increasing in the intervening years:

Several factors are responsible for this acceleration in biomedical machine learning research. The first is the increasing number and availability of machine learning algorithms. The neural network is just one example of this. In the 1990s, medical researchers began using a variety of alternative algorithms, including recently developed algorithms such as decision trees, random forests, and support vector machines, in addition to traditional statistical models, such as logistic and linear regression.

The second factor is the increased availability of electronic clinical data. Prior to the 2000s, almost all medical data was on paper charts and conducting computerized machine learning studies meant hours of manually entering the data into computers. The growth and eventual spread of electronic medical records made it much simpler to use this data to make machine learning models. Additionally, more data meant more accurate models.

This brings us to the present day, in which healthcare analytics is experiencing an exciting time. Today's modern neural networks (commonly referred to as deep learning networks) are commonly outperforming humans in tasks that are more complex than EKG interpretation, such as cancer recognition from x-ray images and predicting sequences of future medical events in patients. Deep learning often achieves this using millions of patient records, coupled together with parallel computing technology that makes it possible to train large models in shorter time spans, as well as newly developed techniques for tuning, regularizing, and optimizing machine learning models. Another exciting occurrence in present healthcare analytics is the introduction of governmental incentives to eliminate excessive spending and misdiagnosis in healthcare. Such incentives have led to an interest in healthcare analytics not just from academic researchers, but also from industrial players and companies looking to save money for healthcare organizations (and to make themselves some money as well).

While healthcare analytics and machine algorithms aren't redefining medical care just yet, the future for healthcare analytics looks bright. Personally, I like to imagine a day when hospitals, equipped with cameras, privately and securely record every aspect of patient care, including conversations between patients and physicians and patient facial expressions as they hear the results of their own medical tests. These words and images could then be passed to machine learning algorithms to predict how patients will react to future results, and what those results will be in the first place. But we are getting ahead of ourselves; before we arrive at that day, there is much work to be done!