Book Image

Healthcare Analytics Made Simple

By : Vikas (Vik) Kumar, Shameer Khader
Book Image

Healthcare Analytics Made Simple

By: Vikas (Vik) Kumar, Shameer Khader

Overview of this book

In recent years, machine learning technologies and analytics have been widely utilized across the healthcare sector. Healthcare Analytics Made Simple bridges the gap between practising doctors and data scientists. It equips the data scientists’ work with healthcare data and allows them to gain better insight from this data in order to improve healthcare outcomes. This book is a complete overview of machine learning for healthcare analytics, briefly describing the current healthcare landscape, machine learning algorithms, and Python and SQL programming languages. The step-by-step instructions teach you how to obtain real healthcare data and perform descriptive, predictive, and prescriptive analytics using popular Python packages such as pandas and scikit-learn. The latest research results in disease detection and healthcare image analysis are reviewed. By the end of this book, you will understand how to use Python for healthcare data analysis, how to import, collect, clean, and refine data from electronic health record (EHR) surveys, and how to make predictive models with this data through real-world algorithms and code examples.
Table of Contents (11 chapters)

Importing the dataset

Before we load the dataset, there are some important facts about the data that must be acknowledged:

  • The data is in a fixed-width format, meaning that there is no delimiter. Column widths will have to be specified manually.
  • There is no header row that has column names.
  • If you were to open the data file using a text editor, you would see rows of data simply containing numbers.

Because column widths are necessary for importing .fwf files, we must import those first into our session. We have therefore made a helper .csv file, titled ED_metadata.csv, that contains the width, name, and variable type of each column. Our data only has 579 columns, so making such a file only took a couple of hours. If you have a bigger dataset, you may have to rely on automated width detection methods and/or more team members to do the grunt work of creating a schema for your data...