Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Power BI Machine Learning and OpenAI
  • Table Of Contents Toc
Power BI Machine Learning and OpenAI

Power BI Machine Learning and OpenAI

By : Greg Beaumont
4.9 (8)
close
close
Power BI Machine Learning and OpenAI

Power BI Machine Learning and OpenAI

4.9 (8)
By: Greg Beaumont

Overview of this book

Microsoft Power BI is the ultimate solution for businesses looking to make data-driven decisions and unlock the full potential of their data. Unleashing Your Data with Power BI Machine Learning and OpenAI is designed for data scientists and BI professionals seeking to improve their existing solutions and workloads using AI. The book explains the intricacies of the subject by using a workshop-style data story for data ingestion, data modeling, analytics, and predictive analytics with Power BI machine learning. Along the way, you’ll learn about AI features, AI visuals, R/Python integration, and OpenAI integration. The workshop-style content allows you to practice all your learnings in real-life challenges and gain hands-on experience. Additionally, you’ll gain an understanding of AI/ML, step by step, with replicable examples and references. From enhancing data visualizations to building SaaS Power BI ML models, and integrating Azure OpenAI, this book will help you unlock new capabilities in Power BI. By the end of this book, you’ll be well-equipped to build ML models in Power BI, plan projects for both BI and ML, understand R/Python visuals with Power BI, and introduce OpenAI to enhance your analytics solutions.
Table of Contents (21 chapters)
close
close
1
Part 1: Data Exploration and Preparation
6
Part 2: Artificial Intelligence and Machine Learning Visuals and Publishing to the Power BI Service
10
Part 3: Machine Learning in Power BI
15
Part 4: Integrating OpenAI with Power BI

Designing a preliminary data model

Earlier in this chapter, we made two simple assumptions about data modeling:

  • Most of the time, a star schema design will provide the most efficient storage and query performance for business intelligence data models
  • Basic ML models, such as the ones you can build in this book, are usually created with a flattened table

Now that you have a grasp of the underlying data and requirements, it is time to think about the data model for your FAA Wildlife Strike data solution. Logically, you can describe your tables of data as follows:

  • STRIKE_REPORTS (from wildlife.accdb): Each row represents a report that was filed. The table of data contains both descriptive values (date, location, and type) along with values that can be summed up and averaged (height and costs).
  • Engine Codes (from read-me.xls): This contains information about the aircraft engines that can be tied to STRIKE_REPORTS.
  • Aircraft Type (from read-me.xls): This contains information about the aircraft that can be tied to STRIKE_REPORTS.
  • Engine Position (from read-me.xls): This contains information about the aircraft engine positions that can be tied to STRIKE_REPORTS.

At this point, you are faced with some data model choices. No matter what decision you make, some people might question your architecture, since there is no perfect design. Depending on how end users will use the data, the data model design may change. This book will demonstrate some of the differences in data model designs for ML models versus traditional BI designs. At a high level, there are three basic approaches you can take in Power BI:

  • Flatten: You can flatten all the data onto a single table by joining Engine Codes, Aircraft Type, and Engine Position onto STRIKE_REPORTS.
  • Star schema: You can build out a true star schema with STRIKE_REPORTS as a fact table and Engine Codes, Aircraft Type, and Engine Position as dimension tables. Some additional data from STRIKE_REPORTS would also be broken out into separate dimension tables. For example, AIRPORT_ID, AIRPORT, STATE, and FAAREGION could be separate dimension tables.
  • Hybrid design: You can build out a hybrid design using both a flattened and star schema design pattern for the sake of practicality and ease of use.

Let’s look at each of these in turn.

Flattening the data

Flattening the FAA Wildlife Strike reports’ data would require joining the Engine Codes, Aircraft Type, and Engine Position tables onto the STRIKE_REPORTS table so that everything is on one big flat table of data. The result would be something that looks like this:

Figure 1.12 – Tables on the left are combined to form a single table on the right

Figure 1.12 – Tables on the left are combined to form a single table on the right

The following table contains some, but not all, of the pros and cons of a flattened table of data:

Pros

Cons

  • Simplicity
  • No joins needed for code
  • Commonly used by data scientists
  • Can compress well with columnar databases
  • No relational data models for business users
  • Repetitive data can lead to an inefficient storage footprint
  • Limitations for queries with advanced logic
  • Less flexibility for future change and evolution of solution
  • Complex logical queries can be less efficient

Figure 1.13 – Pros and cons of a flattened table for BI

Next, let’s look at the star schema.

Star schema

A true star schema built to best practices would include relationships between Engine Codes, Aircraft Type, and Engine Position with the STRIKE_REPORTS table. It would also break off parts of the STRIKE_REPORTS table into smaller dimension tables. The following figure is a representation of the approach for a true star schema. There may be more dimension tables that would need to be broken off of the STRIKE_REPORTS table in addition to Location and Species, but this is an example of how it might look:

Figure 1.14 – Tables on the left are combined into a star schema, and some data is split off into new dimension tables

Figure 1.14 – Tables on the left are combined into a star schema, and some data is split off into new dimension tables

The following table contains some, but not all, of the pros and cons of a true star schema design:

Pros

Cons

  • Tables often line up with business logic
  • Balance of minimal data duplication and efficient queries
  • Usually expandable if the scope of the solution grows and new data is introduced
  • Traditionally considered the gold standard for BI data models
  • With modern tools, the benefits of reducing data duplication are less impactful versus older tools
  • Complicated ETL
  • Machine learning models are usually trained with flat tables of data
  • Don’t always scale well with very large data volumes having tens of billions of rows

Figure 1.15 – Pros and cons of a star schema for BI

Hybrid design

For the FAA Wildlife Strike data, combining aspects of a flattened design and a star schema is also an option. At the time of this book’s writing, the entire STRIKE_REPORTS table is fewer than 300,000 rows and has fewer than 100 columns. Only two columns contain verbose free text, so data volume is not an issue when using Power BI. For this particular use case, the differences in data storage requirements between flattened and star schema data models are minimal. With data volumes of this small size, you can design the data model to meet the needs of the solution without some of the performance concerns that would be introduced for data sources with tens of millions of rows or hundreds of columns containing free text fields. Columns of data left on a transaction table that will be used as categories are technically called degenerate dimensions. A hybrid design could look something like the following example:

Figure 1.16 – Tables from the left are enhanced and combined into a star schema with some descriptive data still in the fact table

Figure 1.16 – Tables from the left are enhanced and combined into a star schema with some descriptive data still in the fact table

The following table contains some, but not all, of the pros and cons of a hybrid design:

Pros

Cons

  • Rapid prototyping
  • Less logic in the data transformation layer
  • Flexible design
  • Possibly less performant than a star schema for traditional BI
  • Additional logical complexity for users versus a big flat table
  • Data will still need to be flattened out for machine learning
  • Not perfect for either BI or ML, but a compromise between the two

Figure 1.17 – Pros and cons of a hybrid design and considerations for additional data

Before finalizing a preliminary logical design for your FAA Wildlife Strike solution, take a step back to think about the data and the requirements. You can review the expected deliverables from earlier in the chapter, including an analytic report and predictions of damage, size, and height.

In addition to the FAA Wildlife Strike data you’ve been using, what other data might be useful for the solution? Also, what is the effort to get the data? Here are a few examples that you could research:

Additional Data Sources

Level of Effort

Date-based table of aggregations such as Month, Quarter, Season, and Holidays

Easy

Time-based table of aggregations such as hour, AM/PM, and so on

Easy

Data for flights that didn’t have a wildlife strike could provide a baseline for the percentage of flights with strikes

Difficult

Weather data that could be mapped to the date and time of wildlife strikes

Difficult

Additional data about wildlife species such as weight ranges, habitat ranges, and so on

Difficult

Figure 1.18 – Additional potential data sources for the solution

Additional flight, weather, and wildlife data could provide greater analytic and predictive value for the solution. However, adding those sources would require quite a bit of effort that exceeds the scope of your project and the length of this book. If the initial project goes well, you can circle back to these options for future enhancements.

There may be value in adding a Time table to the solution, so open up Power Query and take another look at the TIME column. Notice that 95% of the entries are empty:

Figure 1.19 – 95% of the values for TIME are empty

Figure 1.19 – 95% of the values for TIME are empty

Due to a lack of complete data, you decide to leave a Time table out of the initial build.

How about a Date table so that you can roll up data by week, month, quarter, year, holidays, weekends, and more? Looking at the INCIDENT_DATE column in Power Query, it is populated for every entry in the preview:

Figure 1.20 – INCIDENT_DATE is fully populated with date values

Figure 1.20 – INCIDENT_DATE is fully populated with date values

INCIDENT_DATE can be used as a key for connecting to a Date table containing many different date-based aggregations. You decide to pull in a Date table for the architecture. The resulting preliminary data model will now look as follows:

Figure 1.21 – A Date table is added to the preliminary data model

Figure 1.21 – A Date table is added to the preliminary data model

The Date table was not present in the source data, but in your reporting model, it will allow you to slice and dice data by day, week, month, quarter, year, weekend, and more. When you explore data in future chapters, it will add new ways to dive into and explore date-based trends. The Date table will be added in Chapter 2.

In the final section of the chapter, we’ll look at what else we need to take into account for ML.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Power BI Machine Learning and OpenAI
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon