Book Image

Apache Superset Quick Start Guide

By : Shashank Shekhar
Book Image

Apache Superset Quick Start Guide

By: Shashank Shekhar

Overview of this book

Apache Superset is a modern, open source, enterprise-ready business intelligence (BI) web application. With the help of this book, you will see how Superset integrates with popular databases like Postgres, Google BigQuery, Snowflake, and MySQL. You will learn to create real time data visualizations and dashboards on modern web browsers for your organization using Superset. First, we look at the fundamentals of Superset, and then get it up and running. You'll go through the requisite installation, configuration, and deployment. Then, we will discuss different columnar data types, analytics, and the visualizations available. You'll also see the security tools available to the administrator to keep your data safe. You will learn how to visualize relationships as graphs instead of coordinates on plain orthogonal axes. This will help you when you upload your own entity relationship dataset and analyze the dataset in new, different ways. You will also see how to analyze geographical regions by working with location data. Finally, we cover a set of tutorials on dashboard designs frequently used by analysts, business intelligence professionals, and developers.
Table of Contents (10 chapters)

Distribution – histogram

After uploading the file as a table, open it for visualization and select the Histogram option. Make sure that start_date is selected as Time Column. The Time window defined between Since and Until must be large enough to include all the books, because we do not want to do any Time window-specific analysis.

Page count is an important feature in the dataset, where each row is a book. It is a numerical value. So, to begin with let's look at a distribution plot of page counts. It will give us a sense of the variance in the feature value:

Data form for a histogram chart

The number of bins in a histogram limits the granularity of questions we can answer about the variance of the feature:

Distribution plot of page counts

Because we have set five bins, what is identifiable is that about 41-42 out of 93 books (approx. 44%-45%) have page counts of...