Book Image

Apache Superset Quick Start Guide

By : Shashank Shekhar
Book Image

Apache Superset Quick Start Guide

By: Shashank Shekhar

Overview of this book

Apache Superset is a modern, open source, enterprise-ready business intelligence (BI) web application. With the help of this book, you will see how Superset integrates with popular databases like Postgres, Google BigQuery, Snowflake, and MySQL. You will learn to create real time data visualizations and dashboards on modern web browsers for your organization using Superset. First, we look at the fundamentals of Superset, and then get it up and running. You'll go through the requisite installation, configuration, and deployment. Then, we will discuss different columnar data types, analytics, and the visualizations available. You'll also see the security tools available to the administrator to keep your data safe. You will learn how to visualize relationships as graphs instead of coordinates on plain orthogonal axes. This will help you when you upload your own entity relationship dataset and analyze the dataset in new, different ways. You will also see how to analyze geographical regions by working with location data. Finally, we cover a set of tutorials on dashboard designs frequently used by analysts, business intelligence professionals, and developers.
Table of Contents (10 chapters)

Adding a database

The navigation bar lists all the features. The Sources section is where you will create and maintain database integrations and configure table schemas to use as sources of data.

Any SQL database that has a SQLAlchemy connector such as PostgreSQL, MySQL, SQLite, MongoDB, and Snowflake can work with Superset.

Depending on the databases that we connect to Superset, the corresponding SQLAlchemy connectors have to be installed:

Database
PyPI package
MySQL
mysqlclient
PostgreSQL
psycopg2
Presto
pyhive
Hive
pyhive
Oracle
cx_oracle
SQLite
Included in Superset
Snowflake
snowflake-sqlalchemy
Redshift
sqlalchemy-redshift
MS SQL
pymssql
Impala
impyla
Spark SQL
pyhive
Greenplum
psycopg2
Athena
PyAthenaJDBC>1.0.9
Vertica
sqlalchemy-vertica-python
ClickHouse
sqlalchemy-clickhouse
Kylin
kylinpy
BigQuery
pybigquery

It is recommended that you use a database that supports the creation of views. When columns from more than one table have to be fetched for visualization, views of those joins can be created in the database and visualized on Superset, because table joins are not supported in Superset.

SQL query execution for fetching data and rendering visualizations is done at the database level, and Superset only fetches results afterwards. A database with a query execution engine that scales with your data will make your dashboard more real time.

In this book, we will work with public datasets available in Google BigQuery. We have already installed a connector for BigQuery in our installation routine, using the pip install pybigquery command. We have set up authentication for BigQuery using a key file. You should verify that, by confirming that the environment variable points to the valid key file:

echo $GOOGLE_APPLICATION_CREDENTIALS
# It should return
> /home/<your user name>/.google_cdp_key.json

Now, let's add BigQuery as a database in three steps:

  1. Select the Databases option from the drop-down list and create (+) your first database
  2. Set Database to superset-bigquery and SQLAlchemy URI to bigquery://
  3. Save the database

You can verify the database connection by clicking on the Test Connection button; it should return Seems OK! as follows:

Seems OK! dialog box is generated when test connection to database is successful