Book Image

Data Analytics Made Easy

By : Andrea De Mauro
4 (1)
Book Image

Data Analytics Made Easy

4 (1)
By: Andrea De Mauro

Overview of this book

Data Analytics Made Easy is an accessible beginner’s guide for anyone working with data. The book interweaves four key elements: Data visualizations and storytelling – Tired of people not listening to you and ignoring your results? Don’t worry; chapters 7 and 8 show you how to enhance your presentations and engage with your managers and co-workers. Learn to create focused content with a well-structured story behind it to captivate your audience. Automating your data workflows – Improve your productivity by automating your data analysis. This book introduces you to the open-source platform, KNIME Analytics Platform. You’ll see how to use this no-code and free-to-use software to create a KNIME workflow of your data processes just by clicking and dragging components. Machine learning – Data Analytics Made Easy describes popular machine learning approaches in a simplified and visual way before implementing these machine learning models using KNIME. You’ll not only be able to understand data scientists’ machine learning models; you’ll be able to challenge them and build your own. Creating interactive dashboards – Follow the book’s simple methodology to create professional-looking dashboards using Microsoft Power BI, giving users the capability to slice and dice data and drill down into the results.
Table of Contents (14 chapters)
10
And now?
12
Other Books You May Enjoy
13
Index

To get the most out of this book

Let me share a few tips that will help along the path you are about to begin:

  • The book comes with multiple step-by-step, hands-on tutorials that are integral to the development path I have designed for you. Some of the most subtle—and fascinating—aspects of data analytics (like the need to interact with business partners and go back-and-forth during the setup process) can only be understood through real examples: tutorials do a great job explaining them. I strongly suggest you set some quality time for completing each tutorial. The entire execution of a tutorial will take up to two hours: make sure you have access to a computer on which you can install all the required software.
  • At the end of the book, you will find a short series of useful resources organized by chapter. They offer an opportunity to complement your learning experience with selected additional readings. So don't forget to skim through them after you complete a chapter and see if any of them intrigues you.
  • Depending on your background, some parts of the book might feel less "natural" to you. This is normal. Don't get discouraged if any portion of the book is less clear: the chapters are all closely interconnected to each other, and you might find the answers to your doubts in some subsequent pages, so just keep going until the end.
  • Use the book's GitHub and KNIME Hub pages to download all the data you need to complete the tutorials. In there, you will also find the final result of each tutorial (like the complete KNIME workflow or the resulting Power BI dashboard). If you feel lost, you can refer to them to find your way forward.
  • Software improves and updates continuously. This book relies on the latest versions of KNIME and Power BI available at the time of its launch (precisely: KNIME 4.4 and Power BI 2.93). Although the bulk of the content will stay valid for a while, it might be that some of the steps in a tutorial change slightly, making the windows a bit different versus what you find in the figures. This is unavoidable and will not jeopardize your learning. Keep an eye on the book's web page for any errata or addenda I post in case of any significant divergences due to a new version of the software. You can also get in touch with me using the contact details you find in the And Now? section.
  • This book is targeted toward the "business application" of data analytics. For this reason, whenever I had to choose between a rigorous mathematical dissertation and a pragmatic and intuitive explanation of an analytical method, I went for the latter. The rationale is that you will always have the time to learn how to make statistical learning more accurate and dive deeper into the math behind the algorithms you will learn. Thus, the focus will stay on empowering you to use analytics in your work more than giving you the formal description behind each mathematical concept.

Download the data files

All the data and the supporting files related to the tutorials presented in the book are also hosted on GitHub at https://github.com/PacktPublishing/Data-Analytics-Made-Easy. The data and the completed KNIME workflows are also available on KNIME Hub, at the address http://tiny.cc/knimehub.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801074155_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates user input, code words in text, highlighted keywords in code, paths, and file names. For example: "In the configuration window, type the expression $Quantity$*$Price$, to calculate revenues." In this book, you will find only a handful of blocks of example code in Chapter 9, Extending Your Toolbox. They will look like this:

predictions = model.predict(test_set) 
print('R2 score is',r2_score(test_set.Rent,predictions)) 
print('Root Mean Squared Error is', \
       np.sqrt(mean_squared_error(test_set.Rent,predictions)))

Bold: Indicates a new term, an important word, a KNIME node, or words that you see on the screen, like in menus or dialog boxes. For example: "Click OK to close the window and execute the CSV Reader node."

Italic: Is used to emphasize specific words in the context of a sentence and when referring to columns in a dataset, like in: "Neighborhood is the single most useful column when predicting the rent, followed by the Surface of the property."

Warnings, important notes, or interesting facts appear like this.

Tips and tricks appear like this.