Book Image

KNIME Essentials

By : Gábor Bakos
Book Image

KNIME Essentials

By: Gábor Bakos

Overview of this book

KNIME is an open source data analytics, reporting, and integration platform, which allows you to analyze a small or large amount of data without having to reach out to programming languages like R. "KNIME Essentials" teaches you all you need to know to start processing your first data sets using KNIME. It covers topics like installation, data processing, and data visualization including the KNIME reporting features. Data processing forms a fundamental part of KNIME, and KNIME Essentials ensures that you are fully comfortable with this aspect of KNIME before showing you how to visualize this data and generate reports. "KNIME Essentials" guides you through the process of the installation of KNIME through to the generation of reports based on data. The main parts between these two phases are the data processing and the visualization. The KNIME variants of data analysis concepts are introduced, and after the configuration and installation description comes the data processing which has many options to convert or extend it. Visualization makes it easier to get an overview for parts of the data, while reporting offers a way to summarize them in a nice way.
Table of Contents (11 chapters)

Constraints


You can seldom trust the data you have because there can be network problems during import, or the program that was generated was wrongly parameterized, the program got invalid input, or the device you used to collect the data was used out of its operating conditions. For these reasons, it is a good practice to find constraints and check them after import or more complex transformations. You should also check the user input, and if it might cause hard-to-discover problems in later phases, report them as soon as you can.

The Flow Control/Switches nodes can be used to enable the workflow parts selectively (this is useful if the check of constraints is not always required, or it is too time consuming to be on by default or to try correcting the wrong data), but the loop-related nodes (Flow Control/Loop Support) are also useful when multiple columns should be tested and can handle complex conditions.

In the preceding screenshot, a flow variable comes from outside of the meta node,...