-
Book Overview & Buying
-
Table Of Contents
Codeless Deep Learning with KNIME
By :
To install KNIME Analytics Platform, follow these steps:
Figure 1.2 – Steps for downloading the KNIME Analytics Platform package
Once you've downloaded the package, locate it, start it, and follow the instructions that appear onscreen to install it in any directory that you have write permissions for.
Once it's been installed, locate your instance of KNIME Analytics Platform – from the appropriate folder, desktop link, application, or link in the start menu – and start it.
When the splash screen appears, a window will ask for the location of your workspace (Figure 1.3). This workspace is a folder on your machine that will host all your work. The default workspace folder is called knime-workspace:
Figure 1.3 – The KNIME Analytics Platform Launcher window asking for the workspace folder
After clicking Launch, the workbench for KNIME Analytics Platform will open.
The workbench of KNIME Analytics Platform is organized as depicted in Figure 1.4:
Figure 1.4 – The KNIME Analytics Platform workbench
The KNIME workbench consists of different panels that can be resized, removed by clicking the X on their tab, or reinserted via the View menu. Let's take a look at these panels:
The LOCAL workspace displays all workflows, saved in the workspace folder that were selected when KNIME Analytics Platform was started. The very first time the platform is opened, the LOCAL workspace only contains workflows and data in the Example Workflows folder. These are example applications to be used as starting points for your projects.
The EXAMPLES server is a read-only KNIME hosted server that contains many more example workflows, organized into categories. Just double-click it to be automatically logged in with read-only mode. Once you've done this, you can browse, open, explore, and download all available example workflows. Once you have located a workflow, double-click it to explore it or drag and drop it into LOCAL to create a local editable copy.
My-KNIME-Hub provides access to the KNIME community shared repository (KNIME Hub), either in public or private mode. You can use My-KNIME-Hub/Public to share your work with the KNIME community or My-KNIME-Hub/Private as a space for your current work.
Finally, at the very top, you can find the Top Menu, which includes menus for file management and preference settings, workflow editing options, additional views, node commands, and help documentation.
Besides the core software, KNIME Analytics Platform benefits from external extensions provided by the KNIME community. The install KNIME extensions and update KNIME commands, available in the File menu, allow you to expand your current instance with external extensions or update it to a newer version.
Under the top menu, a toolbar is available. When a workflow is open, the toolbar offers commands for workflow editing, node execution, and customization.
A workflow can be built by dragging and dropping nodes from the Node Repository panel onto the Workflow Editor window or by just double-clicking them. Nodes are the basic processing units of any workflow. Each node has several input and/or output ports. Data flows over a connection from an output port to the input port(s) of other nodes. Two nodes are connected – and the data flow is established – by clicking the mouse at the output port of the first node and releasing the mouse at the input port of the next node. A pipeline of such nodes makes a workflow.
In Figure 1.5, under each node, you will see a status light: red, yellow, or green:
Figure 1.5 – Node structure and status lights
When a new node is created, the status light is usually red, which means that the node's settings still need to be configured for the node to be able to execute its task.
To configure a node, right-click it and select Configure or just double-click it. Then, adjust the necessary settings in the node's dialog. When the dialog is closed by pressing the OK button, the node is configured, and the status light changes to yellow; this means that the node is ready to be executed. Right-clicking on the node again shows an enabled Execute option; pressing it will execute the node.
The ports on the left are input ports, where the data from the outport of the predecessor node is fed into the node. Ports on the right are outgoing ports. The result of the node's operation on the data is provided by the output port of the successor nodes. When you hover over the port, a tooltip will provide information about the output dimension of the node.
Important note
Only ports of the same type can be connected!
Data ports (black triangles) are the most common type of node ports and transfer flat data tables from node to node. Database ports (brown squares) transfer SQL queries from node to node. Many more node ports exist and transfer different objects from one node to the next.
After successful execution, the status light of the node turns green, indicating that the processed data is now available on the outports. The result(s) can be inspected by exploring the outport view(s): the last entries in the context menu open them.
With that, we have completed our quick tour of the workbench in KNIME Analytics Platform.
Now, let's take a look at where we can find starting examples and help.
At this point, we have already looked at the KNIME Hub already. The KNIME Hub (https://hub.knime.com/) is a very useful public repository for applications, extensions, examples, and tutorials provided by the KNIME community. Here, you can share your workflows and download workflows that have been created by other KNIME users. Just type in some keywords and you will get a list of related workflows, components, extensions, and more. For example, just type in read file and you will get a list of example workflows illustrating how to read CSV files, .table files, Excel files, and so on. (Figure 1.6):
Figure 1.6 – Resulting list of workflows from searching for "read file" on the KNIME Hub
All workflows described in this book are also available on the KNIME Hub for you: https://hub.knime.com/kathrin/spaces/Codeless%20Deep%20Learning%20with%20KNIME/latest/.
Once you've isolated the workflow you are interested in, click on it to open its page, and then download it or open it in KNIME Analytics Platform to customize it to your own needs.
On the other hand, to share your work on the KNIME Hub, just copy your workflows from your local workspace into the My-KNIME-Hub/Public folder in the KNIME Explorer panel within the KNIME workbench. It will be automatically available to all members of the KNIME community.
The KNIME community is also very active, with tips and tricks available on the KNIME Forum (https://forum.knime.com/). Here, you can ask questions or search for answers.
Finally, contributions from the community are available as posts on the KNIME Blog (https://www.knime.com/blog), as books via KNIME Press (https://www.knime.com/knimepressE TV () channel on YouTube.
The two books KNIME Beginner's Luck and KNIME Advanced Luck provide tutorials for those users who are starting out in data science with KNIME Analytics Platform.
Now, let's build our first workflow, shall we?
In this section, we'll build our first, simple, small workflow. We'll start with something basic: reading data from an ASCII file, performing some filtering, and displaying the results in a bar chart.
In KNIME Explorer, do the following:
a) Right-click LOCAL (or anywhere you want your folder to be).
b) Select New Workflow Group... (as shown in Figure 1.7), and, in the window that opens, name it Chapter 1.
Important note
Folders in KNIME Explorer are called Workflow Groups.
Similarly, you can create a new workflow, as follows:
a) Right-click the Chapter 1 folder (or anywhere you want your workflow to be).
b) Select New KNIME Workflow (as shown in Figure 1.7) and, in the window that opens, name it My_first_workflow.
After clicking Finish, the Workflow Editor will open the canvas for the empty workflow.
Tip
By default, the canvas for a new workflow opens with the grid on; to turn it off, click the Open the settings dialog for the workflow editor button (the button before the last one) in the toolbar. This button opens a window where you can customize the workflow's appearance (for example, allowing curved connections) and perform editing (turn the grid on/off).
Figure 1.7 shows the New Workflow Group... option in the KNIME Explorer's context menu. It allows you to create a new, empty folder:
Figure 1.7 – Context menu for creating a new folder and a new workflow in KNIME Explorer
The first thing we need to do in our workflow is read an ASCII file with the data. Let's read the adult.csv file that comes with the installation of KNIME Analytics Platform. This can be found under Example Workflows/The Data/Basics. adult.csv is a US Census public file that describes 30K people by age, gender, origin, and professional and private life.
Let's create the node so that we can read the adult.csv ASCII file:
a) In the Node Repository, search for the File Reader node (it is actually located in the IO/Read category).
b) Drag and drop the File Reader node onto the Workflow Editor panel.
c) Alternatively, just double-click the File Reader node in the Node Repository; this will automatically create it in the Workflow Editor panel.
In Figure 1.8, see the File Reader node located in the Node Repository:
Figure 1.8 – The File Reader node under IO/Read in the Node Repository
Now, let's configure the node so that it reads the adult.csv file. Double-click the newly created File Reader node in the Workflow Editor and manually configure it with the file path to the adult.csv file. Alternatively, just drag and drop the adult.csv file from the KNIME Explorer panel (or from anywhere on your machine) onto the Workflow Editor window. You can see this action in Figure 1.9:
Figure 1.9 – Dragging and dropping the adult.csv file onto the Workflow Editor panel.
This automatically generates a File Reader node that contains most of the correct configuration settings for reading the file.
Tip
The Advanced button in the File Reader configuration window leads you to additional advanced settings: reading files with special characters, such as quotes; allowing lines with different lengths; using different encodings; and so on.
To execute this node, just right-click it and from the context menu, select Execute; alternatively, click on the Execute buttons (single and double white arrows on a green background) that are available in the toolbar.
To inspect the output data table that's produced by this node's execution, right-click on the node and select the last option available in the context menu. This opens the data table that appears as a result of reading the adult.csv file. You will notice columns such as Age, Workclass, and so on.
Important note
Data in KNIME Analytics Platform is organized into tables. Each cell is uniquely identified via the column header and the row ID. Therefore, column headers and row IDs need to have unique values.
fnlwgt is one column for which we were never sure of what it meant. So, let's remove it from further analysis by using the Column Filter node.
To do this, search for Column Filter in the search box above the Node Repository, then drag and drop it onto the Workflow Editor and connect the output of the File Reader node to the input of the Column Filter node. Alternatively, we can select the File Reader node in the Workflow Editor panel and then double-click the Column Filter node in the Node Repository. This automatically creates a node and its connections in the Workflow Editor.
The Column Filter node and its configuration window are shown in Figure 1.10:
Figure 1.10 – Configuring the Column Filter node to remove the column named fnlwgt from the input data table
Again, double-click or right-click the node and then select Configure to configure it. This configuration window contains three options that can be selected via three radio buttons: Manual Selection, Wildcard/Regex Selection, and Type Selection. Let's take a look at these in more detail:
Since this is our first workflow, we'll go for the easiest approach; that is, Manual Selection. Go to the Manual Selection tab and transfer the fnlwgt column to the Exclude set via the buttons in-between the two frames (these can be seen in Figure 1.10).
After executing the Column Filter node, if we inspect the output data table (right-click and select the last option in the context menu), we'll see a table that doesn't contain the fnlwgt column.
Now, let's extract all the records of people who work more than 20 hours/week. hours-per-week is the column that contains the data of interest.
For this, we need to create a Row Filter node and implement the required condition:
IF hours-per-week > 20 THEN Keep data row.
Again, let's locate the Row Filter node in the Node Repository panel, drag and drop it (or double-click it) into the Workflow Editor, connect the output port of the Column Filter node to its input port, and open its configuration window.
In the configuration window of the Row Filter node (Figure 1.11), we'll find three default filtering criteria: use pattern matching, use range checking, and only missing values match. Let's take a look at what they do:
The default behavior is to include the matching data rows in the output data table. However, this can be changed by enabling Exclude rows by attribute value via the radio buttons on the left-hand side of the configuration window.
Alternative filtering criteria can be done by row number or by row ID. This can also be enabled via the radio buttons on the left-hand side of the configuration window:
Figure 1.11 – Configuring the Row Filter node to keep only rows with hours-per-week > 20 in the input data table
After execution, upon opening the output data table (Figure 1.12), no data rows with hours-per-week < 20 should be present:
Figure 1.12 – Right-clicking a successfully executed node and selecting the last option shows the data table that was produced by the node
Now, let's look at some very basic visualization. Let's visualize the number of men versus women in this dataset, which contains people who work more than 20 hours/week:
Figure 1.13 – The Bar Chart node and its configuration window
To do this, locate the Bar Chart node in the Node Repository, create an instance in the workflow, connect it to receive input from the Row Filter node, and open its configuration window (Figure 1.13). Here, there are four tabs we can use for configuration purposes. Options covers all data settings, General Plot Options covers all plot settings, Control Options covers all control options, and Interactivity covers all subscription events when it comes to interacting with other plots, views, and charts when they've been assembled to create a component. Again, since this is just a beginner's workflow, we'll adopt all the default settings and just set the following:
sex, ensuring it appears on the x axis. Then, select Occurrence Count in order to count the number of rows by sex.This node does not produce data, but rather a view of the bar chart. So, to inspect the results produced by this node after its execution, right-click it and select the central option; that is, Interactive View: Group Bar Chart (Figure 1.14):
Figure 1.14 – Right-clicking a successfully executed visualization node and selecting the Interactive View: Grouped Bar Chart option to see the chart/plot that has been produced
Notice the three buttons in the top-right corner of the view on the right of Figure 1.14. These three buttons enable zooming, toggling to full screen, and node settings, respectively. From the view itself, you can explore how the chart would look if different settings were to be selected, such as a different category column or a different title.
Important note
Most data visualization nodes produce a view and not a data table. To see the respective view, right-click the successfully executed node and select the Interactive View: … option.
The second lower input port of the Bar Chart node is optional (a white triangle) and is used to read a color map so that you can color the bars in the bar chart.
Important note
Note that a number of different data visualization nodes are available in the Node Repository: JavaScript, Local(Swing), Plotly, and so on. JavaScript and Plotly nodes offer the highest level of interactivity and the most polished graphics. We used the Bar Chart node from the JavaScript category in the Node Repository panel here.
Now, we'll add a few comments to document the workflow. You can add comments at the node level or at the general workflow level.
Each node in the workflow is created with a default label of Node xx under it. Upon double-clicking it, the node label editor appears. This allows you to customize the text, the font, the color, the background, and other similar properties of the node (Figure 1.15). We need to write a little comment under each node to make it clear what tasks they are implementing:
Figure 1.15 – Editor for customizing the labels under each node
You can also write annotations at the workflow level. Just right-click anywhere in the Workflow Editor and select New Workflow Annotation. A yellow frame will appear in editing mode. Here, you can add text and customize it, as well as its frame. To close the annotation editor, just click anywhere else in the Workflow Editor. To reopen the annotation editor, double-click in the top-left corner of the annotation (Figure 1.16):
Figure 1.16 – Creating and editing workflow annotations
Congratulations! You have just built your first workflow with KNIME Analytics Platform. It should look something like the one in Figure 1.17:
Figure 1.17 – My_first_Workflow
That was a quick introduction to how to use KNIME Analytics Platform.
Now, let's make sure we have KNIME Deep Learning – Keras Integration installed and functioning.
Change the font size
Change margin width
Change background colour