Exploring Data with RapidMiner

Exploring Data with RapidMiner

By : Andrew Chisholm

Buy this Book

Exploring Data with RapidMiner

By: Andrew Chisholm

Buy this Book

Overview of this book

Data is everywhere and the amount is increasing so much that the gap between what people can understand and what is available is widening relentlessly. There is a huge value in data, but much of this value lies untapped. 80% of data mining is about understanding data, exploring it, cleaning it, and structuring it so that it can be mined. RapidMiner is an environment for machine learning, data mining, text mining, predictive analytics, and business analytics. It is used for research, education, training, rapid prototyping, application development, and industrial applications. Exploring Data with RapidMiner is packed with practical examples to help practitioners get to grips with their own data. The chapters within this book are arranged within an overall framework and can additionally be consulted on an ad-hoc basis. It provides simple to intermediate examples showing modeling, visualization, and more using RapidMiner. Exploring Data with RapidMiner is a helpful guide that presents the important steps in a logical order. This book starts with importing data and then lead you through cleaning, handling missing values, visualizing, and extracting additional information, as well as understanding the time constraints that real data places on getting a result. The book uses real examples to help you understand how to set up processes, quickly. This book will give you a solid understanding of the possibilities that RapidMiner gives for exploring data and you will be inspired to use it for your own work.

Exploring Data with RapidMiner

Credits

About the Author

About the Reviewer

www.PacktPub.com

Preface

Free Chapter

Setting the Scene

A process framework

Data volume and velocity

Data variety, formats, and meanings

Accompanying material

Summary

Loading Data

Reading files

Databases

Using macros

Summary

Visualizing Data

Getting started

Statistical summaries

Relationships between attributes

Time series data

Relations between examples

Summary

Parsing and Converting Attributes

Generating attributes

Renaming attributes

Summary

Outliers

Manual inspection

Automated detection of example outliers

Summary

Missing Values

Missing or empty?

Types of missing data

Categorizing missing data

Effect of missing data

Options for handling missing data

Summary

Transforming Data

Creating new attributes

Summary

Reducing Data Size

Removing examples using sampling

Removing attributes

Summary

Resource Constraints

Measuring and estimating performance

Adding memory

Parallel processing

Restructuring processes

Summary

Debugging

Breakpoints in RapidMiner Studio

Logging data in RapidMiner Studio

RapidMiner Studio console printing

Groovy scripts

Regex tools

Using XPath effectively

Summary

Taking Stock

Exploring new techniques

Where to go next

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Logging data in RapidMiner Studio

RapidMiner Studio provides the Log operator, which we have already seen being used in the previous chapters. Of all the operators, this is the one that I use a great deal, both for debugging and for creation of data.

Dealing with logging first, the Log operator can be inserted anywhere in a process and is configured to output the parameters or values associated with another operator somewhere in the process. For example, the screenshot that follows shows some example parameters for the Log operator:

The left-most column becomes the column name in the log, the second column is the name of the operator within the process, the third column is the type of information (either value or parameter), and the final column is the name of the information to log, which is filled in automatically with valid options by the RapidMiner Studio GUI. The value option is used to log the result of the execution of an operator, whereas the parameter option is used to log the parameters...

Exploring Data with RapidMiner

By : Andrew Chisholm

Exploring Data with RapidMiner

By: Andrew Chisholm

Overview of this book

Related Content you might be interested in

Current Title:

Exploring Data with RapidMiner

Logging data in RapidMiner Studio