Book Image

Microsoft Azure Machine Learning

By : Sumit Mund, Christina Storm
Book Image

Microsoft Azure Machine Learning

By: Sumit Mund, Christina Storm

Overview of this book

Table of Contents (21 chapters)
Microsoft Azure Machine Learning
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Splitting data


Quite often, you would need to split your dataset; most commonly, you would need to split a given dataset for analysis into train and test dataset. ML Studio comes with a Split module for this purpose. It lets you split your dataset into two datasets based on a specified fraction. So, if you choose 0.8, it outputs the first dataset with 80 percent of the input dataset, and the rest 20 percent as second output. You also have an option to split the data randomly. You can specify a random seed value other than 0 if you need to get the same result in a random split every time you run it. You can find the Split module under Data Transformation | Sample, and then Split it in the module palette:

Notice that the last parameter, Stratified split, is False by default, and you make it True only when you go for a stratified split, which means it groups first and then randomly selects rows from each strata (group). In this case, you need to specify the Stratification key column based on...