Book Image

Learning Data Mining with R

By : Bater Makhabel
Book Image

Learning Data Mining with R

By: Bater Makhabel

Overview of this book

<p>Being able to deal with the array of problems that you may encounter during complex statistical projects can be difficult. If you have only a basic knowledge of R, this book will provide you with the skills and knowledge to successfully create and customize the most popular data mining algorithms to overcome these difficulties.</p> <p>You will learn how to manipulate data with R using code snippets and be introduced to mining frequent patterns, association, and correlations while working with R programs. Discover how to write code for various predication models, stream data, and time-series data. You will also be introduced to solutions written in R based on RHadoop projects. You will finish this book feeling confident in your ability to know which data mining algorithm to apply in any situation.</p>
Table of Contents (19 chapters)
Learning Data Mining with R
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Algorithms and Data Structures
Index

Search engines and the k-means algorithm


The general process of partition-based clustering is iterative. The first step defines or chooses a predefined number of representatives of the cluster and updates the representative after each iteration if the measure for the clustering quality has improved. The following diagram shows the typical process, that is, the partition of the given dataset into disjoint clusters:

The characteristics of partition-based clustering methods are as follows:

  • The resulting clusters are exclusive in most of the circumstances

  • The shape of the clusters are spherical, because of most of the measures adopted are distance-based measures

  • The representative of each cluster is usually the mean or medoid of the corresponding group (cluster) of points

  • A partition represents a cluster

  • These clusters are applicable for small-to-medium datasets

  • The algorithm will converge under certain convergence object functions, and the result clusters are often local optimum

The k-means clustering...