Book Image

Using OpenRefine

Book Image

Using OpenRefine

Overview of this book

Data today is like gold - but how can you manage your most valuable assets? Managing large datasets used to be a task for specialists, but the game has changed - data analysis is an open playing field. Messy data is now in your hands! With OpenRefine the task is a little easier, as it provides you with the necessary tools for cleaning and presenting even the most complex data. Once it's clean, that's when you can start finding value. Using OpenRefine takes you on a practical and actionable through this popular data transformation tool. Packed with cookbook style recipes that will help you properly get to grips with data, this book is an accessible tutorial for anyone that wants to maximize the value of their data. This book will teach you all the necessary skills to handle any large dataset and to turn it into high-quality data for the Web. After you learn how to analyze data and spot issues, we'll see how we can solve them to obtain a clean dataset. Messy and inconsistent data is recovered through advanced techniques such as automated clustering. We'll then show extract links from keyword and full-text fields using reconciliation and named-entity extraction. Using OpenRefine is more than a manual: it's a guide stuffed with tips and tricks to get the best out of your data.
Table of Contents (13 chapters)
Using OpenRefine
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

About the Reviewers

Martin Magdinier, during the last six years, has been heavily engaged with startup and open data communities in France, Vietnam, and Canada. Through his recent projects (TTCPass and Objectif Neige) and consulting positions, he became intimate with data massage techniques. Coming from a business approach, his focus is on data management and transformation tools that empower the business user. In 2011, he started to blog tips and tutorials on OpenRefine to help other business users to make the most out of this tool. In 2012, when Google released the software to the community, he helped to structure the new organization. Today, he continues to actively support the OpenRefine user base and advocates its usage in various communities.

Dr. Mateja Verlic is Head of Research at Zemanta and is an enthusiastic developer of the LOD-friendly distribution of OpenRefine. After finishing her PhD in Computer Science, she worked for two years as Assistant Professor at the University of Maribor, focusing mostly on machine learning, intelligent systems, text mining, and sentiment analysis. In 2011, when she joined Zemanta as an urban ninja and researcher, she began exploring the semantic web and has been really passionate about web technologies, lean startup, community projects, and open source software ever since.