Book Image

R High Performance Programming

Book Image

R High Performance Programming

Overview of this book

Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Using columnar databases for improved performance


Most relational databases use a row-based data storage architecture—the data is stored in the database row by row. Whenever the database performs a query, it retrieves the relevant rows for the query before processing the query. This architecture is well suited for business transactional uses, where complete records (that is, including all columns) are written, read, updated, or deleted, a few rows at a time. For most statistical or analytical use cases, however, many rows of data, often with only a few columns, need to be read. As a result, row-based databases are sometimes inefficient at analytical tasks because they read entire records at a time regardless of how many columns are actually needed for analysis. The following figure depicts how a row-based database might compute the sum of one column.

Computing the sum of one column in a row-based database

The increase in demand for data analysis platforms in recent years has led to the development...