Book Image

Turning Spreadsheets into Corporate Data

By : Bill Inmon
Book Image

Turning Spreadsheets into Corporate Data

By: Bill Inmon

Overview of this book

Spreadsheets are a popular way to store and communicate business data, but, although they are easy to create and update, they are not reliable enough to be used for making important corporate decisions. With this book, you can gain insight into how to maintain spreadsheets, how to format them, and then convert them into a database of reliable and useful information. Turning Spreadsheets into Corporate Data starts with a quick history of spreadsheet usage. You’ll learn the basics of formatting spreadsheets, including how to handle special characters and column headings, and how to convert the spreadsheet first into an intermediate database and then into corporate data. You will also learn how to utilize the mnemonic dictionary that is created along with the intermediate database. The later chapters discuss the immutability of data and the importance of organizational and political considerations during the data transformation. By the end of this book, you’ll have the skills and knowledge needed to convert your spreadsheets into reliable corporate data.
Table of Contents (16 chapters)
Free Chapter
13: Case Study

A Final Option

If your .pdf is not available in any other format, as a last resort, you can try to infer the column name headings by using a double blank as a delimiter. This is demonstrated in Figure 4.1.

Once the OCR option has been used, the column headings are there.

So what are the problems with using a double blank as a delimiter? There are lots of problems. The first problem is reliability. In some cases it will not be obvious which values are applicable to which column headings.


The second problem is that a double blank is not always reliable. In the case of a long column name there may be no double blank. The third problem is that in some cases there may be more blanks than a double blank that separates column names. Clearly, there are some really messy problems that are associated with trying to delineate column names in a .pdf format.