Book Image

OCA Oracle Database 11g: SQL Fundamentals I: A Real World Certification Guide ( 1ZO-051 )

By : Steve Ries
Book Image

OCA Oracle Database 11g: SQL Fundamentals I: A Real World Certification Guide ( 1ZO-051 )

By: Steve Ries

Overview of this book

<p>The Oracle Database 11g: SQL Fundamentals I exam is the first stepping stone in getting the Oracle Certified Associate Certification for Oracle Database 11g. The SQL programming language is used in every major relational database today and understanding the real world application of it is the key to becoming a successful DBA.<br /><br />This book gives you: the essential real world skills to master relational data manipulation with Oracle SQL and prepares you to become an Oracle Certified Associate. Beginners are introduced to concepts in a logical manner while practitioners can use it as a reference to jump to relevant concepts directly.<br /><br />We begin with the essentials of why databases are important in today's information technology world and how they work.<br /><br />We continue by explaining the concepts of querying and modifying data in Oracle using a range of techniques, including data projection, selection, creation, joins, sub-queries and functions. Finally, we learn to create and manipulate database objects and to use them in the same way as today's expert SQL programmers.<br /><br />This book prepares you to master the fundamentals of the SQL programming language using an example-driven method that is easy to understand<br /><br />This definitive certification guide provides a disciplined approach to be adopted for successfully clearing the 1Z0-051 SQL Fundamentals I exam, which is the first stepping stone towards attaining the OCA on Oracle Database 11g certification.<br /><br />Each chapter contains ample practice questions at the end. A full-blown mock test is included for practice so you can test your knowledge and get a feel for the actual exam.</p>
Table of Contents (20 chapters)
OCA Oracle Database 11g: SQL Fundamentals I: A Real-World Certification Guide
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Relational Database Management Systems


Imagine, for a moment, that you have the telephone books for the 20 largest cities in the U.S. I give you the following request: Please find all the phone numbers for individuals named Rick Clark in the Greater Chicago area. In order to satisfy the request, you simply do the following:

  • Open the Chicago phone book

  • Scan to the "C" section of names

  • Find all individuals that match "Clark, Rick"

  • Report back their phone numbers

Now imagine that I take each phone book, tear out all of the pages, and throw them into the air. I then proceed to shuffle the thousands of pages on the ground into a completely disorganized mess. Now I repeat the same request: Please find all the phone numbers for individuals named Rick Clark in the Greater Chicago area. How do you think you would do that? It would be nearly impossible. The data is all there, but it's completely disorganized. Finding the "Rick Clarks" of Chicago would involve individually examining each page to see if it satisfied the request—a very frustrating undertaking, to say the least.

This example underscores the importance of a database, or more accurately, a Relational Database Management System(RDBMS) Today's RDBMSs are what enable the storage, modification, and retrieval of massive amounts of data.

Flat file databases

When the devices that we know as computers first came into existence, they were primarily used for one thing—computation. Computers became useful entities because they were able to do numeric computation on an unprecedented scale. For example, one of the first computers, ENIAC, was designed (although not used) for the US Army to calculate artillery trajectories, a task made simpler through the use of complex sequences of mathematical calculations. As such, originally, computers were primarily a tool for mathematical and scientific research. Eventually, the use of computers began to penetrate the business market, where the company's data itself became just as important as computational speed. As the importance of this data grew, so the need for data storage and management grew as well, and the concept of a database was born.

The earliest databases were simple to envision. Most were simply large files that were similar in concept to a spreadsheet or comma-separated values (CSV) file. Data was stored as fields. A portion of these databases might look something like the following:

Susan, Bates, 123 State St, Somewhere, VA
Fred, Hartman, 234 Banner Rd, Anywhere, CA
Bill, Frankin, 345 Downtown Rd, Somewhere, MO
Emily, Thompson, 456 Uptown Rd, Somewhere, NY

In this example, the first field is determined by reading left to right until a delimiter, in this case a comma, is reached. This first field refers to the first name of the individual. Similarly, the next field is determined by reading from the first delimiter to the next. That second field refers to the last name of the individual. It continues in this manner until we have five fields—first name, last name, street address, city, and state. Each individual line or record in the file refers to the information for a distinct individual. Because this data is stored in a file, it is often referred to as a flat file database. To retrieve a certain piece of information, programs could be written that would scan through the records for the requested information. In this way, large amounts of data could be stored and retrieved in an orderly, programmatic way.

Limitations of the flat file paradigm

The flat file database system served well for many years. However, as time passed and the demands of businesses to retain more data increased, the flat file paradigm began to show some flaws.

In our previous example, our flat file is quite limited. It contains only five fields, representing five distinct pieces of information. If this flat file database contained the data for a real company, five distinct pieces of information would not even begin to suffice. A complete set of customer data might include addresses, phone numbers, information about what was ordered, when the order was placed, when the order was delivered, and so on. In short, as the need to retain more data increases, the number of fields grows. As the number of fields grows, our flat file database gets wider and wider. We should also consider the amount of data being stored. Our first example had four distinct records; not a very realistic amount for storing customer data. The number of records could actually number in thousands or even millions. Eventually, it is completely plausible that we could have a single flat file that is hundreds of fields wide and millions of records long. We could easily find that the speed with which our original data retrieval programs can retrieve the required data is decreasing at a rapid rate and is insufficient for our needs.

As our data demands increase, we're presented with another problem. If we are storing order information, for example, strictly under the flat file paradigm, we are forced to store a new record each time an order is placed. Consider this example, in which our customer purchases six different items. We store a six-digit invoice number, customer name, and address for the customer's purchase, as follows:

487345, Susan, Bates, 123 State St, Somewhere, VA
584793, Susan, Bates, 123 State St, Somewhere, VA
998347, Susan, Bates, 123 State St, Somewhere, VA
126543, Susan, Bates, 123 State St, Somewhere, VA
487392, Susan, Bates, 123 State St, Somewhere, VA

Using this example, notice how much duplicate data we have stored. The fields are invoice number, first name, last name, street address, city, and state, respectively. The only different piece of information in each record is the invoice number, and yet we have repeatedly stored the last five fields—information that is stored in previous records. We refer to these anomalies as repeating values. Repeating values present two problems from a processing standpoint. First, the duplicate data must be re-read each time by our retrieval programs, creating a performance problem for our retrieval operations. Second, those duplicate characters constitute bytes that must be stored on disk, uselessly increasing our storage requirements. It is clear that the flat file paradigm needs to be revised in order to meet the growing demands of our database.

Normalization

The world of databases changed in the early 1970s due in large part to the work of Dr. Edgar "Ted" Codd. In his paper, A Relational Model of Data for Large Shared Data Banks, Dr. Codd presented a new paradigm—the relational paradigm. The relational paradigm seeks to resolve the issues of repeating values and unconstrained size by implementing a process called normalization. During normalization, we organize our data in such a way that the data and its inter-relationships can be clearly identified. When we design a database, we begin by asking two questions—what data do I have? And, how do the pieces of data relate to each other? In the first step, the data is identified and organized into entities. An entity is any person, place, or thing. An entity also has attributes, or characteristics, that pertain to it. Some example entities are listed in the following diagram:

These entities represent distinct pieces of information: the Employee entity represents information about employees, the Email entity represents information about e-mail addresses, and so on. These entities, and any others we choose to add, make up our data model. We can also look a little closer at the attributes of a particular entity, as shown in the following diagram:

In our example, data (such as First name, Last name, Address, and Branch name) are the attributes of the Employee entity—they describe information about the employee. This is by no means exhaustive. There would most likely be many more attributes for an employee entity. In fact, this is part of the problem that we discussed earlier with the flat file database—data tends to accumulate, making our file wider and wider, if you will. Additionally, if we were to actually collect this data in a flat file, it might look something like the following screen:

At first glance, this structure may appear to be adequate, but, if we examine further, we can identify problems with it. To begin with, we note that there are multiple values stored in the Address and Email Address fields, which can make structuring queries difficult. This is where the process of normalization can assist us. Normalization involves breaking data into different normal forms. These forms are the steps we take to transform non-relational data into relational data. The first normal form (1NF) involves determining a primary key—a value in each occurrence of the data that uniquely identifies it. In the previous example of data, what attribute could be used to uniquely identify each occurrence of data? Perhaps we could use First Name.

However, it seems fairly clear that there could be more than one employee with the name James or Mary, so that will not suffice. If we were to use First Name and Last Name together as our primary key values, we would get closer to uniqueness, but it would still be insufficient for common names such as John Smith. For now, let us say that First Name, Mid Initial, and Last Name, together (as indicated earlier), uniquely identify each occurrence of data and thus comprise our primary key for the employee entity.

The next issue is the problem of repeating groups. Examine the Email Address attribute. It may be required that one or more e-mail addresses be stored for each employee. This presents problems when attempting to query for a particular employee's e-mail address. As each employee can have more than one e-mail address, the Email attribute would have to be scanned rather than simply pattern-matched in order to retrieve a particular piece of data. One way to rectify this would be to break each individual occurrence of the e-mail address into two separate records that demonstrates the removal of repeating groups, as shown in the next example. Thus, James R. Johnson, who has two Email Addresses, now has two rows in the database—one for the first e-mail address and one for the second:

We have eliminated the repeating groups, but we have now introduced other problems. First, we have violated our primary key, as first, middle, and last name no longer uniquely identify each row. Second, we have begun to duplicate our data. First name, middle initial, last name, and address are all repeated simply for the sake of removing repeating groups. Lastly, we now realize that it is possible for our employees to have more than one address, which further complicates the problem. Clearly, the first normal form alone is insufficient. It is necessary to transform the data again—this time into the second normal form (2NF).

The relational approach

The second normal form involves breaking our employee entity into a number of separate entities, each of which can have a unique primary key and no repeating groups. This is displayed again in the following diagram:

Here, we have separated our employee information into separate entities. We've also added entities that represent the branch and division of which each employee is a part. Now our employee entity contains information such as first name, middle initial, and last name, while our Email entity contains the e-mail address information. The other entities operate similarly—each contains information unique to itself.

This may have solved our repeating data problem, but now we simply have five files that have no relation to each other. How do we connect a particular employee to a particular e-mail address? We do this by establishing relationships between the entities; another requirement of the second normal form. A relationship between two entities is formed when they share some common piece of information. How this relationship functions is determined by the business rules that govern our data. Let's say in our model that one, and only one, e-mail address is kept for each employee. We would then say that there is a one-to-one relationship between our employee entity and our Email entity. Generally, such a relationship is denoted visually with a single bar between the two. We could diagram it as follows:

A more realistic relationship, however, would be one where each employee could have more than one e-mail address. This type of relationship is termed a one-to-many relationship. These relationships form the majority of the relationships used in the relational model and are shown in the following diagram. Note the crow's foot connecting to the Email entity, indicating many:

As you might expect, there is another type of relationship, one which, under relational rules, we try to avoid. A many-to-many relationship occurs when multiple occurrences of the data in one entity relate to multiple occurrences in the other. For instance, if we allowed employees to have multiple e-mail addresses, but also allowed multiple employees to share a single e-mail address. In the relational paradigm, we seek to avoid these types of relationships, usually by relating an entity between the two that transforms a many-to-many relationship into two distinct one-to-many relationships. The last step in normalization is generally the transformation into the third normal form (3NF). In the 3NF, we remove any columns that are not dependent on the primary key. These are known as transitive dependencies. To resolve these dependencies, we move the non-dependent columns into another table. There are higher normal forms, such as fourth normal form (4NF) and fifth normal form (5NF), but these forms are less commonly used. Generally, once we have taken our data structure up to the 3NF, our data is considered relational.

When we design the number and types of relationships between our entities, we construct a data model. This data model is the guide for the DBA on how to construct our database objects. The visual representation of a data model is commonly referred to as an entity relationship diagram (ERD). Using the five example entities we listed previously, we can construct a simple entity relationship diagram, as demonstrated in the following example:

From this example model, we can determine that the employee entity is more or less the center of this model. An employee can have one or more e-mail addresses. An employee can also have one or more street addresses. An employee can belong to one or more branches, and a branch can belong to one or more divisions. Even though it is highly simplified, this diagram shows the basic concepts of visually modeling data.

Through the use of relational principles and entity relationship diagrams, a database administrator or data architect can take a list of customer data and organize it into distinct sets of information that relate to one another. As a result, data duplication is greatly reduced and retrieval performance is increased. It is because of this efficient use of storage and processing power that the RDBMS is the predominant method used in storing data today.

Tip

SQL in the real world

Strictly speaking, the Oracle RDBMS is actually an Object Relational Database Management System (ORDBMS ) and has been since Oracle version 8. An ORDBMS refers to the ability of Oracle databases to be manipulated using object-oriented concepts.

Bringing it into the Oracle world

To discuss the relational paradigm, we have used relational terminology, which is designed to be generic and not associated with any particular database product. The subject of this book, however, is using SQL with Oracle databases. It is time to relate the terminology used in the relational paradigm to terms that are likely more familiar:

The preceding diagram shows a comparison table of the different terms used to describe basic database components. Up to this point, we have used the relational term, entity, to describe our person, place, or thing. From this point, we will refer to it by its more commonly known name—the table.

Tables and their structure

If you've ever used a spreadsheet before, then you are familiar with the concept of a table. A table is the primary logical data structure in an Oracle database. We use the term logical because a table has no physical structure in itself—you cannot simply login to a database server, open up a file manager, and find the table within the directories on the server. A table exists as a layer of abstraction from the physical data that allows a user to interface with it in a more natural way. Examine the following diagram; like a spreadsheet, a table consists of columns and rows:

A column identifies any single characteristic of a particular table, such as first name. A column differs from a row in more than its vertical orientation. Each value within a column contains a particular type of data, or data type. For instance, in the preceding example, the column FIRST_NAME denotes that all data within that column will be of the same type and that data type will be consistent with the label FIRST_NAME. Such a column would contain only character string data. For instance, in the FIRST_NAME column, we have data such as Mary and Matthew, but not the number 42532.84. In the date of birth column, or DOB, only date data would be stored. As we will see in the next chapter, in Oracle, string data or text data is not the same thing as date data.

Along the horizontal, we have rows of data. A row of data is any single instance of a particular piece of information. For example, in the first row of the table in our example, we have the following pieces of information:

Tip

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

  First name = "James"
  Middle initial = "R"
  Last name = "Johnson"
  Gender = "M"
  DOB = "01-01-60"

This information comprises the sum total of all the information we have in this table for a single individual, namely, James R. Johnson. The following row, for Mary S. Williams, contains the same types of information, but different values. This construct allows us to store and display data that is orderly in terms of data types, but still flexible enough to store the data for many different individuals. Together, the columns and rows of data form a relational table: the heart of the Oracle database. However, in order to retrieve and manipulate this table data, we need a programming language; for relational databases, that language is SQL.