This book focuses on representing data in MySQL. The containers of tables in MySQL, and other products are the databases. It is quite possible to have just one table in a database and thus avoid fully applying the relational model concept in which tables are related to each other through common values; however we will use the model in its normal way: having many tables and creating relations between them.
Note
This section describes an example of data crammed into one huge table, also called a too wide table because it is formed with too many columns. This too wide table is fundamentally non-relational.
Sometimes the data structure needs to be reviewed or evaluated, as it might be based on poor decisions in terms of data naming conventions, key choosing, and the number of tables. Probably the most common problem is that the whole data is put into one big, wide table.
The reason for this common structure (or lack of structure) is that many developers think in terms of the results or even of the printed results. Maybe they know how to build a spreadsheet and try to apply spreadsheet principles to databases. Let's assume that the main goal of building a database is to produce this sales report, which shows how many cars were sold in each month, by each salesperson, describing the brand name, the car model number, and the name.
Salesperson |
Period |
Brand Name |
Car model number |
Car model name and year |
Quantity sold |
---|---|---|---|---|---|
Murray, Dan |
2006-01 |
Fontax |
1A8 |
Mitsou 2007 |
3 |
Murray, Dan |
2006-01 |
Fontax |
2X12 |
Wanderer 2006 |
7 |
Murray, Dan |
2006-02 |
Fontax |
1A8 |
Mitsou 2007 |
4 |
Smith, Peter |
2006-01 |
Fontax |
1A8 |
Mitsou 2007 |
1 |
Smith, Peter |
2006-01 |
Licorne |
LKC |
Gazelle 2007 |
1 |
Smith, Peter |
2006-02 |
Licorne |
LKC |
Gazelle 2007 |
6 |
Without thinking much about the implications of this structure, we could build just one table, sales:
salesperson |
brand |
model_number |
model_name_year |
qty_2006_01 |
qty_2006_02 |
---|---|---|---|---|---|
Murray, Dan |
Fontax |
1A8 |
Mitsou 2007 |
3 |
4 |
Murray, Dan |
Fontax |
2X12 |
Wanderer 2006 |
7 | |
Smith, Peter |
Fontax |
1A8 |
Mitsou 2007 |
1 | |
Smith, Peter |
Licorne |
LKC |
Gazelle 2007 |
1 |
6 |
At first sight, we have tabularized all the information that is needed for the report.
Note
The book's examples can be reproduced using the mysql
command-line utility, or phpMyAdmin, a more intuitive web interface. You can refer to Mastering phpMyAdmin 2.8 for Effective MySQL Management book from Packt Publishing (ISBN 1-904811-60-6). In phpMyAdmin, the exact commands may be typed in using the SQL Query Window, or we can benefit from the menus and graphical dialogs. Both ways will be shown throughout the book.
Here is the statement we would use to create the sales
table with the mysql
command-line utility:
CREATE TABLE sales ( salesperson char(40) NOT NULL, brand char(40) NOT NULL, model_number char(40) NOT NULL, model_name_year char(40) NOT NULL, qty_2006_01 int(11) NOT NULL, qty_2006_02 int(11) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=latin1;
In the previous statement, while char(40)
means a column with 40 characters, int(11)
means an integer with a display width of 11 in MySQL.
Using the phpMyAdmin web interface instead, we would obtain:
Here we have entered sample data into our sales
table:
INSERT INTO sales VALUES ('Murray, Dan', 'Fontax', '1A8', 'Mitsou 2007', 3, 4); INSERT INTO sales VALUES ('Murray, Dan', 'Fontax', '2X12', 'Wanderer 2006', 7, 0); INSERT INTO sales VALUES ('Smith, Peter', 'Licorne', 'LKC', 'Gazelle 2007', 1, 6); INSERT INTO sales VALUES ('Smith, Peter', 'Fontax', '1A8', 'Mitsou 2007', 1, 0);
However this structure has many maintenance problems. For instance, where do we store the figures for March 2006? To discover some of the other problems, let's examine sample SQL statements we could use on this table to query about specific questions, followed by the results of those statements:
/* displays the maximum number of cars of a single model sold by each vendor in January 2006 */ SELECT salesperson, max(qty_2006_01) FROM sales GROUP BY salesperson
/* finds the average number of cars sold by our sales force taken as a whole, in February 2006 */ SELECT avg(qty_2006_02) FROM sales WHERE qty_2006_02 > 0
/* finds for which model more than three cars were sold in January */ SELECT model_name_year, SUM(qty_2006_01) FROM sales GROUP BY model_name_year HAVING SUM(qty_2006_01) > 3
We notice that, although we got the answers we were looking for, with the above SQL queries, we would have to modify column names in the queries to obtain results for other months. Also, it becomes tricky if we want to know the month for which the sales have surpassed the yearly average, because we have to potentially deal with twelve column names. Another problem would arise when attempting to report for different years, or to compare a year with another one.
Moreover, a situation that could demonstrate the poor state of this structure is the need for a new report. A structure that is based too closely on a single report instead of being based on the intrinsic relations between data elements does not scale well and fails to accommodate future needs.
Chapter 4 will unfold those problems.