In 1976 Peter Chen invented entity-relationship (ER) modeling as a database design technique. The original diagrams used a box for an entity, a diamond for a relationship, and lines to connect them. The simplicity of the diagrams used in this method have made it the most popular database design technique in use today. The original method was very minimal, so other people have added other details and symbols to the basic diagram.
There are several problems with ER modeling:
1. ER does not spend much time on attributes. The names of the columns in a table are usually just shown inside the entity box, without datatypes. Some products will indicate which column(s) are the primary keys of the table. Even fewer will use another notation on the column names to show the foreign keys. I feel that people should spend more time actually designing data elements, as you can see from the number of chapters in this book devoted to data.
2. Although there can be more than one normalized schema from a single set of constraints, entities, and relationships, ER tools generate only one diagram. Once you have begun a diagram, you are committed to one schema design.
3. The diagram generated by ER tools tends to be a planar graph. That means that there are no crossed lines required to connect the boxes and lines. The fact that a graph has crossed lines does not make it nonplanar; it might be rearranged to avoid the crossed lines without changes to the connections.
A planar graph can also be subject to another graph theory result called the "fourcolor map theorem," which says that you only need four colors to color a planar map so that no two regions with a common border have the same color.
4. ER diagrams cannot express certain constraints or relationships. For example, in the versions that use only straight lines between entities for relationships, you cannot easily express an n-ary relationship (n > 2). Furthermore, you cannot show constraint among the attributes within a table. For example, you cannot show the rule that "An employee must be at least 18 years of age" with a constraint of the form CHECK ((hiredate - birthdate) >= INTERVAL 18 YEARS). As an example of the possibility of different schemas for the same problem, consider a database of horse racing information. Horses are clearly physical objects, and we need information about them if we are going to calculate a betting system. This modeling decision could lead to a table that looks like this:
CREATE TABLE Horses (
horsename CHAR(30) NOT NULL,
track CHAR(30) NOT NULL,
race INTEGER NOT NULL CHECK (race > 0),
racedate DATE NOT NULL,
position INTEGER NOT NULL CHECK (position > 0),
finish CHAR(10) NOT NULL
CHECK (finish IN ('win', 'place', 'show', 'ran', 'scratch')),
PRIMARY KEY (horsename, track, race, racedate));
The track column is the name of the track where the race was held, racedate is when it was held, race is the number of each race, position is the starting position of the horse, and finish is how well the animal did in the race. Finish is an attribute of the entity "horses" in this model. If you do not bet on horse races ("play the ponies"), "win" means first place; "place" is first or second place; "show" is first, second, or third place; "ran" is having been in the race, but not in first, second, or third place; and "scratch" means the horse was removed from the race in which it was scheduled to run. In this model, the finish attribute should have the highest value obtained by the horse in each row of the table.
Now look at the same reality from the viewpoint of the bookie who has to pay out and collect wagers. The most important thing in his model is the outcome of races, and detailed information on individual horses is of little interest. He might model the same reality with a table like this:
CREATE TABLE races (
track CHAR(30) NOT NULL,
racedate DATE NOT NULL,
race INTEGER NOT NULL CHECK (race > 0),
win CHAR(30) NOT NULL REFERENCES Horses(horsename),
place CHAR(30) NOT NULL REFERENCES Horses(horsename),
show CHAR(30) NOT NULL REFERENCES Horses(horsename),
PRIMARY KEY (track, date, race));
The columns have the same meaning as they did in the Horses table, but now there are three columns with the names of the horse that won, placed, or showed for that race ("finished in the money"). Horses are values of attributes of the entity "races" in this model.
No comments:
Post a Comment
Thanks for your comment.