|
|||||
|
Introduction to Databases
Table of Contents:
We assume that most people have some notion of "database". We see databases in everyday life - collections of CDs we can order from a company, a phonebook of phone number and name entries, parts stocked by a supplier to be supplied to a project, records to be processed by a program, a general repository that a program acts upon (like a cgi-bin program acting on a web client's behalf to read and write data to disk). With a bit more precision, when we use the term database, we mean a logically coherent collection of related data with inherent meaning, built for a certain application, and representing a "mini-world" [Slide1]. A database management system (DBMS) is software that allows databases to be defined, constructed, and manipulated. What is a database?A database is a collection of data which is organized in a manner that can be easily retrieved. They consist of fields, records, and files, much like a telephone book. A field is a single piece of information (your last name is one field); a record is one complete set of fields (your entry in the telephone book is a record); and a file is a collection of records (the entire telephone book). When you look at the phone book for someone's telephone number, you are the "search engine" for the telephone book database. The only difference is that the search engine included with database programs is probably a little faster than you are at retrieving information [Slide2]. What is a Relational Database?A relational database is a single database spread across multiple tables. Think of a database as a file cabinet and each drawer of the file cabinet is a table. [Slide3] When Should I Store Data?Databases are used to store large quantities of information. The information can be viewed, sorted, manipulated, retrieved, and printed in various ways. The database gives you the flexibility to obtain this data in multiple formats. If the information you need to store is vast, interrelated, and you need to retrieve it fast and accurately. Terms and Definitions:Field: The smallest piece of data in a table.Cell: Where one row and one column intersect on a table.Record: A collection of all fields relating to one person, place, or thing (i.e., Cindy Ireland's mailing address) occupying a single row of a table.Table: A collection of records (i.e., all WSU employee's mailing address information) with one record occupying a single row with one or many fields per row. A database can consist of multiple tables.Query: Asks specific questions about the data in the database.Forms: A method of entering, displaying, and searching data in other than the spreadsheet-like environment of tables and queries (a graphical user interface).Reports: Summarizes and formats data in the "look" that you want for either table or query data.Object: Components that make up a database (i.e., tables, forms, queries, reports, macros, and modules).There are specific steps you should follow when designing your database:
Complete these steps using a piece of paper and a pencil or a whiteboard - anything erasable. You will probably revise these many times before you have a final design. Experiment with sample data and models of your forms and reports. Make sure the database design stores the data in the manner you need, retrieves the data correctly, and gives you the output (printed document) that you require. It is much more difficult to make changes to the tables, forms, and reports after real data is entered. Double-check your design to make sure it contains all the data you will require before entering the real data. Common Design ProblemsThere are several common pitfalls to keep in mind as you design your database. These problems can cause your data to be harder to use, maintain, and retrieve. The following are signs that you should reevaluate your database design: [Slide8]
Primary KeyRelationship between tables is a vital part of what makes work so well. This one point makes a difference between having a bunch of data and having a relational database that helps you to be more efficient and effective at your job.[Slide9] The difficulties of this approach are fairly obvious. Suppose one advisor works with 20 students. As information on each student is entered, the information for the advisor must also be entered. Entering the exact same information is tedious work to say the least. Remember: You should only enter any one piece of data once. So, instead of entering the same advisor information for each of the 20 students, let's separate the data into two different tables with a relationship established. [Slide10]
AdvisorTable
StudentTable So, how do we identify which students belong to which advisor? We would use a Primary Key. A Primary Key is a unique record identifier in each table. In our Advisor/Student database example above, there is no field that can be used as a Primary Key. Last name for either table is not a good choice - we could have duplicates for different individuals. First name in either table is not a possibility for the same reason - duplicates are possible. Address, phone, and email are not good choices because there can be duplicates but more importantly, they can change. Any field that can change is not a good choice for the Primary Key. The Primary Key should stay the same from the very beginning. A standard, traditional database practice is to assign an ID number to each record. For example, Access 97 provides us with AutoNumber. This is an integer that is incremented with each new record; the first record has an ID of 1, the next has an ID of 2, etc. Our ID field in each table becomes the Primary Key, the unique identifier for each record [Slide11]. This number stays the same regardless of how it is sorted or queried or printed. It is a static field. [Slide12] RelationshipsNow we need to look at how these two tables are related, how exactly advisors are related to students. Each student has one advisor but each advisor can have many students. This is called One-To-Many Relationship. [Slide13] The basic rule for establishing a One-To-Many Relationship is: make the table with the many point to the table with the one. In our example, Student is the table with many (remember, one advisor has many students) so we want to add a field to the Student table that points to the Advisor table. The new field we create needs to point to the Primary Key (ID) of the Advisor table. The Advisor ID field in the Student table, since it refers to the Primary Key of another table, is known as a Foreign Key. The Foreign Key must be the same field type (Number) as the Primary Key. Creating a Table from Scratch In order to gain much more control over the development of a database table, you can create it from scratch using Design View in Access 97. In Design View, you enter your own field names, descriptions, and choose your own data type to associate with each field. You also set your own Primary Key. Before creating a table entirely from scratch, you should write down your basic table structure on paper, focusing particularly on field names and data types. Access fields can have one of the following data types as is shown in Table1.[Slide14] While it will not be necessary for you to know all of these data types it is a good idea to have a general knowledge of the possibilities in defining the data fields. There are three typical implementation models of databases: hierarchical, network, and relational. Each is based on the notion of data stored as a set of records (imagine a set of file cards, for example). Hierarchical (e.g., IMS) and network (e.g., IDMS) models are based on traversing data links to process a database; they are typically used for large mainframe systems and are not considered further here. [Slide15] We focus on relational database management systems (RDBMSs). They have become popular, perhaps largely due to their simple data model:[Slide16]
For
example, a hospital might have an Employee table with a row for each
employee. What attributes might be interesting? This, of course, depends on
the application and use the data will be put to, and is determined at
database design time. In our example, we might have the last name, the first
name, the birth year and the sex. ER Data ModelingJust a few words about design. How do you go about designing a database? It is useful to build a high level conceptual data model where we depict the entities that we are dealing with, their various attributes, and their relationships. An entity is some object with a real or conceptual existence in the world -- "tofu", "Advanced Java Class", "Folger Museum", "Elaine", "company", for example. An attribute is a property of an entity -- "address", "size", "mother", "age", for example [Slide19]. As mentioned above, a relational column is an attribute. A relationship defines roles in which entities work together -- "Bill – has - HIV.” RDBMSs represent relationships as tables. A side note for those already familiar with normalizing databases - ER design has been shown to give relations in third normal form. Also, ER diagrams can be mapped not just to RDBMS, but also to the network and hierarchical models. [Slide20] It is relatively straightforward to represent a database design in graphical ER Diagrams, where rectangles represent entity types, diamonds relationship types, and ovals attributes. Underlined attribute names represent keys.
SQL is both a Data Definition Language (DDL) and a Data Manipulation Language (DML). As a DDL, it allows a database administrator or database designer to define tables, create views, etc. As a DML, it allows an end user to retrieve information from tables. It came from an IBM Research project entitled "SEQUEL" where the intent was to create a structured English-like query language to interface to the early System R database system. Along with QUEL, SQL was the first high level declarative database language. In this section, we will just give a few examples of SQL syntax to help suggest some familiarity with the style. For further reference, any number of books can be consulted. Also, SQL is widely used, and a quick search on the web came up with an excellent syntax reference. Creating and Updating a DatabaseIn this example that follows, we create a table and insert two records. Note that attributes are positional and are specified in the same order in Create Table, unless a specific ordered attribute list is specified in the Insert Into statement (non-specified values are null). Create Table Patient
(Name char(20) not null,
Age char(3) not null,
Address varchar(20),
Clinical_status char(50)
);
Insert Into Patient
Values ("Patrick_O’Carroll", "32", "NA", "neutropenia");
Insert Into Patient (AP, PCV, Leucocytes)
Values ("110", "20", "45,000");
Update Doses_of_Dopamine
Set Doses_of_Dopamine = Initial * 1.2
Where Evaluation > .02;
As you can see, SQL statements look a bit like English. The Delete statement (with a Where clause to specify conditions) removes selected tuples from a table. What Is A Query?A query is the question you ask, such as "Who are the students that Bryan Brown advises?", not the results (i.e., Jerry Turner). You may have several queries created as they may ask different specific questions. Once you have created a query, you can run it as often as needed because the results will change each time the table data is changed. [Slide21] Querying from a DatabaseThe Select (no relation to the relational algebra operation) statement is probably the most widely used SQL statement, and it is used to retrieve data from a database. It has many options, and we will again just give a few examples.[Slide22],[Slide23] The most basic Select statement on, say, a table called Doctor, is Select *
From Doctor;
This just returns all tuples in the Doctor table. We can be more selective and ask for, say, just the attributes Specialist, Years_of_Practice , and Age: Select Specialist, Years_of_Practice, Age
From Doctor;
This essentially applies the select and project relational operators to the table. We can also apply conditions to be more selective. Maybe we want to look at our inventory of surgeons’ doctors with at least 50 years-old and see which ones (identified by their years_of_practice) have 50 years-old, as well as their hospital location location: Select Years_of_Practice, Age, Hospital
From Doctor
Where Specialist = "Surgeon" and Age >= 50;
We can even retrieve from multiple tables. For each surgeon, let's look at its age, hospital, Last_name, and date_of_contract. We assume we have a table Department which has Specialty as key and Date_of_Birth and Name as some attributes. To illustrate a point, let's assume that both tables have Name as an attribute; the value in the inventory on-hand Bike table is a vendor-supplied name, while the value of Name in the Manufacturer table is the name of the manufacturer. Select Age, Hospital, Last_Name, Date_of_Contract
From Doctor, Department
Where Specialty = "Surgeon" And
Doctor.Age = Department.Specialty;
Let's look at the SQL for the join example we illustrated above. It is fairly straightforward: Select *
From Doctor Table2, Specialty Table
Where Hospital = "Asia";
If we want to look at the distinct salaries we are paying to people in department A32, we can use the Distinct keyword: Select Distinct Salary
From Doctor
Where Hospital = "Asia";
It's hard to believe all the theory that we very lightly touched on above lies beneath such straightforward declarative syntax.[Slide24] |
||||
This project is a joint effort of the
University of
Washington School of Public Health and Community Medicine |
|||||
Revised: | Contact Us |