INTERNATIONAL INFORMATICS COURSE

Introduction to Databases

Table of Contents:

SQL: data definition and data manipulation language

We assume that most people have some notion of "database". We see databases in everyday life - collections of CDs we can order from a company, a phonebook of phone number and name entries, parts stocked by a supplier to be supplied to a project, records to be processed by a program, a general repository that a program acts upon (like a cgi-bin program acting on a web client's behalf to read and write data to disk).

With a bit more precision, when we use the term database, we mean a logically coherent collection of related data with inherent meaning, built for a certain application, and representing a "mini-world" [Slide1]. A database management system (DBMS) is software that allows databases to be defined, constructed, and manipulated.

What is a database?

A database is a collection of data which is organized in a manner that can be easily retrieved. They consist of fields, records, and files, much like a telephone book. A field is a single piece of information (your last name is one field); a record is one complete set of fields (your entry in the telephone book is a record); and a file is a collection of records (the entire telephone book). When you look at the phone book for someone's telephone number, you are the "search engine" for the telephone book database. The only difference is that the search engine included with database programs is probably a little faster than you are at retrieving information [Slide2].

What is a Relational Database?

A relational database is a single database spread across multiple tables. Think of a database as a file cabinet and each drawer of the file cabinet is a table. [Slide3]

When Should I Store Data?

Databases are used to store large quantities of information. The information can be viewed, sorted, manipulated, retrieved, and printed in various ways. The database gives you the flexibility to obtain this data in multiple formats. If the information you need to store is vast, interrelated, and you need to retrieve it fast and accurately.

Terms and Definitions:

Field: The smallest piece of data in a table.

Cell: Where one row and one column intersect on a table.

Record: A collection of all fields relating to one person, place, or thing (i.e., Cindy Ireland's mailing address) occupying a single row of a table.

Table: A collection of records (i.e., all WSU employee's mailing address information) with one record occupying a single row with one or many fields per row. A database can consist of multiple tables.

Query: Asks specific questions about the data in the database.

Forms: A method of entering, displaying, and searching data in other than the spreadsheet-like environment of tables and queries (a graphical user interface).

Reports: Summarizes and formats data in the "look" that you want for either table or query data.

Object: Components that make up a database (i.e., tables, forms, queries, reports, macros, and modules).

Design your database

There are specific steps you should follow when designing your database:

Determine the purpose of your database. This will help you decide what data you want your database to store.
Determine the tables you will need. Once you have a clear purpose for your database, you can divide your information into separate subjects, such as "Employee" or "Orders". Each subject will be a table in your database. Remember your naming conventions from the last unit.
Determine the fields that you will need in your tables. Decide what specific pieces of information you want to store in each table. Each category of information in table is called a field and is displayed as a column in the table. For example, one field in an Employee table should be Last Name; another could be First Name [Slide4].
Determine the relationships. Look at each table and decide how the data in one table is related to the data in other tables. Add fields to tables or create new tables to clarify the relationships, as necessary. [Slide5]
Refine your design. Analyze your design for errors. Create the tables and add a few records of sample data [Slide6]. See if you can get the results you want from your tables. Make adjustments to the design as needed. [Slide7]

Complete these steps using a piece of paper and a pencil or a whiteboard - anything erasable. You will probably revise these many times before you have a final design. Experiment with sample data and models of your forms and reports. Make sure the database design stores the data in the manner you need, retrieves the data correctly, and gives you the output (printed document) that you require. It is much more difficult to make changes to the tables, forms, and reports after real data is entered. Double-check your design to make sure it contains all the data you will require before entering the real data.

Common Design Problems

There are several common pitfalls to keep in mind as you design your database. These problems can cause your data to be harder to use, maintain, and retrieve. The following are signs that you should reevaluate your database design: [Slide8]

You have one table with a large number of fields that don't all relate to the same subject. For example, one table might contain fields pertaining to your customers as well as fields that contain sales information. Each table should contain data about only one subject.
You have fields that are intentionally left blank in many records because they aren't applicable to those records. This usually means that the fields belong in another table.
You have a large number of tables, many of which contain the same fields. For example, you have separate tables for January sales and February sales, or for local customers and remote customer, in which you store the same type of information. Try consolidating all the information pertaining to a single subject in one table. You may also need to add an extra field, for example, to identify the sales date.

Defining relationships

Primary Key

Relationship between tables is a vital part of what makes work so well. This one point makes a difference between having a bunch of data and having a relational database that helps you to be more efficient and effective at your job.[Slide9]

The difficulties of this approach are fairly obvious. Suppose one advisor works with 20 students. As information on each student is entered, the information for the advisor must also be entered. Entering the exact same information is tedious work to say the least. Remember: You should only enter any one piece of data once. So, instead of entering the same advisor information for each of the 20 students, let's separate the data into two different tables with a relationship established. [Slide10]

AdvisorTable
Last_Name
First_Name
Building/Rm
Phone
Email

StudentTable
Last_Name
First_Name
Major
Phone
Email

So, how do we identify which students belong to which advisor? We would use a Primary Key. A Primary Key is a unique record identifier in each table. In our Advisor/Student database example above, there is no field that can be used as a Primary Key. Last name for either table is not a good choice - we could have duplicates for different individuals. First name in either table is not a possibility for the same reason - duplicates are possible. Address, phone, and email are not good choices because there can be duplicates but more importantly, they can change. Any field that can change is not a good choice for the Primary Key. The Primary Key should stay the same from the very beginning.

A standard, traditional database practice is to assign an ID number to each record. For example, Access 97 provides us with AutoNumber. This is an integer that is incremented with each new record; the first record has an ID of 1, the next has an ID of 2, etc. Our ID field in each table becomes the Primary Key, the unique identifier for each record [Slide11]. This number stays the same regardless of how it is sorted or queried or printed. It is a static field. [Slide12]

Relationships

Now we need to look at how these two tables are related, how exactly advisors are related to students. Each student has one advisor but each advisor can have many students. This is called One-To-Many Relationship. [Slide13]

The basic rule for establishing a One-To-Many Relationship is: make the table with the many point to the table with the one. In our example, Student is the table with many (remember, one advisor has many students) so we want to add a field to the Student table that points to the Advisor table. The new field we create needs to point to the Primary Key (ID) of the Advisor table. The Advisor ID field in the Student table, since it refers to the Primary Key of another table, is known as a Foreign Key. The Foreign Key must be the same field type (Number) as the Primary Key.

Creating a Table from Scratch

In order to gain much more control over the development of a database table, you can create it from scratch using Design View in Access 97. In Design View, you enter your own field names, descriptions, and choose your own data type to associate with each field. You also set your own Primary Key.

Before creating a table entirely from scratch, you should write down your basic table structure on paper, focusing particularly on field names and data types. Access fields can have one of the following data types as is shown in Table1.[Slide14]

While it will not be necessary for you to know all of these data types it is a good idea to have a general knowledge of the possibilities in defining the data fields.

Relational database model

There are three typical implementation models of databases: hierarchical, network, and relational. Each is based on the notion of data stored as a set of records (imagine a set of file cards, for example). Hierarchical (e.g., IMS) and network (e.g., IDMS) models are based on traversing data links to process a database; they are typically used for large mainframe systems and are not considered further here. [Slide15]

We focus on relational database management systems (RDBMSs). They have become popular, perhaps largely due to their simple data model:[Slide16]

Data is presented as a collection of relations
Each relation is depicted as a table
Columns are attributes
Rows ("tuples") represent entities
Every table has a set of attributes that taken together as a "key" (technically, a "superkey") uniquely identifies each entity [Slide17]

For example, a hospital might have an Employee table with a row for each employee. What attributes might be interesting? This, of course, depends on the application and use the data will be put to, and is determined at database design time. In our example, we might have the last name, the first name, the birth year and the sex.

The notion of view can be useful. Imagine that a hospital maintains a database of its employees -- there might be a lot of attributes like age, salary, emergency contacts, appraisal, etc. There may be needs to look at the database for different applications serving different users. The hospital may need to make available demographic data, for example, to a governmental agency. Only some of the attributes need be supplied - and others ought not to so as to protect privacy. Different views can be provided into the same data; in a RDBMS, a view can be seen as yet another table.[Slide18]

ER Data Modeling

Just a few words about design. How do you go about designing a database? It is useful to build a high level conceptual data model where we depict the entities that we are dealing with, their various attributes, and their relationships. An entity is some object with a real or conceptual existence in the world -- "tofu", "Advanced Java Class", "Folger Museum", "Elaine", "company", for example. An attribute is a property of an entity -- "address", "size", "mother", "age", for example [Slide19]. As mentioned above, a relational column is an attribute. A relationship defines roles in which entities work together -- "Bill – has - HIV.” RDBMSs represent relationships as tables. A side note for those already familiar with normalizing databases - ER design has been shown to give relations in third normal form. Also, ER diagrams can be mapped not just to RDBMS, but also to the network and hierarchical models. [Slide20]

It is relatively straightforward to represent a database design in graphical ER Diagrams, where rectangles represent entity types, diamonds relationship types, and ovals attributes. Underlined attribute names represent keys.

SQL: Data definition and data manipulation language

SQL is both a Data Definition Language (DDL) and a Data Manipulation Language (DML). As a DDL, it allows a database administrator or database designer to define tables, create views, etc. As a DML, it allows an end user to retrieve information from tables. It came from an IBM Research project entitled "SEQUEL" where the intent was to create a structured English-like query language to interface to the early System R database system. Along with QUEL, SQL was the first high level declarative database language.

In this section, we will just give a few examples of SQL syntax to help suggest some familiarity with the style. For further reference, any number of books can be consulted. Also, SQL is widely used, and a quick search on the web came up with an excellent syntax reference.

Creating and Updating a Database

In this example that follows, we create a table and insert two records. Note that attributes are positional and are specified in the same order in Create Table, unless a specific ordered attribute list is specified in the Insert Into statement (non-specified values are null).

Create Table Patient

  (Name   char(20)  not null,

   Age   char(3)  not null,

   Address  varchar(20),

   Clinical_status   char(50)

);

Insert Into Patient

  Values ("Patrick_O’Carroll", "32", "NA", "neutropenia");

Insert Into Patient (AP, PCV, Leucocytes)

  Values ("110", "20", "45,000");

Update Doses_of_Dopamine

  Set Doses_of_Dopamine = Initial * 1.2

  Where Evaluation > .02;

As you can see, SQL statements look a bit like English. The Delete statement (with a Where clause to specify conditions) removes selected tuples from a table.

What Is A Query?

A query is the question you ask, such as "Who are the students that Bryan Brown advises?", not the results (i.e., Jerry Turner). You may have several queries created as they may ask different specific questions. Once you have created a query, you can run it as often as needed because the results will change each time the table data is changed. [Slide21]

Querying from a Database

The Select (no relation to the relational algebra operation) statement is probably the most widely used SQL statement, and it is used to retrieve data from a database. It has many options, and we will again just give a few examples.[Slide22],[Slide23]

The most basic Select statement on, say, a table called Doctor, is

Select *

  From Doctor;

This just returns all tuples in the Doctor table. We can be more selective and ask for, say, just the attributes Specialist, Years_of_Practice , and Age:

Select Specialist, Years_of_Practice, Age

  From Doctor;

This essentially applies the select and project relational operators to the table.

We can also apply conditions to be more selective. Maybe we want to look at our inventory of surgeons’ doctors with at least 50 years-old and see which ones (identified by their years_of_practice) have 50 years-old, as well as their hospital location location:

Select Years_of_Practice, Age, Hospital

  From Doctor

  Where Specialist = "Surgeon" and Age >= 50;

We can even retrieve from multiple tables. For each surgeon, let's look at its age, hospital, Last_name, and date_of_contract. We assume we have a table Department which has Specialty as key and Date_of_Birth and Name as some attributes. To illustrate a point, let's assume that both tables have Name as an attribute; the value in the inventory on-hand Bike table is a vendor-supplied name, while the value of Name in the Manufacturer table is the name of the manufacturer.

Select Age, Hospital, Last_Name, Date_of_Contract

  From Doctor, Department

  Where Specialty = "Surgeon" And

        Doctor.Age = Department.Specialty;

Let's look at the SQL for the join example we illustrated above. It is fairly straightforward:

Select *

  From Doctor Table2, Specialty Table

  Where Hospital = "Asia";

If we want to look at the distinct salaries we are paying to people in department A32, we can use the Distinct keyword:

Select Distinct Salary

  From Doctor

  Where Hospital = "Asia";

It's hard to believe all the theory that we very lightly touched on above lies beneath such straightforward declarative syntax.[Slide24]

Go to Lecture Topics

Go to Introduction

Go to Case study

Go to Readings

This project is a joint effort of the University of Washington School of Public Health and Community Medicine
and the United States Centers for Disease Control. It is an approved APEC project.

Revised:

="12" src="../Patrick/Modulos%20Patrick/texto81.gif" alt="*">