Client/Server Programming -- Week 5
Introduction
This week we'll introduce databases.
Note that the links from this page are to handouts that will be distributed the night of class. Only print out those handouts if you could not attend class.
Main topics this week:
Assignment 3-1 Due
Databases
Database Tables
JDBC Architecture
Synchronization in Databases
Databases You Can Use
Other Database Topics
Assignment 6-1 Assigned
Next Week
Assignment 3-1 is due today. You must turn in a hard copy and also use the submit script at ~shaffsta/bin/submit345. After today, code may be turned in for 10% up until the next class.
Databases provide a solution to the problem of keeping data around after a program ends. A database system also addresses a number of other needs:
A database system is an extremely complex program to write, which is why most programmers use third-party database systems rather than writing their own.
There are two primary kinds of database systems:
In a relational database, data is stored as tables that are related to one another. Entity-relationship diagrams are used to show the structure of a relational database. There's a lot that goes into designing a relational database to prevent inconsistent data.
In an object oriented database, the structure of your objects in your program becomes the structure of the data in the database. The database simply provides a way to save and restore objects.
There are a lot of different ways to access a database system. In Java, a primary way is using JDBC (Java DataBase Connectivity). JDBC is based on the design of Microsoft's ODBC (Open DataBase Connectivity). Both systems are designed to allow access to different database systems in the same way, so a programmer does not need to worry about the differences between the database systems.
We'll do a quick review of relational database tables. A table in a database represents an entity, a thing. A table can contain many records (rows), each containing the data for one instance of the entity the table represents. Each row can contain multiple fields (columns) that contain the individual data members about that entity.
For example, a table that contained information about students might have this data in it:
StudentID | StudentName |
10337 | Jay Shaffstall |
98333 | Lisa Olinger |
35777 | Jonathan Smythe |
In this simple table, we have two columns: a student id and a student name. We have three rows, each row representing information about one student.
For the purposes of this course, we don't worry too much about the techniques of good database design, since any databases used for assignments will be provided by your instructor and will generally consist of only one or two tables. A real-world database for corporate work can consists of dozens of tables and must be carefully designed...the database class covers more on the techniques for good table design.
JDBC Architecture
JDBC is actually a fairly complicated architecture, whose sole purpose is to enable you to work with every database in the same way. This is complicated, because every database has a different way of doing the same things. All relational databases support the same basic tasks, but each has different methods you call to accomplish those tasks. If you were to write your code for one specific database, and then had to change databases, you would have to very carefully go over your code to make sure you didn't miss any database specific code.
JDBC tries to present you with a single way of doing things, and then translates that into database specific commands. It does this by using JDBC drivers for each database. Every JDBC driver essentially implements the same interface, allowing you to write your Java code the same regardless of the database used.
These drivers fall into one of four types:
For all these types of drivers, you write the same JDBC code to do work on the database.
In your assignments so far, you used the synchronized keyword to prevent race conditions when necessary. In database programming, you'll allow the database to do the synchronization on any data stored in the database. A database system will prevent race conditions by not allowing two updates or an update and a read to a table to happen simultaneously. It does this either by means of table locking or row locking.
Table locking means that when one client is writing to a table, no other client can read or write that table. Multiple clients may read the table at the same time. Another client could be updating a different table at the same time.
Row locking means that when one client is writing to a row, no other client can read or write that row, but other clients may be writing to other rows.
Similar to synchronization, locking at the table level is easier but less efficient than locking at the row level.
MySQL databases use table locking.
The Hypersonic database appears to avoid the need for locking by serializing database requests (essentially the same as synchronizing at the method level). No matter how many threads request database actions, the requests will be processed one at a time. By now, you should recognize that, while this is the safest approach, it is also the least efficient.
While many database systems are intended for corporate use, and are quite expensive, there are also freely available database systems you can use at home.
Perhaps the most well-known free database system is MySQL. MySQL does not provide all the features of the commercial database systems, but it does provide the features used most often. More information about MySQL, including downloads, is available at http://www.mysql.com/.
Postgre SQL seems to be MySQL's main competitor. According to their home page, they're "the worlds most advanced Open Source database software". While Postgre SQL may be more feature rich than MySQL, it does not seem to have quite the popularity.
The database recommended for this course is called Hypersonic. Hypersonic is written entirely in Java, and so has some advantages for use in your assignments. You do not need to run a separate database server, you can instead simply use the Hypersonic classes in your program (this is referred to as "in-process" or "standalone" mode in the Hypersonic documentation). While this would not be useful for an actual database application, it is quite convenient for your assignments if you're developing everything at home.
We'll look at most of these topics in more detail in the next few weeks, but it's worth talking about them a bit today.
Connections
A program uses a database by making a connection to it. This is the same idea as a client making a connection to the server, but in most cases the server is making a connection to the database. Generally a single connection is made for a server, and then all client requests go through that single connection. Most database servers are mulithreaded, allowing multiple requests to be processed at the same time.
Results Sets
When you ask the database for information, that information is returned as part of a "results set". A results set is like an iterator, in that you do not have direct access to the data return, instead you must call methods on the results set to get the next piece of data.
In most cases, the results set does not contain all the data returned, but only a limited buffer of data. As you ask the results set for more data, it will go back to the database as needed to keep its buffer full.
Transactions
A transaction in a database is a sequence of actions that must all either succeed or all fail. A transaction represents a logical task that is performed by several database actions. For example, the logical task "Remove customer from database" might require removing the customer's orders from the order table, removing the customer row from the customer table, and adding a row to the tracking table saying why the customer was removed.
If one step in a transaction fails, we want them all to fail. For example, let's say that we are able to remove the customer's orders from the order table, but then removing the customer row fails. We cannot leave the database like that, since we have not completed the logical task of removing the customer from the database.
A transaction in a database gives us a way of reversing any changes we might have already made to put things back the way they were before we made the changes. This is called a "rollback" of the transaction. So if any one step in the transaction fails, we can rollback any of the steps we already performed to leave the database the way it was.
If all the steps in the transaction succeed, then we can tell the database to go ahead and make the changes permanent. This is called a "commit" of the transaction. Other threads can see the changes only once the transaction has been committed.
By default, JDBC will commit each database request as soon as it is executed. We'll talk in a couple of weeks about how to modify this to use transactions yourself.
Distributed Transactions
A distributed transaction is a logical task that must perform actions on more than one database. The same requirements exist as for normal transactions: either all the actions must succeed or they must all fail. The problem becomes more complicated when multiple databases are involved, however. We'll look at this in more detail in a couple of weeks.
Assignment 6-1 is assigned today. This assignment involves the use of JDBC to read and write data from a database. There will also be additional features required that are not JDBC related; you should focus on those additional features first, and we will finish covering practical JDBC programming next week.
Next week we will cover more JDBC programming.