Computer Science II

Computer Science II -- Week 9

Introduction

Note that the links from this page are to handouts that will be distributed the night of class. Only print out those handouts if you could not attend class.

Main topics this week:

Trees
Binary Tree
Binary Search Tree
2-3 Trees
B-Trees
Associative Arrays

Trees

Trees are a data structure. Like all other data structures, that means a tree is a collection of other things, called nodes.

Characteristics of a tree:

One node is a root node.

Every node except root node has one parent node

A node can have any number of children.

What is a node that has no children called?

Terminology:

height of a tree -- start counting at 1

depth of a node

degree of a node

degree of a tree

full tree

complete tree

Binary Tree

A binary tree is a tree, with an additional restriction:

Each node can have at most 2 children.

Like any data structure, we need a way of iterating over all the nodes in a tree. However, unlike linked lists or vectors, there are multiple ways of iterating over all the nodes in a tree.

Iterating over all the nodes in a tree is commonly known as traversing the tree. The most common traversal methods in a tree are:

Preorder traversal

Inorder traversal

Postorder traversal

Breadth-first traversal

Regardless of how we traverse the tree, in order to find a single node we still must do a search of all the nodes, at a cost of O(N).

Binary Search Tree

A binary search tree is a binary tree with additional restrictions:

The value of all left hand children of a node is always less than the value of the node itself.

The value of all right hand children of a node is always greater than to the value of the node itself.

The claim is that these additional restrictions speed up searching.

Consider searching a linked list.

Now consider searching the same data in a binary search tree.

What's the big Oh rating of searching a binary search tree?

But what about searching the same data in a binary search tree that's laid out just a bit differently? What's the big Oh rating of searching this binary search tree?

In order to keep searches efficient in a binary search tree, you must keep the tree 'balanced'. In general, keeping a tree balanced means keeping roughly the same number of nodes in each half of the tree. Chapter 13 in your book talks in detail about balancing algorithms for binary search trees (AVL trees and Red-Black trees are two examples). For example, the AVL tree keeps itself balanced by requiring that the height of any two subtrees either be equal or differ by 1. To do this the AVL tree may have to rearrange nodes when a new node is inserted.

All of the algorithms we'll discuss for binary search trees do not assume the tree is balanced.

Binary search tree operations:

insert (data, root)
remove (data, root)

Insert algorithm:

Remove algorithm: the remove algorithm has three basic cases which need to be handled separately:

1) Node to be removed has no children

2) Node to be removed has one child

3) Node to be removed has two children

Here's a handout showing the completed remove algorithm.

2-3 Trees

A 2-3 tree is similar to a binary search tree, but some of the nodes can have up to 3 children instead of just 2 children. Also, some of the nodes can have 2 data items instead of just 1 data item.

Example of 2-3 tree.

Note how the value of the middle child must be between the values of the 2 data items in the parent.

Rules for 2-3 tree nodes:

If a node has 2 children, then it must contain 1 data item
If a node has 3 children, then it must contain 2 data items
If a node is a leaf node, then it may contain either 1 or 2 data items

A 2-3 tree helps reduce the number of nodes you have to traverse when trying to find any particular node.

B-Trees

So far we've assumed our entire tree structure is in memory. If we have too much data in our tree to store everything in memory, then some of our nodes will have to be put into a file.

The problem with putting nodes into a file is that accessing the files to get the value of a node is extremely slow compared to accessing a node in memory. Even a binary search tree is going to be too slow for file-based nodes.

Show example of how many disk accesses would be required to find an item in a typical binary search tree.

A B-Tree is a disk based tree that allows faster searching than a binary search tree. The B-Tree is an extension of the concept of a 2-3 tree.

The general idea of a B-Tree is to keep the tree as shallow as possible to limit disk accesses. This means that each node may have many children.

Show example of how many disk accesses would be required to find an item in a typical B-Tree.

Keeping track of all the children of a node becomes quite complex, and is efficient only because disk accesses are so much slower than memory accesses.

Associative Arrays

With a normal array or vector you use an integer index to identify a particular item in the array.

int array [50];

array [5] = 10;

This works great if the key, the value you want to look up and item by, is a number.

What if I want to associate a key that is a string, with some data that is a Foo object?

I want to be able to have a data structure where I can give it the key (which is a string) and get back the data (which is a Foo).

Associative arrays allow this by making it possible to use any data type to index into the array. So I could do something like:

Foo aFoo (5);

array ["Jay"] = aFoo;

The C++ standard template library provides an associative array class. The class is called map. To use it, you must include the right header file:

#include <map>

The map class is a template. A map needs to know two data types: the data type of the key, and the data type of the data. So you declare a map like this:

map<key type, data type> nameOfMap;

So my map of strings and Foos would be:

map<string, Foo> aMap;

Now I can use array syntax to put Foo objects into the map.

Foo aFoo (5);

aMap ["Jay"] = aFoo;

Since the index in a map does not start at 0 and increment by 1, we need some way of finding a specific item. The map class has a find method that works like this:

map<string, Foo>::iterator iter = aMap.find ("Jay");

if (iter != aMap.end ())
cout << (*iter) << endl;
else
cout << "Item not found" << endl;

Maps use a balanced binary search tree to store their data, so searching a map is an O(log n).

Your book has examples of using a map on pages 705-707.