Algorithm Analysis -- Week 3
Introduction
This week we will look at various sorts of trees and the algorithms for working with them.
Note that the links from this page are to handouts that will be distributed the night of class. Only print out those handouts if you could not attend class.
Main topics this week:
Trees
Binary Tree
Tree Exercise
Binary Search Tree
Binary Search Tree Exercise
AVL Tree
B-Tree
Heapsort
Next Week
Trees are a data structure. Like all other data structures, that means a tree is a collection of other things, called nodes.
Characteristics of a tree:
One node is a root node.
Every node except root node has one parent node
A node can have any number of children.
Terminology:
leaf node (also known as terminal node)
sibling nodes
height of a tree -- start counting at 1
depth (or level) of a node
degree of a node
degree of a tree
subtree
complete tree -- all leaves at level n or n -1, depth stays the same or decreases left to right
full tree -- all leaves at level n
skewed tree
A binary tree is a tree, with an additional restriction:
Each node can have at most 2 children.
A full binary tree is a binary tree of depth k, having 2k-1 nodes, where k >= 1.
A complete binary tree is a binary tree where, if you number the nodes from 1 to n, starting at the root and moving from left to right at each level, the numbers correspond to the same numbers in a full binary tree.
Like any data structure, we need a way of iterating over all the nodes in a tree. However, unlike linked lists or vectors, there are multiple ways of iterating over all the nodes in a tree.
Iterating over all the nodes in a tree is commonly known as traversing the tree. The most common traversal methods in a tree are:
Preorder traversal
Inorder traversal
Postorder traversal
Breadth-first traversal (also known as level order traversal) -- algorithm uses a queue
Regardless of how we traverse the tree, in order to find a single node we still must do a search of all the nodes, at a cost of O(N).
I will pass out a tree exercise, and you will need to answer questions based on the trees shown. This exercise is worth 10 points.
Since finding any single element in a binary tree is O(n), we need a way to make searching faster (remember that O(n) is the same as a linear search through an array, so trees would be no better than an array). As usual, to make searching more efficient we introduce the concept of sorting data to come up with a binary search tree.
A binary search tree is a binary tree with additional restrictions:
The value of all left hand children of a node is always less than the value of the node itself.
The value of all right hand children of a node is always greater than the value of the node itself.
The claim is that these additional restrictions speed up searching.
Consider searching a linked list.
Now consider searching the same data in a binary search tree.
What's the big Oh rating of searching a binary search tree?
But what about searching the same data in a binary search tree that's laid out just a bit differently? What's the big Oh rating of searching this binary search tree?
In order to keep searches efficient in a binary search tree, you must keep the tree 'balanced'. In general, keeping a tree balanced means keeping roughly the same number of nodes in each half of the tree. Chapter 13 in your book talks in detail about balancing algorithms for binary search trees (AVL trees and Red-Black trees are two examples). For example, the AVL tree keeps itself balanced by requiring that the height of any two subtrees either be equal or differ by 1. To do this the AVL tree may have to rearrange nodes when a new node is inserted.
All of the algorithms we'll discuss for binary search trees do not assume the tree is balanced.
Binary search tree operations:
insert (data, root)
remove (data, root)
Insert algorithm (insertions in a non-balanced binary search tree will always create a new leaf node):
void insert (node where, data)
{
if (where is null)
{
where = new node;
where.data = data;
}
else if (data < where.data)
insert (where.left, data);
else if (data > where.data)
insert (where.right, data);
}
Remove algorithm: the remove algorithm has three basic cases which need to be handled separately:
1) Node to be removed has no children
2) Node to be removed has one child
3) Node to be removed has two children
Here's a handout showing the completed remove algorithm.
I will pass out an exercise in which you will have to show the results of inserts and removes on various binary search trees. This exercise is worth 10 points.
We showed that a binary search tree could, in the worst case, end up with a O(n) search even though it's sorted. A solution to that problem is to keep the tree 'balanced', with roughly the same number of nodes in each subtree. One way of balancing a tree is to use an AVL tree (other balanced binary search trees include red-black trees).
Searching an AVL tree is exactly the same as searching a binary search tree. The difference comes when you insert and remove items. The AVL tree must move nodes around to ensure that the tree remains balanced. So insertion and removal of nodes becomes less efficient, but searching avoids the O(n) worst case.
Why call then AVL trees? They were created by two Russian mathematicians whose last names were Adel'son-Velskii and Landis.
We've said that an AVL tree is balanced. What does this mean? If a tree is balanced, then the height of every node's left subtree is either equal to the height of the node's right subtree, or they differ by only 1. If a node's left and right subtrees meet this condition, we say that the node has the AVL property. A tree is balanced only if every node has the AVL property.
(Show some examples)
As we insert and remove items from the tree, we will probably violate the AVL property of one or more nodes.
(Show some examples)
To avoid this, we have to move nodes around so that every node still has the AVL property. The exact algorithms used to maintain balance in the tree aren't important, but you should understand that additional work is being done on inserts and removes.
A balanced binary search tree ensures that we can search in O(lg n) time. A B-Tree is a special kind of tree that provides faster searching than a binary search tree. In a B-Tree, a node can contain multiple values. If a node contains x values, then it should have x + 1 children.
(Show example)
This ensures that data can be found faster than in a binary search tree, because fewer nodes must be searched. The order of complexity is still roughly O(lg n), but for any set of data, a B-Tree will find an element faster than a balanced binary search tree.
Programming a B-Tree is more complex than a binary search tree, however, so they are most often used when the amount of data is extremely large (as in a database). Also, since a B-Tree reduces the number of node comparisons, we can use it with disk based searches (where each node must be loaded in from a disk file, because we have too many nodes to fit into memory).
Heapsort is a type of sort that uses trees. Heapsort, in general, uses less memory than mergesort or quicksort. Heapsort uses a data structure called a heap.
A heap is a complete binary tree, with the following property:
The value of a parent node is greater than or equal to the values stored in its children
Since the root is the parent of everything, this means that the root has the largest value in the heap.
A heapsort has two stages. The first stage involves putting the items to be sorted into a heap. The second stage involves repeatedly removing the root item from the heap until the heap is empty. Since the root item is always the largest value in the heap, this will result in extracting items in largest to smallest order.
When performing a heap sort, we always start with a complete binary tree. We then convert that complete binary tree to a heap. If the complete binary tree has a height of h, then we start looking at those subtrees whose roots are at depth h - 1.
(Show example)
We convert those subtrees to heaps by comparing the children with the root. If the root is smaller than one or both of the children, then we swap the root with the largest child. This leaves us with a subtree which is a heap. We then move one level up and perform the same process over again.
Eventually we process the root, and we have a heap. Now we need to remove items from the heap, starting with the root.
We first swap the root with the rightmost and bottommost leaf node (because that's the only node we can delete and still have a complete binary tree), then delete the rightmost leaf node. Now we need to make sure the heap order is restored, so we compare the root with its children. If the root is smaller than either or both of its children, then we swap the root with the largest child. We then compare the value that was in the root with its new children, and continue the process until it is larger than both children.
(Show example)
This is called an "in-place" sort because it uses no additional memory, unlike mergesort and quicksort, which both use some additional memory.
Next week we will cover graphs and graphing algorithms.