Binary Trees


Introduction Deletion
Vocabulary Tree Traversal
Searching Applications
Insertion  


INTRODUCTION

One of the major drawbacks to using linked lists is the amount of time it takes to search a long list. In order to find the node containing the value 10 in the linked list below, the entire list must be traversed. This causes severe problems when the list contains many nodes. Organizing the linked list into a binary search tree solves many of these problems. A binary search tree provides a structure that retains the flexibility of a linked list, while allowing quicker access to any node in the list.


The binary search tree gets its tree structure by allowing each node to point to two other nodes: one that precedes it in the list, and one that follows it. These nodes may be any nodes in the list, as long as they satisfy the basic rule: the node to the left contains a value smaller than the node pointing to it, and the node to the right contains a larger value.

The figure below shows a binary search tree that could have been created from the nodes in the preceding figure. For any given node, the nodes to the left contain smaller values and the nodes to the right contain larger values. The first node in the tree is pointed to by an external pointer, called the
root of the tree.



To search for a value, such as 10, in the binary search tree, the root is examined. The value in the root is smaller than 10, so we know by the basic rule that the node being searched for is located somewhere to the right. The value in the node immediately to the right is compared to 10. It is smaller, so the search moves to the right. The process continues until it arrives at the node containing 10. By following the path from the root to the node containing 10, the search required only four comparisons. Searching for the same value in the linked list would have required ten comparisons.

Duplicate nodes in trees can be handled in a variety of ways, depending on the application. In some applications duplicates are not essential, so they can be ignored. In other situations duplicated nodes are noted by special flag fields or counter fields within each node. Other applications require that duplicate nodes are included in the tree, either to the right or left of the original node. In this discussion assume that all values are unique.



VOCABULARY

The following figure illustrates the relationship among the nodes in a binary tree. 

Each binary tree has a unique first element called the root. The node to the left is called the left child, the node to the right is called the right child. Any node pointing to other nodes is called the parent of those nodes. Any node may have 0, 1, or 2 children. A node with no children is called a leaf. Two nodes are siblings if they have the same parent. A node is an ancestor of another node if it is the parent of the node, or the parent of some other ancestor of that node. The root is an ancestor of every other node in the tree. A node is a descendant of another node if it is the child of the node or the child of some other descendant of that node. All nodes are descendants of the root. Descendants to the left of a node comprise its left subtree, whose root is the left child of the node. Descendants to the right of a node comprise the right subtree, whose root is the right child of the node.

The
level of a node refers to its distance from the root. The root is level 0, the next level is level 1, etc. The maximum number of nodes at any level N is 2N.

The tree will be accessed through the external pointer ROOT. Nodes are accessed through their pointers. The nodes in the following examples will contain three fields:

 


   '---------------------------------------------------------------------------------------------    
   ' doubleLinkNode class
   '---------------------------------------------------------------------------------------------    

   ' instance variables
   Private leftNode As doubleLinkNode
   Private info As Variant
   Private rightNode As doubleLinkNode

   ' class methods
   '---------------------------------------------------------------------------------------------    
   ' Constructor
   '---------------------------------------------------------------------------------------------    
   Private Sub Class_Initialize( )
      Set leftNode = Nothing
      Set rightNode = Nothing
   End Sub

   '---------------------------------------------------------------------------------------------    
   ' Set node value
   '---------------------------------------------------------------------------------------------    
   Public Sub setInfo(ByVal newValue As Variant)
      info = newValue
   End Sub

   '---------------------------------------------------------------------------------------------    
   ' Return node value
   '---------------------------------------------------------------------------------------------    
   Public Function getInfo( ) As Variant
      getInfo = info
   End Function

   '---------------------------------------------------------------------------------------------    
   ' Reset rightNode reference
   '---------------------------------------------------------------------------------------------    
   Public Sub setRightNode(ByRef followingNode As doubleLinkNode)
      Set rightNode = followingNode
   End Sub

   '---------------------------------------------------------------------------------------------    
   ' Reset leftNode reference
   '---------------------------------------------------------------------------------------------    
   Public Sub setLeftNode(ByRef previousNode As doubleLinkNode)
      Set leftNode = previousNode
   End Sub

   '---------------------------------------------------------------------------------------------    
   ' Return rightNode reference
   '---------------------------------------------------------------------------------------------    
   Public Function getRightNode( ) As doubleLinkNode
      Set getRightNode = rightNode
   End Function

   '---------------------------------------------------------------------------------------------    
   ' Return leftNode reference     
   '---------------------------------------------------------------------------------------------    
   Public Function getLeftNode( ) As doubleLinkNode
      Set getLeftNode = leftNode
   End Function


SEARCHING

Executing a binary search on a tree involves moving a pointer to the left or right until the desired value is found. The following search routine returns a pointer to the node containing a given value known to be in the tree. It uses an external pointer P to search the tree.

P is first set to the root. Then the INFO field is compared to the value being searched for. If the INFO field is equal to the value, the desired node has been found and the routine is exited, returning the current value of P. If the INFO field is greater than the value, P is set to P.LEFTNODE, otherwise P is set to P.RIGHTNODE. The comparisons continue until the correct node is found. The algorithm (in pseudocode) is:

P = ROOT
WHILE P. INFO <> VAL 
     IF P. INFO > VAL THEN
          P  = P. LEFTNODE
     ELSE 
             P = P.RIGHTNODE
     END IF
WEND

The maximum number of comparisons in a binary search on a tree equals the level of the lowest node in the tree plus 1. For the tree in the figure above, the maximum number of comparisons needed to find any node in the tree is four. 

For the same information ordered in a linear linked list, the maximum number of comparisons equals the number of nodes in the list, and on the average half of the nodes must be searched.  In the worst case -- searching for the last node in a linear linked list -- a linked list containing 1000 nodes requires 1000 comparisons. If the nodes were arranged in a binary tree, and the tree was evenly balanced (more on balancing later), a maximum of 11 comparisons would be required.


   '---------------------------------------------------------------------------------------------    
   ' If an item is found in the tree, returns a ptr to the node
   '---------------------------------------------------------------------------------------------    

   Public Function searchTree(ByVal keyValue As Variant) As doubleLinkNode
      Dim p As doubleLinkNode
      Dim valueInTree As Boolean

      Set p = root
      valueInTree = False

      While Not p Is Nothing And Not valueInTree
         If p.getInfo( ) = keyValue Then
            valueInTree = True
         ElseIf p.getInfo( ) > keyValue Then
            Set p = p.getLeftNode( )
         Else
            Set p = p.getRightNode( )
         End If
      Wend
      If Not p Is Nothing Then
         Set searchTree = p
      Else
         Set searchTree = Nothing
      End If
   End Function

INSERTION


To create and maintain a binary tree, it is necessary to have a routine that will insert new nodes into the tree. A new node will always be inserted into its appropriate position in the tree as a leaf. The linked figure shows a series of insertions into a binary tree.


Given the root of a binary tree and a value to be added to the tree, there are several tasks to be performed:

  1. Create a node for the new value.
  2. Search for the insertion place.
  3. Fix pointers to insert the new node.

Steps 2 and 3 can be combined. The complete process can be outlined as follows:

  1. Create a new node.
  1. Allocate space for the node.
  2. Set the INFO field equal to VAL.
  3. Set the left and right pointers to Nothing.
  1. Insert new node.
    1. If root is Nothing, set root to newnode and stop, otherwise examine nodes beginning with the root.
    2. With the current node
      1. If VAL is less than TREENODE.INFO, then move left.
        1. If TREENODE.LEFT is Nothing then insert NEWNODE and stop, otherwise move left and repeat step B.
      2. 2. If VAL is greater than TREENODE.INFO, then move right.
        1. If TREENODE.RIGHT is Nothing, then insert NEWNODE and stop, otherwise move right and repeat step B.

In the following code segment, assume the following declarations:

    Private root As doubleLinkNode


   '---------------------------------------------------------------------------------------------    
   ' I. Create a new node.
   '     A. Set the INFO field equal to VAL.
   '     B. Set the left and right pointers to Nothing.
   ' II. Insert new node.
   '     A. If root is nothing, set root to newnode and stop, otherwise
   '          examine nodes beginning with the root.
   '     B. With the current node
   '         1. If newValue is less than TREENODE.INFO. then move left.
   '             a. If TREENODE.LEFT = Nothing then insert NEWNODE and
   '                 stop, otherwise move left and repeat step B.
   '         2. If newValue is greater than TREENODE.INFO, then move right.
   '             a. If TREENODE.RIGHT = Nothing then insert NEWNODE and stop,
   '                 otherwise move right and repeat step B.
   '---------------------------------------------------------------------------------------------    
   Public Sub insertNode(ByVal newValue As Variant)

      Dim newNode As doubleLinkNode
      Dim treeNode As doubleLinkNode
      Dim inserted As Boolean

      ' Create and initialize a new node.
      Set newNode = New doubleLinkNode
      Call newNode.setInfo(newValue)
      Call newNode.setLeftNode(Nothing)
      Call newNode.setRightNode(Nothing)

      If root Is Nothing Then
         Set root = newNode
      Else ' search root ancestors
         inserted = False
         Set treeNode = root
         While Not inserted                                       ' move down tree
            If newValue < treeNode.getInfo( ) Then 'move left
               If treeNode.getLeftNode( ) Is Nothing Then
                  Call treeNode.setLeftNode(newNode)
                  inserted = True
               Else
                  Set treeNode = treeNode.getLeftNode( )
               End If ' move left
            Else ' move right
               If treeNode.getRightNode( ) Is Nothing Then
                  Call treeNode.setRightNode(newNode)
                  inserted = True
               Else
                  Set treeNode = treeNode.getRightNode( )
               End If ' move right
            End If
         Wend 'move down tree
      End If ' search root ancestors
   End Sub

The order in which the nodes are inserted determines the shape of the tree. The following figures illustrate how the same data, inserted in different orders, will produce differently-shaped trees. 

     - or -

Since the height of the tree determines the maximum number of comparisons in a binary search, the number of levels in a tree is important. Minimizing the number of levels in the tree will maximize search efficiency.


DELETION

Deletion can be performed on isolated nodes or on entire subtrees. This discussion will focus on deletion of nodes. This operation varies depending on the position of the node in the tree. It is simpler to delete a leaf than it is to delete the root of the tree. 

The deletion algorithm consists of three cases, depending on the number of children linked to the node to be deleted.

  1. Deleting a leaf -- deleting a leaf is simply a matter of setting the appropriate link of its parent to Nothing.



  2. Deleting a node with one child -- This is not as simple as deleting a leaf, because we do not want to delete all of the target node's descendants. The pointer from the parent must skip over the target node and point to that node's child. The target node is then given the old heave-ho.




  3. Deleting a node with two children -- This case is extremely complicated because the parent of the deleted node cannot point to both children. One solution is to replace the INFO field of the target node with the the INFO field of the node whose INFO field is closest in value to the target node. This INFO field can come from either the left or right subtree. In our examples, the INFO field will come from the node with the closest value from the left subtree. This value will be the value in the tree which immediately precedes the target node.

    To find the immediate predecessor, we move once to the left from the target node and then as far as possible to the right. If the left child has no right child then the left child is used as the replacement value. The INFO field in the node to be deleted is then replaced with the contents of the INFO field from the replacement node. Then the replacement node (which has 0 or 1 child) is deleted by changing one of its parent's pointers.





See this figure for additional examples.

In order to locate and delete the target node, a routine similar to DELETE is used.

 


   '---------------------------------------------------------------------------------------------    
   ' The node with the value findValue will be found and deleted
   ' from the binary tree. Assumes the node is in the tree.
   '---------------------------------------------------------------------------------------------    
   Public Sub delete(ByRef findValue As Variant)
      Dim back As doubleLinkNode, ptr As doubleLinkNode

      ' Search tree for node containing findValue
      Set ptr = root
      Set back = Nothing
      While ptr.getInfo( ) <> findValue
         Set back = ptr
         If ptr.getInfo( ) > findValue Then
            Set ptr = ptr.getLeftNode( )
         Else
            Set ptr = ptr.getRightNode( )
         End If
      Wend
      
      Call deleteNode(ptr, back)

   End Sub ' delete

 

Notice that DELETE calls DELETENODE to perform the actual deletion. It passes as a parameter the pointer to the node within the tree, ptr, as well as a pointer to the parent node, back. Since the node to be deleted has been located, the algorithm must determine which of the three cases it satisfies. 

==========

The first case, in which the node to be deleted is a leaf, can be detected by the statement If ptr.getLeftNode( ) Is Nothing And ptr.getRightNode( ) Is Nothing.  Then, if back is nothing then there are no other nodes in the tree and the root is set to Nothing.  Note that ptr points to the node to be deleted, and back points to its predecessor.

        If back Is Nothing Then ' it is the only node in the tree
            Set root = Nothing

Otherwise, whichever pointer field of the back node that currently points to the same node as ptr must be set to Nothing.

         Else ' delete the leaf
            If back.getRightNode Is ptr Then
               Call back.setRightNode(Nothing)
            Else  ' back.getLeftNode is a ptr
               Call back.setLeftNode(Nothing)
            End If


==========

In the second case the node to be deleted has 1 child. 

If the condition Not ptr.getRightNode( ) Is Nothing is true, then the node to be deleted has a right child

Then, if back is nothing then the node to be deleted is the root and the root pointer must be reset.  

            If back Is Nothing Then
               Set root = ptr.getRightNode( ) ' delete root

In the case of a non-root node, the algorithm must then determine if the node to be selected is a right node or a left node.  

If it is a right node (If back.getRightNode( ) Is ptr), then the right pointer of the back node must be set to the ptr node's right pointer (because it has a right child), thereby bypassing the ptr node.

            ElseIf back.getRightNode( ) Is ptr Then ' delete nonroot node
               Call back.setRightNode(ptr.getRightNode( ))

Otherwise it is a left node, and the left pointer of the back node must be set to the ptr node's right pointer (because it has a right child), thereby bypassing the ptr node.
            Else
               Call back.setLeftNode(ptr.getRightNode( ))   

------

The Else condition in this case (has one child) is that the node to be deleted has a left child

Then, if back is nothing then the node to be deleted is the root and the root pointer must be reset.  

            If back Is Nothing Then
               Set root = ptr.getLeftNode( ) ' delete root

In the case of a non-root node, the algorithm must then determine if the node to be selected is a right node or a left node.  

If it is a right node (If back.getRightNode( ) Is ptr), then the right pointer of the back node must be set to the ptr node's left pointer (because it has a left child), thereby bypassing the ptr node.

            ElseIf back.getRightNode( ) Is ptr Then ' delete nonroot node
               Call back.setRightNode(ptr.getLeftNode( ))

Otherwise it is a left node, and the left pointer of the back node must be set to the ptr node's left pointer (because it has a left child), thereby bypassing the ptr node.
            Else
               Call back.setLeftNode(ptr.getLeftNode( ))   

 

==========

In the third case, the node has two children. It can be detected by the statement If Not ptr.getLeftNode( ) Is Nothing And Not ptr.getRightNode( ) Is Nothing

Deleting a node with two children involves searching the tree for the key value that is closest to the key value of the node to be deleted (immediately before or immediately after).  The ptr node will not actually be deleted in this case.  Instead, its contents will be replaced by the contents of the node with the closest key value, and then the node whose value was moved will be deleted.  The algorithm guarantees that the node that is ultimately deleted (the one containing the replacement value) will have at most one child, so its deletion is not overly complex.

This algorithm will use the value immediately preceding the value to be deleted. Therefore the  left subtree will be searched in order to find the replacement value.  This value will be located in one of two places. If the node to the left of Ptr has no right child, then this node contains the replacement value. 

Otherwise, the replacement value is found in the rightmost descendant of the node to the left of Ptr.

 

Locate the node containing the replacement value.

         ' find the node containing the closest value that is less than the
         ' value being deleted
         Set back = ptr
         Set temp = ptr.getLeftNode( )
         While Not temp.getRightNode( ) Is Nothing
            Set back = temp
            Set temp = temp.getRightNode( )
         Wend

When the node containing the replacement value is found the values are copied into the info field of ptr:

         ' copy replacement value into ptr info field
         ptr.setInfo (temp.getInfo( ))

... and then the node from which the replacement value was extracted is deleted.

         ' delete the node from the tree
         If back Is ptr Then
            Call back.setLeftNode(temp.getLeftNode( ))
         Else
            Call back.setRightNode(temp.getLeftNode( ))
         End If

   '---------------------------------------------------------------------------------------------    
   ' Removes a node from a binary tree
   '---------------------------------------------------------------------------------------------    
   Private Sub deleteNode(ByRef ptr As doubleLinkNode, _
                                            ByRef back As    doubleLinkNode)
      Dim temp As doubleLinkNode

      ' case of no children
      If ptr.getLeftNode( ) Is Nothing And ptr.getRightNode( ) Is Nothing Then
         If back Is Nothing Then ' it is the only node in the tree
            Set root = Nothing
         Else ' delete the leaf
            If back.getRightNode Is ptr Then
               Call back.setRightNode(Nothing)
            Else
               Call back.setLeftNode(Nothing)
            End If
         End If
      ' case of deleting node with two children
      ElseIf Not ptr.getLeftNode( ) Is Nothing And _
                 Not ptr.getRightNode( ) Is Nothing Then
         ' find the node containing the closest value that is less than the
         ' value being deleted
         Set back = ptr
         Set temp = ptr.getLeftNode( )
         While Not temp.getRightNode( ) Is Nothing
            Set back = temp
            Set temp = temp.getRightNode( )
         Wend

         ' copy replacement value into ptr info field
         ptr.setInfo (temp.getInfo( ))

         ' delete the node from the tree
         If back Is ptr Then
            Call back.setLeftNode(temp.getLeftNode( ))
         Else
            Call back.setRightNode(temp.getLeftNode( ))
         End If

      Else ' node has only one child
         ' reset one of the pointer fields of back according to whether
         ' the node being deleted has a right or left child
         If Not ptr.getRightNode( ) Is Nothing Then ' there is a right child
            If back Is Nothing Then
               Set root = ptr.getRightNode( ) ' delete root
            ElseIf back.getRightNode( ) Is ptr Then ' delete nonroot node
               Call back.setRightNode(ptr.getRightNode( ))
            Else
               Call back.setLeftNode(ptr.getRightNode( ))   
            End If
          Else ' there is a left child
            If back Is Nothing Then
               Set root = ptr.getLeftNode( ) ' delete root
            ElseIf back.getRightNode( ) Is ptr Then ' delete nonroot node
               Call back.setRightNode(ptr.getLeftNode( ))
            Else
               Call back.setLeftNode(ptr.getLeftNode( ))
            End If
         End If
       End If
   End Sub

 



PRINTING THE TREE (TREE TRAVERSAL)

Printing each of the nodes in a tree involves traversing the tree, or operating on every node in the tree. Traversals were simple with linked lists, but when attempting to print out all of the data stored in a binary tree, an algorithm cannot proceed linearly from one end to another. Rather, from any particular node, it may have to move left for some data and then right for more data. We must keep track of what has been printed at a node and on its left and right, and the code can become quite involved.

The algorithm to print the information in order involves three steps:

  1. Print the INFO fields to the left of the node.
  2. Print the INFO field at the given node.
  3. Print the INFO fields to the right of the node.

One way of keeping track of which nodes have been printed is to use a stack. A recursive solution can also be used, in which case VB will keep track of all of this information automatically.


   '---------------------------------------------------------------------------------------------    
   ' Prints the binary tree in order from smallest
   ' to largest. This is a recursive procedure.
   '---------------------------------------------------------------------------------------------    
   Public Sub inOrderPrint(ByRef p As doubleLinkNode)
      ' Base case: if P is Nothing then do nothing.
      If Not p Is Nothing Then ' general case
         ' Traverse the left subtree to print the smaller values.
         Call inOrderPrint(p.getLeftNode)

         ' Print the value of current node.
         Debug.Print (p.getInfo())

         ' Traverse the right subtree to print the larger values.
         Call inOrderPrint(p.getRightNode)
      End If ' general case

   End Sub ' inOrder


This will be invoked initially by the statement Call inOrderPrint(root)


PREORDER AND POSTORDER TRAVERSALS

Sometimes a tree must be printed in different orders. A preorder traversal of a binary tree:

  1. visits the root

  2. visits the left subtree preorder

  3. visits the right subtree preorder

The results of a preorder traversal are shown above

The preorder print procedure can be written recursively by changing the order of the statements in the previous routine.


   '---------------------------------------------------------------------------------------------    
   ' Prints the binary tree in preorder.
   ' This is a recursive sub.
   '---------------------------------------------------------------------------------------------    
   Private Sub preOrderPrint(ByRef p As doubleLinkNode)
      ' Base case: if P is Nothing then do nothing.
      If Not p Is Nothing Then ' general case
         ' Print the value of current node.
         Debug.Print (p.getInfo( ))

         ' Traverse the left subtree.
         Call preOrderPrint(p.getLeftNode( ))

         ' Traverse the right subtree.
         Call preOrderPrint(p.getRightNode( ))

      End If ' general case
   End Sub ' preorder


A postorder traversal of a binary tree

  1. traverses the left subtree postorder
  2. traverses the right subtree postorder
  3. visits the root

The results of a preorder traversal are shown above

A procedure to print out the elements in a binary tree in postorder follows. It also rearranges the order of the three cases in the general case to change the order of printing.


   '---------------------------------------------------------------------------------------------    
   ' Prints the binary tree in postorder.
   ' This is a recursive sub.
   '---------------------------------------------------------------------------------------------    
   Private Sub postOrderPrint(ByRef p As doubleLinkNode)
      ' Base case: if P is Nothing then do nothing.
      If Not p Is Nothing Then ' general case
         ' Traverse the left subtree.
         Call postOrderPrint(p.getLeftNode( ))

         ' Traverse the right subtree.
         Call postOrderPrint(p.getRightNode( ))

         ' Print the value of current node.
         Debug.Print (p.getInfo( ))

      End If ' general case
   End Sub ' postorder

 

Note: You may need a method to return a pointer to the tree root:


   '---------------------------------------------------------------------------------------------    
   ' Returns a pointer to the root of the tree.  The return type is 
   ' doubleLinkNode.
   '---------------------------------------------------------------------------------------------    
   Public Function returnRoot( ) As doubleLinkNode
        Set returnRoot = root
   End Function

 

Note II: The declaration of Root belongs in clsTree.



APPLICATIONS OF BINARY TREES

Considerable use is made of tree data structures in representing the structure of computer programs, written in languages such as VB, and in the actual writing of compilers. Trees offer a convenient structure for recording syntactical information about a program and then using this information in the translation of the program to machine language.