Tree Balancing


A balanced tree is one that has, as much as possible, an equal number of descendants in each node's left and right subtrees.  Since items are inserted "root first" that means that a tree will be balanced if the item that falls in the "middle" of the list can be inserted first. (If the items are listed from smallest to largest, the "middle" item is the one in the middle of the list. There will be an (approximately) equal number of smaller and larger items in the list.)  

If the middle item is inserted first, it becomes the root of the tree.  It will have as many items in its left subtree as it has in its right subtree.  The "root" of the left subtree should be the "middle" item of all the items that are less than the main root.  That item is inserted next.  As the items that are less than the root are inserted, about half will be placed in the left subtree, and about half will be placed in the right subtree.  Recall the the discussion of recursion stated that "When data structures are used, the recursive case is often in terms of a smaller structure rather than a smaller value. The base case occurs when there are no values left to process in the structure."  This process can most easily be coded as a recursive procedure.    

The process will be accomplished by dumping the contents of the unbalanced tree into an array, accessing the items in the order described above, and recreating the tree by inserting the items individually.

The algorithm for balancing a tree consists of two parts: one iterative and one recursive.  The iterative part, balanceTree, creates the array and invokes the recursive part, rebuildTree, that rebuilds the tree.  The balanceTree method first creates a dynamic array and redimensions it to the number of nodes in the tree, as determined by function countTreeNodes.  It then stores the node values of the tree into the array using an inorder traversal, inOrderCopy.  Therefore they are stored in ascending order, from smallest to largest.  It then resets the root of the tree to Nothing, in effect discarding the original tree, and then calls the recursive routine, rebuildTree, passing it the bounds of the array.

The rebuildTree method checks the array bounds it is passed. If the low and high bounds are the same (base case 1) it simply inserts the corresponding array element into the tree.  If the bounds only differ by one location (base case 2) then it inserts both elements into the tree.  Otherwise, it computes the "middle" element of the subarray, inserts it into the tree, and then makes two recursive calls to itself: one to process the elements less than the middle element, and one to process the elements greater than the middle.


  
'------------------------------------------------------------------------------
   ' This method balances a binary tree by copying the 
   ' nodes inOrder into an array and then recreating the 
   ' tree in a methodical manner so it is balanced.
   '------------------------------------------------------------------------------
   Public Sub balanceTree( )
      Dim nodeArray( ) As Variant
      Dim nodeCount As Integer

      nodeCount = countTreeNodes  (root)
      ReDim nodeArray (nodeCount - 1) ' subtract 1 because base is 0, not 1

      Call inOrderCopy (root, nodeArray)

      Set root = Nothing

      Call rebuildTree (nodeArray, LBound(nodeArray), UBound(nodeArray))

   End Sub

   '------------------------------------------------------------------------------
   ' Recursive method to recreate a tree so it is balanced. 
   ' If the low and high bounds are the same (base case 1) 
   ' it simply inserts the corresponding array element into 
   ' the tree.  If the bounds only differ by one location (base 
   ' case 2) then it inserts both elements into the tree.  
   ' Otherwise, it computes the "middle" element of the 
   ' subarray, inserts it into the tree, and then makes two 
   'recursive calls to itself: one to process the elements 
   ' less than the middle element, and one to process the 
   ' elements greater than the middle.

   '------------------------------------------------------------------------------
   Private Sub rebuildTree(ByRef nodeArray( ) As Variant, _
               ByVal lowIndex As Integer, ByVal highIndex As Integer)
      Dim midIndex As Integer

      If lowIndex = highIndex Then ' base case 1
         Call insertNode(nodeArray(lowIndex))
      ElseIf (lowIndex + 1) = highIndex Then ' base case 2
         Call insertNode(nodeArray(lowIndex))
         Call insertNode(nodeArray(highIndex))
      Else ' general case
         midIndex = (lowIndex + highIndex) / 2
         Call insertNode(nodeArray(midIndex))
         Call rebuildTree(nodeArray, lowIndex, midIndex - 1)
         Call rebuildTree(nodeArray, midIndex + 1, highIndex)
      End If
   End Sub

   '------------------------------------------------------------------------------
   ' Counts the nodes in a binary tree.
   ' This is a recursive sub.
   '------------------------------------------------------------------------------
   Public Function countTreeNodes(ByRef p As doubleLinkNode) _
                           As Integer

      ' Base case: if P is Nothing then do nothing.
      If Not p Is Nothing Then ' general case
         ' count current node.
         countTreeNodes = 1
      
         ' Traverse the left subtree.
         countTreeNodes = countTreeNodes + _
                                          countTreeNodes(p.getLeftNode( ))

         ' Traverse the right subtree.
         countTreeNodes = countTreeNodes + _
                                          countTreeNodes(p.getRightNode( ))
      End If ' general case

   End Function ' countTreeNodes

   '------------------------------------------------------------------------------
   ' This recursive method copies the contents of a binary 
   ' tree into an array in ascending order. Calls insertIntoArray.
   '------------------------------------------------------------------------------
   Public Sub inOrderCopy (ByRef ptr As doubleLinkNode, _
            ByRef nodeArray( ) As Variant)
         If Not ptr Is Nothing Then
            Call inOrderCopy (ptr.getLeftNode( ), nodeArray)
            Call insertIntoArray(nodeArray, ptr.getInfo( ))
            Call inOrderCopy (ptr.getRightNode( ), nodeArray)
         End If
   End Sub

   '------------------------------------------------------------------------------
   ' This method inserts an item into the next available array 
   ' element
   '------------------------------------------------------------------------------
   Private Sub insertIntoArray(ByRef nodeArray( ) As Variant, _
               ByVal insertValue As Variant)
      Static nextAvailableIndex As Integer

      nodeArray(nextAvailableIndex) = insertValue
      nextAvailableIndex = nextAvailableIndex + 1
   End Sub


 

If a tree is balanced, the number of levels, or depth, of a tree with N nodes will be around log2(N).  Although VB6 provides a log10 function but not a log2 function, log2(N) = log10(N) / log10(2).  The code segment below contains two routines: one to determine the number of levels in a tree, and another to determine if the tree is balanced.  The following statement can be used to balance the tree if it is determined that it is not balanced:  If Not tree.optimalLevels(tree.returnRoot, 0) Then Call tree.balanceTree

  


   '------------------------------------------------------------------------------
   ' When called as tree.optimalLevels (root, 0) this will 
   ' return True if the tree has the optimal number of levels
   ' (depth).  Otherwise it will return False.  The second
   ' argument specifies a tolerance of how close the current
   ' must be to balanced.  Zero tolerance indicates that it 
   ' must be perfectly balanced, a tolerance of 1 indicates
   ' that it can be within one level of the optimal number of
   ' levels, etc.
   '------------------------------------------------------------------------------

   Public Function optimalLevels ( ByVal treeNode As doubleLinkNode, _
                                                        ByVal fudgeFactor As Integer ) As Boolean
      Dim count As Integer
      Dim optimalNumber As Integer

      count = countTreeNodes(root)
      optimalNumber = (Log(count) / Log(2)) - 1
      optimalLevels = findDepth(root, 0) <= (optimalNumber + fudgeFactor)

   End Function

   '------------------------------------------------------------------------------
   ' When called as tree.findDepth(root,0), this will compute the
   ' max of the depths of all the leaves in the tree to which root
   ' points. When called recursively, the depth parameter gives
   ' the depth of the node, and the routine returns the max of the
   ' depths of the leaves in the subtree to which node points.
   ' In each recursive call to this routine, depth increases by one.
   '------------------------------------------------------------------------------

   Public Function findDepth(ByVal treeNode As doubleLinkNode, ByRef depth As Integer) As Integer
      Dim leftMax As Integer, rightMax As Integer, tempDepth As Integer
      If treeNode Is Nothing Then
         ' The tree is empty. Return 0.
         findDepth = 0
      ElseIf (treeNode.getLeftNode Is Nothing) And (treeNode.getRightNode Is Nothing) Then
         ' The node is a leaf, so the maximum depth in this
         ' subtree is the depth of this node (the only leaf
         ' that it contains).
         findDepth = depth
      Else
         ' Get the maximum depths for the two subtrees of this
         ' node. Return the larger of the two values, which
         ' represents the maximum in the tree overall.
         leftMax = findDepth(treeNode.getLeftNode, depth + 1)
         rightMax = findDepth(treeNode.getRightNode, depth + 1)
         If leftMax > rightMax Then
            findDepth = leftMax
         Else
            findDepth = rightMax
         End If
      End If
   End Function