Text Analyzer
Objectives: Use sequential files, text extraction, parallel arrays, sorting, and searching.
Description: This program requires that you maintain parallel arrays--one that stores every word that appears in a text excerpt, and another that stores the frequency with which each word appears.
Deliverable
1 consists of steps 1-4. (Dump the word array to a list box for grading
purposes.)
Deliverable 2 consists of step 5-6.
First read a text excerpt from a file.
Next, extract individual words from the text.
Compare the extracted word to each word in the word array
If it is already in the array, increment the corresponding element in the frequency array.
If it is not found in the array, then add it.
Sort the parallel arrays in alphabetical order, and then print the arrays.
Sort the parallel arrays in ascending order by frequency , and then print the arrays.
Finally, read a list of words from another file, search the concordance for each word, and if it was included in the original file indicate how many times it appeared. If it was not included, it should be listed as having appeared 0 times.
A demo of the Text Analyzer has been posted. Use a text box to display the text being scanned. (Set the MultiLine property to True and the ScrollBars property to Vertical.) Use synchronized list boxes to display the results of the analysis. Be sure to see the Clues page for coding hints. Perform all sorting operations through the use of code rather than through form controls.
Your instructor may present a structure chart (module chart) in class, but an outline version follows:
Go_Click
prepareText (Steps 1 & 2)
read
text
echo text in
text box
remove hyphens from text using
Replace
strip
end-of-line
characters from text using
Replace
split text into an array using
Split
process individual words to remove
punctuation
using
inStr and
Left (or
Mid)
process individual words to remove
leading and trailing spaces
using
Trim,
LTrim, or RTrim
convert all words to lower-case using
LCase
createWordList using
Insertion
Sort (Step 3 & 4)
find insertion point
update
word list
update
frequency list
increment
numWords
printAlphaList (Step 4)
searchforWords (Step 6)
read Search list
binary
search (recursive)
print in list box
sortListByFrequency
(Step 5)
sort word list (selection
sort)
sort frequency list
synchronize indexes
printFreqList
(Step 5)
synchronizeListScrolling
synchronize alpha list
synchronize freq list
synchronize search list
The lines above which are the farthest indented do not necessarily represent entire modules, but may instead represent the low-level tasks that make up a module. Click on the links above for references in the notes that you may find helpful.
Why do steps 5 and 6 as listed in the requirements above appear reversed in the module outline? It had something to do with the requirements of the faster search algorithms.
Students in Parker's CIS-220 class must follow the modularization as detailed above unless alternative approaches are approved on an individual basis. Use the methods noted above, such as insertion sort, and make your interface appear as much like the demo interface as possible.
Important Note:
As you can see form the module list above, this is a complex system. However, most modules are fairly simple or build on code provided in class notes. When developing your code, the difficulty will be greatly reduced if you use the approach known as incremental development. In incremental development you design and code one module, such as prepareText, test it, and when it works properly set it aside. Design and code the second module, such as createWordList, test it, and when it is working properly integrate it with the first module. When they work properly together, set the combined module aside and begin work on module 3, printAlphaList. Follow this procedure until all modules have been coded, tested, and integrated and the program is complete. This enables you to view the problem as a series of subsystems, all of which are easily managed. Proper planning and management will greatly reduce the apparent complexity of this problem.
Sample Interface
