Ah. Similar to the problem I've got with storing results from the AMOS XRef app. Each time I come across a Variable, Proc, Label or Function token, I need to:
- Scan the existing results for a match to the name.
- Insert a record for it if it doesn't exist.
- Insert a line number and type ("definition" or "usage") record for it.
Then, after the source scan is completed, output them in sorted order.
The 'linear' scans on the name records would give me the same problem you're getting - lengthy run times.
One solution is to use a Binary Tree structure to store the records in sorted order. They're very quick to search but have one big drawback. If the tree gets too 'unbalanced', the search time approaches that of a 'linear' scan. Worst case scenario is when the records are already in sorted order before they're inserted into the tree. The scan is then 'linear'.
A 'well-balanced' Binary Tree only has a maximum search time equal to its 'depth'. Which is not much gain for small collections of data but makes an enormous difference for large collections. For example: 1,000,000 records can be searched with a maximum of just 20 comparisons per search.
Putting it back-to-front:
2 ^ maximum_search_depth = number_of_recordsThere are ways to produce self-balancing Binary Trees when the records are inserted, but coding one is complex. I've shelved it for the time being to get stuck into the bug-fixing and docs projects.
I've attached a zip of some of the more useful stuff I downloaded when I was researching it. They will also explain some of the terminology I used above.