[futurebasic] Re: Benchmark Challenge #3 - WFC

Message: < previous - next > : Reply : Subscribe : Cleanse
Home   : September 2000 : Group Archive : Group : All Groups

From: lcs@... (Laurent SIEBENMANN)
Date: Tue, 19 Sep 2000 09:44:05 +0200 (MET DST)

Hi bench mark watchers!

Statz sez:

 > If we are doing a benchmark, that's fine. I'll go 
 > with you on that. If we are trying to produce a GUI 
 > app, that's a different issue with literally 
 > thousands of other items thrown on the table. It's 
 > like trying to hoe your vegetable garden with a 
 > forklift and a bowl of Chile con carne. 

Bold words and true! I buy all that.

HOWEVER we still need a self-contained application
 --- one which is essentially all muscle and
no fat. A great benchmark, a Statzmark let us say.

The application is called "WordCensus".

When one drops a TEXT file called "myfile" on the smart
"WordCensus" icon (FB variant), a  TEXT file
"myfile.count" is produced and an alert pops up saying:

 << See word census in "myfile.count" alongside of
"myfile". Census done in 243
miliseconds. Use option key to bypass this alert. 
Mouse now to exit.>>

That is self-explanatory. Right?  If the option key is
down while dropping, then no alert.

Some necessary details.  Assign to "myfile.count" the
same creator as "myfile" has. Except if the creator is
in a set of bad creators like "????", "    ", to be
completed by Bill, in which case one assigns the
creator of a certain freeware or shareware editor
like TexEditPlus having no file-size limit; again let
Bill say which.

No 32K limits. No TE window.  No scrolling. 

But a really useful little utility that minds its own
business.

One can find the KJ bible and the works of Shakespeare
in the Gutenberg public domain libraries.  

        Cheers,

            Laurent S


PS. Go for speed, or  MetroWerks will wipe us out.

PPS. Delighted I am to see Munger demos behaving; 
I've been bitten by that critter before.  I'll bet
though that Munger is not part of the fastest algorithm.

PPPS. Bill, you write:

 > It is partially a sort benchmark, in that the output must be presented in
 > sorted order, but I'd expect most of the measurable time to be spent
 > searching the text.

Dangerous.  For 8000 words are you going to search the
text 8000 times with Munger?  I'ld suggest following up
tedd's hints on binary trees. In a single pass build an
ordered binary tree whose leaves are the distinct words
each with its population count. But any great sort
algorithhm seems a promising starting point.