[futurebasic] Re: Assembly - Speed or not?

Message: < previous - next > : Reply : Subscribe : Cleanse
Home   : December 1997 : Group Archive : Group : All Groups

From: Rick Brown <rbrown@...>
Date: Mon, 08 Dec 1997 22:07:32 -0600
wilcox wrote:

> Now, I understand inserting assembly in
> an INTERPRETED languages (Applescript, Usertalk, Hypertalk, C64 Basic,
> AppleII Basic, etc) would DEFINANTLY show an increase in speed, but why
> does it make ANY differance at all in COMPILED languages? Is it because
> humans can write tigher assembly code then current computers can, resulting
> in less instructions, and a shorter execution time (I don't know here, just
> guessing :) )?

That's an excellent question, and you've basically hit on the correct
answer.  Compilers try to generate the fastest sequence of machine
instructions they can, but they've got their own set of rules and
algorithms for doing that, and since they can't understand the
underlying thing that your BASIC code is trying to _accomplish_, they
won't necessarily create the most efficient possible code.  _Most_ of
the time they do quite a good job, but they can't always do the _best_
job.  For example, I wanted a simple routine once that would simply
compare the contents of two ranges memory, to see if they were equal. 
When I write it in Basic, it looks something like this:

'-------------------------------------------
LOCAL FN RecordsMatch(ptr1&, ptr2&, numBytes&)
  'Returns _zTrue iff the numBytes& bytes starting at ptr1&
  'exactly match the numBytes& bytes which start at ptr2&.
  DIM theyMatch
  done = _false
  mismatch = _false
  endRange1& = ptr1& + numBytes&
  DO
    SELECT CASE
      CASE ptr1& = endRange1&
        done = _zTrue
      CASE PEEK(ptr1&) = PEEK(ptr2&)
        INC(ptr1&): INC(ptr2&)
      CASE ELSE
        mismatch = _zTrue
        done = _zTrue
    END SELECT
  UNTIL done
END FN = NOT mismatch
'-------------------------------------------

I compiled this and found that FB generated 194 bytes of machine
language instructions from it.  On the other hand, when I write the same
algorithm directly in assembly, it looks like this:

    `     move.l  ^ptr1&,a0
    `     move.l  ^ptr2&,a1
    `     move.l  a1,a2
    `     add.l   ^numBytes&,a2
    `L1   move.b  (a0)+,d0
    `     move.b  (a1)+,d1
    `     cmp.b   d0,d1
    `     bne.s   F
    `     cmpa.l  a1,a2
    `     bne.s   L1
    `     move.l  #zTrue,D0
    `     bra.s   L2
    `F    move.l  #false,D0
    `L2   move.w  D0,^theyMatch

which is a mere 65 bytes of machine language code.  Those numbers don't
in themselves prove that my "custom" assembler routine is faster, but
I'm willing to bet it is.  This by no means indicates that FB is an
inefficient compiler--it just means that compilers are still not quite
as smart as people.

- Rick