[futurebasic] [FB^3] Floating point speed

Message: < previous - next > : Reply : Subscribe : Cleanse
Home   : September 1999 : Group Archive : Group : All Groups

From: Robert Purves <robert.purves@...>
Date: Thu, 30 Sep 1999 22:10:33 +1200
As an antidote to list traffic about teething problems in FB^3, consider
the simple benchmark below. It is modified from "Examples:Neat Apps:FB II
vs FB^3 Example" on the FB^3 CD. The results give cause for celebration
(champagne and cigars for Staz'n'Andy), as well as food for thought.

Robert Purves


'--------Simple floating point benchmark--------
register off
DIM &&,x#,y#,z#,t#
register on
dim i&,t&
x# = 12345678.90123456789
y# = 123: z# = .01: t# = 100
t& = FN TICKCOUNT
FOR i&=1 TO 10000000: x#=x#+y#*z#-y#/t#: NEXT
PRINT FN TICKCOUNT-t&;" ticks, x#=";x#
'-----------------------------------------------

Results from iMac (233 MHz G3):

                 Time to nearest 10 ticks   MFLOPS
                 -------------------------  ------
   FB2                  11870                 0.2
   FB^3 68K (1)          3920                 0.6
   FB^3 PPC (2)          3870                 0.6
   PPC ASM  (2)           710                 3.4
   FB^3 PPC (3)           230                10.4
   FB^3 PPC (4)           230                10.4
   PPC ASM  (3)           100                24
   PPC ASM  (4)           100                24

 1. Unaffected by alignment of variables
 2. Variables aligned on 2-byte boundary but not 4
 3. Variables aligned on 4-byte boundary but not 8
 4. Variables aligned on 8-byte boundary

Explanation to Note (2)
Misalignment can be forced by the following DIM statement:
DIM &&,silly%,x#,y#,z#,t#

Explanation to Notes (3) and (4)
DIM &&,x#,y#,z#,t# sometimes fails to produce 8-byte alignment, through an
anomaly reported to Staz. On some processors, though not the G3, this could
slow the performance.

Assembly stuff, for bold explorers:

'-------------PPC ASM equivalent------
//FOR i&=1 TO 10000000: x#=x#+y#*z#-y#/t#: NEXT
countFP&=10000000
` lwz r3,^countFP&    ; r3=10000000
` lfd f1,^x#
` lfd f2,^y#
` lfd f3,^z#
` lfd f4,^y#
` lfd f5,^t#
`loopFP
` fdiv f0,f4,f5       ; f0=f4/f5
` addic. r3,r3,$FFFF  ; r3=r3-1 (subic. r3,r3,1)
` fmadd f1,f2,f3,f1   ; f1=f2*f3+f1
` fsub f1,f1,f0       ; f1=f1-f0
` stfd f1,^x#         ; save f1->x#
` bc 4,2,loopFP       ; bne loopFP
'--------------------------------------