News:

Building a 3D Ray Tracer  By stevmjon

Main Menu

PlayBASIC V1.65 (Work In Progress) Gallery

Started by kevin, April 03, 2016, 12:01:41 AM

Previous topic - Next topic

kevin

#15
PlayBASIC V1.65 VM clocks a 15 MIPS (Million Instructions Per Second )  


     While testing the SWAP commands, I did a bit of performance calculation of the current runtime,   but not in milliseconds though, rather  the number of byte code instructions the runtime can chew through per second, which comes in around the 15.6 million instructions per second, on this 11 year old system.     Which for the test code is about 2.2 times quicker than V1.64P


      Now this got me thinking, did you know the 68000 processor found in classic Amiga system, like the A500,  have about a 1Mip through put in hardware.   So the current PlayBASIC runtime is about 14->15 times quicker than that machine code the actual hardware.  

       ::)  ::)  ::)   :P

     



kevin

   PlayBASIC V1.65   - The Stack

          So what's been going on ?  - Well... a fair bit, but again it's mostly all the boring stuff that's hidden deep down inside the VM.    For example the current focus is on how the stack instruction sets are set out and by extension  how functions themselves work.

          One of the older ideas for functions was that there was way to help the parameter passing out when calling it.   So when passing lots of parameters into a function it's not doing as much work to make that happen.   Although it's always a pretty ugly operation anyway, since we're hitting memory regardless.     

          Every time you drive the cost of any core operation down, we win back runtime performance, moreover it should be a bit easier during translation too.    It's just a pain having to revisit old stuff..

kevin

#17
   PlayBASIC V1.65   - User Function Call Bench marks

     So here's today's little bench,  this time we're calling User defined Functions and PSUBS with collection simple passing combo's..     Results are pretty good, as compared to the PB1.64P4  it's about 5.5 times faster.   There's some bais though as we're not strickly comparing apples with apples here (meaning they work in different ways now), but it's the end result that matters.

    Test Code:

 
PlayBASIC Code: [Select]
   MaxTests=10000

Dim Tests#(100)


STARTTIME=Timer()

; ---------------------------------------------------------------------
do
; ---------------------------------------------------------------------

cls

frames++

a=45
b#=123.456
c$="Hello World"

a$=c$
b$=c$

Test=0

CallsPerTest=Maxtests*5

; ----------------------------------------------------------------------------
; ----------------------------------------------------------------------------
; ----------------------------------------------------------------------------
; ----------------------------------------------------------------------------
; ----------------------------------------------------------------------------
print make$("-",64)
print "--[ FunctionCalls ]----------------------------------"
print make$("-",64)

t=timer()
for lp=0 to maxtests
fInput0Params()
fInput0Params()
fInput0Params()
fInput0Params()
fInput0Params()
next
Tests#(Test)=Tests#(Test)+(timer()-t )
print " fInput0Params():"+str$(Tests#(Test)/frames)
Test++


t=timer()
for lp=0 to maxtests
fInput1Params1(A)
fInput1Params1(A)
fInput1Params1(A)
fInput1Params1(A)
fInput1Params1(A)
next
Tests#(Test)=Tests#(Test)+(timer()-t )
print " fInput1Params1(A ):"+str$(Tests#(Test)/frames)
Test++



t=timer()
for lp=0 to maxtests
fInput1Params2(B#)
fInput1Params2(B#)
fInput1Params2(B#)
fInput1Params2(B#)
fInput1Params2(B#)
next
Tests#(Test)=Tests#(Test)+(timer()-t )
print " fInput1Params2(B#):"+str$(Tests#(Test)/frames)
Test++



t=timer()
for lp=0 to maxtests
fInput1Params3(C$)
fInput1Params3(C$)
fInput1Params3(C$)
fInput1Params3(C$)
fInput1Params3(C$)
next
Tests#(Test)=Tests#(Test)+(timer()-t )
print " fInput1Params3(C$):"+str$(Tests#(Test)/frames)
Test++



t=timer()
for lp=0 to maxtests
fInput2Params1(A,B)
fInput2Params1(A,B)
fInput2Params1(A,B)
fInput2Params1(A,B)
fInput2Params1(A,B)
next
Tests#(Test)=Tests#(Test)+(timer()-t )
print " fInput2Params1(A ,B ):"+str$(Tests#(Test)/frames)
Test++


t=timer()
for lp=0 to maxtests
fInput2Params2(A#,B#)
fInput2Params2(A#,B#)
fInput2Params2(A#,B#)
fInput2Params2(A#,B#)
fInput2Params2(A#,B#)
next
Tests#(Test)=Tests#(Test)+(timer()-t )
print " fInput2Params2(A#,B#):"+str$(Tests#(Test)/frames)
Test++


t=timer()
for lp=0 to maxtests
fInput2Params3(A$,B$)
fInput2Params3(A$,B$)
fInput2Params3(A$,B$)
fInput2Params3(A$,B$)
fInput2Params3(A$,B$)
next
Tests#(Test)=Tests#(Test)+(timer()-t )
print " fInput2Params3(A$,B$):"+str$(Tests#(Test)/frames)
Test++




t=timer()
for lp=0 to maxtests
fInput3Params1(A,B,C)
fInput3Params1(A,B,C)
fInput3Params1(A,B,C)
fInput3Params1(A,B,C)
fInput3Params1(A,B,C)
next
Tests#(Test)=Tests#(Test)+(timer()-t )
print " fInput3Params1(A ,B ,C ):"+str$(Tests#(Test)/frames)
Test++



t=timer()
for lp=0 to maxtests
fInput3Params2(A#,B#,C#)
Login required to view complete source code
 
 

kevin


  PlayBASIC V1.65   - User Function Return Methods

           Still working about at function calling opcodes,  it seem the caller method works pretty well so we update the return from function opcodes.    Which seem to work about the same really as V1.64P4, there's some gains in some tests and not in others.   Which is all part of the wonderful would of optimization.     I'm not too concerned about it though, as on balance it'll be quicker since the parameter passing is generally much quicker.   It's be nice to keep shaving cycles out of the Vm loop, but not always practical. 

      One thing I've been thinking about bringing forward is some caching logic to returned temp strings from functions.    This would potentially save at least one copy of the string being made.  Moreover it should avoid the allocation and deallocation of that temp string.   Unfortunately Allocating memory from the OS is NOT a fixed time operation.   If anything it's road defragmentation, which makes allocations slower and slower and slower.  So worth avoiding if possible, just painful :)




kevin


  PlayBASIC V1.65   - User Function / Psub calling & returning

       The FUNCTION calling /returning mech's seem to be working, so I've been working on the PSUB's today.   It just means picking through the old byte code generation and parsing code and replacing stuff.   The updated runtime method is both cleaner at compile and runtime so we can trim out of legacy code here and there which is nice.

       Today's been  slow going really, not a lot of code it's mostly tweaking/testing.  Have picked up a few errors in the new byte code generation. One really odd one where it would opt'd a return value out by writing in into some other variable..  Giving some rather strange results..  It's one of those things that was hidden well enough away as to only occur in perfect combination of code,  as of a lot of test code still worked.   Bound to be a few more of those hidden away. 
       
        There's still a few tidbits left to do with functions / psubs like setting up a new version of the CallFunction  and there's no support for ExitFunction or recursion either either.    Even so, the bound dll calling method doesn't use the same approach as calling internal functions, so that needs to be tweaked with the latest method.     Which will get rid of some work that needs to be performed at runtime.


kevin

#20
PlayBASIC V1.65 - Runtime closing in on native C performance

     Revisited the external function call operations again,  wanted to try alternative opcode layout where the runtime can take as few steps as possible when  calling a external bound functions.   Even though the steps are small, small things still cost you performance, so every time we can shave cycles off of the runtimes overhead, means more VM instructions being executed per second for you.  

    When starting each update,  that  age old fantasy of closing the gap between native machine code execution and runtime execution continues to be firmly at the forefront of ones mind.    We've been shaving the difference down month by month year after year, and are now inside a factor of 10 (on average).    The general thinking for runtimes is there's if you can get down to 10 to 1 ratio, then that's pretty optimal.   This is largely because of the amount of memory accesses runtimes require.

     Having said all that... I'm rather excited to announce today that we're further broken down the execution wall and have achieved a ratio of 6 to 1 in raw function calling performance.    Which is the test that calls executes external functions linked with the runtime.   For some perspective that's over double the V1.64P4 performance for the same operation.

    So it's been a good day.. :)



More Speed

      Tweaked the caller again today in order to remove one memoy access and an addition, and we get even closer with a 4 to 1 ratio for the benchmark..    It is a rather narrow test though, calling more complex functions can't be done so easily, but a lot of general stuff we use all the time will get a nice boost...


   






kevin


PlayBASIC V1.65 –  Call Function  (Calling Functions By Name)


      Have moved onto another legacy part of the runtime that needs to be updated into not only the new instruction set but the new application format.    This time the opcode doesn't really need much tweaking, rather any changes are in  how the data structure is set up.     What I want to do is localize everything.   This means that reading code/data from the current running application is going to cache better.  Previous builds this is all in separate heaps,  which may, or may not be being read/flushed from the cpu cache.  When chunks are flushed it's effectively invisible overhead,  if we build it all into one place, we can get rid of some of that.  Ultimately it's up to your CPU how and when it fetch's data from he memory.

     So far I've got the new structure all set up and are in the process of writing the  application builder code, which is just some code that stores all the apps requirements in one chunk.   Once that's done and running, we can set up CallFunction command blocks, which will got a ways to getting a bunch of legacy code working again.     After that, the same type of thing will need to be done with the Types. 

kevin

#22
    PlayBASIC V1.65 –  FunctionIndex()  FunctionExist()  benching

    So here we have the base function of the dynamic function calling command set are up and running again.   All the code was written yesterday, but ran into a matching issue  so have spent all morning debugging that, what joy !  Which means we've only got the FunctinIndex / FunctionExist functions working at this point.  For those who don't read the manual, they're for querying functions in your program at compile and runtime. This is done generally with a view to calling said function by name or by index.  

   Stuff like functionExist can be used at compile time to determine if some code is included within the current code and then act accordingly, such optimally including/ excluding sections of code.   Which is  method I often use when writing bigger applications.    

   Anyway, the replacement benchmark shows the new data structures and opcode perform about *2 times faster than legacy V1.64P4 build.    I'm just happy it works at this point.     Now, I have to build a dynamic version of CallFunction.    


PlayBASIC Code: [Select]
   Maxtests=10000


; ------------------------------------------------------
do
; ------------------------------------------------------

cls

frames++


print "----[Function INDEX]--------------------------------------"


Name$="Test"

tt=timer()
for lp=0 to Maxtests
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
next
tt1#+=timer()-tt
print tt1#/frames

print result

Name$="LastFunctionInTableWithALongName"

tt=timer()
for lp=0 to Maxtests
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
next
tt2#+=timer()-tt
print tt2#/frames
print result


Name$="aaaaaasssssdsdsd"
tt=timer()
for lp=0 to Maxtests
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
result=FUnctionIndex(Name$)
next
tt3#+=timer()-tt
print tt3#/frames
print result



print ""
print ""


print "----[Function Exist]--------------------------------------"


Name$="Test"

tt=timer()
for lp=0 to Maxtests
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
next
tt10#+=timer()-tt
print tt10#/frames

print result

Name$="LastFunctionInTableWithALongName"

tt=timer()
for lp=0 to Maxtests
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
next
tt11#+=timer()-tt
print tt11#/frames
print result


Name$="aaaaaasssssdsdsd"
tt=timer()
for lp=0 to Maxtests
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
result=FUnctionExist(Name$)
next
tt12#+=timer()-tt
print tt12#/frames
print result


print ""
print ""
print "---[Stats]---------------------------------------------"
f=FPS()
TestCount=6
print F
print str$( (MaxTests*5*TestCount*f)/1000.0/1000)+" Million Searchs Per Second"
print "-------------------------------------------------------"

sync
loop


Function Test()
EndFUnction


Function LastFunctionInTableWithALongName()
EndFUnction









kevin

#23
   PlayBASIC V1.65 - Beta 37  –  Dynamic CallFunction


     Have replaced the CallFunction opcodes on the Vm side.   The replacement allows us to getrid of some runtime calc's and uses the newer simpler function caller structure, so it's quicker than the legacy version.   The param matching now works via the internal param's datatype class, where as it used to try and match the token at runtime.   Which just means we can do things like pass a type handle into a function and it'll compute the pointer. This wasn't possible before as it the incoming data was just tagged as being integer, it didn't know it was Handle.  


    So this kind of thing is now possible
PlayBASIC Code: [Select]
   Type tVector
x#,y#,Z#
EndType

Dim Table(10) as tVector

Table(5) = New tVector
Table(5).X = 1000
Table(5).Y = 2000
Table(5).z = 3000


for lp =0 to 10
CallFunction "ShowVector", Table(lp)
next

Sync
WaitKEY


Function ShowVector(me as tVector)
if int(me)
print Me.X
print Me.Y
print Me.Z
else
print "Not allocated"

endif
EndFunction






     Alll that's left is set up a version of the caller that will wrapped/bound DLL calls, which about 1/2 way through currently..  So we should be able to sign off on all that later on today/tonight...

 
      Edit:    Got this set up and testing while watching the Footy.   So we can tick that off the todo list for now. 

 

kevin

PlayBASIC V1.65 - Beta 39  - Types & Runtime Structures

     In keeping with the new app structure we need to move the types structures into the new container.    Have tweaked the structure some what, mainly it uses a local heap for text fields.   Which just means those strings aren't floating around in the main runtime.   

     So far have got all everything embedded in the new runtime APP structure,  just have to work through the legacy opcodes and hook everything together.    From memory there's only 3 or 4 opcodes anyway.  There might be some runtime speed benefit but I can't imagine much, since ultimately your allocating memory, which is not a uniform speed operation.     

     Anyway, after those replacements the only legacy opcodes remaining are array access and some legacy stack opcodes.  Some of which I'm not too sure we need anymore....


kevin


  PlayBASIC V1.65 - Beta 41   - Array an List Functions

      Have worked my way through the array / list functions replacing some bits and updating others.    Haven't brute force tested them though, but stuff like searching should be quicker.  There's  few functions left over which might end up on the cutting room floor, since they require the old internal array data structures.   GetArray() / SetArray() used the linear array table, which no longer exists.    Could possibly emulate it, but just retrurning the handle of the structure and storing them in array does the same thing.   



 



kevin

PlayBASIC V1.65 - Beta 43   - Type Fields

        It's been a cold and miserable week at home, snowing most days which doesn't make sitting in the cold office much fun.   None the less, I've been getting bits of final part of the update done replacing the type structure accesses opcodes.   These are a collection of about 10 (from memory) opcodes that type accesses resolve down to in the VM.  I've been sitting back look at that section of code for a few days before deciding to update it,  the original code isn't much difference, it's just that types have a lot of secondary support opcodes  which the compiler uses to optimize the output stream.   So by changing them you break a lot of other stuff in the process.   Which is not something I'm all the interested in..

        But...  now you should know,  there's exception coming.. Which is .. If changing it makes it faster, then of course I'm going to try it...   and it has, as the current (incomplete) build  makes reading from a field in a typed array (1D) about 45% faster than what it was.     That's when compared to a none cached type access, and without the write optimization that pre V1.65 versions have. 

      The compile time optimizer doesn't currently understand the new byte code instructions completely and as such can't make all those little short cuts when pulling data from a typed array/list.   It's those little short cuts and can really win back a lot of runtime execution time.       

      I'm hoping that when everything is tweaked and running we should be able get beyond 50%->60% improvement (so it'd be doing the same work in ½ or less time it takes in V1.64P4) .   I've a few tweaks for caching also in mind and  I think a simple cached field should be somewhere  %15 of the full field access.   

kevin

  PlayBASIC V1.65 - Beta 44   - Read Type Fields BenchMarks 3-> 5 times faster

    Updated the compile time optimizer last night so it can understand the type access instructions.    The results are as I'd hoped, showing between a *3 to *5 performance gain over V1.64P4 (without caching enabled).      In raw read terms that's about 24 million reads per second on my test system, compared to about 6 million in the older version.  

    Bellow is the test code and some pics of the results running on the 2 different versions and the same system.    

PlayBASIC Code: [Select]
      Type tCool
a,b,c
EndType

TYPE tVector3
x#,y#,Z#
a,b,c
b1 as byte
b2 as byte
b3 as byte
b4 as byte

w1 as word
w2 as word
w3 as word
w4 as word

iArray(100)
sArray#(100)
fArray$(100)
EndType


Size = 100
Dim v(Size) as tVector3


For lp =0 to Size
v(lp)= new tVector3
v(lp).x = 1
v(lp).y = 1
v(lp).z = 1
next

Maxtests=500


Dim Tests#(1000)



Do
cls

frames++


Test=0

// ----------------------------------------------------------------------
// ---[ FLOAT ]----------------------------------------------------------
// ----------------------------------------------------------------------

t=timer()
For Testlp =0 to Maxtests

For lp =0 to Size
Result# =v(lp).x
Result# =v(lp).y
Result# =v(lp).z
Result# =v(lp).x
Result# =v(lp).y
Result# =v(lp).z
next

next
Test++
Tests#(Test)=Tests#(Test)+(Timer()-t)

print " Reading Float2Flt:"+str$(Tests#(Test)/frames)


t=timer()
For Testlp =0 to Maxtests
For lp =0 to Size
Result =v(lp).x
Result =v(lp).y
Result =v(lp).z
Result =v(lp).x
Result =v(lp).y
Result =v(lp).z
next
next
Test++
Tests#(Test)=Tests#(Test)+(Timer()-t)
print " Reading Float2Int:"+str$(Tests#(Test)/frames)



// ----------------------------------------------------------------------
// ---[ INTEGER ]--------------------------------------------------------
// ----------------------------------------------------------------------


t=timer()
For Testlp =0 to Maxtests
For lp =0 to Size
Result# =v(lp).a
Result# =v(lp).b
Result# =v(lp).c
Result# =v(lp).a
Result# =v(lp).b
Result# =v(lp).c
next
next
Test++
Tests#(Test)=Tests#(Test)+(Timer()-t)
print " Reading Int2Flt:"+str$(Tests#(Test)/frames)


t=timer()
For Testlp =0 to Maxtests
For lp =0 to Size
Result =v(lp).a
Result =v(lp).b
Result =v(lp).c
Result =v(lp).a
Result =v(lp).b
Result =v(lp).c
next
next

Test++
Tests#(Test)=Tests#(Test)+(Timer()-t)
print " Reading Int2Int:"+str$(Tests#(Test)/frames)






// ----------------------------------------------------------------------
// ---[ BYTE ]-----------------------------------------------------------
// ----------------------------------------------------------------------

t=timer()
For Testlp =0 to Maxtests
For lp =0 to Size
Result# =v(lp).b1
Result# =v(lp).b2
Result# =v(lp).b3
Result# =v(lp).b1
Result# =v(lp).b2
Result# =v(lp).b3
next
next
Test++
Login required to view complete source code





kevin

#28
   PlayBASIC V1.65 – On It's way to beta testing..

        Yes... the day has finally arrived when I've crossed the final big opcodes block off the runtime replacement to do list.   Without the list in front of me, I'd say it's about 99% complete as of this morning.  The only things missing now are little bits of stuff that need hooking up/replacing between how it used to work and how it works today.   So there's still places where the compiler hasn't been updated to output the new instruction set, and as such would output some old instruction set,  resulting some operation that won't work at runtime.    They're generally easy fixes, but require some detective to track them down.

        The performance of the runtime is pretty good, it's generally much quicker than V1.64P4 and there's still some fat on the bone that can be trimmed off.  The coolest thing about that though,  is that I'm yet to get really jump into the compile time optimizations to be added.  Which are series of replacements that the code generation will make during output for you.

         The compile time optimization list contains way too many opt's to list from  memory (plus it's private part our compiler technologies),  but here's one I see in peoples programs when reading characters from a string.       This expression ThisChr=Asc(Mid$(String$,Pos,1))  which should be written as  ThisChr= Mid (String$,Pos) in PlayBASIC.     The latter is quicker as we're not returning a string, which no matter how fast you make them, will always be slower than reading a character directly from the string.     This kind of thing can be done at compile time, so the output stream of opcodes can be  made as  clean as possible.  

       Anyway we should be able to start dropping a stream V1.65 Betas very soon !  -  The more people that are active in testing them the quicker this process becomes.  


kevin

#29
 PlayBASIC V1.65 – Write Array Strings Bench Marking

      Updated the write string array opcodes last night and today and we can repeort yet another gain in performance.   The change helps the VM avoid some bogus string coping when moving computed strings from expressions.  It seems to work well so far gaining over 10fps in the deom on a per frame basis, but the test executes almost 10 seconds faster... in V1.65 and the V1.64....    very happy with that !

       Benchmark code:  running 10 year old  athon 3000 (single core)..  :)


PlayBASIC Code: [Select]
      Max= 10000
Dim v1$(Max)
Dim v2$(Max)

Dim TempStrings$(Max)
For lp =0 to Max
TempStrings$(lp) = Make$(str$(lp),10+(lp and 64))
next


StartTime=Timer()
For Tests =0 to 400
cls

Frames++
a$=make$(Str$(Frames)+",",10)

s$=make$("-",len(PlayBASIC$))
s$+="---------------------------------------------------------"
print s$
print "---[ String Write Bench Marking "+PlayBASIC$+" ]-----------------------"
print s$
print ""

t=timer()
For lp =0 to max
Temp$=A$+ " Some Test String"
v1$(lp) = Temp$
next
tt1#=tt1#+(Timer()-T)
print " Test1 :"+Str$(tt1#/frames)

print "Test String :"+v1$(0)
print ""


t=timer()
For lp =0 to max
v2$(lp) = a$+" Some Test String"
next
tt2#=tt2#+(Timer()-T)
print " Test2 :"+Str$(tt2#/frames)
print "Test String :"+v2$(0)
print ""

// -----------------------------------------------------
// Direct Assignment From Array
// -----------------------------------------------------
t=timer()
For lp =0 to max
v1$(lp) = TempStrings$(lp)
next
tt3#=tt3#+(Timer()-T)
print " Test3 :"+Str$(tt3#/frames)

print "Test String :"+v1$(0)
print ""


// -----------------------------------------------------
// Stream Line String Function Returns
// -----------------------------------------------------
t=timer()
For lp =0 to max
Temp$=Left$(A$,10)
v1$(lp) = Temp$
next
tt4#=tt4#+(Timer()-T)
print " Test4 :"+Str$(tt4#/frames)
print "Test String :"+v1$(0)
print ""

t=timer()
For lp =0 to max
v1$(lp) = Left$(A$,10)
next
tt5#=tt5#+(Timer()-T)
print " Test5 :"+Str$(tt5#/frames)
print "Test String :"+v1$(0)
print ""

#break

print S$
print S$
print "String Count :"+Str$(Max)
print " FPS :"+STR$(Fps())
print S$
print S$

Sync
next Tests


print "DONE"
TotalTime=Timer()-StartTime
print "Time :"+str$(float(TotalTime)/Tests)+" Seconds"

Sync
waitkey
waitnokey