Making a language interpreter, it works. Sort of.

Started by LemonWizard, April 01, 2014, 01:18:15 AM

Previous topic - Next topic

LemonWizard

Uhm okay there is the code.
Hopefully It makes sense. You can run it and it should work..
It doesn't work the way I want and I'm stuck.

PlayBASIC Code: [Select]
; PROJECT : Project1
; AUTHOR : LemonWizard
; CREATED : 3/31/2014
; ---------------------------------------------------------------------
#include "Input"
SETFPS 60

/// Petit Computer Runtime Version 0.00 /// BY LemonWizard

//Parsing rules
///Create a dictionary on the fly
//Create a list of tokens for a single line
//Search each token, then push a keyword onto the stack in the order it was found
//Search the stack for keywords, and resolve one keyword at a time
//Use multiple passes on this line to resolve the stack.
//Take a resolved item off the stack
//Continue along the stack until the stack is empty
//Generate errors for un resolved stack members
//Reset the stack, and move onto the next line.

//This is rather simple now that I understand better ^^


dim operands$(10)
operands$(1)="-"
operands$(2)="+"
operands$(3)="/"
operands$(4)="*"
operands$(5)="="
operands$(6)="=="
operands$(7)=","
operands$(8)="("
operands$(9)=")"
operands$(10)=chr$(34)

dim special$(2)
special$(1)="@"


//we are differentiating our symbols for operands here

dim keywords$(10)
keywords$(1)="IF"
keywords$(2)="THEN"
keywords$(3)="LET"
keywords$(4)="PRINT"
keywords$(5)="GOTO"
keywords$(6)="A"

dim stack$(6)


main:



for t=1 to getarrayelements(stack$(), 0)
if stack$(t)<>"" then print "Contents of stack element: " +str$(t)+" " +stack$(t)
next t



b$=staticinput("Enter a line!")
stack$()=generate_stack(b$)








sync
cls rgb(0,0,0)
goto main


function generate_stack(line$)

deletearray stack$()
dim stack$(0)

for tx=0 to len(line$)

// Now we search the string for it's contents in the order they are found.
//First we want to know about variables

//Search for operands //We inject the length of the operand into the check for convenience
for t=1 to 10
if mid$(line$, tx, len(operands$(t) ))=operands$(t) //it equals the operand we are searching for.
redim stack$(stacknum)

//Here we are just putting the value onto the stack anyway in case the following conditions are not met
stack$(stacknum)=operands$(t) // since they get over written below if they are met

if operands$(t)="="
vname$="" //we will assume this is NOT math and look for a variable name
//if not found we will simply the operand to the stack anyway.
//Rules. The variable name must not be an operand.
//If the variable name is an operand we will replace it in the stack
//With the operand. Hmm Yes. good enough.
//if the variable name is less than 2 lengths it may be operand
// if that is so we will just program it so!!

//Check the stack for = sign
//Assume assignment

///Put the variablename on the stack if possible




dim tempray$(100)
tokens=splittoarray(line$, " ", tempray$(), 0)

//Look for our operand values
for tt=0 to tokens
for bbb=0 to len(tempray$(tt) ) //I can't believe we have to search every damn token
// This SHOULDNT be an issue! //also this may cause issues with
//multiple equal statement there's gotta be a better less messy way!! wtf.. grrr


if mid$(tempray$(tt), bbb, 1)="="
vname$=left$( tempray$(tt), bbb-1)
value=val( mid$( tempray$(tt), bbb+1, len(tempray$(tt) )))

endif


next bbb


endif
next tt

//We will assume type safety here and put the next things onto the stack
stack$(stacknum)=vname$
stacknum=stacknum+1
redim stack$(stacknum)
stack$(stacknum)=operands$(t) //let us not forget about the operand itself
stacknum=stacknum+1
redim stack$(stacknum)
stack$(stacknum)=str$(value) //put the number onto the stack to complete our operation!

stacknum=stacknum+1
redim stack$(stacknum) //reset the stack size so it does not overflow


//attempting to put the variable on the stack let's see how it looks!
Login required to view complete source code



ModEdIT: Changed CODE to PBCODE tags for syntax highlighting


LemonWizard

#1
Uhm okay so I fixed an issue (a few issues) so far with the stack.
Now the problem is that it's giving the stack duplicates.
Before anyone tries to help I know why already. It is because I"m counting to the full length of line$ and then
asking playbasic to tokenize the whole line$ loop through it and add items to the stack accordingly based on the (this token is "=" condition )
but I can not think of a way to fix it as I need all tokens including the = operand to appear on the stack in the same order
that they were found.
I thought of setting up something like if this variable name is encountered more than X times, ignore it and don't put
it on the stack , and don't put this token on the stack (operand = ) either.
THe problem is that would break things later as lines like this would be broken "A=10 : A=25 : A=A*2 : A=13"
and right when I thought of the solution I found a way it was broken. >.<
The only other solution is not use the token search and instead somehow integrate a raw string search during the We found an = sign. But I think I would encounter the same issue (EXCEPT NOT) since I'm going through the whole length of the string. Perhaps I can ignore the token at it's position (index) a second or third time instead? hmm
infact that may even work...because at it's index I can remove a duplicate entry.
I think I may have it now. o.o Oh well I'm posting this anyway for ideas sake
Uhmmm. So. here is the code.


PlayBASIC Code: [Select]
; PROJECT : Project1
; AUTHOR : Tosharu
; CREATED : 3/31/2014
; ---------------------------------------------------------------------
#include "Input"
SETFPS 60

/// Petit Computer Runtime Version 0.00 /// BY GimmeMoreCoinz

//Parsing rules
///Create a dictionary on the fly
//Create a list of tokens for a single line
//Search each token, then push a keyword onto the stack in the order it was found
//Search the stack for keywords, and resolve one keyword at a time
//Use multiple passes on this line to resolve the stack.
//Take a resolved item off the stack
//Continue along the stack until the stack is empty
//Generate errors for un resolved stack members
//Reset the stack, and move onto the next line.

//This is rather simple now that I understand better ^^


dim operands$(10)
operands$(1)="-"
operands$(2)="+"
operands$(3)="/"
operands$(4)="*"
operands$(5)="="
operands$(6)="=="
operands$(7)=","
operands$(8)="("
operands$(9)=")"
operands$(10)=chr$(34)

dim special$(2)
special$(1)="@"


//we are differentiating our symbols for operands here

dim keywords$(10)
keywords$(1)="IF"
keywords$(2)="THEN"
keywords$(3)="LET"
keywords$(4)="PRINT"
keywords$(5)="GOTO"
keywords$(6)="A"

dim stack$(6)


main:

print B$

for t=0 to getarrayelements(stack$(), 0)
if stack$(t)<>"" then print "Contents of stack element: " +str$(t)+" " +stack$(t)
next t



b$=staticinput("Enter a line!")
stack$()=generate_stack(b$)








sync
cls rgb(0,0,0)
goto main


function generate_stack(line$)

deletearray stack$()
dim stack$(0)
dim foundnames$(10)
found=0 //the amount of times we find any variable
cur=0 //the current found variable

for tx=0 to len(line$)

// Now we search the string for it's contents in the order they are found.
//First we want to know about variables

//Search for operands //We inject the length of the operand into the check for convenience
for t=1 to 10
if mid$(line$, tx, len(operands$(t) ))=operands$(t) //it equals the operand we are searching for.
redim stack$(stacknum)

//Here we are just putting the value onto the stack anyway in case the following conditions are not met
stack$(stacknum)=operands$(t) // since they get over written below if they are met

if operands$(t)="="
vname$="" //we will assume this is NOT math and look for a variable name
//if not found we will simply the operand to the stack anyway.
//Rules. The variable name must not be an operand.
//If the variable name is an operand we will replace it in the stack
//With the operand. Hmm Yes. good enough.
//if the variable name is less than 2 lengths it may be operand
// if that is so we will just program it so!!

//Check the stack for = sign
//Assume assignment

///Put the variablename on the stack if possible




dim tempray$(100)
//checking$=mid$(line$, tx-5, tx+5) //maybe use for testing
tokens=splittoarray(line$, " ", tempray$(), 0)

//Look for our operand values
for tt=0 to tokens
for bbb=0 to len(tempray$(tt) ) //I can't believe we have to search every damn token
// This SHOULDNT be an issue! //also this may cause issues with
//multiple equal statement there's gotta be a better less messy way!! wtf.. grrr


if mid$(tempray$(tt), bbb, 1)="="
vname$=left$( tempray$(tt), bbb-1)
value=val( mid$( tempray$(tt), bbb+1, len(tempray$(tt) )))


//We will assume type safety here and put the next things onto the stack
stack$(stacknum)=vname$
stacknum=stacknum+1
redim stack$(stacknum)
stack$(stacknum)=operands$(t) //let us not forget about the operand itself
stacknum=stacknum+1
redim stack$(stacknum)
stack$(stacknum)=str$(value) //put the number onto the stack to complete our operation!

stacknum=stacknum+1
redim stack$(stacknum) //reset the stack size so it does not overflow






endif
Login required to view complete source code



ModEdIT: Changed CODE to PBCODE tags for syntax highlighting


LemonWizard

By the way... I fixed that issue partly.
>.< The check on the index works. Sort of.
But checking for the = operand in the whole line$ string is not working...
Here is why
http://imgur.com/GGdggvY
Uhm.. the line at the top shows the last command entered.
The list at the bottom is the results.
If you follow from left to right at the top, and from top to bottom in the list.
You will begin to see many inconsistencies.
>.<

kevin

#3
  What you're writing is  lexical scanner which is the initial tokenization process.   So it takes the input string and break it down into a list of tokens that represent this line/block of code with the given rules.   The rules are what types of characters can appear in keywords/literals/operators etc.  

 The basic logic is you scan from left to right.    If the current character is space/tab, then skip it.   If it's something else, fall into the ID'ing section.  Some tokens like operators are generally a single character, so for those we trap them and spit out the token onto the stack as you call it.   Where as others, like keywords/numbers are of unknown length.   Keywords would be assumed to start with characters that are within an alphabetical range.  To find the length of the keyword, we scan from the next character onwards, checking to see if the next characters fall in with our rule.     Most languages use Alpha-Numeric (A to Z, 0 to 9 with _ characters) rule with additional underscore support.    Numbers are the same thing, if the current character is a number, then we scan ahead to find the end of the number.  Once we have our number/keyword, we drop it onto the stack and continue on from after the last character in the keyword.

 A couple of links,

 - Simple Formula Evaluation / Compilers

 - Formula Evaluation (Lexical Scanner)



LemonWizard

Quote from: kevin on April 01, 2014, 08:23:24 AM
  What you're writing is  lexical scanner which is the initial tokenization process.   So it takes the input string and break it down into a list of tokens that represent this line/block of code with the given rules.   The rules are what types of characters can appear in keywords/literals/operators etc. 

  The basic logic is you scan from left to right.    If the current character is space/tab, then skip it.   If it's something else, fall into the ID'ing section.  Some tokens like operators are generally a single character, so for those we trap them and spit out the token onto the stack as you call it.   Where as others, like keywords/numbers are of unknown length.   Keywords would be assumed to start with characters that are within an alphabetical range.  To find the length of the keyword, we scan from the next character onwards, checking to see if the next characters fall in with our rule.     Most languages use Alpha-Numeric (A to Z, 0 to 9 with _ characters) rule with additional underscore support.    Numbers are the same thing, if the current character is a number, then we scan ahead to find the end of the number.  Once we have our number/keyword, we drop it onto the stack and continue on from after the last character in the keyword.

  A couple of links,

  - Simple Formula Evaluation / Compilers

  - Formula Evaluation (Lexical Scanner)




Hey thank you for this information it's going to be useful in creating a scripting language for future projects and moving into my own heiarchy system transparent to playbasic's ^^; you may ask why do this. For a basic interpreter that extends functions and so on and also learning purposes

LemonWizard

Hey Kevin. Guess what. You helped me alot. Uhm I was just stuck on one part.. and I realized I didn't need to parse the whole line because with the spaces gone most tokens will be whole pieces. I even found some shortcut ways to assume things, such as a string with quotes or tokens.

Uhm the idea you gave me was brilliant. I used a currentmember counter in the stack and I checked previous members of the stack to see if a requirement for the item was there. If it was, I placed it one member behind to fill the member gap. Such as IF-> action. Required action. Uhmmm

Have a look at what I have working now! I started from scratch it was much easier. If anyone else wants to look at this they can. I don't want to give the code out YET because it is still having some bugs. Once I get it in pristine condition I may release the code ^^;


http://i.imgur.com/IkoZTYh.png AND THERE IT IS! the image. Thanks again! Kevin I think you really are brilliant

LemonWizard

OKAY ANOTHER SUCCESS

http://imgur.com/8zFPeAU

MY CODE IS STABLE THIS IS AWESOME I now have a comparison operator working as well!
I decided to use the asm shorthand CMP in my stack to save on stack memory (for later porting to limited systems? )

I am so happy this is one of the coolest things I have done. Kevin I couldn't have done it without your idea.
Uhm and with this iteration of the program since I restarted... EVERYTHING the whole order of operation is actually preserved since I ditched the routine of stepping through the code character by character . This works so much better.. uhm and the tokenizing.. uhm and also the whole thing with the injection into previous stack members. So genious! as long as I step foreward for each time I stepped back I am safe!

LemonWizard

I have an early debug thing as well!!! http://imgur.com/mNocHer

LemonWizard

Well, just fiddling away here. I think basic debug messages are going to become a part of this little project:

http://imgur.com/IsPo7sk
it demonstrates the use of the '>' and '<' operand but '<' not included in the pic.


Now the kicker. This is actually an interpreter built planning to interprete code for PETIT computer so that pc based debugging can happen... and with the soon arrival of petit computer for 3DS, it will be a useful tool if the syntax for PTC 3ds is anything the same o.o.
But I plan to take it a step further. I also want it to be able to run old petit computer games... So far I only have a somewhat working stack generator. I think some memswap commands and such would be nice >.< but i'm getting too finicky.... hmmm

kevin


It's good you're getting some progress, but this is not twitter, so we don't really need a new post every hour on the hour.   If you have new information edit a previous post. 

LemonWizard

Just showing off the progress so far :D
Unfortunately because I'm checking individual stack tokens >.<
I've failed to seperate A=A+5 properly. Well actually.. it works better than I intended... finding two results

if you check this image you'll see what I mean