Monday, October 2, 2017

Y-Basic Architecture (part 1)


Code

Memory will be divided into page sections where each page is 256 bytes long. Every section will start at an address where its lowest byte equal 0.


1. Code Line Number Section

This section will contains a list of 2 elements: line number (2 bytes: 0 to 65535) and an address offset where the actual basic code is located. For N numbers of basic lines, there will be N+1 entry. The number of elements will also be stored somewhere to determine how many entries are presents.

The elements in that table will always be in line number order. This will allow a fast look-up to found if a specific line number exist and where its code is located.


2. Code Section

The code section will contain the actual operations required to execute the Basic code line. Function parameters and returning value will be done via a stack design. This allows having a proper code execution while maintaining the order information in which the code line was written. 

In order to keep the original source text, an additional element will be added if needed. This will allow for example to add space character, comments or other source code decoration (like non necessary parenthesis).

Example:

        10 PRINT “Hello World”
        20 GOTO 10

Code Section (Page: 08):

0800: PUSH STRING “Hello World”
080D: PRINT
080E: PUSH 10
0810: GOTO

Code Line Number Section (Page: 09):

0900: $000A (10), 0000
0904: $0014 (20), 000E
0908: $0000, 0011


Additional information is required to be stored somewhere in order to know the location of these two pages (here 08 and 09) and how many lines of code (here 2).



To determine the physical address of a given line, the page number has to be added to the most significant byte. Of course, when saving on disk, the program would not contains the zeroes between sections in order to reduce the disk space needed.




Variables

Variables are more tricky to deal with because their contents need to persists in order to be able to load a new program and continue the execution based on prior events. Within the code, a variable will be identify by an index, (starting from 0).


3. Variable Section

This section will contains offset pointer to the variable name and reference count for each variable declared in the program. Making each variable entry taken 4 bytes (2 bytes for each information). 


The reference count will be especially useful when removing a line of code, in order to determine if variables used in that lines should be removed or not.


4. Variable Name Section

This section will store variable names. Variable names will be stored in 6-bit per char, in order to reduce the length required to store and compare names. When loading a new program, a binding operation will be required in order to maintains the contents and types of a variables previously used.


5. Variable Reference Section (only when running)

This section is created when running the program and initialized by 0 for each variable entry. During the execution, the reference of a variable will be created and memory allocation will be reserved to store it's contains. I did talk a bit before about variable types and different ideas... but I plan to talk in a future post about what will be stored at the address used in this section.


Example:
10 B=0:A=1
20 A = A + B

Variable Section:
0000: 0000, 0002 ; offset for variable name B: 0000, reference count: 2
0004: 0002, 0003 ; offset for variable name A: 0002, reference count: 3

Variable Name Section:
0000: "B\0"
0002: "A\0"

Variable Reference Section:
when starting to execute the program, 4 bytes will be set to 00 and be use later to store the 2 physical address related to variable B and A.

You might wonder why going into this page segmentation? One of the reason is speed... when looking for variable #1, by doing some simple operations, we can find the memory address where the information is stored. Microsoft Basic was designed with very limited RAM in mind (probably 4 KB RAM was the standard at the time they started to work on it).

With this way of storing a program, it is easy and possible to store more than one program! After reviewing some numbers, it become clear that maximum of 4 was the good number to go for. A program will be able to be loaded, removed, swapped or replaced in memory. This feature could be useful in complex software where modules can be use according to a task. If you think about it, the reference count for variables become even more important in that case, as more than one program could be using it!


Yes, things get complicated but also very interesting.