Diary: How to add a REPL to STAK
We start out having a compiler, linker and bytecode interpreter – all very standard, by-the-book design. How to add REPL without having to reinvent the tooling? Moreover, it should work on all platforms where the interpreter can run (desktop, MS-DOS, consoles)
From the back: the interpreter will need a to incorporate a debug server, to allow manipulation of execution as well as of the code itself. For desktop, this can be a line-oriented text protocol on top of TCP. The interpreter has no clue about the structure of the program, so it can only provide low-level primitives. The debugger side will have to figure out all the rest.
Let’s think of the simplest case where the interpreter starts out as a
blank slate. It needs to be told to start out in suspended state (this
can also be achieved by a dummy program containing a single 'brk
opcode). The user enters a statement into the REPL; to allow compound
statements, the REPL can continue to accept input until parentheses are
balanced. The REPL calls the compiler, wrapping the entered statement in
a dummy function declaration. Good, except this will not work for
variable declarations: they would not be seen by following statements!
It will also not work for function declarations, since they can not be
nested. There is a simple solution – have the REPL detect these forms
and compile them in a global context.
The compiler produces a compilation unit. The unit is very easily relocatable:
- globals are referenced by name
- functions are referenced by name
- branches are always relative
- locals are, well, local
- constants are numbered locally as well (via
'constants-offset
)
The linker is then invoked to produce a linked program. The program contains a table of functions and marks one of them as the main function. The program must be loaded into the interpreter. The interpreter is instructed to begin executing at the start of the main function. How to tell the interpreter where to stop? Some ideas:
- a) place a breakpoint after the last instruction of
main
(but what if another function begins there?) - b) construct an artifical
real-main
which will just callmain
and then break - c) instrument the compiler to insert a
'brk
at the end ofmain
- d) append a
(break)
statement to the user input
Option d) seems like the most pragmatic one.
When the breakpoint is encountered, the interpreter will suspend execution and inform the REPL. The REPL must pop the top-of-stack and print it. But this will only work for one iteration. Then what?
User enters another statement. First of all, the compiler needs to be
told about previously defined global variables (it raises an error upon
encountering an undeclared variable, so we cannot wait until link time).
The linker needs to be told about previously defined functions and
globals. For the moment, let’s not permit updating previously defined
functions (though a special case will be necessary to permit redefining
main
). The linker needs to run in an “incremental mode”, taking into
consideration:
- number of previously defined functions, globals, constants (reminder: constants are not de-duplicated across functions, so only the total count is relevant)
- length of existing bytecode
The linker will thus produce a “program fragment”, whose bytecode can be
loaded starting at the end of the previous program, and its
function/global/constant tables appended to the existing tables.
(Optimization opportunity: if main
was the last function in the old
program, its bytecode can be overwritten, and its function table entry
replaced). As we have discovered, to permit iteration, the linker will
also need to provide:
- the (updated) function table
- the (updated) global table
The REPL now instructs the interpreter to load the program fragment with the respective table/code offsets. It finds the new main function and jumps to it.
How to reconcile global x local x REPL
- Locals can be initialized with an arbitrary expression
- Globals can only be initialized with a literal integer value
- This makes sure that all globals are initialized before first use in a linked program
- Otherwise the rule could be relaxed and globals initialized by code like locals (but it would be less efficient for the vast majority of cases)
- In REPL, we want to define values with an expression
At the same time:
- The execution model requires REPL variables to be globals, since all
local contexts cease to exist after execution
- Maybe they shouldn’t?
- What is our goal in the bigger picture, i.e. w.r.t. debugging a
larger program
- Indeed we will want to stop in the middle of a function and potentially execute entered code
If the compiler had a “REPL mode” what would it do:
- allow statements in top-level (implicit
main
) - globals initialized with expression
Additional interpreter state
– already have itbool Thread::suspended