Pulsar
Bali uses an interpreter called Pulsar. It is a fairly fast interpreter that uses an unorthodox mix of a valuespace (called the "stack" internally, as in a stack of values) as well as special-purpose registers for various purposes like passing arguments to functions, and taking out the return value from one.
Table of Contents
Design
Bali, like most JavaScript engines, starts off by lowering the abstract syntax-tree into a bytecode format it calls MIR.
Prior to 0.7.2, Bali would do an incredibly wasteful pass of generating the bytecode structures, emitting it as a string, then parsing that string into the VM's bytecode structures. Since that version, Bali instead converts the bytecode structures directly into the VM's structures, saving a lot of unnecessary memory allocations.
The bytecode format originally started off its life as part of the Mirage project. However, ever since the VM was moved into the source tree from Mirage, the two formats have diverged radically and are no longer compatible with one another, despite looking fairly similar. Bali does, though, carry a lot of legacy baggage from this as Mirage was supposed to be as agnostic as possible, while Bali's VM is strictly focused on JavaScript. More on this will be more obvious to you soon.
After 0.7.5, Pulsar uses a dispatch table instead of the massive switch case used earlier.
From codegen to execution
Let's take a very simple JavaScript program:
console.log("Hello, world!")
If we use Balde with the --dump-bytecode
flag, we can check out what the lowering mechanism generates for this code:
# Bytecode generated by Bali
# Bali is a JavaScript engine under the Ferus project.
# For more information, visit https://github.com/ferus-web/bali
# Developed by the Ferus Authors for the Ferus Project
# Clause/CodeModule "String"
# Operations: 1
CLAUSE String
1 CALL BALI_STRING
END String
# Clause/CodeModule "atob"
# Operations: 1
CLAUSE atob
1 CALL BALI_ATOB
END atob
# Clause/CodeModule "btoa"
# Operations: 1
CLAUSE btoa
1 CALL BALI_BTOA
END btoa
# Clause/CodeModule "encodeURI"
# Operations: 1
CLAUSE encodeURI
1 CALL BALI_ENCODEURI
END encodeURI
# Clause/CodeModule "BigInt"
# Operations: 1
CLAUSE BigInt
1 CALL BALI_BIGINT
END BigInt
# Clause/CodeModule "parseInt"
# Operations: 1
CLAUSE parseInt
1 CALL BALI_PARSEINT
END parseInt
# Clause/CodeModule "outer"
# Operations: 36
CLAUSE outer
1 LDUD 0
2 LDF 1 nan
3 LDF 5 inf
4 LDB 2 true
5 LDB 3 false
6 LDN 4
7 CFLD 7 0 "@bali_object_type"
8 CFLD 8 0 "@bali_object_type"
9 CFLD 9 0 "@bali_object_type"
10 CFLD 10 0 "@bali_object_type"
11 CFLD 11 0 "@bali_object_type"
12 CFLD 12 0 "@bali_object_type"
13 CFLD 13 0 "@bali_object_type"
14 CFLD 14 0 "@bali_object_type"
15 CFLD 15 0 "@bali_object_type"
16 CFLD 16 0 "@bali_object_type"
17 LDS 17 "Hello, world!"
18 PARG 17
19 CALL BALI_CONSTRUCTOR_STRING
20 RARG
21 RREG 17 0
22 ZRETV
23 RARG
24 LDN 18
25 LDUI 20 8
26 PARG 20
27 LDUI 21 18
28 PARG 21
29 LDS 21 "log"
30 PARG 21
31 CALL BALI_RESOLVEFIELD
32 RARG
33 PARG 17
34 INVK 18
35 RARG
36 ZRETV
END outer
That's quite a mouthful, isn't it? Let us ignore all clauses apart from outer
, for they exist just for some of the runtime's features to function properly.
By the way, this is in debug mode, where the emitter generates additional comments. The bytecode that'd be printed would be much more condensed otherwise.
Bootstrapping
Instruction 1 loads undefined
at position 0. This is where all failed identifier indexing attempts during codegen point to.
Instruction 2 loads NaN
at position 1, and so on.
This is the part of the bytecode we call "bootstrapping". It essentially preps up the VM to handle what is to come.
It also creates a field called @bali_object_type
in a lot of Object
(s) created discreetly in the native initialization phase. This is an internal tag used by the engine.