Bali Manual
Author: Trayambak Rai (xtrayambak at disroot dot org)
Version Series Targetted: 0.7.x
This manual is largely inspired by Monoucha's manual.
WARNING: Bali is only tested with the default memory management strategy (ORC). It uses ORC extensively and as such, it will probably not compile with other strategies.
WARNING: If you embed Bali in your program, you must compile it with the C++ backend as Bali relies on some C++ libraries.
This manual is largely under construction. Report problems if you can.
Table of Contents
- Introduction
- Creating new types
- Supported ECMAScript APIs
- Using Balde
- Controlling Codegen
- Using the Native Interface
Introduction
Bali is a JavaScript engine written from scratch in Nim for the Ferus web engine. It is designed to be convenient to interface with whilst being fast and compliant. It provides you high-level abstractions as far as humanly possible.
Currently, it has a fairly-fast bytecode VM as well as a baseline JIT compiler for x86-64 SysV systems.
Bali is evolving very quickly, and as such, the API is subject to breaking. Such changes will be generally marked by a bump in the minor version.
Terms to learn
Although not evident at first, learning these terms will make things much easier for you.
Atom
An atom is a variant type discriminated by its kind
field that can be:
- An integer (signed)
- An integer (unsigned)
- A float (64-bit)
- A string
- A sequence (of more atoms, but you can mark it as homogenous to restrict it to one type)
- An object (basically, an overglorified hashmap)
There's also an identifier type, but that's only internally used by Mirage.
A JSValue
is a pointer to an atom, and it is generally what is used instead of a MAtom
, to facilitate easier value manipulation.
Drooling Baby's First Scripts
We'll now learn how to use Bali's easy
module, which as the name suggests, gives you the simplest-possible way to execute JavaScript in Nim. It is incredibly dumbed down, and you're recommended to not use it for serious projects where more control over the entire execution flow is required.
All you need to is import a single component: bali/easy
import pkg/bali/easy
proc main() =
var runtime = createRuntimeForSource(
"""
console.log("Goo goo ga ga")
console.log("It doesn't get simpler than this.")
"""
)
runtime.run()
when isMainModule:
main()
It's mostly meant to be used as a gentle introduction to using Bali.
Baby's First Scripts
Now, we'll learn how to write a Nim program that can load and evaluate JavaScript. You'll need to import two of Bali's components:
- The grammar
module, which as the name suggests, is responsible for lexing and parsing the JavaScript source code into an abstract syntax tree
- The runtime
module, which takes in the AST and converts it into bytecode that targets the Mirage/Pulsar interpreter.
However, all of the complexities are neatly abstracted away, so you needn't worry about all of that. The API is deceptively simple. Here's the minimal example.
import bali/grammar/prelude
import bali/runtime/prelude
# Create a parser.
let parser = newParser("""
console.log("Hello from JavaScript!")
""")
# Parse the source code into an AST.
let ast = parser.parse()
# Instantiate the runtime, which generates the bytecode and runs it.
# You can optionally pass on a filename, it only exists to keep the
# bytecode caching mechanism work.
let runtime = newRuntime("nameofyourfile.js", ast)
# Begin execution. This halts your code until execution is completed.
runtime.run()
Let's breakdown what each line does:
- We're creating a parser with newParser()
and feeding it our source code.
- We're parsing the source code into an abstract syntax tree.
- We're feeding the AST to the runtime.
- We're running the generated bytecode.
That's it! You just executed JavaScript with Bali! Yay.
Creating new types
Now, let us learn how to create new types. Bali can automatically "wrap" a Nim object into its representative atom. Bali can do this for primitives like integers, floats, strings and sequences of said primitives as well.
Wrapping Primitives
Let us try wrapping some primitives into atoms before wrapping a Nim object.
import std/options
import bali/runtime/prelude
let name = "John Doe"
let age = 24'u
let likes = @["Skating", "Tennis", "Programming"]
let aName = wrap(name)
let aAge = wrap(age)
let aLikes = wrap(likes)
assert aName.kind == String
assert aAge.kind == Integer
assert aLikes.kind == Sequence
assert aName.isSome and aName.getStr().get() == name
assert aAge.isSome and aAge.getInt().get() == age
NOTE: When you call wrap()
, Bali allocates the object's on the heap, which is normally controlled by the Boehm GC! Do not free the pointer unless you're 100% sure it's no longer needed. Due to how Bali is designed, deallocating a JSValue
can cause a ripple effect where other parts of the code holding onto the pointer won't notice it, and will subsequently perform a user-after-free! Henceforth, leave deallocations to the GC, because it's smarter than you.
Here, we just turned Nim types into atoms, which can be of any type, only dictated by their kind
field. We can also turn the first two atoms back into their original representation.
Unfortunately, it isn't as simple for the sequence, which can be dynamically typed with heterogenous types at this point.
The getXXX
functions return Option[requested type]
as they need to safely provide a way to tell if an atom can be of the requested primitive type.
Hold up, this isn't how JavaScript engines work!
Yep, they use coercion. You could implement coercion like this using the exception system, but Bali already handles some of the coercion functions that ECMAScript expects. They'll be covered later.
Wrapping our Type
Assuming you already have the runtime set up via newRuntime()
, you can define types before run()
is called.
type Person* = object
name*: string
age*: uint
likes*: seq[string]
runtime.registerType(prototype = Person, name = "Person")
runtime.setProperty(Person, "name", str("John Doe"))
runtime.setProperty(Person, "age", integer(24'u))
runtime.setProperty(Person, "likes", sequence(@[
str("Skating"),
str("Tennis"),
str("Programming")
]))
Here, we're exposing our Person
type to the JavaScript environment by specifying what fields it has, and also setting those fields.
Now, you can easily just run this in your JS code:
console.log(Person.name) // Log: John Doe
console.log(Person.age) // Log: 24
console.log(Person.likes) // Log: [Skating, Tennis, Programming]
Modifying our Prototype
Now, what if you wanted to add a function that can be called by every instance of your type?
In order to do that, you need to modify its prototype. Assume that we want multiple instances of the previously defined Person
class to be possible.
We need to define a constructor for it.
runtime.defineConstructor(
"Person",
proc =
let
name = runtime.ToString(&runtime.argument(1), required = true, message = "Expected `name` argument at pos 1, got {nargs}")
age = runtime.ToNumber(&runtime.argument(2), required = true, message = "Expected `age` argument at pos 2, got {nargs}")
let person = Person(name: name, age: age.uint)
ret person
)
Now, we can call new Person("John Doe", 24)
in JavaScript land and get an instance of the Person
class!
Let us assume you want a greet
function for all Person
(s) that returns "
runtime.definePrototypeFn(
Person,
"greet",
proc(value: JSValue) =
let
name = runtime.ToString(value["name"])
age = runtime.ToNumber(value["age"])
echo name & " greets you back."
)
Let us break that down, shall we?
definePrototypeFn
, as the name suggests, sets up a function for a type's prototype. This prototype is copied to all of the instances of that type, and all of the child classes that derive from this one, and so on and so forth.- You might be wondering what
value
here is. It's actually thePerson
type we calledret
on to pass it over to JS-land. It's now in the form of an atom, and unfortunately there's no way to easily turn it back into its original representation, atleast right now.
Now, the following code should work fine:
let john = new Person("John Doe", 23)
let jane = new Person("Jane Doe", 28)
john.greet()
jane.greet()
Supported ECMAScript APIs
Bali supports the following ECMAScript APIs.
The Math Type
The Math
type contains the following methods.
Math.random
This function uses the global RNG instance created via librng to generate a float between 0 and 1. WARNING: Do NOT tuse this function in places like cryptography! All of the PRNG algorithms used are highly predictable!
Using another RNG algorithm
You can pass the following values as arguments for --define:BaliRNGAlgorithm
:
xoroshiro128
: Xoroshiro128xoroshiro128pp
: Xoroshiro128++xoroshiro128ss
: Xoroshiro128**mersenne_twister
: Mersenne Twistermarsaglia
: Marsaglia 69069pcg
: PCGlehmer
: Lehmer64splitmix
: Splitmix64
All of them vary in terms of quality, footprint and speed. Bali defaults to xoroshiro128
as it provides a nice balance between all of those.
The rest of the functions
All of the functions apart from Math.hypot
have been implemented.
The JSON type
The JSON
type has been implemented using the jsony parser. It contains a routine to turn Nim's JsonNode
structs into JSValue
(s) on the fly.
It is not very spec-compliant yet, and just exists for my convenience.
Supported Routines
JSON.parse()
JSON.stringify()
The URL type
The URL
type is not part of the JavaScript spec, but we implement it anyways. It uses Ferus' sanchar parser under the hood.
Supported Routines
new URL()
constructorURL.parse()
The BigInt type
The BigInt
type has been implemented, but the vast majority of its routines have not. It uses the GNU MP BigNum library under the hood.
Supported Routines
new BigInt()
constructorBigInt.prototype.toString()
The String type
This is perhaps the most complete implemented-by-Bali ECMAScript type here. It uses Kaleidoscope and simdutf under the hood.
Supported Routines
new String()
constructorString.prototype.indexOf()
String.prototype.concat()
String.prototype.trimStart()
/String.prototype.trimLeft()
String.prototype.trimEnd()
/String.prototype.trimRight()
String.prototype.toLowerCase()
String.prototype.toUpperCase()
String.prototype.repeat()
String.fromCharCode()
String.prototype.codePointAt()
The Date type
This is yet another well-implemented-by-Bali ECMAScript type. It uses a mixture of LibICU and internal math-based logic, mostly translated from Ladybird's LibJS.
Supported Routines
new Date()
constructorDate.now()
Date.parse()
Date.prototype.getYear()
Date.prototype.getFullYear()
Date.prototype.toString()
Date.prototype.getDay()
Date.prototype.getDate()
The Number type
This is just a boxed representation of a number.
Supported Routines
Number.isFinite()
Number.isNaN()
Number.parseInt()
Number.NaN
Number.EPSILON
Number.prototype.toString()
Number.prototype.valueOf()
The Set type
The Set type uses a Sequence atom under the hood, with guards to ensure that no duplicated values can get lodged in on accident.
Supported Routines
new Set()
Set.prototype.toString()
Set.prototype.add()
Set.prototype.size()
WARNING: The Set
type can handle recursion easily, but other routines may not! Calling ToString()
on a recursive Set
is known to cause a segmentation fault caused due to a stack overflow caused by infinite recursion.
var x = new Set;
x.add(x) // this is just fine.
console.log(x) // This ends up calling `Set.prototype.toString()`, which ends up calling `ToString()` on its elements and... I think you get the point.
Using Balde
Balde, short for "Bali debugger" is a CLI tool that acts both as a script runner and a REPL.
Running Scripts
Running a JavaScript source file is as simple as:
$ balde path/to/your/file.js
Flags
Balde supports a few flags for easier debugging.
--dump-ast
This flag lets the runtime evaluate the AST for any errors, but does not allow for its execution. Instead, it prints out the AST instead.
This allows for semantic errors to be thrown out. If you want an immediate AST dump, use --dump-no-eval
.
$ balde path/to/your/file.js --dump-ast
<AST representation>
--dump-no-eval
This flag dumps the parsed representation (or AST) of the provided JavaScript source without any further evaluation. It also includes the parsing errors.
$ balde path/to/your/file.js --dump-no-eval
<AST representation>
--verbose
This flag allows all debug logs to be shown, which can be used to diagnose bugs in the engine's multiple phases (tokenization, parsing, bytecode generation and runtime). Beware that this gets very spammy.
$ balde path/to/your/file.js --verbose
<A boat load of logs>
--dump-tokens
This flag dumps all of the tokens of a JavaScript source file.
$ balde path/to/your/file.js --dump-tokens
Using the REPL
WARNING: The REPL is still a very unstable feature. It is known to have several bugs. To run the REPL, simply run Balde with no arguments.
Experiments
Experiments are unstable Bali features that are locked behind an interpreter flag. In order to use them, you need to use the --enable-experiments
flag.
--enable-experiments
expects a syntax like this:
--enable-experiments:<experiment1>;<experiment2>
Current Experiments
There are currently no active experimental features.
Controlling Codegen
Bali exposes some levers to turn on/off some code generation features. This features help Bali emit faster bytecode by skipping certain expensive portions of the provided JS source into code that does roughly the same thing.
If the optimized code does not perform the same behaviour as the unoptimized code, that is a bug. You should probably report it to me after you're 100% sure that it isn't a problem on your part.
Prelude
Bali exposes three code generation optimizations as of right now: - Loop Elider - Loop Allocation Eliminator - Return-value register scrubbing
Loop Elision
Say, we have some code like this (this is very stupid, most people don't write code like this):
var i = 0
while (i < 999999)
{
i++
}
Bali can successfully prove that the while-loop only exists to modify the loop's state controller (i
) in a particular way (increment it in this case)
Bali now knows that there is not a need to:
* Waste storage space by generating code for a loop
* Waste CPU cycles by iterating 999999 times
As such, it can compute the result that will be stored at the end in the i
variable and turns the code into the rough equivalent of this:
var i = 999999
This also works for decrements, as intended.
There are a few cases where the loop elision will correctly fallback and actually generate the loop's code if it detects that a loop does more than mutate its own state.
var i = 0
while (i < 999999)
{
i++ // This is fine.
console.log(i) // Woops: we can't elide that loop now!
}
Now, it realizes that it has to actually generate the loop. Loop elision makes Bali really fast against other JavaScript engines for very specifically cherry picked benchmarks, but serves next to no real-world usage. Yet.
Loop Allocation Elision
Say, we have some code like this (this actually happens in a lot of places):
while (true)
{
let x = "Programming se me utsahit hota hun. Me vyavsaik programmer hun, aur tum ise idhar dekh sakte ho. Ye mera kaushal he."
console.log(x)
}
AFAIK, other JavaScript engines like V8 and SpiderMonkey have an internal string interning "cache" to prevent allocating the same string again and again. Bali takes a different approach, because why not?
Before a loop is about to be generated, Bali runs an optimization pass over the body to check if any allocations can be moved outside the loop's body. The aforementioned code sample, hence, would be converted into this:
let x = "Programming se me utsahit hota hun. Me vyavsaik programmer hun, aur tum ise idhar dekh sakte ho. Ye mera kaushal he."
while (true)
{
console.log(x)
}
This prevents unnecessary allocations. Yay.
Return-Value Register Scrubber
This optimization essentially allows the engine to emit the ZRETV
instruction. This instruction clears the return-value register, which allows the garbage collector to quickly clear up the memory it might occupy. This makes some badly written code no longer result in an OOM. It, however, like most of Bali, is not battle tested and as such, might result in undefined behaviour. It is disabled by default.
Disabling Optimizations
If you don't want the bytecode generator to spend time optimizing code (you're 100% sure you've written very neat code that will always make your CPU happy or something) or the bytecode generator ends up performing optimization incorrectly (rare, but if it occurs, please file a bug report), then you can disable codegen optimizations.
Disabling Optimizations in Balde
Balde exposes the following three flags to control optimizations: * --disable-loop-elision * --disable-loop-allocation-elim * --aggressively-free-retvals * --disable-jit
Disabling Optimizations in Nim code
When instantiating the Runtime
, you can pass an InterpreterOpts
struct containing a CodegenOpts
struct.
var runtime = newRuntime(
fileName, ast,
InterpreterOpts(
codegen: CodegenOpts(
elideLoops: false,
loopAllocationEliminator: false,
aggressivelyFreeRetvals: false,
jitCompiler: false
)
)
)
Using the Native Interface
A small example
Bali exposes a pretty neat two-way interface for native code written in Nim to interact with JavaScript, and vice versa.
Here's a short example. Here's a brief summary of what it does:
- The JavaScript code has a function called "shoutify" which takes in an argument and returns an all-upper-case version of it.
- The Nim code has a function called "greet" which takes in an argument and prints "Hi there,
import pkg/bali/grammar/prelude
import pkg/bali/runtime/prelude
proc main =
let parser = newParser(
"""
let greeting = greet("tray") // Here, we're calling a native function written in Nim
console.log(greeting)
function shoutify(name)
{
// Make a name seem like it's been SHOUTED OUT.
// This is called by native code.
var x = new String(name);
var y = x.toUpperCase() // Fun fact: `toUpperCase()` is native code. So here, we have native code calling interpreted code, which in turn calls more native code. :^)
return y
}
"""
)
let ast = parser.parse()
var runtime = newRuntime("interop.js", ast)
runtime.defineFn(
"greet",
proc() =
let arg = runtime.ToString(&runtime.argument(1))
ret str("Hi there, " & arg)
,
)
runtime.run() # Execute what we have so far.
let fn = runtime.get("shoutify") # Get the `shoutify` value reference in the global scope
if !fn:
return
let retval = runtime.call(&fn, str("tray")) # Call the `shoutify` function in bytecode. Pass it the arguments it expects.
echo "I AM SHOUTING YOUR NAME AT YOU, " & runtime.ToString(retval) # Take its return value and print it out.