-
Notifications
You must be signed in to change notification settings - Fork 27
Architecture
Filbert takes as input a Python code string, and outputs a JavaScript abstract syntax tree. As you might guess, the interesting part is handling Python language features that do not exist in JavaScript.
The demo page helps visualize the process.
This is the main Python parser. Its parse()
method will return a Mozilla Parser API compliant AST, given valid Python code.
This is a very forgiving Python parser. Its parse_dammit()
method will try to return a valid AST for both valid and invalid Python code. It is a best-effort, and by no means foolproof.
As shown in the diagram above, the non-parser components are borrowed from filbert.js.
Tokenizing is the process of reading in text and identifying a piece of language grammar, which results in intermediary tokens that are used to construct a parse tree. For example, the string 'def' translates to the Python keyword def. After 'def' has been tokenized, we don't think of it as a string anymore, but an atomic keyword to be used in constructing a parse tree. Its _def
in filbert.js.
Filbert is a top-down parser. It processes tokens from start to end, in a top-down recursive fashion. It reads in a token (e.g. _def
), and decides what it's parsing (e.g. a function definition). If it's unclear from the current token, it reads another and tries again. And so on.
The main loop is basically a parseStatement()
call over and over again, which is done in parseTopLevel()
. There are a bunch of additional parse* functions that may be called depending on the token in hand. E.g. parseClass()
.
The runtime library is the glue that allows us to make JavaScript act like Python.
Important reminder: the runtime library is used by the output JavaScript, at runtime.
During the parsing phase, we create AST nodes that depend on the runtime library being available. However, it is up to the Filbert client to make sure the library can be accessed. For example, if the input Python code contains a call to range()
, the parser will happily convert that to a JavaScript call __pythonRuntime.functions.range(10);
. However, it won't run unless the runtime library containing the JavaScript implementation of range()
is available.
Provides JavaScript implementation of Python built-in functions. Python calls to these functions are translated into JavaScript calls to the library equivalents.
Provides JavaScript implementations for Python objects like lists, dictionaries, tuples, etc. Python usage of these objects are converted to their JavaScript library equivalents.
Some operators have to be overloaded at runtime in order to provide Python functionality. For example, we override addition.
Question: Why do we override addition, but not subtraction?
Input Python code
def add(a, b):
return a + b
num = add(7, 88)
print(num)
Output AST
{
"type": "Program",
"body": [
{
"type": "FunctionDeclaration",
"id": {
"type": "Identifier",
"name": "add"
},
"params": [
{
"type": "Identifier",
"name": "a"
},
{
"type": "Identifier",
"name": "b"
}
],
"body": {
"type": "BlockStatement",
"body": [
{
"type": "ReturnStatement",
"argument": {
"type": "CallExpression",
"arguments": [
{
"type": "Identifier",
"name": "a"
},
{
"type": "Identifier",
"name": "b"
}
],
"callee": {
"type": "MemberExpression",
"object": {
"type": "MemberExpression",
"object": {
"type": "Identifier",
"name": "__pythonRuntime"
},
"property": {
"type": "Identifier",
"name": "ops"
},
"computed": false
},
"property": {
"type": "Identifier",
"name": "add"
},
"computed": false
}
}
}
]
}
},
{
"type": "VariableDeclaration",
"kind": "var",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "num"
},
"init": {
"type": "CallExpression",
"arguments": [
{
"type": "Literal",
"value": 7,
"raw": "7"
},
{
"type": "Literal",
"value": 88,
"raw": "88"
}
],
"callee": {
"type": "Identifier",
"name": "add"
}
}
}
]
},
{
"type": "ExpressionStatement",
"expression": {
"type": "CallExpression",
"arguments": [
{
"type": "Identifier",
"name": "num"
}
],
"callee": {
"type": "MemberExpression",
"object": {
"type": "MemberExpression",
"object": {
"type": "Identifier",
"name": "__pythonRuntime"
},
"property": {
"type": "Identifier",
"name": "functions"
},
"computed": false
},
"property": {
"type": "Identifier",
"name": "print"
},
"computed": false
}
}
}
]
}
JavaScript generated from AST by escodegen:
function add(a, b) {
return __pythonRuntime.ops.add(a, b);
}
var num = add(7, 88);
__pythonRuntime.functions.print(num);
Cool, right? :)