Skip to content
Matt Lott edited this page Jul 19, 2014 · 2 revisions

Filbert Architecture

Filbert takes as input a Python code string, and outputs a JavaScript abstract syntax tree. As you might guess, the interesting part is handling Python language features that do not exist in JavaScript.

The demo page helps visualize the process.

filbert.js

This is the main Python parser. Its parse() method will return a Mozilla Parser API compliant AST, given valid Python code.

filbert_loose.js

This is a very forgiving Python parser. Its parse_dammit() method will try to return a valid AST for both valid and invalid Python code. It is a best-effort, and by no means foolproof.

As shown in the diagram above, the non-parser components are borrowed from filbert.js.

Tokenizing

Tokenizing is the process of reading in text and identifying a piece of language grammar, which results in intermediary tokens that are used to construct a parse tree. For example, the string 'def' translates to the Python keyword def. After 'def' has been tokenized, we don't think of it as a string anymore, but an atomic keyword to be used in constructing a parse tree. Its _def in filbert.js.

Parsing

Filbert is a top-down parser. It processes tokens from start to end, in a top-down recursive fashion. It reads in a token (e.g. _def), and decides what it's parsing (e.g. a function definition). If it's unclear from the current token, it reads another and tries again. And so on.

The main loop is basically a parseStatement() call over and over again, which is done in parseTopLevel(). There are a bunch of additional parse* functions that may be called depending on the token in hand. E.g. parseClass().

Runtime Library

The runtime library is the glue that allows us to make JavaScript act like Python.

Important reminder: the runtime library is used by the output JavaScript, at runtime.

During the parsing phase, we create AST nodes that depend on the runtime library being available. However, it is up to the Filbert client to make sure the library can be accessed. For example, if the input Python code contains a call to range(), the parser will happily convert that to a JavaScript call __pythonRuntime.functions.range(10);. However, it won't run unless the runtime library containing the JavaScript implementation of range() is available.

Built-in Functions

Provides JavaScript implementation of Python built-in functions. Python calls to these functions are translated into JavaScript calls to the library equivalents.

Objects

Provides JavaScript implementations for Python objects like lists, dictionaries, tuples, etc. Python usage of these objects are converted to their JavaScript library equivalents.

Operators

Some operators have to be overloaded at runtime in order to provide Python functionality. For example, we override addition.

Question: Why do we override addition, but not subtraction?

Example

Input Python code

def add(a, b):
  return a + b
num = add(7, 88)
print(num)

Output AST

{
  "type": "Program",
  "body": [
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "add"
      },
      "params": [
        {
          "type": "Identifier",
          "name": "a"
        },
        {
          "type": "Identifier",
          "name": "b"
        }
      ],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": {
              "type": "CallExpression",
              "arguments": [
                {
                  "type": "Identifier",
                  "name": "a"
                },
                {
                  "type": "Identifier",
                  "name": "b"
                }
              ],
              "callee": {
                "type": "MemberExpression",
                "object": {
                  "type": "MemberExpression",
                  "object": {
                    "type": "Identifier",
                    "name": "__pythonRuntime"
                  },
                  "property": {
                    "type": "Identifier",
                    "name": "ops"
                  },
                  "computed": false
                },
                "property": {
                  "type": "Identifier",
                  "name": "add"
                },
                "computed": false
              }
            }
          }
        ]
      }
    },
    {
      "type": "VariableDeclaration",
      "kind": "var",
      "declarations": [
        {
          "type": "VariableDeclarator",
          "id": {
            "type": "Identifier",
            "name": "num"
          },
          "init": {
            "type": "CallExpression",
            "arguments": [
              {
                "type": "Literal",
                "value": 7,
                "raw": "7"
              },
              {
                "type": "Literal",
                "value": 88,
                "raw": "88"
              }
            ],
            "callee": {
              "type": "Identifier",
              "name": "add"
            }
          }
        }
      ]
    },
    {
      "type": "ExpressionStatement",
      "expression": {
        "type": "CallExpression",
        "arguments": [
          {
            "type": "Identifier",
            "name": "num"
          }
        ],
        "callee": {
          "type": "MemberExpression",
          "object": {
            "type": "MemberExpression",
            "object": {
              "type": "Identifier",
              "name": "__pythonRuntime"
            },
            "property": {
              "type": "Identifier",
              "name": "functions"
            },
            "computed": false
          },
          "property": {
            "type": "Identifier",
            "name": "print"
          },
          "computed": false
        }
      }
    }
  ]
}

JavaScript generated from AST by escodegen:

function add(a, b) {
    return __pythonRuntime.ops.add(a, b);
}
var num = add(7, 88);
__pythonRuntime.functions.print(num);

Cool, right? :)