implement and test SCFG->AST conversion #127

esc · 2024-05-29T09:34:51Z

This implements the creation of Python source code from a potentially restructured SCFG.

The main entry-point is:

from numba_rvsdg import SCFG2AST

And the round-trip test for the entry-points shows how to use the API:

class TestEntryPoints(TestCase):

    def test_rondtrip(self):

        def function() -> int:
            x = 0
            for i in range(2):
                x += i
            return x, i

        scfg = AST2SCFG(function)
        scfg.restructure()
        ast_ = SCFG2AST(function, scfg)

        exec_locals = {}
        exec(ast.unparse(ast_), {}, exec_locals)
        transformed = exec_locals["transformed_function"]
        assert function() == transformed()

Special attention was paid to the testing of the transform. For all the previously defined testing functions to test the AST -> SCFG direction, the tests were augmented to also test the SCFG -> AST direction. To test a transformed function, we assert behavioral equivalence by running the original and the transformed through a set of given arguments and ensure they always produce the same result. Additionally we ensure that all lines of the test function are covered using a custom sys.monitoring setup. (As a result the package now needs at least 3.12 for testing). This ensures that that the set of arguments covers the original function and also that the transformation back to Python doesn't create any dead code.

Special thanks to @stuartarchibald for the stater patch for the custom sys.monitoring based tracer.

Overall the iteration over the SCFG is still somewhat unprincipled, however, the tests and overall coverage do seem to suggest the approach works. Now that we can transform Python programs and have solid tests too, more thought ought to be invested into storing the SCFG in a non-recursive data-structure and developing a more elegant API to traverse the graph and it's regions depending on the use-case.

Typing and annotations are a nbit haphazard, there are multiple issues in the existing classes so some parts of this code just choose to use Any and # type: ignore pragmas.

@stuartarchibald

This implements the creation of Python source code from a potentially restructured SCFG. The main entry-point is: ``` from numba_rvsdg import SCFG2AST ``` And the round-trip test for the entry-points shows how to use the API: ``` class TestEntryPoints(TestCase): def test_rondtrip(self): def function() -> int: x = 0 for i in range(2): x += i return x, i scfg = AST2SCFG(function) scfg.restructure() ast_ = SCFG2AST(function, scfg) exec_locals = {} exec(ast.unparse(ast_), {}, exec_locals) transformed = exec_locals["transformed_function"] assert function() == transformed() ``` Special attention was paid to the testing of the transform. For all the previously defined testing functions to test the AST -> SCFG direction, the tests were augmented to also test the SCFG -> AST direction. To test a transformed function, we assert behavioral equivalence by running the original and the transformed through a set of given arguments and ensure they always produce the same result. Additionally we ensure that all lines of the test function are covered using a custom `sys.monitoring` setup. (As a result the package now needs at least 3.12 for testing). This ensures that that the set of arguments covers the original function and also that the transformation back to Python doesn't create any dead code. Special thanks to @stuartarchibald for the stater patch for the custom `sys.monitoring` based tracer. Overall the iteration over the SCFG is still somewhat unprincipled, however, the tests and overall coverage do seem to suggest the approach works. Now that we can transform Python programs and have solid tests too, more thought ought to be invested into storing the SCFG in a non-recursive data-structure and developing a more elegant API to traverse the graph and it's regions depending on the use-case. Typing and annotations are a nbit haphazard, there are multiple issues in the existing classes so some parts of this code just choose to use `Any` and `# type: ignore` pragmas.

sklam · 2024-05-29T18:04:04Z

numba_rvsdg/core/datastructures/ast_transforms.py

+            if_break_node = ast.If(
+                test=if_beak_node_test,
+                body=[ast.Continue()],
+                orelse=[ast.Break()],
+            )


I found a way to avoid the use of continue and break.
Instead of:

while True: cont = <loop_body> if cont: continue else: break

emit this:

cont = True while cont: cont = <loopbody>

you can try ea3ef44

does that we now eliminate the use of break and continue altogether for all restructured programs?

As title

esc · 2024-05-31T06:05:03Z

@sklam one idea I had was to use https://peps.python.org/pep-0622/ instead of generating the if-cascade? It might make it a bit simpler. What do you think? Or is it better to code generate programs with the smallest subset of the available syntax?

As title

esc · 2024-05-31T06:56:25Z

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

As title

sklam · 2024-05-31T13:49:31Z

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

Maybe even with a prefix like __scfg_<name>__, so that it's easy to find what names are introduced.

esc · 2024-05-31T14:17:35Z

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

Maybe even with a prefix like __scfg_<name>__, so that it's easy to find what names are introduced.

That's a good idea, it would make it easy to then ban any input program that uses variables that begin with __scfg_, just as a precaution.

numba_rvsdg/core/datastructures/ast_transforms.py

As title

esc · 2024-06-03T16:31:10Z

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

Maybe even with a prefix like __scfg_<name>__, so that it's easy to find what names are introduced.

That's a good idea, it would make it easy to then ban any input program that uses variables that begin with __scfg_, just as a precaution.

Done in 40b9daa

As title

sklam

I've been testing this PR against my pyasir work---latest working commit sklam/pyasir@40bf0ed
I got to 2d loops working and emitting correct LLVM; e.g.

def sum1d(n: Int64) -> Int64:
    c = 0
    for i in range(n):
        for j in range(i):
            c += i + j
    return c

I was hoping to get to code like below but I'm still messing up variable dependency in the if-statement:

def sum1d(n: Int64) -> Int64:
    c = 0
    for i in range(n):
        for j in range(i):
            c += i * j
            if c > 100:
                break
    return c

As far as I can tell, output from SCFG2ASTTransformer for the example above is correct. All previously mentioned issues are resolved. This PR is good to merge!

For future work, I'll suggest allowing the user of SCFG2ASTTransformer specify how certain things are handled. Currently, I'm fixing in after the fact with custom ASTTransformers (see https://github.com/sklam/pyasir/blob/40bf0eda7dde93acacfbefc6c4ffe630d40dd153/test_scfg_frontend.py#L227-L233).

esc · 2024-06-09T08:09:12Z

@sklam thank you for the review!

esc mentioned this pull request May 29, 2024

Write demonstrator of Python bytecode → Python AST #81

Closed

sklam reviewed May 29, 2024

View reviewed changes

esc added 2 commits May 29, 2024 20:35

avoid use of break and continue

ea3ef44

As title

update docs

6b31686

As title

esc added 2 commits May 31, 2024 08:51

fix running functions during test, increase coverage

f971cc0

As title

make sure the tracers picked something up

24b57b0

As title

esc added 2 commits May 31, 2024 08:58

fix coverage for test functions, run all branches

ab0b337

As title

secure codegen by not returning a default value

57ca469

As title

sklam reviewed May 31, 2024

View reviewed changes

numba_rvsdg/core/datastructures/ast_transforms.py Show resolved Hide resolved

esc added 3 commits June 1, 2024 15:42

fix codegen_view to return non-nested AST

3da4ece

As title

cleanup codegen_view

682aff7

As title

prefix all SCFG variables with __scfg_

40b9daa

As title

esc added 2 commits June 3, 2024 18:33

fix typo in docs

c520a29

As title

black the code

74b5a05

As title

sklam approved these changes Jun 7, 2024

View reviewed changes

esc merged commit f567eff into numba:main Jun 9, 2024
2 checks passed

esc deleted the scfg->ast branch June 10, 2024 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement and test SCFG->AST conversion #127

implement and test SCFG->AST conversion #127

esc commented May 29, 2024

sklam May 29, 2024

esc May 29, 2024

esc May 29, 2024

esc commented May 31, 2024

esc commented May 31, 2024

sklam commented May 31, 2024

esc commented May 31, 2024

esc commented Jun 3, 2024

sklam left a comment

esc commented Jun 9, 2024

implement and test SCFG->AST conversion #127

implement and test SCFG->AST conversion #127

Conversation

esc commented May 29, 2024

sklam May 29, 2024

Choose a reason for hiding this comment

esc May 29, 2024

Choose a reason for hiding this comment

esc May 29, 2024

Choose a reason for hiding this comment

esc commented May 31, 2024

esc commented May 31, 2024

sklam commented May 31, 2024

esc commented May 31, 2024

esc commented Jun 3, 2024

sklam left a comment

Choose a reason for hiding this comment

esc commented Jun 9, 2024