Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement and test SCFG->AST conversion #127

Merged
merged 12 commits into from
Jun 9, 2024
Merged

implement and test SCFG->AST conversion #127

merged 12 commits into from
Jun 9, 2024

Conversation

esc
Copy link
Member

@esc esc commented May 29, 2024

This implements the creation of Python source code from a potentially restructured SCFG.

The main entry-point is:

from numba_rvsdg import SCFG2AST

And the round-trip test for the entry-points shows how to use the API:

class TestEntryPoints(TestCase):

    def test_rondtrip(self):

        def function() -> int:
            x = 0
            for i in range(2):
                x += i
            return x, i

        scfg = AST2SCFG(function)
        scfg.restructure()
        ast_ = SCFG2AST(function, scfg)

        exec_locals = {}
        exec(ast.unparse(ast_), {}, exec_locals)
        transformed = exec_locals["transformed_function"]
        assert function() == transformed()

Special attention was paid to the testing of the transform. For all the previously defined testing functions to test the AST -> SCFG direction, the tests were augmented to also test the SCFG -> AST direction. To test a transformed function, we assert behavioral equivalence by running the original and the transformed through a set of given arguments and ensure they always produce the same result. Additionally we ensure that all lines of the test function are covered using a custom sys.monitoring setup. (As a result the package now needs at least 3.12 for testing). This ensures that that the set of arguments covers the original function and also that the transformation back to Python doesn't create any dead code.

Special thanks to @stuartarchibald for the stater patch for the custom sys.monitoring based tracer.

Overall the iteration over the SCFG is still somewhat unprincipled, however, the tests and overall coverage do seem to suggest the approach works. Now that we can transform Python programs and have solid tests too, more thought ought to be invested into storing the SCFG in a non-recursive data-structure and developing a more elegant API to traverse the graph and it's regions depending on the use-case.

Typing and annotations are a nbit haphazard, there are multiple issues in the existing classes so some parts of this code just choose to use Any and # type: ignore pragmas.

This implements the creation of Python source code from a potentially
restructured SCFG.

The main entry-point is:

```
from numba_rvsdg import SCFG2AST
```

And the round-trip test for the entry-points shows how to use the API:

```
class TestEntryPoints(TestCase):

    def test_rondtrip(self):

        def function() -> int:
            x = 0
            for i in range(2):
                x += i
            return x, i

        scfg = AST2SCFG(function)
        scfg.restructure()
        ast_ = SCFG2AST(function, scfg)

        exec_locals = {}
        exec(ast.unparse(ast_), {}, exec_locals)
        transformed = exec_locals["transformed_function"]
        assert function() == transformed()
```

Special attention was paid to the testing of the transform. For all the
previously defined testing functions to test the AST -> SCFG direction,
the tests were augmented to also test the SCFG -> AST direction. To test
a transformed function, we assert behavioral equivalence by running the
original and the transformed through a set of given arguments and ensure
they always produce the same result. Additionally we ensure that all
lines of the test function are covered using a custom `sys.monitoring`
setup. (As a result the package now needs at least 3.12 for testing).
This ensures that that the set of arguments covers the original function
and also that the transformation back to Python doesn't create any dead
code.

Special thanks to @stuartarchibald for the stater patch for the custom
`sys.monitoring` based tracer.

Overall the iteration over the SCFG is still somewhat unprincipled,
however, the tests and overall coverage do seem to suggest the approach
works. Now that we can transform Python programs and have solid tests
too, more thought ought to be invested into storing the SCFG in a
non-recursive data-structure and developing a more elegant API to
traverse the graph and it's regions depending on the use-case.

Typing and annotations are a nbit haphazard, there are multiple issues
in the existing classes so some parts of this code just choose to use
`Any` and `# type: ignore` pragmas.
Comment on lines 810 to 814
if_break_node = ast.If(
test=if_beak_node_test,
body=[ast.Continue()],
orelse=[ast.Break()],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a way to avoid the use of continue and break.
Instead of:

while True:
    cont = <loop_body>
    if cont:
        continue
    else:
        break

emit this:

cont = True
while cont:
    cont = <loopbody>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can try ea3ef44

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that we now eliminate the use of break and continue altogether for all restructured programs?

@esc
Copy link
Member Author

esc commented May 31, 2024

@sklam one idea I had was to use https://peps.python.org/pep-0622/ instead of generating the if-cascade? It might make it a bit simpler. What do you think? Or is it better to code generate programs with the smallest subset of the available syntax?

@esc
Copy link
Member Author

esc commented May 31, 2024

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

@sklam
Copy link
Member

sklam commented May 31, 2024

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

Maybe even with a prefix like __scfg_<name>__, so that it's easy to find what names are introduced.

@esc
Copy link
Member Author

esc commented May 31, 2024

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

Maybe even with a prefix like __scfg_<name>__, so that it's easy to find what names are introduced.

That's a good idea, it would make it easy to then ban any input program that uses variables that begin with __scfg_, just as a precaution.

@esc
Copy link
Member Author

esc commented Jun 3, 2024

@sklam the other thought I had was to surround all the introduced variables using dunder __. The special variables like __loop_cont__ and __sentinel__ use this convention, so maybe it would be good to be consistent here?

Maybe even with a prefix like __scfg_<name>__, so that it's easy to find what names are introduced.

That's a good idea, it would make it easy to then ban any input program that uses variables that begin with __scfg_, just as a precaution.

Done in 40b9daa

Copy link
Member

@sklam sklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been testing this PR against my pyasir work---latest working commit sklam/pyasir@40bf0ed
I got to 2d loops working and emitting correct LLVM; e.g.

def sum1d(n: Int64) -> Int64:
    c = 0
    for i in range(n):
        for j in range(i):
            c += i + j
    return c

I was hoping to get to code like below but I'm still messing up variable dependency in the if-statement:

def sum1d(n: Int64) -> Int64:
    c = 0
    for i in range(n):
        for j in range(i):
            c += i * j
            if c > 100:
                break
    return c

As far as I can tell, output from SCFG2ASTTransformer for the example above is correct. All previously mentioned issues are resolved. This PR is good to merge!


For future work, I'll suggest allowing the user of SCFG2ASTTransformer specify how certain things are handled. Currently, I'm fixing in after the fact with custom ASTTransformers (see https://github.com/sklam/pyasir/blob/40bf0eda7dde93acacfbefc6c4ffe630d40dd153/test_scfg_frontend.py#L227-L233).

@esc
Copy link
Member Author

esc commented Jun 9, 2024

@sklam thank you for the review!

@esc esc merged commit f567eff into numba:main Jun 9, 2024
2 checks passed
@esc esc deleted the scfg->ast branch June 10, 2024 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants