Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macros without augmentations #4115

Open
lrhn opened this issue Oct 1, 2024 · 6 comments
Open

Macros without augmentations #4115

lrhn opened this issue Oct 1, 2024 · 6 comments
Labels
feature Proposed language feature that solves one or more problems

Comments

@lrhn
Copy link
Member

lrhn commented Oct 1, 2024

I'm not suggesting we don't add augmentations to the language. I just found this approach interesting enough that I want it documented, in case we ever reconsider. @mit-mit mentioned this approach as an alternative to augmentations, and it has some nice properties, and some limitations, so I wanted to thinkt hrough it and write it down. So, this.

TL;DR: Instead of adding augmentations to the language, we just let macros rewrite source by doing the same additions and code wrapping/replacement that augmentations would allow, and then let the result of that be the source for that URI that the compiler sees. By keeping it all inside the macro execution, the language doesn't have to care about ordering or having two versions of every declaration. But then it cannot be used for anything else, and in practice we still have to specify what makes a program "valid" before macro execution, so the saving might not be that big.

Macros without augmentations

Augmentations are mainly introduced to facilitate macros. They do have other uses (patch files and code generation).

If macros were the only reason for augmentations, and we did not expect users or code generators to write augmentations, we could consider to not make augmentations a language feature, and just make them a macro feature which rewrites the Dart source.

Macro rewriting

Augmentations can be used to add declarations and features to declarations, and to wrap code (functions bodies and initializer expressions).

A rewriting functionality would allow macros to add declarations to libraries and class-like scopes, to add features to existing declarations, and to replace code, given access to an opaque representation of the existing code, which can be inserted into the replacing code if needed.

This corresponds to every place where an augmentation can add to or wrap the augmented definition, so the expressive power is the same, but rather than generating layers of augmentations, the modifying operation acts directly on the existing program, and the end result is a new program.

That is, pre-macro source code is parsed and used as input to macros implementations. Macro implementations can introspect on the existing source code, and can then add to the program, or replace code (bodies and initializers), where each modification updates the program that is available to further introspection and modification.

A macro implementation can then:

  • Add imports and exports.
  • Maybe add parts (then it should also be able to generate the entire part file content). This only makes sense with enhanced parts, and even then it might not.
  • Add new top-level declarations.
  • Modify existing top-level declarations.
  • If modifying a class-like declaration.
  • Add new member declarations to class-like declarations.
  • Modify existing member declarations of class-like declarations.

The modifications are the same we would allow using augmentations: Adding metadata and documentation, adding more super-interfaces or mixins applications where allowed, or even a missing super-class for class declarations, adding initializer list entries to constructors, adding enum values to enums, etc.

The biggest difference between this and augmentations is that there is only ever one parameter list for a function. It’s not possible to call augmented(...) with different arguments. However, it is possible to change the values of the parameter variables before inserting the existing body in the new body, which should give the same effect, or introduce new variables with the same names. (But it should be possible to change a final parameter to a var parameter to make it assignable. It’s up to the macro code to not break promotion by modifying the variable in a closure.)

That may also reduce the complexity of compilation.

Benefits

It’s simpler for the language. There is no need to specify augmentation declarations, which is a new kind of declaration for each kind of existing declaration, with slightly different syntax requirements than the non-augmenting declaration, and no need for the language to give a semantics to source code containing augmenting declarations. It can use the current language semantics, and macros is just a built-in source transformation performed before we assign semantics to the program.

Augmentations also allow declarations to omit parts that are required for a full declaration. That makes it easier to mistakenly allow an incomplete program, because the grammar must allow incomplete programs. (But we’ll be using the same parser for the pre-macro source code, which is also an incomplete program, so it may not make much difference whether we use augmentations or not for that.)

Because the program augmentation is done imperatively and sequentially, the language doesn’t have to worry about augmentation ordering, and therefore part order is also not semantically important. All code for a declaration is in the same file. Macro execution order still matters, but that’s handled during macro execution, the final result has that order built in, and the language doesn’t have to care about which declaration came from where.

There are no new files added, instead the effect of a macro is to change the source code associated with a URI in the compilation process. Rather than using the file on disk as the code being compiled, the macro transformation of that file content is the code that is compiled. Any new code is added into the same file, inside the same class declaration, not in a distinct file with different imports and different scoping. (If a macro needs more imports, it adds them to the file where it uses them. If it needs to avoid conflicts, it should import with a fresh prefix.)

Parallel macro execution

Running macros in parallel becomes technically harder, since they are now modifying the same mutable state. There will have to be an API for working with a shared data structure, which may cause errors if two parallel macros both try to add something of which there can be only one, like adding a superclass to a class. At least unless it’s the same superclass.

That would also be a problem if both macros create augmenting class declarations with a declared superclass, so it’s not new. If anything, we can more easily make “add superclass if possible” an atomic operation since we know that we are working in parallel on a shared data structure. Maybe even have transactions on the data structure, where you add a superclass and a super constructor invocation on every real constructor, and only succeed if every operation works. If just creating an augmentation that does that, it’s impossible to know if another augmentation will be added before yours which invalidates the state you were working from.

Issues

Macro only

A macro-only feature doesn’t allow code generators to use the same feature to augment code. They are stuck at the “generate a subclass and have the constructor create that” trick.

Most code generators could probably be made into macros, but there can be reasons for wanting the code to be generated once and for all, not on every compilation, and code generation may be able to do things macros are not allowing, like running arbitrary code or doing network requests.

A macro-only feature is not a direct replacement for patching of platform libraries. That’s an “us problem”, not something we should design user-facing features from. We can make “patch macros” that modify the platform libraries by adding private members and add bodies, if we want to use the feature (and remove the external from declarations, which may then need to be a capability).

No new scopes, import or parameter

A macro can insert imports into an existing file, but it cannot create its own new import scope, as it would if it added a new part file.

That means that inserted code exists in the same scope as the surrounding class.

It’s not a strong expressiveness issue, as long as macros can be guaranteed one or more fresh names that they can use as import prefixes. A macro can import anything it needs to import with that prefix, or if that would conflict, some of the imports with other fresh prefixes. If the prefix is provided only symbolically, so the macro implementation can’t see the name, then the macro server can merge and remove prefixes, maybe add show and hide to the imports, when it knows which names are used, to make the output prettier. (It shouldn’t be necessary to do anything unless a macro-added import is completely unused, then it should be removed to avoid the dependency.)

No separate source

There is no separate source file with only the macro code, and the original source code is not the actual source code that is compiled.

If we want to show the generated code to the user, say if they do “Go to definition” on a reference to a macro-generated member in their original source file, then we will have to show them the entire generated source file.

That is, we can show the alternative source code for the current source file (or another source file, if the modified declaration is in a different part of the same library), like a show-source:package:foo/foo.dart view. That file contains the entire modified version of that source file, with all macro additions and replacements, and users can look through it.

If we can show the generated declaration itself in a popup on hover, users may rarely have to go to the source, but they should still be able to.

The macro-expanded file is also shown during debugging, since it contains the actual code that was compiled and run. The file doesn’t preserve line numbers or position in a line, so the macro-expanded source should have a source-map pointing back to the original source. That should allow source-map aware debuggers to show the original code in the original file when that is running, while showing generated code in the expanded file when that is running.

(We can even choose to save the generated file next to the original with a name like original_file.dart.macro, or in a .macro/ subdirectory of the same directory, or even put all macro generated files in lib/.macro/. And put *.macro in .gitignore. Then tools that don’t want to run the compiler can still check for a corresponding file in the expected place, and run, or ask the user to run, the compiler only if it finds nothing. The macro files will have all relative and package-local URIs updated to point to macro expanded files if necessary, or to the original file if it’s not modified. If we do that, any file that isn’t modified itself, will still be modified if it refers to a file that was modified, since its import change. That saved file is still a representation of the macro-expanded program, it is not the program itself. The internal macro-expanded program uses the new source as the source code for the original URI, so libraries preserve their URIs. I don’t know if it’s possible to have a source map for a URI that points to source file with the same URI, but maybe we can point to a file instead of a URI.)

The macro code generation can even choose to add comments around all macro generated code, like:

@foo
@More("bananas")
class Foo /*@foo{*/ extends SuperFoo /*}*/ 
   implements Baz /*@More(_){*/, MacroMore /*}*/ {
  static final String id /*@foo{*/ = "Foo" /*}*/;
}

Maybe not great for quickly scanning the code, but could be good for deeper readability because it makes it very clear when reading the code which parts were not in the original, and where they come from.

Since the macro-expanded source is a new program, and doesn’t have to preserve existing source lines, we should probably format it before showing it to users. (But if we don’t format it before compiling, error reports won’t be in the same place as in the formatted source, and we might not want to format everything as part of compiling, it could be unnecessarily expensive.)

Conclusion

I think this is a possible alternative to adding augmentations to the language. It will still support macros, but nothing else, because it is effectively an in-compiler program transformation, from an incomplete (and therefore invalid) Dart program (aka. not a Dart program) to a hopefully correct and complete Dart program. It's like a "file loader" that does macro expanding while loading, before the compiler ever sees the source. (Other than the macro process also using the compiler for introspection, but nothing requires it to, it's just more efficient.)

@lrhn lrhn added the feature Proposed language feature that solves one or more problems label Oct 1, 2024
@jakemac53
Copy link
Contributor

This is in effect what pub transformers did. The modification of existing files was a big complaint in that system due to the debugging issues etc. I do not believe it is a better system (but maybe it would be easier to implement/specify).

@lrhn
Copy link
Member Author

lrhn commented Oct 1, 2024

The biggest difference would be that pub transformers would have to save the modified files to disk, and thereby give them another URI, while modifying direclty in-memory during compilation wouldn't need to do that. There is still only one source, and one program, it's just not precisely the same thing.

But yes, it can probablyy easily become confusing, and we'd have to work hard to present things in a way that minimizes that confusion.

@mit-mit
Copy link
Member

mit-mit commented Oct 1, 2024

The strawman for the "Show what the macro produced" IDE experience I had in mind when we brainstormed this was that we'd show the full code after the macro had expanded with it's additions somehow highlighted (e.g. via a diff, or colorization).

But yet, it's quite plausible that the saved implementation cost from not doing augmentations would just turn into a similar (or higher) cost on the IDE experience side.

@rrousselGit
Copy link

I was about to mention diff/colorization & stuff, but you were faster than me :p
Moving the complexity of visualizing generated code to the IDE instead of the language sounds useful though.

Making generated code nicely readable is IMO less important than reducing the complexity of macros from the Language PoV.
I still think we're placing too much importance on the readability of generated code.

@rrousselGit
Copy link

rrousselGit commented Oct 1, 2024

This makes me think: I wonder if we could extract from IDEs some useful metrics with regards to how often generated code are looked at.

Maybe VScode could periodically send "time spent on foo.dart" vs "time spent on foo.part.dart" when some *.*.dart files are detected in a project?

@jakemac53
Copy link
Contributor

The biggest difference would be that pub transformers would have to save the modified files to disk, and thereby give them another URI, while modifying direclty in-memory during compilation wouldn't need to do that. There is still only one source, and one program, it's just not precisely the same thing.

Transformers were purely in memory as well. It is exactly the same thing except there were zero limitations put on transformers, and the original file had to live within the current grammar rules.

What you end up with is line numbers that don't match up, which causes issues when looking at stack traces and also makes debugging weird.

I am extremely opposed to overwriting files with different contents that do not match what is on disk, it is a nightmare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems
Projects
None yet
Development

No branches or pull requests

4 participants