Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research on Code Formatters Across Programming Languages #103

Open
3 of 6 tasks
Enter-tainer opened this issue Jul 7, 2024 · 3 comments
Open
3 of 6 tasks

Research on Code Formatters Across Programming Languages #103

Enter-tainer opened this issue Jul 7, 2024 · 3 comments

Comments

@Enter-tainer
Copy link
Owner

Enter-tainer commented Jul 7, 2024

Research and study code formatting tools for various mainstream programming languages, including but not limited to:

  1. Understand the working principles of each tool
  2. Compare features, pros, and cons of different formatters
  3. Learn from their designs and implementations
  4. Examine how they handle special cases and edge conditions

After completing the research, summarize findings and consider how to apply the learned knowledge to typstyle

@Enter-tainer
Copy link
Owner Author

Enter-tainer commented Jul 7, 2024

Ruff

Ruff has a ruff_formatter crate which is forked from rome_formatter. It looks like ruff is heavily inspired by rome. It has an IR for formatting which is similar to pretty rs.

The initiail looking of ruff's design is similar to typstyle. Parse source code to ast, then transform ast into formatter IR, then print IR into strings. Notablely, it has special handling for comments, like finding the source code range of every comment element. Haven't figure out how it is used yet.

Ruff categorizes comments into three types, leading, dangling and trailing. Python doesn't have inline comments so ruff's life is easiler than typstyle.

It also has a contribution.md describing challenges in formatting comments.

It looks like for each ast node struct, ruff has a format struct for it. And ast_node.format() will transform a ast node into the corresponding format struct. All these format structs impls a format trait, which defines how to produce formatter IR for each struct.

I feels like ruff's formatter is better than pretty rs. Maybe we should investigate more and consider switch to it or rome formatter. But before we do that we should make sure it is more powerful than current one.

@Enter-tainer
Copy link
Owner Author

Enter-tainer commented Jul 7, 2024

Prettier

I'm more familar with prettier. From what i've seen before, prettier implements Wadler's pretty printer and support more convenient extensions.

How it handles comments remains unclear. It has a massive file(~1k loc) for handling comments. https://github.com/prettier/prettier/blob/main/src/language-js/comments/handle-comments.js Looks like it manually handles all cases for each ast node type.

Interestingly it can also format markdown. I wonder if it do hard line wrapping and how it handles edge cases like #75 (comment) From code, it looks like the code is written by janpanese or chinese people. The idea is interesting: newlines and spaces are inter-changeable (conditionally). And it doesn't allow breaking for certain cases. I feels like it is possible to do hard line wrapping after reading prettier's implementation. https://github.com/prettier/prettier/blob/main/src/language-markdown/print-whitespace.js

@Enter-tainer
Copy link
Owner Author

Rustfmt

Rustfmt doesn't use a format IR. Instead, it directly visit each ast node and produce strings. This gives it more flexibility. For example, rustfmt can set differen line width limit for different construct. This is impossible in prettier or ruff.

I think we cannot learn a lot from it. Maybe it is too hard and boring to make a rustfmt-ish formatter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant