Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C Family] Full rewrite of C family syntaxes #4147

Draft
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

camila314
Copy link
Contributor

@camila314 camila314 commented Feb 7, 2025

I am currently working on a full rewrite of the C family using modern Sublime Text syntax highlighting features. I plan to rewrite C, Objective-C, C++, and Objective-C++ in that order. Note that as of right now, only the C syntax has been edited and the syntax test file does not yet have any annotations.

This is currently a draft. Please feel free to let me know if I'm missing anything and I will add it to the roadmap.

C Roadmap

  • Numbers
  • Strings
  • Operators
  • Punctuation
  • Constants
  • Control Flow (keywords, labels)
  • Enclosings (parenthesis, brackets, braces)
  • Default Types
  • Storage modifiers and qualifiers
  • Function calls
  • Function definitions
  • Structs, Enums, Unions, Typedefs
  • Declspec and Attribute
  • Standard C Functions
  • Rewrite preprocessor
  • Rewrite comments

Changes made to C Highlighting

  • Removed #if 0 and #if 1 #else creating a commented out block
  • Added i and j to numerical suffixes (GNU extension)
  • Add _Alignof, alignof, and offsetof, _Static_assert, static_assert, _Pragma as word operators
  • Every double-underscored identifier without another scope is treated as support constant
  • __cplusplus is a support.constant
  • Add nullptr_t (C23) and max_align_t (C11) to list of default types
  • Add _Thread_local, thread_local, _Atomic, _Alignas and _Noreturn as storage modifiers
  • Fixed bug where a function parameter followed by a newline wouldn't be highlighted properly
  • Add more __declspec properties
  • Add common __attribute__ properties
  • Add digit separator ' (C23)
  • Add scopes for __VA_ARGS__ and __VA_OPT__ (C23)
  • Give defined macro a suport.macro.c scope
  • Add #embed, #elifdef, and #elifndef directives (C23)
  • Add multi-line double-slashed comments via \
  • Add typedef support for function ptrs
  • Function pointer arguments in a typedef now share the same scoping as regular function defs

Issues

@camila314
Copy link
Contributor Author

My motivations are simple
image

- include: expressions
prototype:
- include: preprocessor.comment
- match: \\$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heads up, you can test this in a test file for Sublime via:

// SYNTAX TEST "Packages/C/C.sublime-syntax" \

... \
//  ^ punctuation.separator.continuation \
...

@@ -1,1555 +1,185 @@
/* SYNTAX TEST "Packages/C++/C.sublime-syntax" */
// SYNTAX TEST "Packages/C++/C.sublime-syntax"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for line-continuation, change header to:

// SYNTAX TEST "Packages/C++/C.sublime-syntax" \

when testing:

... \
//  ^ punctuation.separator.continuation \
...

duplicate the line-separator for continuation to the line with the expected scopes as well.

- Add digit separator `'` (C23)
- Add scopes for __VA_ARGS__ and __VA_OPT__
- Give defined macro a `suport.macro.c` scope
- Add #embed, #elifdef, and #elifndef directives
- Add multi-line double-slashed comments via `\`
- Fix some bugs
@deathaxe
Copy link
Collaborator

Some general advices:

main and prototype are primary entry points and thus shouldn't be located somewhere in the middle of anywhere. They should be the first contexts.

I'd recommend to organize contexts in a logical tree-like top-down structure, grouping related things like preprocessors, declarations, statements, expressions, keywords, identifiers, constants, operators, variables etc. together.

Various syntaxes even use ###[ SECTION NAME]### comments to visually separate bigger blocks of related contexts. This also enables easy folding of irrelevant blocks.

All of that will certainly help to keep track of structure as syntax definition grows. Keeping same context order and structure especially is important when extending a syntax, which will be the casse for C++.

The more syntaxes in this repo follow a common schema the easier it is to maintain them.

Premature ordering can be kill of creativity, so order is not that important for prototyping. Maybe however don't wait too long to add some structure.


Please don't just add some code snippets to syntax test files, but also design assertions on the way going. It sometimes takes more time to write tests than syntax, they are however the only way to ensure not to break wanted/designed behavior by automated syntax tests.

When writing rules for certain constructs you have in mind what you find important. Tests are just an expression of those thoughts or tools to verify your ideas/expectations against reality and vice versa.

Tests can (or even should) also contain incomplete statements/expressions, to ensure code doesn't break too badly. At least it helps to ensure to limit breaking highlighting to small regions or makes you think about how to achieve it.

Assertions should also contain some negative tests to ensure meta scope boundaries are maintained correctly or to avoid unwanted overlapping meta scopes. These are the less obvious issues, which may however impact various functionalities such as syntax based folding.

@deathaxe
Copy link
Collaborator

FWIW, context order and structure of existing C syntaxes may not be ideal nor a good template to rely on. Better look at something more recent, such as Java, PHP or Python.

@michaelblyons
Copy link
Collaborator

michaelblyons commented Feb 10, 2025

I don't think those are the actual tests. 😉 I note the path to the syntax def in User, and the final scope being c-2.

May I recommend similar to your PackageDev clone, you look at the suggestions in the ReadMe of this repo? Then you can modify the C syntax directly. If you want to keep the old one for comparison, you can rename it or move that one to User.

I'm also curious why you changed the assertion comment from // to /* ... */. Edit: My mistake. I like line comments for syntax tests, but the old one is block comments.

@uncenter
Copy link

uncenter commented Feb 12, 2025

I can't tell if these have been fixed yet, but some issues I noticed for C++ files on the latest stable that don't seem to be mentioned above:

CleanShot 2025-02-12 at 09 30 46

The names of namespaces being used don't have any specific scopes relating them to namespaces.

CleanShot 2025-02-12 at 09 30 58

I would expect the Typedef type here to have some scope related to type names.

Comment on lines +376 to +378
- match: \}|\] # Fallback
scope: invalid.illegal.stray-bracket-end.c
pop: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may easily cause owning contest to also break or not being terminated correctly due to brackets already being consumed here.

I'd recommend to use lookaheads and consume stray brackets only in top-level context.

@jrappen
Copy link
Contributor

jrappen commented Feb 13, 2025

To auto-close related issues upon merge of this PR, GitHub expects certain keywords:

https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests

or in other words, you need to change the following in your opening comment (which will be the commit message of the merge commit):

## Issues fixed

- fixes #1070
- fixes #1093
...

@michaelblyons
Copy link
Collaborator

Edited OP to add GitHub's format.

@jrappen
Copy link
Contributor

jrappen commented Feb 15, 2025

There's this old PR #1831, that hasn't been merged into master branch for ages. Maybe you can (partly) use that in your re-write here.

@deathaxe
Copy link
Collaborator

Not sure how useful #1831 still is. It came with some interesting parts, but scoping variable definitions entity.name is not that great. Maybe some things can even be solved with newer features.

Disclaimer: It's a while (some years) I had a look at it, the last time.

@jrappen
Copy link
Contributor

jrappen commented Feb 17, 2025

I just mentioned it cause it wasn't in Michael's "possibly related" list of issues/PRs.

captures:
1: entity.name.type.typedef.c
pop: 1
- match: ; # Fallback
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing scope? Alternatively use a lookahead and let a parent context consume/scope it, if it is required to handle multiple context pops.

property:
- match: \(
scope: punctuation.section.group.begin.objc
push: property.attrs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @property ()()() is not a valid statement, I'd suggest to set property.attr.

Suggested change
push: property.attrs
set: property.attrs

push: property.attrs
- include: else-pop
property.attrs:
- match: \)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All groups should receive a corresponding meta scope.

Suggested change
- match: \)
- meta_scope: meta.group.objc
- match: \)

Comment on lines +132 to +134
- include: expression-identifiers
- include: punctuation.comma
- include: operator.pointer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, always start contexts with meta_scope followed by real patterns.

Comment on lines +152 to +153
- match: (?=\b(self)\s*(\.))
push: self-access
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just...

Suggested change
- match: (?=\b(self)\s*(\.))
push: self-access
- match: \bself\b
scope: variable.language.objc
push: self-access
...
self-access:
- match: \s*(\.)\s*({{identifier}})
captures:
1: keyword.operator.accessor.objc
2: variable.other.member.objc
- include: pop-immediately

block-fn:
- match: \(
scope: punctuation.section.group.begin.objc
push: function.params
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this would also match void func()()() {}

Comment on lines +213 to +214
- match: .
scope: support.function.objc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A . pattern causes each character to become a single token, which is not ideal for various reasons and should be avoided if possible.

scope: variable.language.objc
- match: \@(property|synthesize|dynamic)\b
scope: keyword.other.objc
- match: '(@available)\b(\()'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A \b in between is not required as \( is already a word boundary. Optionally supporting whitespace via \s* may be useful however.

Suggested change
- match: '(@available)\b(\()'
- match: '(@available)\s*(\()'

1: keyword.operator.word.objc
2: punctuation.section.group.begin.objc
push: selector-obj
- match: '@(selector|encode|available)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny optimization, but avoiding unused capture groups helps with performance a little bit.

Suggested change
- match: '@(selector|encode|available)'
- match: '@(?:selector|encode|available)'

scope: storage.type.objc
- match: \@(?:interface|class|implementation|protocol|end)\b
scope: keyword.declaration.objc
- match: '(?=\bNS\w+\b)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure, but the following might be enough

Suggested change
- match: '(?=\bNS\w+\b)'
- match: '(?=\bNS)'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment