Inline `ASTNode` bindings dependencies and observers #15098

ggiraldez · 2024-10-19T03:08:04Z

Every ASTNode contains two arrays used for type inference and checking: dependencies and observers. By default, these are created lazily, but most active (ie. effectively typed) ASTNodes end up creating them. Furthermore, on average both these lists contain less than 2 elements each.

This PR replaces the both Array(ASTNode)? references in ASTNode by inlined structs that can hold two elements and a tail array for the cases where more links are needed. This reduces number of allocations, bytes allocated, number of instructions executed and running time.

Some numbers from compiling the Crystal compiler itself without running codegen (since type binding occurs in the semantic phase anyway).

Running time (measured with hyperfine running with GC_DONT_GC=1): ~6% reduction
Before:

Benchmark 1: ./self-semantic-only.sh
  Time (mean ± σ):      3.398 s ±  0.152 s    [User: 2.264 s, System: 0.470 s]
  Range (min … max):    3.029 s …  3.575 s    10 runs

After:

Benchmark 1: ./self-semantic-only.sh
  Time (mean ± σ):      3.180 s ±  0.095 s    [User: 2.153 s, System: 0.445 s]
  Range (min … max):    3.046 s …  3.345 s    10 runs

Memory (as reported by the compiler itself, with GC): ~9.6% reduction
Before:

Parse:                             00:00:00.000038590 (   1.05MB)
Semantic (top level):              00:00:00.483357706 ( 174.13MB)
Semantic (new):                    00:00:00.002156811 ( 174.13MB)
Semantic (type declarations):      00:00:00.038313066 ( 174.13MB)
Semantic (abstract def check):     00:00:00.014283169 ( 190.13MB)
Semantic (restrictions augmenter): 00:00:00.010672651 ( 206.13MB)
Semantic (ivars initializers):     00:00:04.660611385 (1250.07MB)
Semantic (cvars initializers):     00:00:00.008343907 (1250.07MB)
Semantic (main):                   00:00:00.780627942 (1346.07MB)
Semantic (cleanup):                00:00:00.000961914 (1346.07MB)
Semantic (recursive struct check): 00:00:00.001121766 (1346.07MB)

After:

Parse:                             00:00:00.000044417 (   1.05MB)
Semantic (top level):              00:00:00.546445955 ( 190.03MB)
Semantic (new):                    00:00:00.002488975 ( 190.03MB)
Semantic (type declarations):      00:00:00.040234541 ( 206.03MB)
Semantic (abstract def check):     00:00:00.015473723 ( 222.03MB)
Semantic (restrictions augmenter): 00:00:00.010828366 ( 222.03MB)
Semantic (ivars initializers):     00:00:03.324639987 (1135.96MB)
Semantic (cvars initializers):     00:00:00.007359853 (1135.96MB)
Semantic (main):                   00:00:01.806822202 (1217.96MB)
Semantic (cleanup):                00:00:00.000626975 (1217.96MB)
Semantic (recursive struct check): 00:00:00.001435494 (1217.96MB)

Callgrind stats:
- Instruction refs: 17,477,865,704 -> 16,712,610,033 (~4.4% reduction)
- Estimated cycles: 26,835,733,874 -> 26,154,926,143 (~2.5% reduction)
- GC_malloc_kind call count: 35,161,616 -> 25,684,997 (~27% reduction)

This allows big savings in number and bytes allocated since most ASTNodes will have on average less than 2 elements in each of these lists.

ggiraldez · 2024-10-19T07:54:13Z

Damn... I wished I saw this before I even started 😬

The CI is failing anyway, and maybe the reason is the same as the last time. Will investigate further.

straight-shoota · 2024-10-19T08:53:05Z

Ah, I assumed you were aware of that. 🙈
I had definitely mentioned there had been a previous attempt that looked nice but didn't work out.

Anyway the issue here must be something relatively obvious. I presume it shouldn't be hard to fix. The interpreter failure was more subtle.

src/compiler/crystal/semantic/bindings.cr

ggiraldez · 2024-10-19T15:12:53Z

Ah, I assumed you were aware of that. 🙈 I had definitely mentioned there had been a previous attempt that looked nice but didn't work out.

No worries. It was pretty straightforward to implement. I know that you mentioned it, but it didn't occur to me that there was a trail to follow.

Anyway the issue here must be something relatively obvious. I presume it shouldn't be hard to fix. The interpreter failure was more subtle.

Not sure it's obvious 😬 So far, I spent more time trying to figure out what's broken than actually implementing the optimisation. The good news is that this doesn't reproduce #12769.

straight-shoota · 2024-10-19T16:20:40Z

Turning SmallNodeList into a class seems to fix the failure. So it's a pass-by-value issue.
We can dig through the source code to finde where we pass an instance. Or we could keep it a class, and maybe inline the allocation with ReferenceStorage 🤷

straight-shoota · 2024-10-21T17:14:12Z

src/compiler/crystal/semantic/bindings.cr

+      while true
+        dependencies = node.dependencies.select { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) }


thought: I'm wondering if we could avoid the array allocation here (and maybe in similar locations). We're only interested in the first matching item, so Enumerable#find(&) should be sufficient.
This is an bit of a different change and maybe should go into a separate PR. But it could also fit in the scope here.

Indeed, I'm pretty sure the entire while block can be replaced with just the following:

while node = node.dependencies.find { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) } nil_reason = node.nil_reason if node.is_a?(MetaTypeVar) owner_trace << node if node visited.add node end

I pushed a commit to my fork, that way it's running CI immediately so we can confirm this works. oprypin@4203133

Or well, the CI got cancelled and overridden by the next commit on top of it - see my next comment.

Nice, thank you! Looks good, I'll pull both your commits.

straight-shoota · 2024-10-21T19:47:55Z

src/compiler/crystal/semantic/bindings.cr

-      while deps = node.dependencies
-        dependencies = deps.select { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) }
+      while true
+        dependencies = node.dependencies.select { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) }
        if dependencies.size > 0


suggestion: Use !dependencies.empty? here, it improves efficiency by not having to iterate the list.

straight-shoota · 2024-10-21T19:54:19Z

src/compiler/crystal/semantic/bindings.cr

-    def size
-      size = 0
-      size += 1 unless @first.nil?
-      size += 1 unless @second.nil?
-      if tail = @tail
-        size += tail.size
-      end
-      size
-    end


suggestion: We should keep the implementation of #size. The parent implementation of Enumerable#size iterates all elements which would be less efficient than adding the known tail size.

oprypin · 2024-10-21T22:06:25Z

src/compiler/crystal/semantic/bindings.cr

+      while true
+        dependencies = node.dependencies.select { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) }


Indeed, I'm pretty sure the entire while block can be replaced with just the following:

while node = node.dependencies.find { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) } nil_reason = node.nil_reason if node.is_a?(MetaTypeVar) owner_trace << node if node visited.add node end

I pushed a commit to my fork, that way it's running CI immediately so we can confirm this works. oprypin@4203133

Or well, the CI got cancelled and overridden by the next commit on top of it - see my next comment.

oprypin · 2024-10-21T22:10:37Z

src/compiler/crystal/semantic/bindings.cr

+    def each(& : ASTNode ->)
+      if first = @first
+        yield first
+      end
+      if second = @second
+        yield second
+      end
+      if tail = @tail
+        tail.each { |node| yield node }
+      end
+    end
+
+    def push(node : ASTNode) : self
+      if @first.nil?
+        @first = node
+      elsif @second.nil?
+        @second = node
+      else
+        tail = @tail ||= Array(ASTNode).new
+        tail.push(node)
+      end
+      self
+    end
+
+    def reject!(& : ASTNode ->) : self
+      if first = @first
+        if yield first
+          @first = nil
+        end
+      end
+      if second = @second
+        if yield second
+          @second = nil
+        end
+      end
+      if tail = @tail
+        tail.reject! { |node| yield node }
+      end
+      self
+    end


I converted this collection so that it never has "holes". reject! needs to have a much more complicated implementation (so the code is harder to trust), but in turn everything else can have an early return. Also it preserves push order.

Feel free to ignore this, I just wanted to have fun :D

Suggested change

def each(& : ASTNode ->)

if first = @first

yield first

end

if second = @second

yield second

end

if tail = @tail

tail.each { |node| yield node }

end

end

def push(node : ASTNode) : self

if @first.nil?

@first = node

elsif @second.nil?

@second = node

else

tail = @tail ||= Array(ASTNode).new

tail.push(node)

end

self

end

def reject!(& : ASTNode ->) : self

if first = @first

if yield first

@first = nil

end

end

if second = @second

if yield second

@second = nil

end

end

if tail = @tail

tail.reject! { |node| yield node }

end

self

end

def each(& : ASTNode ->)

if first = @first

yield first

if second = @second

yield second

if tail = @tail

tail.each { |node| yield node }

end

end

end

end

def size

if @first.nil?

0

elsif @second.nil?

1

elsif (tail = @tail).nil?

2

else

2 + tail.size

end

end

def push(node : ASTNode) : self

if @first.nil?

@first = node

elsif @second.nil?

@second = node

elsif (tail = @tail).nil?

@tail = [node] of ASTNode

else

tail.push(node)

end

self

end

def reject!(& : ASTNode ->) : self

if first = @first

if second = @second

if tail = @tail

tail.reject! { |node| yield node }

end

if yield second

@second = tail.try &.shift?

end

end

if yield first

@first = @second

@second = tail.try &.shift?

end

end

self

end

I pushed a commit to my fork, that way it's running CI immediately so we can confirm this works. oprypin@f967ade?diff=split

Nice fun. I believe #each can actually be shortened a bit more:

def each(& : ASTNode ->) yield @first || return yield @second || return @tail.try(&.each { |node| yield node }) end

I incorporated this change as well, thanks!

Co-Authored-By: Johannes Müller <[email protected]>

Inline ASTNode bindings dependencies and observers

82cdf48

This allows big savings in number and bytes allocated since most ASTNodes will have on average less than 2 elements in each of these lists.

Blacksmoke16 added performance topic:compiler:semantic labels Oct 19, 2024

straight-shoota reviewed Oct 19, 2024

View reviewed changes

src/compiler/crystal/semantic/bindings.cr Outdated Show resolved Hide resolved

straight-shoota linked an issue Oct 19, 2024 that may be closed by this pull request

Optimize allocation for node dependencies and observers in compiler #12851

Closed

ggiraldez added 2 commits October 19, 2024 17:02

Fix for failing test and tighten some more code

3faa849

Remove Indexable module from SmallNodeList

e45f759

straight-shoota reviewed Oct 21, 2024

View reviewed changes

oprypin added 2 commits October 22, 2024 00:07

Optimize find_owner_trace

4203133

Optimize SmallNodeList

f967ade

oprypin reviewed Oct 21, 2024

View reviewed changes

Simplify SmallNodeList#each

2332977

Co-Authored-By: Johannes Müller <[email protected]>

straight-shoota approved these changes Oct 22, 2024

View reviewed changes

ysbaddaden approved these changes Oct 22, 2024

View reviewed changes

straight-shoota added this to the 1.15.0 milestone Oct 22, 2024

straight-shoota merged commit 9d8c2e4 into crystal-lang:master Oct 23, 2024
66 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline `ASTNode` bindings dependencies and observers #15098

Inline `ASTNode` bindings dependencies and observers #15098

ggiraldez commented Oct 19, 2024

ggiraldez commented Oct 19, 2024

straight-shoota commented Oct 19, 2024

ggiraldez commented Oct 19, 2024

straight-shoota commented Oct 19, 2024

straight-shoota Oct 21, 2024 •

edited

Loading

oprypin Oct 21, 2024

ggiraldez Oct 22, 2024

straight-shoota Oct 21, 2024

straight-shoota Oct 21, 2024

oprypin Oct 21, 2024

oprypin Oct 21, 2024

straight-shoota Oct 21, 2024 •

edited

Loading

ggiraldez Oct 22, 2024

		while true
		dependencies = node.dependencies.select { \|dep\| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) }

Inline ASTNode bindings dependencies and observers #15098

Inline ASTNode bindings dependencies and observers #15098

Conversation

ggiraldez commented Oct 19, 2024

ggiraldez commented Oct 19, 2024

straight-shoota commented Oct 19, 2024

ggiraldez commented Oct 19, 2024

straight-shoota commented Oct 19, 2024

straight-shoota Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

oprypin Oct 21, 2024

Choose a reason for hiding this comment

ggiraldez Oct 22, 2024

Choose a reason for hiding this comment

straight-shoota Oct 21, 2024

Choose a reason for hiding this comment

straight-shoota Oct 21, 2024

Choose a reason for hiding this comment

oprypin Oct 21, 2024

Choose a reason for hiding this comment

oprypin Oct 21, 2024

Choose a reason for hiding this comment

straight-shoota Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

ggiraldez Oct 22, 2024

Choose a reason for hiding this comment

Inline `ASTNode` bindings dependencies and observers #15098

Inline `ASTNode` bindings dependencies and observers #15098

straight-shoota Oct 21, 2024 •

edited

Loading

straight-shoota Oct 21, 2024 •

edited

Loading