Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline ASTNode bindings dependencies and observers #15098

Merged

Conversation

ggiraldez
Copy link
Contributor

Every ASTNode contains two arrays used for type inference and checking: dependencies and observers. By default, these are created lazily, but most active (ie. effectively typed) ASTNodes end up creating them. Furthermore, on average both these lists contain less than 2 elements each.

This PR replaces the both Array(ASTNode)? references in ASTNode by inlined structs that can hold two elements and a tail array for the cases where more links are needed. This reduces number of allocations, bytes allocated, number of instructions executed and running time.

Some numbers from compiling the Crystal compiler itself without running codegen (since type binding occurs in the semantic phase anyway).

  • Running time (measured with hyperfine running with GC_DONT_GC=1): ~6% reduction
    Before:

    Benchmark 1: ./self-semantic-only.sh
      Time (mean ± σ):      3.398 s ±  0.152 s    [User: 2.264 s, System: 0.470 s]
      Range (min … max):    3.029 s …  3.575 s    10 runs
    

    After:

    Benchmark 1: ./self-semantic-only.sh
      Time (mean ± σ):      3.180 s ±  0.095 s    [User: 2.153 s, System: 0.445 s]
      Range (min … max):    3.046 s …  3.345 s    10 runs
    
  • Memory (as reported by the compiler itself, with GC): ~9.6% reduction
    Before:

    Parse:                             00:00:00.000038590 (   1.05MB)
    Semantic (top level):              00:00:00.483357706 ( 174.13MB)
    Semantic (new):                    00:00:00.002156811 ( 174.13MB)
    Semantic (type declarations):      00:00:00.038313066 ( 174.13MB)
    Semantic (abstract def check):     00:00:00.014283169 ( 190.13MB)
    Semantic (restrictions augmenter): 00:00:00.010672651 ( 206.13MB)
    Semantic (ivars initializers):     00:00:04.660611385 (1250.07MB)
    Semantic (cvars initializers):     00:00:00.008343907 (1250.07MB)
    Semantic (main):                   00:00:00.780627942 (1346.07MB)
    Semantic (cleanup):                00:00:00.000961914 (1346.07MB)
    Semantic (recursive struct check): 00:00:00.001121766 (1346.07MB)
    

    After:

    Parse:                             00:00:00.000044417 (   1.05MB)
    Semantic (top level):              00:00:00.546445955 ( 190.03MB)
    Semantic (new):                    00:00:00.002488975 ( 190.03MB)
    Semantic (type declarations):      00:00:00.040234541 ( 206.03MB)
    Semantic (abstract def check):     00:00:00.015473723 ( 222.03MB)
    Semantic (restrictions augmenter): 00:00:00.010828366 ( 222.03MB)
    Semantic (ivars initializers):     00:00:03.324639987 (1135.96MB)
    Semantic (cvars initializers):     00:00:00.007359853 (1135.96MB)
    Semantic (main):                   00:00:01.806822202 (1217.96MB)
    Semantic (cleanup):                00:00:00.000626975 (1217.96MB)
    Semantic (recursive struct check): 00:00:00.001435494 (1217.96MB)
    
  • Callgrind stats:

    • Instruction refs: 17,477,865,704 -> 16,712,610,033 (~4.4% reduction)
    • Estimated cycles: 26,835,733,874 -> 26,154,926,143 (~2.5% reduction)
    • GC_malloc_kind call count: 35,161,616 -> 25,684,997 (~27% reduction)

This allows big savings in number and bytes allocated since most
ASTNodes will have on average less than 2 elements in each of these
lists.
@ggiraldez
Copy link
Contributor Author

Damn... I wished I saw this before I even started 😬

The CI is failing anyway, and maybe the reason is the same as the last time. Will investigate further.

@straight-shoota
Copy link
Member

Ah, I assumed you were aware of that. 🙈
I had definitely mentioned there had been a previous attempt that looked nice but didn't work out.

Anyway the issue here must be something relatively obvious. I presume it shouldn't be hard to fix. The interpreter failure was more subtle.

@ggiraldez
Copy link
Contributor Author

Ah, I assumed you were aware of that. 🙈 I had definitely mentioned there had been a previous attempt that looked nice but didn't work out.

No worries. It was pretty straightforward to implement. I know that you mentioned it, but it didn't occur to me that there was a trail to follow.

Anyway the issue here must be something relatively obvious. I presume it shouldn't be hard to fix. The interpreter failure was more subtle.

Not sure it's obvious 😬 So far, I spent more time trying to figure out what's broken than actually implementing the optimisation. The good news is that this doesn't reproduce #12769.

@straight-shoota
Copy link
Member

Turning SmallNodeList into a class seems to fix the failure. So it's a pass-by-value issue.
We can dig through the source code to finde where we pass an instance. Or we could keep it a class, and maybe inline the allocation with ReferenceStorage 🤷

Comment on lines 363 to 364
while true
dependencies = node.dependencies.select { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) }
Copy link
Member

@straight-shoota straight-shoota Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I'm wondering if we could avoid the array allocation here (and maybe in similar locations). We're only interested in the first matching item, so Enumerable#find(&) should be sufficient.
This is an bit of a different change and maybe should go into a separate PR. But it could also fit in the scope here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I'm pretty sure the entire while block can be replaced with just the following:

      while node = node.dependencies.find { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) }
        nil_reason = node.nil_reason if node.is_a?(MetaTypeVar)
        owner_trace << node if node
        visited.add node
      end

I pushed a commit to my fork, that way it's running CI immediately so we can confirm this works. oprypin@4203133

Or well, the CI got cancelled and overridden by the next commit on top of it - see my next comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you! Looks good, I'll pull both your commits.

while deps = node.dependencies
dependencies = deps.select { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) }
while true
dependencies = node.dependencies.select { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) }
if dependencies.size > 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Use !dependencies.empty? here, it improves efficiency by not having to iterate the list.

Comment on lines 17 to 25
def size
size = 0
size += 1 unless @first.nil?
size += 1 unless @second.nil?
if tail = @tail
size += tail.size
end
size
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: We should keep the implementation of #size. The parent implementation of Enumerable#size iterates all elements which would be less efficient than adding the known tail size.

Comment on lines 363 to 364
while true
dependencies = node.dependencies.select { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I'm pretty sure the entire while block can be replaced with just the following:

      while node = node.dependencies.find { |dep| dep.type? && dep.type.includes_type?(owner) && !visited.includes?(dep) }
        nil_reason = node.nil_reason if node.is_a?(MetaTypeVar)
        owner_trace << node if node
        visited.add node
      end

I pushed a commit to my fork, that way it's running CI immediately so we can confirm this works. oprypin@4203133

Or well, the CI got cancelled and overridden by the next commit on top of it - see my next comment.

Comment on lines 17 to 56
def each(& : ASTNode ->)
if first = @first
yield first
end
if second = @second
yield second
end
if tail = @tail
tail.each { |node| yield node }
end
end

def push(node : ASTNode) : self
if @first.nil?
@first = node
elsif @second.nil?
@second = node
else
tail = @tail ||= Array(ASTNode).new
tail.push(node)
end
self
end

def reject!(& : ASTNode ->) : self
if first = @first
if yield first
@first = nil
end
end
if second = @second
if yield second
@second = nil
end
end
if tail = @tail
tail.reject! { |node| yield node }
end
self
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I converted this collection so that it never has "holes". reject! needs to have a much more complicated implementation (so the code is harder to trust), but in turn everything else can have an early return. Also it preserves push order.

Feel free to ignore this, I just wanted to have fun :D

Suggested change
def each(& : ASTNode ->)
if first = @first
yield first
end
if second = @second
yield second
end
if tail = @tail
tail.each { |node| yield node }
end
end
def push(node : ASTNode) : self
if @first.nil?
@first = node
elsif @second.nil?
@second = node
else
tail = @tail ||= Array(ASTNode).new
tail.push(node)
end
self
end
def reject!(& : ASTNode ->) : self
if first = @first
if yield first
@first = nil
end
end
if second = @second
if yield second
@second = nil
end
end
if tail = @tail
tail.reject! { |node| yield node }
end
self
end
def each(& : ASTNode ->)
if first = @first
yield first
if second = @second
yield second
if tail = @tail
tail.each { |node| yield node }
end
end
end
end
def size
if @first.nil?
0
elsif @second.nil?
1
elsif (tail = @tail).nil?
2
else
2 + tail.size
end
end
def push(node : ASTNode) : self
if @first.nil?
@first = node
elsif @second.nil?
@second = node
elsif (tail = @tail).nil?
@tail = [node] of ASTNode
else
tail.push(node)
end
self
end
def reject!(& : ASTNode ->) : self
if first = @first
if second = @second
if tail = @tail
tail.reject! { |node| yield node }
end
if yield second
@second = tail.try &.shift?
end
end
if yield first
@first = @second
@second = tail.try &.shift?
end
end
self
end

I pushed a commit to my fork, that way it's running CI immediately so we can confirm this works. oprypin@f967ade?diff=split

Copy link
Member

@straight-shoota straight-shoota Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fun. I believe #each can actually be shortened a bit more:

def each(& : ASTNode ->)
  yield @first || return
  yield @second || return
  @tail.try(&.each { |node| yield node })
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I incorporated this change as well, thanks!

Co-Authored-By: Johannes Müller <[email protected]>
@straight-shoota straight-shoota added this to the 1.15.0 milestone Oct 22, 2024
@straight-shoota straight-shoota merged commit 9d8c2e4 into crystal-lang:master Oct 23, 2024
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize allocation for node dependencies and observers in compiler
5 participants