Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added functionality to track overlapping accesses #258

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

sadrabt
Copy link
Contributor

@sadrabt sadrabt commented Oct 14, 2024

This pull request address the issue of overlapping accesses for DSA as discussed by issue #254

Remaining:

  • create a test for global overlapping (can use clang/jumptable)
  • create a test for stack overlapping

@l-kent
Copy link
Contributor

l-kent commented Oct 14, 2024

I don't think this is working correctly yet.

For indirect_calls/jumptable/clang:BAP, these are the relevant instructions with overlapping stack accesses:

00000413: mem := mem with [R31 + 0x10, el]:u128 <- V0

00000429: R8 := mem[R31 + 0x10, el]:u64

00000438: R8 := mem[R31 + 0x18, el]:u64

The 128 bit access to R31 + 16 overlaps with the 64 bit access to R31 + 16 and the 64 bit access to R31 + 24.

However, the analysis doesn't correctly determine that these overlap. The results it gives are that:

  • 00000429_R8 points to node 438 which contains pointers to add_six and add_two and is collapsed.
  • 00000438_R8 points to node 447 which is empty.

There is also node 450 which contains Stack_16 (of size 16) and Stack_12 (of size 4). It has three cells:

  • 0 points to node 438
  • 8 points to node 447
  • 12 points to node 439 (which is also empty).

This is incorrect - accesses to R31 + 16 and R31 + 12 never overlap, and these offsets don't make any sense.

Stack_12 and Stack_16 do not overlap and the analysis should not find that they do.
00000429_R8 should contain a pointer to add_two (this is already fine)
00000438_R8 should contain a pointer to add_six (this is not correct as-is)

It's alright if they both point to a collapsed node containing pointers both add_two and add_six, but it's even better if we can maintain the offsets if possible?

@sadrabt sadrabt force-pushed the points_to_analysis_overlapping_access branch from 4751106 to 159c5d9 Compare October 15, 2024 00:32
@l-kent
Copy link
Contributor

l-kent commented Oct 15, 2024

This seems to be working now so I'll merge it soon.

It would be nice to be able to improve the precision so it can tell that add_two is at offset 0 and add_six is at offset 8 in the combined node and then tell that 00000429_R8 is pointing to add_two, 00000438_R8 is pointing to add_six, and 0000040b_V0 is pointing to both, but that would probably be a fair bit more complex and isn't a must have.

@l-kent
Copy link
Contributor

l-kent commented Oct 15, 2024

Oh you were planning to write tests weren't you? I'll wait for you do to those before merging.

@sadrabt
Copy link
Contributor Author

sadrabt commented Oct 15, 2024

// 0x10DC0 contains a pointer to the procedure add_two: separate node with a single cell at the start which points to add_two
// 0x10DC8 contains a pointer to the procedure add_six: separate node with a single cell at the start which points to add_six

once the analysis process
V0 := mem[0x10DC0, el]:u128
it will merge the nodes for 0x10DC0 and 0x10DC8 with offset 8 so that the resulting node will have a cell at offset 0 which corresponds to 0x10DC0 and a cell at offset 8 corresponding to 0x10DC8. Each cell still has a distinct pointee

here currently the analysis will grow the cell corresponding to 0x10DC0 the size of the access. it causes that cell to overlap the cell corresponding to 0x10DC8 and as a result their outgoing edges (their pointees) are also merged. This causes the collapse

an alternative would be to not grow 0x10DC0's cell but instead do a multi load/write.
instead of growing the 0x10DC0's cell, we start from that cell and continue to merge each consecutive cell's pointee with a cell at a different offset in V0's Node. The problem here is that we don't have a way of telling which offsets are the correct ones in V0's node to be merged with the pointees. I think one possibility would be to increment a set number (64 bits or one pointer's length) since we are only concerned with pointers. However, I'm unsure of the full implications of this for soundness

@l-kent

This comment was marked as outdated.

@l-kent

This comment was marked as resolved.

@l-kent
Copy link
Contributor

l-kent commented Oct 15, 2024

instead of growing the 0x10DC0's cell, we start from that cell and continue to merge each consecutive cell's pointee with a cell at a different offset in V0's Node. The problem here is that we don't have a way of telling which offsets are the correct ones in V0's node to be merged with the pointees.

Can you explain this further? I don't quite follow what the problem is, why don't we have a way of telling which offsets to use?

@sadrabt
Copy link
Contributor Author

sadrabt commented Oct 15, 2024

after V0 := mem[0x10DC0, el]:u128
we want V0 node to look something like
V0 = {Cell1 == pointee(0x10DC0), cell2 == pointee(0x10DC8)},
I don't think we have a way of determining what offset cell2 begins at because we don't know the size of the pointee(0x10DC0)

@l-kent
Copy link
Contributor

l-kent commented Oct 15, 2024

We do actually know the sizes. The way it works is that:

  • add_two has address 0x754
  • add_six has address 0x768
  • 0x10DC0 is a location that contains 0x754, so it is a pointer to a pointer to add_two
  • 0x10DC8 is a location that contains 0x768, so it is a pointer to a pointer to add_six
  • These are all pointers which are 64-bit/8-byte values

Because 0x10DC8 is 8 bytes after 0x10DC0 and they are loaded together, we know that the offset for cell 2 is 8 just from that.

@sadrabt
Copy link
Contributor Author

sadrabt commented Oct 15, 2024

we would know the size if the pointee is a single (not merged with anything else) global object (possibly single stack objects aswell), because we can look it up in the symbol table and know its size and also know the object isn't merged with something else so we can rely on our known size. But I don't think we can tell the size in general. if pointee is not global or stack we won't be able to tell it's size, same thing follows if a stack or global object has been merged with some other nontrivial (not a place holder for SSA variables) object. But you are right we are able to get more precision in certain cases

@l-kent
Copy link
Contributor

l-kent commented Oct 15, 2024

We know the size because there is an object at 0x10DC0 and an object at 0x10DC8, and we are accessing both via a 128-bit access to 0x10DC0. 0x10DC8 is offset from 0x10DC0 by 8 bytes, so cell2 should also have an offset of 8 bytes. This applies broadly, whether we are accessing global data, or the stack, or a heap location. We don't need to know anything about the symbol table for this (except that those objects exist at those addresses in this example).

Of course, if those objects have already been merged with anything else that has caused their nodes to collapse then you can't possibly do anything except collapse these together too, but that's fine.

If you don't know the sizes, you can't even know that there are overlapping accesses to begin with.

@sadrabt
Copy link
Contributor Author

sadrabt commented Oct 15, 2024

You are so correct. OF COURSE we have sizes of the pointees because we use them to determine if there is an overlapping access. Thank you for clarifying that. My symbol table to global objects (Data Structure Nodes) conversion step is missing the middle indirection level (cell for 754 pointing to add_two) from 0x10DC0 to 754 to add_two. I'll fix that next and then should be able to get more precise overlapping accesses implemented.

@l-kent
Copy link
Contributor

l-kent commented Oct 15, 2024

Thanks, that makes sense. I agree that we need to add that extra indirection layer that isn't in there at present.

@sadrabt sadrabt force-pushed the points_to_analysis_overlapping_access branch from 0185072 to 879508a Compare October 21, 2024 01:11
@l-kent
Copy link
Contributor

l-kent commented Oct 21, 2024

You accidentally overwrote some of your previous fixes with this force push

@sadrabt sadrabt force-pushed the points_to_analysis_overlapping_access branch from 879508a to 0185072 Compare October 21, 2024 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants