added functionality to track overlapping accesses #258

sadrabt · 2024-10-14T05:49:07Z

This pull request address the issue of overlapping accesses for DSA as discussed by issue #254

Remaining:

create a test for global overlapping (can use clang/jumptable)
create a test for stack overlapping

l-kent · 2024-10-14T23:17:43Z

I don't think this is working correctly yet.

For indirect_calls/jumptable/clang:BAP, these are the relevant instructions with overlapping stack accesses:

00000413: mem := mem with [R31 + 0x10, el]:u128 <- V0

00000429: R8 := mem[R31 + 0x10, el]:u64

00000438: R8 := mem[R31 + 0x18, el]:u64

The 128 bit access to R31 + 16 overlaps with the 64 bit access to R31 + 16 and the 64 bit access to R31 + 24.

However, the analysis doesn't correctly determine that these overlap. The results it gives are that:

00000429_R8 points to node 438 which contains pointers to add_six and add_two and is collapsed.
00000438_R8 points to node 447 which is empty.

There is also node 450 which contains Stack_16 (of size 16) and Stack_12 (of size 4). It has three cells:

0 points to node 438
8 points to node 447
12 points to node 439 (which is also empty).

This is incorrect - accesses to R31 + 16 and R31 + 12 never overlap, and these offsets don't make any sense.

Stack_12 and Stack_16 do not overlap and the analysis should not find that they do.
00000429_R8 should contain a pointer to add_two (this is already fine)
00000438_R8 should contain a pointer to add_six (this is not correct as-is)

It's alright if they both point to a collapsed node containing pointers both add_two and add_six, but it's even better if we can maintain the offsets if possible?

l-kent · 2024-10-15T02:54:02Z

This seems to be working now so I'll merge it soon.

It would be nice to be able to improve the precision so it can tell that add_two is at offset 0 and add_six is at offset 8 in the combined node and then tell that 00000429_R8 is pointing to add_two, 00000438_R8 is pointing to add_six, and 0000040b_V0 is pointing to both, but that would probably be a fair bit more complex and isn't a must have.

src/main/scala/analysis/data_structure_analysis/Graph.scala

src/main/scala/analysis/data_structure_analysis/LocalPhase.scala

l-kent · 2024-10-15T02:59:07Z

Oh you were planning to write tests weren't you? I'll wait for you do to those before merging.

sadrabt · 2024-10-15T04:23:08Z

// 0x10DC0 contains a pointer to the procedure add_two: separate node with a single cell at the start which points to add_two
// 0x10DC8 contains a pointer to the procedure add_six: separate node with a single cell at the start which points to add_six

once the analysis process
V0 := mem[0x10DC0, el]:u128
it will merge the nodes for 0x10DC0 and 0x10DC8 with offset 8 so that the resulting node will have a cell at offset 0 which corresponds to 0x10DC0 and a cell at offset 8 corresponding to 0x10DC8. Each cell still has a distinct pointee

here currently the analysis will grow the cell corresponding to 0x10DC0 the size of the access. it causes that cell to overlap the cell corresponding to 0x10DC8 and as a result their outgoing edges (their pointees) are also merged. This causes the collapse

an alternative would be to not grow 0x10DC0's cell but instead do a multi load/write.
instead of growing the 0x10DC0's cell, we start from that cell and continue to merge each consecutive cell's pointee with a cell at a different offset in V0's Node. The problem here is that we don't have a way of telling which offsets are the correct ones in V0's node to be merged with the pointees. I think one possibility would be to increment a set number (64 bits or one pointer's length) since we are only concerned with pointers. However, I'm unsure of the full implications of this for soundness

l-kent · 2024-10-15T04:44:48Z

instead of growing the 0x10DC0's cell, we start from that cell and continue to merge each consecutive cell's pointee with a cell at a different offset in V0's Node. The problem here is that we don't have a way of telling which offsets are the correct ones in V0's node to be merged with the pointees.

Can you explain this further? I don't quite follow what the problem is, why don't we have a way of telling which offsets to use?

sadrabt · 2024-10-15T04:51:02Z

after V0 := mem[0x10DC0, el]:u128
we want V0 node to look something like
V0 = {Cell1 == pointee(0x10DC0), cell2 == pointee(0x10DC8)},
I don't think we have a way of determining what offset cell2 begins at because we don't know the size of the pointee(0x10DC0)

l-kent · 2024-10-15T05:03:14Z

We do actually know the sizes. The way it works is that:

add_two has address 0x754
add_six has address 0x768
0x10DC0 is a location that contains 0x754, so it is a pointer to a pointer to add_two
0x10DC8 is a location that contains 0x768, so it is a pointer to a pointer to add_six
These are all pointers which are 64-bit/8-byte values

Because 0x10DC8 is 8 bytes after 0x10DC0 and they are loaded together, we know that the offset for cell 2 is 8 just from that.

sadrabt · 2024-10-15T05:21:07Z

we would know the size if the pointee is a single (not merged with anything else) global object (possibly single stack objects aswell), because we can look it up in the symbol table and know its size and also know the object isn't merged with something else so we can rely on our known size. But I don't think we can tell the size in general. if pointee is not global or stack we won't be able to tell it's size, same thing follows if a stack or global object has been merged with some other nontrivial (not a place holder for SSA variables) object. But you are right we are able to get more precision in certain cases

l-kent · 2024-10-15T05:51:02Z

We know the size because there is an object at 0x10DC0 and an object at 0x10DC8, and we are accessing both via a 128-bit access to 0x10DC0. 0x10DC8 is offset from 0x10DC0 by 8 bytes, so cell2 should also have an offset of 8 bytes. This applies broadly, whether we are accessing global data, or the stack, or a heap location. We don't need to know anything about the symbol table for this (except that those objects exist at those addresses in this example).

Of course, if those objects have already been merged with anything else that has caused their nodes to collapse then you can't possibly do anything except collapse these together too, but that's fine.

If you don't know the sizes, you can't even know that there are overlapping accesses to begin with.

sadrabt · 2024-10-15T06:31:55Z

You are so correct. OF COURSE we have sizes of the pointees because we use them to determine if there is an overlapping access. Thank you for clarifying that. My symbol table to global objects (Data Structure Nodes) conversion step is missing the middle indirection level (cell for 754 pointing to add_two) from 0x10DC0 to 754 to add_two. I'll fix that next and then should be able to get more precise overlapping accesses implemented.

l-kent · 2024-10-15T06:35:37Z

Thanks, that makes sense. I agree that we need to add that extra indirection layer that isn't in there at present.

l-kent · 2024-10-21T01:19:52Z

You accidentally overwrote some of your previous fixes with this force push

sadrabt added 2 commits October 15, 2024 09:47

added functionality to track overlapping accesses

45fc3f6

fixed issue with outgoing cells in local phase

159c5d9

sadrabt force-pushed the points_to_analysis_overlapping_access branch from 4751106 to 159c5d9 Compare October 15, 2024 00:32

fixed bug with not growing cell size if the cell is stack/global

a998590

l-kent reviewed Oct 15, 2024

View reviewed changes

src/main/scala/analysis/data_structure_analysis/Graph.scala Outdated Show resolved Hide resolved

l-kent reviewed Oct 15, 2024

View reviewed changes

src/main/scala/analysis/data_structure_analysis/Graph.scala Outdated Show resolved Hide resolved

l-kent reviewed Oct 15, 2024

View reviewed changes

src/main/scala/analysis/data_structure_analysis/LocalPhase.scala Outdated Show resolved Hide resolved

minor fixes

0185072

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

sadrabt force-pushed the points_to_analysis_overlapping_access branch from 0185072 to 879508a Compare October 21, 2024 01:11

sadrabt force-pushed the points_to_analysis_overlapping_access branch from 879508a to 0185072 Compare October 21, 2024 02:20

sadrabt added 3 commits October 21, 2024 13:43

new dsa test programs

cf16bff

added a layer of indirection for global function entries

58c68c7

improved precision when handling overlapping accesses

586be74

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added functionality to track overlapping accesses #258

added functionality to track overlapping accesses #258

sadrabt commented Oct 14, 2024

l-kent commented Oct 14, 2024

l-kent commented Oct 15, 2024

l-kent commented Oct 15, 2024

sadrabt commented Oct 15, 2024

This comment was marked as outdated.

This comment was marked as resolved.

l-kent commented Oct 15, 2024

sadrabt commented Oct 15, 2024 •

edited

Loading

l-kent commented Oct 15, 2024

sadrabt commented Oct 15, 2024

l-kent commented Oct 15, 2024 •

edited

Loading

sadrabt commented Oct 15, 2024 •

edited

Loading

l-kent commented Oct 15, 2024

l-kent commented Oct 21, 2024

added functionality to track overlapping accesses #258

Are you sure you want to change the base?

added functionality to track overlapping accesses #258

Conversation

sadrabt commented Oct 14, 2024

l-kent commented Oct 14, 2024

l-kent commented Oct 15, 2024

l-kent commented Oct 15, 2024

sadrabt commented Oct 15, 2024

This comment was marked as outdated.

This comment was marked as resolved.

l-kent commented Oct 15, 2024

sadrabt commented Oct 15, 2024 • edited Loading

l-kent commented Oct 15, 2024

sadrabt commented Oct 15, 2024

l-kent commented Oct 15, 2024 • edited Loading

sadrabt commented Oct 15, 2024 • edited Loading

l-kent commented Oct 15, 2024

l-kent commented Oct 21, 2024

sadrabt commented Oct 15, 2024 •

edited

Loading

l-kent commented Oct 15, 2024 •

edited

Loading

sadrabt commented Oct 15, 2024 •

edited

Loading