Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decompiler: fix indexing into arrays #6722

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kkots
Copy link

@kkots kkots commented Jul 15, 2024

Ghidra decompiler is sometimes struggling to show submember array access.
For example, given the following structs:

enum MyEnum {
    ELEMENT_1 = 0,
    ELEMENT_2 = 0xf1
};
struct MyStruct {
    int garbage;
    MyEnum ar[4];
};

and the following source code:

int main()
{
        // mem is a MyStruct*
	...
        int index = getc(stdin) - '0';
	mem->ar[index] = 0xf1;
        ...
}

that compiled into the following 32-bit x86 code:

// eax coming from getc call
        00414e41 83 e8 30        SUB        EAX,0x30
        00414e50 c7 44 81        MOV        dword ptr [ECX + EAX*0x4 + 0x4],0xf1
                 04 f1 00 
                 00 00

the decompiled output produced would be this:

// PTR_0041a1b8 is the 'mem' from source code, a MyStruct*
// local_18 is coming from a getc call
local_18 = local_18 + -0x30;
*(undefined4 *)((int)PTR_0041a1b8 + local_18 * 4 + 4) = 0xf1;

With this fix it should produce:

local_18 = local_18 + -0x30;
PTR_0041a1b8->ar[local_18] = New_Name_(1);

This is caused by RulePtrArith traveling through a CPUI_INT_MULT and latching onto a CPUI_INT_ADD operation happening on the line above the array access, which makes it distribute the members of the addition through the *4 multiplication, producing a local_18 * 4 and a -0x30 * 4 + 4, which doesn't lead to any member of the struct, and then it just gives up. This is fixed by letting it undo the 'distribute' that it did and try again in case if it didn't find the appropriate struct member.

EDIT: Another example

        00dd9359 8b b1 b0        MOV        ESI,dword ptr [ECX + 0xb0]
                 00 00 00

...
        00dd9368 2b c6           SUB        EAX,ESI

...
        00dd9379 8b 84 81        MOV        EAX,dword ptr [ECX + EAX*0x4 + 0x32c]
                 2c 03 00 00

It becomes a tree:

MOV EAX, CPUI_INT_ADD(
    ptr,
    CPUI_INT_ADD(
        CPUI_INT_MULT(
            CPUI_INT_ADD(
                CPUI_LOAD(...),
                CPUI_INT_MULT(
                    CPUI_LOAD(...),
                    -1
                )
            ),
            4
        ),
        0x32c
    )
)

When Ghidra sees the -1, it checks if 0xfffffffc is greater than the structure's size, which it is, and then short-circuits the whole expression tree as invalid and RulePtrArith just doesn't get applied.
By setting preventDistribution to true, this behavior can be stopped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Decompiler Status: Triage Information is being gathered
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants