Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use QEMU version 3.1.0 #210

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Use QEMU version 3.1.0 #210

wants to merge 4 commits into from

Conversation

Hamled
Copy link

@Hamled Hamled commented Mar 25, 2019

Summary

This PR updates Qira's build scripts to support building QEMU version 3.1.0 for that tracer.

Notes

Warning: Until the related QEMU patch PR is merged in, this will break the QEMU build script.

The tracers/qemu_build.sh script assumes that the appropriately patched QEMU source code can be found on GitHub at geohot/qemu, branch v3.1.0-qira.

@Hamled
Copy link
Author

Hamled commented Mar 25, 2019

The Travis CI build is failing for this PR: https://travis-ci.org/geohot/qira/builds/511206681

I thought it would be due to the build script trying to clone a non-existent branch from geohot/qemu, but it appears that the current failure is because it cannot install the libcapstone-dev package that is required for QEMU 3.1.0.

I'm not familiar enough with the Travis CI setup to know for sure, but my best guess is that libcapstone-dev was added after 14.04 and perhaps that's what the CI is configured to use? I didn't see anything in the Travis config file that was specifying a version.

@Hamled
Copy link
Author

Hamled commented Mar 25, 2019

Okay now the Travis CI tests are failing for the Right Reasons.

@geohot
Copy link
Owner

geohot commented Mar 29, 2019

Ok, QEMU upstreamed. Before I merge this, we should add a correctness check that both QEMU produce the same log for qira_tests/bin/loop. I will try to do it tonight, but if you beat me to it that's great :)

@Hamled
Copy link
Author

Hamled commented Mar 29, 2019

I'll definitely check that out, however I don't think I'll have time until next week.

Please feel free to check and merge if things are working!

@Hamled
Copy link
Author

Hamled commented Apr 7, 2019

Okay, so I think I understand what you've asked me to check regarding the logs. Please let me know if I'm incorrect.

What I've done is run the Docker build script on the current master branch and then the same for this PR's branch.

Since the Dockerfile involves running the tests after everything is built, I was able to copy out the log files those tests generate (I believe it's just one trace for the qira_tests/bin/loop executable).

Comparing the generated files from both branches, there are some differences:

  • Trivial differences in _base and _strace
  • Probably trivial differences in _env
  • Several differences in the binary file with no suffix (I assume this is the QEMU tracer output).

Due to my inexperience with Qira and the fact that the trace file is a binary format, I don't have any sense yet whether those differences are meaningful or just vary with every trace.

Here are the two log directories for your inspection: https://www.dropbox.com/sh/s69taki0lzgg27t/AACCggiu5Ynr-9Q-vkx-c8eLa?dl=0

@geohot
Copy link
Owner

geohot commented Jun 17, 2019

I want to merge this, but I'm concerned about regressions. If someone diffs the binary file and makes sure the differences aren't important, I'll merge.

@Hamled
Copy link
Author

Hamled commented Jun 17, 2019

I've got some time again this week. I'll dig more into what the binary files are and see if I can reach a conclusion about the differences.

@Hamled
Copy link
Author

Hamled commented Jul 24, 2019

I've been working on this over the weekend and into this week, and I just wanted to give a progress update. My suspicion is that the differences in the QEMU trace file are not important, but I'm continuing to dig into it more so I can verify that.

I believe they're not likely to be important based on creating a much simpler version of the loop program used for testing, which uses the write syscall directly and doesn't allocate any heap or stack memory.

Comparing multiple runs of that program using QEMU 3.1.0 and 2.5.1, the only differences in the trace file are a couple of spots where the changes for a given changelist are re-ordered because the TCG ops resulting from the x86 instruction inc eax have also been re-ordered between the versions.

My hope is that this is also the case for the dynamically linked loop program using printf, but I'm still investigating my debug output (based on the original debug lines you have in the QEMU patch). I need to do additional filtering to separate out all of the changes and changelist log lines that are from libc, which comprise > 99.9% of the instructions executed, so I can at least confirm whether the executable's resulting changes are the same.

@Hamled
Copy link
Author

Hamled commented Jul 25, 2019

After filtering out all of the instructions from library code in the QEMU debug logs I'm working from, there are three categories of differences I see between 2.5.1 and 3.1.0:

  1. The leave instruction at 0x400565 (changelist 129) has slightly different changes, because the TCG ops are different now. It looks like they maybe optimized it a bit to get rid of a temporary variable.
  2. The stack pointer is consistently at a different location when _start is called by the interpreter. It's the same between all runs for the same version of QEMU, but it's always at an address 16 bytes higher in version 2.5.1. This could be the result of something else being pushed onto the stack, but it feels more likely to be due to aligning the stack different at the beginning of execution.
    This causes a lot of noise in terms of changes, since everything read/writing to the stack is then using a different address, but I don't think it's an important difference.
  3. There are a small number of spots where some CPU flag has a 1 read in some runs and a 0 in others. This is not consistent across runs for the same version (although two runs in 3.1.0 happened to have no such differences, and so have exactly the same trace file except for PID). The most of these that I've seen in my runs is 8.

I'll be digging more into that final case to understand better what's going on, but so far everything is confirming that this patch is consistent with the results of the 2.5.1 patch.

Please let me know if there are additional test cases to investigate, since I figure this loop code is too simple to exercise all the things that might have changed to cause regressions.

@Hamled
Copy link
Author

Hamled commented Jul 25, 2019

Quick update:

  • The stack setup by QEMU's elf loader has some padding in it for alignment purposes since this commit which got merged in 2.9.0. This can be confirmed in the diff of the _env files (there's additional zero bytes between the platform name and the 16 bytes of random data, to get the random bytes on a 16-byte alignment, and again before auxiliary, environment, and args vectors are put on the stack).
  • The value being read that is different between various runs of the program on the same version, is the cc_src2 member of CPUX86State which is some emulator-internal variable, for TCG's custom condition code implementation.
    I doubt that we should even be tracking this in the trace file, because it is unlikely to mean anything for the program's analysis, and at best constitutes noise.

I see that there are some conflicts now, I will rebase this branch to correct those, should you wish to merge this in.

Hamled added 4 commits July 24, 2019 21:55
QEMU now uses Capstone for its disassembly output, so that dependency
has been added as well.
QEMU version 3.1.0 requires Capstone for disassembly and Ubuntu 14.04
does not have the libcapstone-dev package.
@janbbeck
Copy link
Contributor

would you be willing to help port qemu 4.1 stable to qira?

@Hamled
Copy link
Author

Hamled commented Jan 29, 2020

would you be willing to help port qemu 4.1 stable to qira?

I'd be interested for sure. Going through all the stuff to verify (as much as possible, at least) the port to 3.1 helped me learn a lot more about how qemu works in user mode.

I probably won't be able to dedicate time for the next week or two, but I'd be happy to check out whatever you're working on!

@janbbeck
Copy link
Contributor

Well, I just patched in your changes from 3.1 into 4.1 and it compiled ok and qira runs without errors, but the browser shows all fields empty. Have you seen this/ any ideas?

@janbbeck
Copy link
Contributor

I should clarify. When running qira /bin/ls the terminal window also shows no output - i.e the listing is not there.

@janbbeck
Copy link
Contributor

Update: Careful melding of your changes with v4.0.0-rc0 did work fine.

@janbbeck
Copy link
Contributor

janbbeck commented Jan 29, 2020 via email

@Hamled
Copy link
Author

Hamled commented Jan 29, 2020

Yes I would be interested in your 4.0.0 port of the qira patch for qemu.

I'll also check out what they've done to implement the plugin system. After doing the 3.1 port, I was left with a distinct impression that a long-term improvement to qira would probably benefit from a re-architecture of the qemu integration.

Specifically, it would most likely be better to separate out the qemu patch/plugin entirely, and have a well-defined format for the trace log output, which could then be loaded by qira.

Porting the patch to 4.1+ and its new plugin system might be the ideal time to make such a change to qira's design.

As for alternatives, the only thing I've seen so far (outside of some academic work) is a project called rr, from Mozilla. It only works on Linux x86_64 (because it uses some hardware- and OS-specific features to achieve deterministic replay), but that might work for your use case. Check it out: https://rr-project.org/

It's primarily designed for debugging use by developers, rather than reverse engineering, but I've been wanting to see if it could be integrated into an RE tool. I was planning to focus on Binary Ninja and maybe REDasm.

@janbbeck
Copy link
Contributor

Ok, figured it out:
https://github.com/janbbeck/qemu/tree/v4.0.0-qira

I did play with rr a while back. It's nice, but does rely on ptrace and is thus similarly vulnerable to ptrace based protection.

@geohot geohot mentioned this pull request Mar 23, 2020
@janbbeck
Copy link
Contributor

janbbeck commented Mar 23, 2020

Hamled, can you explain in more detail what you did in terms of binary regression testing?

edit: typo

@Hamled
Copy link
Author

Hamled commented Mar 30, 2020

@janbbeck It's been quite a bit since I've thought about the regression testing for the binary trace format for qira, but here's what I remember so far of my work:

There's two parts to what I did:

  • First was to create a minimal, statically linked binary that basically does the same thing as the code in loop.c, but with no code from libc.

    That code is here, it's based on linux-syscall-support (which is added to that branch in the parent commit). This means while it will compile for any platform that LSS supports, it only works on Linux.

    Honestly the simple assembly tests could also suffice for this purpose, the whole idea is to have a trace that is as small and as deterministic as possible with qira, through whatever means.

    This is important because the next step is to...

  • Manually compare the same trace as generated by qira/qemu-2.12 and qira/qemu-3.1.0. Comparing the binary trace files is doable but a huge pain, so I re-enabled, updated, and extended the debug logging code that @geohot had put into his qemu patch.

    The code changes to support this happened in both the qira project and the qemu fork:

    The summary of those changes is to fix compiler errors related to the format string portability issues in the original debug logs, make QIRA_DEBUG use the qemu logging system (this may be new since 2.12 idk), and update qira's qemu driver to pass the necessary command-line options to enable that logging.

    After that, I had some human-readable debug logs to actually compare. There were a lot of differences, which after quite a bit of diff-fu using Beyond Compare, led me to assign "blame" for each of the differences to particular changes in qemu's emulator implementation.

    Unfortunately that part is where my memory of the specifics is pretty hazy. All I can say is that Beyond Compare's ability to re-align diffs, filter out most of the lines, and other fanciness proved crucial to being able to get more than just noise out of it (other diff tools might support this, idk).

If you want to go through the above steps to generate a very minimal debug log for a qira/qemu-4.0.0 build and the same for qira/qemu-2.12, I can try to look at them again in Beyond Compare and let you know if any particular tricks come back to me.

@Hamled
Copy link
Author

Hamled commented Mar 30, 2020

In case you should find it useful, I've uploaded the log files I actually used for my analysis: https://github.com/Hamled/qira/blob/qemu-3.1.0-debug-logs/logs/qemu-trace-logs.tar.gz

The comment for the commit adding that file explains its contents, but I'll paste it because it's in markdown format and will look nicer here:
Qemu trace and debug logs comparing qira patches

These are log files I used to compare the qemu traces and qira-related
debug logs to determine if there were any regressions in the binary
trace logs generated, since they were different between the patch for
qemu 2.12 and qemu 3.1.0.

The logs are committed as an archive, because in total they're over 600
MB uncompressed. The archive has the following structure:

The root has directories for each test case run: loop and minloop.
Loop is the standard loop test from test_auto/source-autogen/loop.c.
Minloop is the same as loop, but without using anything from libc to
achieve deterministic execution (as much as possible with qemu).

Within each of those is a directory for the platform used, either 16.04
or local. "16.04" is the docker container running on Ubuntu 16.04, and
"local" is when qemu was just on my local machine, running Arch Linux
updated to whatever was current in July of 2019.

Within each platform directory is the directory for the qemu version,
2.5.1 or 3.1.0, and within there are directories for traces from each
run of the binary.

Each trace directory has the following files:

  • 0 - the binary trace file
  • 0_base - The contents of /proc/self/maps for qemu when the trace
    was run. The program's maps are in there, but it clearly also has maps
    for things that only qemu uses (or are mapped in by qemu for some
    reason).
  • 0_env - Contents of the stack for the binary being traced, just
    before the entry point is executed.
  • 0_qemu - Debug log from qemu, including TCG-related logging about
    each operation executed, as well as qira-specific logging for each
    change that is included in the binary trace file, etc.
  • 0_qemu_chng - Same as above but filtered to only include the logs
    lines that correspond to data which is output into the binary trace
    file (e.g. setting changelists and recording of changes within each
    changelist).
  • 0_strace - strace log for the binary being traced

The traces for the loop test also include 0_qemu_proc and
0_qemu_proc_chng, which are the same logs, but filtered to not include
log lines for instructions executed by library code. This cuts down on
the noise significantly, but for your own sanity if you're reviewing
these files, the minloop test traces are vastly smaller.

There are also a couple of other random files:

  • loop/16.04/qemu-2.5.1/ld-2.23.so - Cannot recall why this is here
    anymore, maybe I needed a specific version of ld inside of the docker
    container?
  • filter_proc_changelists.rb - Ruby script to filter the log files to
    do the filtering out of library code mentioned above.

@Hamled
Copy link
Author

Hamled commented Mar 30, 2020

As an example for why the minloop test is useful, if you compare the changes-only log output from multiple traces on the same version and platform, there are zero differences.

This deterministic basis I think is necessary for then comparing the diffs of runs from the two versions to identify changes that are only due to differences in qemu's TCG implementation.

This doesn't apply to the full log, at least the ones I have, because of changes in where the memory is mapped due to ASLR. While qemu's user mode doesn't actually implement ASLR for the binary it is emulating, a lot of qira's debug logging (like read/write logs) include addresses that are in the "host" address space.

I dunno why I didn't think of it at the time, but probably if you ran qemu with ASLR turned off, these would produce logs that were also exactly the same between multiple runs? It's probably worth doing either way.

The standard loop test case has changes being recorded which are just straight-up different in multiple runs from the same version, sometimes different data is being written than other times. I can't say for sure, but it might just be a result of how complicated the printf code is, like maybe it's doing heap allocations and then malloc and free have to walk a data structure which isn't exactly the same on each run? Dunno.

@janbbeck
Copy link
Contributor

janbbeck commented Apr 3, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants