Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include semantics for each instruction #72

Open
jrmuizel opened this issue Nov 7, 2017 · 5 comments
Open

Include semantics for each instruction #72

jrmuizel opened this issue Nov 7, 2017 · 5 comments

Comments

@jrmuizel
Copy link

jrmuizel commented Nov 7, 2017

ARM includes a full machine readable description of the semantics of each instruction in their xml files: https://developer.arm.com/products/architecture/a-profile/exploration-tools https://alastairreid.github.io/alastairreid.github.io/ARM-v8a-xml-release/

I'm not sure if it's practical to get to that level of accuracy/completeness but the more information available the better.

@markcharney
Copy link
Contributor

markcharney commented Nov 7, 2017

hi. yeah, as you might imagine I get this request quite often. I agree would be useful but it is way beyond anything I am able to do on my own. A few years ago I started Intel’s ZSIM...

@markcharney
Copy link
Contributor

markcharney commented Nov 7, 2017

...(hit wrong button, silly phone) which became Intel Simulation and Analysis Engine (Intel SAE) with years of work from many other dedicated and talented engineers. But that is not an open source project. And it is C code...

@pgoodman
Copy link

pgoodman commented Nov 10, 2017

Shameless plug, but Remill has C++ implementations of instructions that map to XED iforms. Also in progress is AArch64 support, based on the documents referenced by @jrmuizel.

For example, the semantics for ADDSD are:

template <typename D, typename S1, typename S2>
DEF_SEM(ADDSD, D dst, S1 src1, S2 src2) {
  auto lhs = FReadV64(src1);
  auto rhs = FReadV64(src2);
  auto sum = FAdd(FExtractV64(lhs, 0), FExtractV64(rhs, 0));
  auto res = FInsertV64(lhs, 0, sum);
  FWriteV64(dst, res);
  return memory;
}

And those semantics are mapped to XED iforms in the following way:

DEF_ISEL(ADDSD_XMMsd_MEMsd) = ADDSD<V128W, V128, MV64>;
DEF_ISEL(ADDSD_XMMsd_XMMsd) = ADDSD<V128W, V128, V128>;
IF_AVX(DEF_ISEL(VADDSD_XMMdq_XMMdq_MEMq) = ADDSD<VV128W, VV128, MV64>;)
IF_AVX(DEF_ISEL(VADDSD_XMMdq_XMMdq_XMMq) = ADDSD<VV128W, VV128, VV128>;)

This document describes a bit more about the process that Remill follows.

@jrmuizel
Copy link
Author

Neat. The fact that this description uses templates means that it's probably possible to extract the semantics for use in languages other than C++. I'll take a look.

It looks like https://github.com/StanfordPL/stoke also has a model for a lot of x86 semantics. However it seems like it's built more around a model of querying for an SMT formula for a given x86 instruction and not, for example, being able to generate an interpreter given the semantics.

It would be interesting to use the Remill templates to generate SMT formulas and check them against stoke.

@pgoodman
Copy link

@jrmuizel We use these Semantics in McSema, and some people run KLEE on McSema-produced bitcode. KLEE converts LLVM IR into SMT queries.

I also use these semantics in a dynamic binary translator. I have an issue filed with the Stoke team to use Remill semantics, because I believe we support more instructions than they do.

One way to get a better understanding of how to use Remill is to see how the test case systems lift instructions. You can see that here and here.

You can find out more about Remill or talk to me (username pag) in the #binary-lifting channel of the Empire Hacking slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants