Include semantics for each instruction #72

jrmuizel · 2017-11-07T03:49:07Z

ARM includes a full machine readable description of the semantics of each instruction in their xml files: https://developer.arm.com/products/architecture/a-profile/exploration-tools https://alastairreid.github.io/alastairreid.github.io/ARM-v8a-xml-release/

I'm not sure if it's practical to get to that level of accuracy/completeness but the more information available the better.

markcharney · 2017-11-07T04:02:44Z

hi. yeah, as you might imagine I get this request quite often. I agree would be useful but it is way beyond anything I am able to do on my own. A few years ago I started Intel’s ZSIM...

markcharney · 2017-11-07T04:06:11Z

...(hit wrong button, silly phone) which became Intel Simulation and Analysis Engine (Intel SAE) with years of work from many other dedicated and talented engineers. But that is not an open source project. And it is C code...

pgoodman · 2017-11-10T19:17:12Z

Shameless plug, but Remill has C++ implementations of instructions that map to XED iforms. Also in progress is AArch64 support, based on the documents referenced by @jrmuizel.

For example, the semantics for ADDSD are:

template <typename D, typename S1, typename S2>
DEF_SEM(ADDSD, D dst, S1 src1, S2 src2) {
  auto lhs = FReadV64(src1);
  auto rhs = FReadV64(src2);
  auto sum = FAdd(FExtractV64(lhs, 0), FExtractV64(rhs, 0));
  auto res = FInsertV64(lhs, 0, sum);
  FWriteV64(dst, res);
  return memory;
}

And those semantics are mapped to XED iforms in the following way:

DEF_ISEL(ADDSD_XMMsd_MEMsd) = ADDSD<V128W, V128, MV64>;
DEF_ISEL(ADDSD_XMMsd_XMMsd) = ADDSD<V128W, V128, V128>;
IF_AVX(DEF_ISEL(VADDSD_XMMdq_XMMdq_MEMq) = ADDSD<VV128W, VV128, MV64>;)
IF_AVX(DEF_ISEL(VADDSD_XMMdq_XMMdq_XMMq) = ADDSD<VV128W, VV128, VV128>;)

This document describes a bit more about the process that Remill follows.

jrmuizel · 2017-11-10T20:44:24Z

Neat. The fact that this description uses templates means that it's probably possible to extract the semantics for use in languages other than C++. I'll take a look.

It looks like https://github.com/StanfordPL/stoke also has a model for a lot of x86 semantics. However it seems like it's built more around a model of querying for an SMT formula for a given x86 instruction and not, for example, being able to generate an interpreter given the semantics.

It would be interesting to use the Remill templates to generate SMT formulas and check them against stoke.

pgoodman · 2017-11-10T20:57:34Z

@jrmuizel We use these Semantics in McSema, and some people run KLEE on McSema-produced bitcode. KLEE converts LLVM IR into SMT queries.

I also use these semantics in a dynamic binary translator. I have an issue filed with the Stoke team to use Remill semantics, because I believe we support more instructions than they do.

One way to get a better understanding of how to use Remill is to see how the test case systems lift instructions. You can see that here and here.

You can find out more about Remill or talk to me (username pag) in the #binary-lifting channel of the Empire Hacking slack.

markcharney closed this as completed Nov 7, 2017

markcharney reopened this Nov 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include semantics for each instruction #72

Include semantics for each instruction #72

jrmuizel commented Nov 7, 2017

markcharney commented Nov 7, 2017 •

edited

Loading

markcharney commented Nov 7, 2017 •

edited

Loading

pgoodman commented Nov 10, 2017 •

edited

Loading

jrmuizel commented Nov 10, 2017

pgoodman commented Nov 10, 2017

Include semantics for each instruction #72

Include semantics for each instruction #72

Comments

jrmuizel commented Nov 7, 2017

markcharney commented Nov 7, 2017 • edited Loading

markcharney commented Nov 7, 2017 • edited Loading

pgoodman commented Nov 10, 2017 • edited Loading

jrmuizel commented Nov 10, 2017

pgoodman commented Nov 10, 2017

markcharney commented Nov 7, 2017 •

edited

Loading

markcharney commented Nov 7, 2017 •

edited

Loading

pgoodman commented Nov 10, 2017 •

edited

Loading