Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from Capstone to Zydis (x86 architecture) #4832

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

tushar3q34
Copy link
Contributor

@tushar3q34 tushar3q34 commented Jan 10, 2025

Your checklist for this pull request

  • I've read the guidelines for contributing to this repository
  • I made sure to follow the project's coding style
  • I've documented or updated the documentation of every function and struct this PR changes. If not so I've explained why.
  • I've added tests that prove my fix is effective or that my feature works (if possible)
  • I've updated the rizin book with the relevant information (if needed)

Detailed description

  • Used ZydisDecodedInstruction and ZydisDecodedOperand instead of cs_x86 and cs_x86_op
  • Changed ESIL, RzIL and asm for x86 architecture
  • Currently, this PR fails ~300 tests due to different formatting and some x86-16 specific instructions

TODO in further commits :

  • Change Tests/code to account for differences in outputs presented by Capstone and Zydis
  • Find a workaround for x86-16 specific instructions (lcall,ljump etc)
  • Document functions of analysis_x86 and asm_x86 files

Test plan

...

Closing issues

closes #4709

...

@tushar3q34
Copy link
Contributor Author

tushar3q34 commented Jan 10, 2025

Changes in the tests/code :

  • Extra spaces, extra keywords (qword,dword) in disassembly. For the keywords part, I think there are some formatting options in Zydis so I will try looking at that
  • Changes in ESIL outputs
  • Handle ljmp and lcall. Zydis only uses call and jmp with deciding whether to far jump/call or not depending on operands. So I will look into it and handle the code and necessary testcases in further commits.
  • Handle warnings

@wargio
Copy link
Member

wargio commented Jan 10, 2025

Very well done.

@wargio
Copy link
Member

wargio commented Jan 12, 2025

Please fix the compilation issues. see the CI jobs. also run clang format (via the python script)

@tushar3q34
Copy link
Contributor Author

Please fix the compilation issues. see the CI jobs. also run clang format (via the python script)

Yes, I did not get time yesterday. I will do it asap.

Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice job! Looks good so far.

Regarding the renaming:
I think it is better if you add a file which maps ZYDIS_ enums to the X86_ enums. Simply for readability and less changes.

Building:
Capstone can be built without any x86 code. You just have to remove -DCAPSTONE_HAS_X86 in subprojects/packagefiles/capstone-*/meson.build. This also makes the binary way smaller.

cc @wargio

directory = zydis
patch-directory = zydis
depth = 1
revision = master
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any big change in the master branch? Otherwise sticking to a released package (and using wrap-file) is preferred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on this one :)
Did we agreed on something and I just forgot it?

@tushar3q34
Copy link
Contributor Author

Hello, I am working on these and I will commit the changes soon.
Also, in one of the jobs
subprojects/zydis/meson.build:1:0: ERROR: Got unknown keyword arguments "license_files"
This is because it uses meson==0.61.5 while the requirement for Zydis is meson_version >= 1.3
One of the solutions could be rewriting the build file of Zydis using a patch to support >=0.59 versions. Is there any better way to handle this?

@Rot127
Copy link
Member

Rot127 commented Jan 12, 2025

One of the solutions could be rewriting the build file of Zydis using a patch to support >=0.59 versions. Is there any better way to handle this?

Ouh, nice. Didn't know it uses Meson as well. Why is subprojects/packagefiles/zydis/meson.build needed then?
Doesn't it overwrite the original file?

cc @wargio For what to do about updating meson. But bumping the version seems fine to me. I build with 1.4.0 normally.

@Rot127
Copy link
Member

Rot127 commented Jan 12, 2025

Seems like we need the old meson version for TinyCC and old debian builds. So it must be the patch then :(

@tushar3q34
Copy link
Contributor Author

Doesn't it overwrite the original file?

It customises the build process, doesn't overwrite it.

So it must be the patch then

Okay, i will look into this. There are a few things that old version of meson doesn't have like enable_auto_if() and c_static_args etc as an argument in library. So I will patch them using .diff files while cloning using wrap-git or wrap-file

@wargio
Copy link
Member

wargio commented Jan 13, 2025

So I will patch them using .diff files while cloning using wrap-git or wrap-file

dont diff patch. just create a folder with the patched files and define it in the wrap file. this will overwrite the files in the pulled folder.

@tushar3q34
Copy link
Contributor Author

The latest version of Zydis (4.1.0) does not support meson build system so we cannot use a wrap-file.
The older versions support CMake and it is possible to run CMake command using meson (which will also not cause the issue with meson version ) but some of the CI jobs do not have CMake.
Since we cannot use wrap-file, we cannot directly patch files using a folder as wrap-git doesn't support it. What should be done? @wargio

@wargio
Copy link
Member

wargio commented Jan 13, 2025

so we cannot use a wrap-file.

capstone does not support meson neither but we use meson anyway. just create a meson.build file in the patch dir and add the sources and defines.

Examples:

@tushar3q34 tushar3q34 force-pushed the dist-capstone-to-zydis-x86 branch 2 times, most recently from 8854385 to ed87201 Compare January 14, 2025 23:27
@tushar3q34
Copy link
Contributor Author

@wargio So the current issue with the builds is that there is no Zycore dependency installed in Zydis by default. To do that, we would have to build Zycore inside Zydis using another meson.build file.
A better way would be to use amalgamated form of Zydis. So we only need to compile one file, Zydis.c using only one header Zydis.h. A disadvantage to that will be that we will not have a well defined structure of the subproject.
Which one do you suggest?

@wargio
Copy link
Member

wargio commented Jan 16, 2025

Why not just add a folder with the wrap file of the dependency within the patch folder of zydis? Or if available just use a source which contains all the deps needed. This is quite common

@Rot127
Copy link
Member

Rot127 commented Jan 16, 2025

Agree with @wargio. Meson has (luckily) the ability to make subprojects depend on subprojects.

@tushar3q34 tushar3q34 force-pushed the dist-capstone-to-zydis-x86 branch from ed87201 to 75ecbfd Compare January 21, 2025 16:54
@tushar3q34
Copy link
Contributor Author

tushar3q34 commented Jan 21, 2025

Hello. So I tried building zycore inside zydis but I faced the following issues:

  • When I try to get zycore from zydis as a subproject, it gets redirected as wrap-redirect and zycore is created outside. Hence we need to git clean subproject every time before rebuilding it.
  • Since zycore also uses meson ver >= 1.3, we have to write patch for it too and writing a patch in the subprojects directory of the patch of zydis does not work due to the wrap-redirect (it cannot identify the location of build file, according to meson docs, subproject of a subproject is always built outside and then transferred inside).

I think there must definitely be a solution but I cannot seem to get it due to my inexperience in meson build system. So I would really like some help in this direction. As of now, I have pushed the changes with amalgamated form of zydis.

In the meanwhile, I resolved ~150 tests that were mostly formatting related. There are still ~70 tests that I need to review and debug code accordingly. I'm working on it. Thanks.

Edit : I think I merged dev into my branch as well. Should I remove it?
After merging from dev, more tests fail. I will change tests and debug code accordingly.
For now, opex, x86_16 and RzIL code has to be fixed.

@tushar3q34 tushar3q34 force-pushed the dist-capstone-to-zydis-x86 branch from 3e72d6c to 187ebe4 Compare January 22, 2025 06:44
@tushar3q34 tushar3q34 force-pushed the dist-capstone-to-zydis-x86 branch 4 times, most recently from 134a850 to 2702f06 Compare February 20, 2025 13:47
@tushar3q34 tushar3q34 requested review from wargio and Rot127 February 20, 2025 15:22
@tushar3q34
Copy link
Contributor Author

tushar3q34 commented Feb 20, 2025

@wargio, @notxvilka, @Rot127 I have resolved the tests. Please review the changes.

Should I also refactor analysis_x86_zydis.c file? It had a context struct X86CSContext (now X86ZYDISContext) which was implemented but not used. As of now, I'm passing operands, instructions and pc value separately as parameters to all the functions while passing only the context object can be sufficient.

I will also remove the unused code asap.

76 * pointers
66 * pointers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check what exactly is different? you can use the fl commands and diff this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fl result remains unchanged

0x00400730 f3c3 repz ret
0x00400730 f3c3 ret
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, interesting change.

Copy link
Member

@wargio wargio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, so many changes, great work. most of them looks good. i will take some days to review everything.

Could you do us a favour and try to use rz-ghidra and jsdec on various x86 binaries just to see if they crashes?

@wargio
Copy link
Member

wargio commented Feb 20, 2025

Yes, you can refactor that file.

@tushar3q34
Copy link
Contributor Author

Could you do us a favour and try to use rz-ghidra and jsdec on various x86 binaries just to see if they crashes?

Yes, I will check.

@tushar3q34 tushar3q34 mentioned this pull request Feb 20, 2025
5 tasks
@notxvilka
Copy link
Contributor

@tushar3q34 please rebase.

@tushar3q34 tushar3q34 force-pushed the dist-capstone-to-zydis-x86 branch from 2702f06 to f9e83e3 Compare February 22, 2025 06:06
@github-actions github-actions bot removed the RzCore label Feb 22, 2025
@tushar3q34 tushar3q34 force-pushed the dist-capstone-to-zydis-x86 branch from f9e83e3 to b31ab81 Compare February 22, 2025 06:32
@tushar3q34 tushar3q34 force-pushed the dist-capstone-to-zydis-x86 branch from b31ab81 to 1e8b102 Compare February 22, 2025 07:09
Copy link
Member

@Rot127 Rot127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only reviewed the first two commits, but so far fantastic job!

I have mostly nitpicks (they should be really quick to resolve, so don't be scared of the number :) ).

I can't see who made the original edits, so apologies if I requested changes to code not from you. Please be so kind and fix them anyways, since it improves the overall stability.

Please also compile Rizin with ASAN enabled (see BUILDING.md) and run the asm tests with the LEAK sanitizer enabled.
Also the x86 esil tests (db/analysis/x86*).

You can ignore leaks which are not part of the x86 code (you can also suppress them: https://github.com/google/sanitizers/wiki/AddressSanitizerLeakSanitizer#suppressions).

Some general comments:

  • Take care of marking all pointers as const if you know they should not change (be it during initialization or for function parameters).
  • Use size_t instead of int if you know the value should never be negative.
  • Initialize indices (i) of loops always within the loop. If they are not used outside of it.

directory = zydis
patch-directory = zydis
depth = 1
revision = master
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping on this one :)
Did we agreed on something and I just forgot it?

struct Getarg {
ZydisDecodedInstruction *zydecode;
ZydisDecodedOperand *zydeop;
int bits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be a size_t? bits shouldn't be negative.


static void hidden_op(ZydisDecodedInstruction *zydecode, ZydisDecodedOperand *zydeop, int mode) {
unsigned int mnemonic = zydecode->mnemonic;
int regsz = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, size_t.

static void opex(RzStrBuf *buf, X86ZYDISContext *zydx, int mode, ut64 addr) {
ZydisDecodedInstruction *zydecode = zydx->zydecode;
ZydisDecodedOperand *zydeop = zydx->zydeop;
int i;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not to be used outside of the loop. So please initialize it in the for loop.

pj_end(pj);
}

static bool is_xmm_reg(ZydisDecodedOperand op) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const ZydisDecodedOperand op

case ZYDIS_MNEMONIC_PUSHFQ:
switch (zydeop[0].type) {
case ZYDIS_OPERAND_TYPE_MEMORY:
if (zydeop[0].mem.disp.value && !zydeop[0].mem.base && !zydeop[0].mem.index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't assume here zydeop[0].mem.base is 0 for the invalid case please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also for index

Comment on lines +3174 to +3179
char mnemonic[256];
ZydisFormatterInit(&formatter, ZYDIS_FORMATTER_STYLE_INTEL);
ZydisFormatterFormatInstruction(&formatter, zydx->zydecode, zydx->zydeop,
zydx->zydecode->operand_count, mnemonic, sizeof(mnemonic), addr, NULL);
ZydisFormatterSetProperty(&formatter, ZYDIS_FORMATTER_PROP_FORCE_SIZE, ZYAN_TRUE);
op->mnemonic = rz_str_dup(mnemonic);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can directly allocate to op->mnemonic and pass it instead.

Comment on lines +3154 to +3155
zydx->zydecode = RZ_NEW0(ZydisDecodedInstruction);
zydx->zydeop = RZ_NEWS(ZydisDecodedOperand, ZYDIS_MAX_OPERAND_COUNT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you free them before returning. Please don't allocate them on the heap then. Just allocate them on the stack and assign the pointer here. You can set them to NULL before returning.

@tushar3q34
Copy link
Contributor Author

tushar3q34 commented Feb 23, 2025

@Rot127
Thanks for the reviews :). I am working on resolving them.

  • A few of them have been resolved in the further commits already. Commented code has been removed.
  • For the build file/subprojects, check the latest commit.
  • The %s, %d, const and other details were carried over from the previous file (which I hadn't modified), but I’ll update them as suggested.
  • The default value for most things is zero except segment of memory operand. But I will change these everywhere to INVALID or NONE for clarity as suggested.
  • doxygen was not present in the previous file, so I will add doxygen for all necessary functions.
  • I ran the ASAN check already and there were no memory leaks (as of the latest commit)

Would it be possible to review from the latest commit? Lots of changes since the first few, especially in analysis_x86_zydis.c file. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate from Capstone to Zydis for x86 and x86_64 architectures
4 participants