-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reversible binary patch format #23379
Comments
use the -r flag |
As of 2578ff0, using
|
you can also use The proposal to create a decent and standarized and extensible binary patching file format in plain text looks quite interesting to me, and i would love to have support in r2 for that. btw there's also support for rapatch. but its just part of r2, not a standalone tool. but for consistency with radiff2 it probably makes sense to have a rapatch2 tool instead of having an r2 uppercase flag.
You can read more about this in doc/rapatch.md |
just created a new tool that cant bemerged until r2-6.0 for now is just a dummy thing, but I agree that your proposal is important and should be treated as a first class tool, would you like to improve radiff2 to support this output? i'm not 100% sure about the LE/BE values because radiff just spots changes in byte which may not really know if the underlying data is a word or qword. the patch format can specify that or maybe we can do some happy assumptions on this. i think we have time during 5.9.x until we reach 6.0 to break abi and provide such new tool with proper manpage and a working patch format for unified binary patching. |
Thanks for taking a look. I'll need to think about the patch format to come up with a spec that makes sense first. You can always fall back to plain hex bytes if you don't know how the data type, but the patch format should allow you to use a more human friendly format. The problem with the flow of saving the binary and then diffing with the original is that you miss information on how the user specified the changes. It may be better to generate a patch from r2 itself when you know how those changes were made. This way you can store in the patch comments such as the instruction being changed or ASCII. I suspect that for any w command that you do with radare2, you can always find the opposite command that would revert that change, and specify it using the same value format. For example, --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ -0x00100000,4 +0x00100000,4 @@
- wvf 1.23
+ wvf 3.21 This is nice for radare2 users because they will be already familiar with the commands, but it doesn't make a lot of sense for users of other tools. It also has the problem that there is not information about the byte order. Something like this: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ -0x00100000,4 +0x00100000,4 @@
- LE (float) 1.23
+ LE (float) 3.21 May be more understandable, specially if we use known types like C. Another issue is that in the common case, you will always use the same LE/BE for a patch (although we should support the cases when that is not true), so you don't need to pollute every line with LE/BE. It is conveninent to define a byte order at the start: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ LE @@
@@ -0x00100000,4 +0x00100000,4 @@
- (float) 1.23
+ (float) 3.21 Now, there is the case in which a user may specify integers in different bases hex/dec/octal. In my above case: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ LE @@
@@ -0x00189ca8,4 +0x00189ca8,4 @@
- (uint32_t) 1799000 # Using a type that can map a constant into bytes
+ (uint32_t) 1920000 # Notice how I use decimal here I think it may be good to use the C format for numbers too: 0123 = octal, 123 = dec, 0x123 = hex. This may also be valid: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ LE @@
@@ -0x00189ca8,12 +0x00189ca8,12 @@
- (uint32_t []) { 1799000, 1799001, 1799002 }
+ (uint32_t []) { 1920000, 0x123, 07 } But it starts to complicate the syntax. Also, if I want to add another number, I would need to modify the 12 in the hunk header. So this may be simpler: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ LE @@
@@ -0x00189ca8,uint32_t +0x00189ca8,uint32_t @@
- 1799000 # Using a type that can map a constant into bytes
+ 1920000 # Notice how I use decimal here --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ LE @@
@@ -0x00189ca8,uint32_t[3] +0x00189ca8,uint32_t[3] @@
- 1799000, 1799001, 1799002
+ 1920000, 0x123, 0777 # Notice 0777 is octal All those cases can be mapped to the basic format, where everything is a simple hex string. I don't like to just specify an hex string that could be confused with a number. Also, I think So maybe we can use something like this: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ -0x00189ca8,char[4] +0x00189ca8,char[4] @@
- '00 1b 73 58'
+ '00 4c 1d 00' Notice there is no byte order, as hex string don't need one. We can also probably specify that the default is LE, and only write BE when needed. There is also mixed endianness, but I think we can ignore those for now, and fall back to hex strings if needed. This basic hex format is probably doable to be implemented in radiff2 without much effort, as we don't even need to output aligned words, just which bytes differ. I can try to modify radiff2, but I'm not familiar with the codebase so it may take a while. On a more advanced implementation, one could determine what type of data is placed on which addresses of a binary file, and then produce the appropriate representation in a patch when changing those bytes. Otherwise fall back to hex strings. The hex format should allow you to split the lines as you want, so you can write instructions properly: --- a.bin 2024-09-24 09:24:41.475235346 +0200
+++ b.bin 2024-09-24 09:24:41.475235346 +0200
@@ -0x00189ca8,char[4] +0x00189ca8,char[4] @@
- '01 46' # mov r1, r0
- '68 46' # mov r0, sp
+ '4f f2 ba fc' # bl 0x254526 It would be also nice if this format is a superset of the patch format, so you can also apply normal patches with rapatch2 (or even mix hunks). I think this can be easily done by using the "type" specifier of the hunk. So This may be useful if you have a mix of source code and blobs and you want to specify a patch to change both. |
This oneliner more or less implements the hex diff (addresses are decimal and start at 1): % bindiff() { diff -u0p <(od -An -vtx1 -w1 $1) <(od -An -vtx1 -w1 $2) | sed '/^@@/s/,\([0-9]*\)/,char[\1]/g' }
% bindiff v2.bin v3.bin
--- /proc/self/fd/11 2024-09-26 22:10:25.813980894 +0200
+++ /proc/self/fd/13 2024-09-26 22:10:25.813980894 +0200
@@ -1612969,char[3] +1612969,char[3] @@
- 58
- 73
- 1b
+ 00
+ 4c
+ 1d |
Let's discuss it in here https://hackmd.io/@BCdr4EkGSKO51w6pf-JUow/r1h5idQRC/edit |
I have created a draft of the specification here: https://github.com/rodarima/xpatch/ For now I'm calling it "xpatch" as in extended patch. It is a superset of the patch(1) format, so you can mix plain text and binary patches in the same xpatch file. There is a simple xdiff program as an example, but to implement a full xpatch I'll probably need to bring bison to parse the grammar properly and detect errors. I'm still thinking a bit about the format, but so far it seems suitable for all my usecases. I've also added support to parse hex strings directly by specifying the printf-like parsing format |
rapatch2 tool has been merged into master, now its the time to start implementing the required bits, want to submit a PR? |
I added a simple proof of concept xpatch that parses a small subset of the specification, but is not complete. I would like to first get some feedback on the syntax, before I attempt to implement the whole thing. Specially the extended format. I think it covers all the use cases I wanted, but I also think it also has enough expressive power to store all file manipulations you may be doing with radare2 (or other tools). |
Ill find some time in the upcoming days to test your poc and give you proper feedback on the proposal 👍 |
i finally reach the point to have time to look at it! sorry for the huge delay :D |
here's the initial work towards supporting these patches in rapatch2 i will be merging and pushing improvements, ill cleanup the logic and give you some feedback to try to clarify the standard. one of the things i saw is that in +++ and --- lines, the tab char is used to separate the filename and the timestamp. radiff2 is still not generating this format, but i did some refactoring and cleanup to clarify the flags usage. it should be easy to support that but i want radiff2 to generate a smart patch in the sense that it will list aligned dwords instead of bytes, so its easier to read, also adding comments with the disasm if there's code involved in the place, etc. feel free to suggest your ideas and patches if you want to contribute too! i have so many other open fronts right now, but i wanted to get at least some basic work done here |
Nice!, thanks for the effort.
Thanks, I was following the POSIX manual but they don't seem to mention it. I think tab is reasonably safe, I should update the spec. |
It will be good to document all possible data types that can be handled. I have some more questions but i think i will solve them while implementing it. Im currently thinking in ways to abstract the code to make it cleaner to revert to work withour supping code. It will be good to have a bunch of tests for that too. Would you like to help on any of this in r2? |
I agree. I have a WIP implementation for a standalone xpatch which may serve as to solve some ambiguities, although it is probably more convenient to rely on the spec only. I'm not familiar with r2 codebase, so I don't think I could help much on the r2 side. I plan to add some test cases, so far xpatch can apply a patch so that xdiff can create a suitable .xpatch output that can be fed again to xpatch to generate the same changes: % cat 1.xpatch
--- b860_102.bin foo
+++ b860_102.bin.2 bar
This is just a comment, it will be ignored.
@@ u8,u32 -0x1897d8,1 +0x1897d8,1 @@
- 0x1b7358 # 1799000
+ 0x1d4c00 # 1920000
% ls b860_102.bin*
b860_102.bin
# Apply original xpatch once
% src/xpatch < 1.xpatch 2>/dev/null
% sha1sum b860_102.bin*
5b465d58349393d7ac71ca71b26957988a016842 b860_102.bin
6e3ff9c4ce5a9334e46670810d01db81e9ab2494 b860_102.bin.2
# Recreate another .xpatch from the differences
% src/xdiff b860_102.bin b860_102.bin.2 > 2.xpatch
% cat 2.xpatch
--- b860_102.bin foo
+++ b860_102.bin.2 bar
@@ u8,u32 -0x1897d8,1 +0x1897d8,1 @@
- 0x1b7358
+ 0x1d4c00
# Remove modified binary and try again with the new 2.xpatch
% rm b860_102.bin.2
% src/xpatch < 2.xpatch 2>/dev/null
% sha1sum b860_102.bin*
5b465d58349393d7ac71ca71b26957988a016842 b860_102.bin
6e3ff9c4ce5a9334e46670810d01db81e9ab2494 b860_102.bin.2
# Checksums match |
I would like to generate some binary patches that can be read in plain text, in the same way I do with diff(1) and patch(1). In particular, I want to be able to do these operations:
The default radiff2(1) format is close to what I want.
It has the benefit that it can be reversed by a simple awk(1) program:
However, AFAIK this format doesn't seem to be accepted by any tool.
The r2 format outputs radare2(1) commands, but they ignore what was in that address before:
This is not enough, as I want to know if a give patch collides with another one. This also prevents from reverting an applied patch.
There is also the rapatch.md format, but it seems to be different than these two. And it also seems to have the same problem, it cannot be reverted.
Maybe this problem can be solved by implementing a reversible operator like
wx
that swap bytes instead of overwriting an address.The problem of this approach is that when a swap command fails, it should output that hunk into a reject file, which is probably not what you want from a r2 session.
Maybe it would be a better idea to have another tool just for this workflow (which could also work with multiple files at once). You first perform all the changes you want with r2
w
commands, then you save the file and generate a patch that can be further edited and applied/reversed:Here is an example of what that patch may look like, which is very close to what patch(1) expects:
The benefit of such format is that:
This format can also be used to insert or remove bytes, leaving a different sized file. It also prevents the problem of using multiple write commands for the same memory location if the hunk addresses are sorted. It also resembles the patch format closely enough that it gets the syntax colors of normal patches on GitHub.
Patches of patches are also readable:
I think I could adapt radiff2.c to output such format, and maybe modify patch(1) to accept them.
The text was updated successfully, but these errors were encountered: