-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--map-source / --unmap-output inner classes #33
Comments
how are you! lol so u've been poking around the repo :) it's been 3 months since i last committed. the thing was ready for a beta release (minus testing) but i couldn't get myself to document the release... sigh
thanks! there's a trick or two under the sleeve that i'm sure will surprise you there :) so dexptacher-tool... let me take a look at how things stand right now |
ok. class name there are technical reasons. a mapping functioning akin to 'iq4$* -> NotificationComponent$*` is not really a mapping, but a mapping template. say the internal structures of a map would allow this level of expression. applying the map would require hierarchical searches of the name space for each class REFERENCE (happens almost everywhere in bytecode, not just class def headers) and could make the mapping perform horribly. cache transformed names? well it adds to the working memory (now mapping PLUS cache) and i want the thing to be usable in phones (low memory, low performance). but also, mappings are applied with multithreading in certain cases (typical for debug builds of apps for fast turnarounds, the ones using but what happens when i need to instantiate a mapping that is the inverse of an existing mapping? with mapping templates, this can't be done; i need real maps. so nope, for many reasons, the right way would be adding a pre-process step where the read-in map template is transformed into a map. the pre-process step would also require access to the bytecode to be transformed. a hell of a lot of code for low returns, but it should work. right? hold on. so a transform is built based on a transform template + bytecode. once that is done you can invert the transform. only there's a problem: the inverse will not going to be applied to the same (transformed) code! the code has been patched. so where it's different, the template would not really apply as expected! ok, recalculate transforms for each bytecode to be processed. nope! because now inverses would not really invert the previous transform. so they wouldn't guarantee code restoration as they do now. the previous solution would be the best, with the ugly property of very obscure semantics for template expansion in weird cases. crappy... but better than the nothing there is now? not really. obfuscators i've seen don't preserve outer/inner class name relationships. why the hell your obfuscator produced so i'm really surprised to see this. i never expected it. sooooo........ that's to justify my lack of support for 'iq4$* -> NotificationComponent$*`. it was not laziness, it was a well weighted decision. that said, 'com.foo.iq4$d -> com.foo.NotificationComponent$NotificationMonitor' should totally work. maps don't bother with nested classes at all. in fact, the test 'suite' uses 3 simultaneous maps, and all of them operate on nested classes like that. tests pass because they all work. take a look: https://github.com/DexPatcher/dexpatcher-tool/blob/master/test/mapping.txt |
using |
Ah, brilliant. As much as it'd be convenient to automatically have all (just) the inner classes automatically translated that makes so much sense. And really, the mapping is already such that you explicitly list the things you want renamed, and inner classes being just different named classes that fits perfectly. I updated my mapping file to list all the inner classes, deleted my temp files and re-ran with I think last night when testing I probably left the So now it's working great, thanks! I do think there might be a minor glitch though, so I'm comparing the decompiled code and without mapping the inner classes, the code references are all going to the obf$a references as expected, like such:
public static final iq4$a n = new iq4$a((qg6) null);
public final String a;
public final iq4$c b = new iq4$c(this.h);
public static final HashMap<String, iq4$b> l = new HashMap<>();
public static final HashMap<String, iq4$i> m; Then, with the rest of the mapping filled out for the inner classes they're now back to being recognised as inner classes and get directly referenced public static final a n = new a((qg6) null);
public final String a;
public final c b = new c(this.h);```
public static final HashMap<String, iq4.b> l = new HashMap<>();
public static final HashMap<String, iq4.i> m; ... all except when the reference is inside They also appear to be wrong when it's a reference in a method arg list public final Object a(iq4.d dVar, iq4.b bVar, boolean z, xe6<? super cd6> xe6) |
this is by design. your decompiler is flawed. what are you using? java/dalvik bytecodes have 'signature' items all around. these are supposed to contain generic type info before type erasure happens in the compiler. the signatures are strings and are totally ignored by the VMs (except that reflection APIs can expose them). so obfuscated code typically is stripped from signatures. again i didn't expect signatures in obfuscated code, or maybe illegal signatures to hinder decompilers. in short, sigs cannot be trusted. decompilers should only use them if the match reality (they don't in this case) and usage should be toggleable. CFR decompiler makes an effort not to depend on 'optional' metadata, as it could be doctored to trick the decompiler. and if it does, it makes all sorts of sanity checks on the data. a signature in obfuscated code could spell "yo mama!" and the code would still run fine. so i obviously make no effort to interpret it or map it. the usual use case is that code is dual transformed forth and back, so sigs if required will still be valid. you'll have to make do with your decompilation, or switch decompiles, or configure your decompiler if possible. this is a wont fix, sorry. :( |
btw, ALWAYS use
|
Ah that's quite interesting. It does make sense that the My go-to decompiler is jadx, in my use-cases with the latest versions of jadx it fails on functions/classes the least and makes some of the neatest code compared to the others. To compare, I fed the same processed classes.dex file into bytecode-viewer (https://github.com/Konloch/bytecode-viewer) Jadx /* JADX WARNING: Removed duplicated region for block: B:15:0x00f2 */
/* JADX WARNING: Removed duplicated region for block: B:7:0x0021 */
public final /* synthetic */ java.lang.Object b(iq4.d r17, iq4.b r18, boolean r19, xe6<? super cd6> r20) { Fernflower // $FF: synthetic method
public final Object b(NotificationMonitor var1, NotificationComponent.b var2, boolean var3, xe6 var4) { CFR: /*
* Unable to fully structure code
*/
public final /* synthetic */ Object b(iq4.d var1_1, iq4.b var2_2, boolean var3_3, xe6<? super cd6> var4_4) { Procyon public final /* synthetic */ Object b(final iq4$d iq4$d, final iq4$b l$2, final boolean z$0, final xe6<? super cd6> xe6) { Krakatau: final public Object b(NotificationComponent$NotificationMonitor a0, NotificationComponent$b a1, boolean b0, xe6 a2) { So Jadx, CFR and Procyon all get the function args wrong, but Fernflower and Krakatau get them right. ps. thanks for the tip about the set -u flag, I didn't know that one. |
no maybe that CFR is old. but surely it has a command line arg to disable processing of signatures. |
I've just re-tested with the latest version of CFR, it gives the same results public final class NotificationComponent {
public static final HashMap<String, iq4.b> l;
public static final HashMap<String, iq4.i> m; public final /* synthetic */ Object b(iq4.d var1_1, xe6<? super cd6> var2_2) { There's a very long list of command line options for cfr, but I can't see anything that looks to me like it'd be related to naming/signatures. The only one I thought might be, |
hi Lee, so some software of mine maps (renames) dalvik bytecode names. the output is dedexed (eg: dex2jar), then fed to CFR. the mapping is to better handling obfuscated code. surprisingly, the obfuscator used on a piece of code does not remove signatures, and in fact it seems to map them "correctly" (meaning it obfuscates less than it could; ie, it could just strip the signatures). now maybe in java bytecode sigs are required, per-spec at least. so maybe this is why the obfuscator in question did that. but not in dalvik: in dalvik, the string content of signatures is unspecified. whatever may be the case, my mapping tool does not concern itself with signatures so they become stale. and CFR outputs incorrect generic source code that reflects the stale values of signatures. check out a comparison between decompilers here: (in this comparison, maybe the decompilers that get things right are the ones that do not parse signatures at all.) i was under the impression that CFR tried to use code and not metadata as much as possible to decompile. what would be the recommended way to make CFR behave better in this scenario? maybe disabling generics processing? is there an option with which CFR tries to regenerate erased types based on actual casts observed in code? thanks for your help! |
Hey -
Interesting - there's some places where I trust the signature if it passes
basic verification checks - do you have a small case that reproduces this
you can provide? Probably possible to spot a more aggressive verification
check I can add.
While there are a billion and one flags, they're almost always set
automatically, and I don't want to have to rely on getting people to set
the right ones if I can infer it, otherwise the amount of effort to use
shoots up.
…On Tue, Apr 7, 2020 at 5:40 AM Lanchon ***@***.***> wrote:
@leibnitz27 <https://github.com/leibnitz27>
hi Lee,
so some software of mine maps (renames) dalvik bytecode names. the output
is dedexed (eg: dex2jar), then fed to CFR.
the mapping is to better handling obfuscated code. surprisingly, the
obfuscator used on a piece of code does not remove signatures, and in fact
it seems to map them "correctly" (meaning it obfuscates less than it could;
ie, it could just strip the signatures).
now maybe in java bytecode sigs are required, per-spec at least. so maybe
this is why the obfuscator in question did that. but not in dalvik: in
dalvik, the string content of signatures is unspecified.
whatever may be the case, my mapping tool does not concern itself with
signatures so they become stale. and CFR outputs incorrect generic source
code that reflects the stale values of signatures.
check out a comparison between decompilers here:
#33 (comment)
<#33 (comment)>
(in this comparison, maybe the decompilers that get things right are the
ones that do not parse signatures at all.)
i was under the impression that CFR tried to use code and not metadata as
much as possible to decompile. what would be the recommended way to make
CFR behave better in this scenario? maybe disabling generics processing? is
there an option with which CFR tries to regenerate erased types based on
actual casts observed in code?
thanks for your help!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFXCEHXLG7IG7VPO5K32DLRLKVB7ANCNFSM4MCEZTEQ>
.
|
The single class referenced in the example/comparison above can be found here: https://mega.nz/file/485BDS4J#iDsvgTrWVuBX9CtmQsHWPa4m9vCaiTnIHoKiFufvrbM It was originally named |
(Lee, please note that Andrew's code sample is very probably based on non-free software.) |
Sorry yes, this one is not open software, hence sharing the minimum possible on an external host. I accept responsibility for this and already host it and related code on a public gitlab repo, so I would presume/expect any liability should stop with me. |
oh dont worry Andrew. it's just that Lee tends to integrate these bits into his repo as test cases. we can easily create a clean snippet for him if he needs it. |
@Lanchon - leibnitz27/cfr@a986af3 (--usesignatures false will turn them off altogether, however I strongly recommend against doing that, and just let the more aggressive verification do its thing ). Having said that, I would strongly recommend, if you're rewriting descriptors anyway, to rewrite signatures if you possibly can. I know it's possible to lie in them (so I'm very happy to add more paranoia to CFR), but there are a few places (other than generics) where signatures and descriptors carry significantly different info , and the signature is preferable. Anonymous inner classes with captures is a very good example - the descriptors for those are a total mess. |
thanks!
i won't, for various reasons:
|
Hi @Lanchon so this is working pretty well for me now, I've got a my script running dexpatcher manually to handle the anon and map and create a renamed jar which is good for disassembly and inspection. I'm trying to use the mapped classes in Android Studio and not having as much luck there when trying to build again. In build.gradle:
If I understand correctly, when building through dexpatcher-gradle, it uses This jar hasn't been run through dexpatcher-tool though (I believe), so it doesn't know about any of the anon/map changes? To combat this I tried However, when trying to compile I get > Task :collectDexDebug
> Task :patchDexDebug FAILED
DexPatcher version 1.8.0-beta1 by Lanchon (https://dexpatcher.github.io/)
info: read 'C:\foo\build\intermediates\dexpatcher\decoded-app'
debug: read 'C:\foo\build\intermediates\dexpatcher\decoded-app': file 'classes.dex': 10672 types
debug: read 'C:\foo\build\intermediates\dexpatcher\decoded-app': file 'classes2.dex': 10099 types
debug: read 'C:\foo\build\intermediates\dexpatcher\decoded-app': file 'classes3.dex': 525 types
debug: read 'C:\foo\build\intermediates\dexpatcher\decoded-app': dex version '035'
stats: read 'C:\foo\build\intermediates\dexpatcher\decoded-app': 21296 types, 220 ms, 10 us/type
info: read 'C:\foo\build\intermediates\dexpatcher\patch-dex\debug'
debug: read 'C:\foo\build\intermediates\dexpatcher\patch-dex\debug': file 'classes.dex': 24 types
debug: read 'C:\foo\build\intermediates\dexpatcher\patch-dex\debug': file 'classes2.dex': 5653 types
debug: read 'C:\foo\build\intermediates\dexpatcher\patch-dex\debug': file 'classes3.dex': 4297 types
debug: read 'C:\foo\build\intermediates\dexpatcher\patch-dex\debug': file 'classes4.dex': 11346 types
fatal: exception:
lanchon.multidexlib2.DuplicateTypeException: Lcom/portfolio/platform/uirenew/alarm/AlarmActivity;
at lanchon.multidexlib2.MultiDexContainerBackedDexFile.<init>(MultiDexContainerBackedDexFile.java:42)
at lanchon.multidexlib2.MultiDexIO.readDexFile(MultiDexIO.java:34)
at lanchon.dexpatcher.Processor.readDex(Processor.java:299)
at lanchon.dexpatcher.Processor.processFiles(Processor.java:134)
at lanchon.dexpatcher.Processor.processFiles(Processor.java:80)
at lanchon.dexpatcher.Main.runWithExceptions(Main.java:71)
at lanchon.dexpatcher.Main.run(Main.java:49)
at lanchon.dexpatcher.Main.run(Main.java:44)
at lanchon.dexpatcher.Main.runWithUsage(Main.java:39)
at lanchon.dexpatcher.Main.main(Main.java:30)
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':patchDexDebug'.
> Process 'command 'C:\Program Files\Android\Android Studio\jre\bin\java.exe'' finished with non-zero exit value 3 It's complaining about a duplicate type: This is one that's in my patches java source, but hasn't been mapped itself. This class is extending a mapped type though Does this make sense to you? Do you know of a quick-fix way of getting the dexpatcher mapped class patches to compile in android studio? |
im really sorry you have to go though all this. of course dxp-gradle should be updated to do all this and more. btw you are getting pretty good at this. :) the problem is you are instructing gradle to import the symbols (and code) into your patch. so the patch has all the code from the original apk, plus your patch. in the log, the multidex read that starts here: this is happening to all the classes you are patching of course: the originals are also in the multidex. but i/o errors and other low-level stuff out of dxp's control cause nstant aborts (while dxp tries to continue as much as possible in the face of errors, as you already know). solution: import the symbols in gradle via a different scope (configuration). i think the right one is called |
btw it's cool that you've discovered and early adopted this stuff before i published it. but yeah, dxp-gradle is requires an overhaul. i'm no longer supporting old android plugins and gradle versions though. the amount of hacks and dynamic code needed is outrageous. so the current dxp-g is the last to support AAPT1. it's a drag, but i can't work against google deprecations. |
Hah, yeah I "should" have ample time and motivation to finish all my projects already too ;-) please, no apology necessary for open source/free work you haven't done yet! The wip I'm playing with is hugely beneficial even as is! Yep I reckon you're right. I didn't know about And yeah, I certainly wouldn't put in any more than bare minimum trying to maintain support for deprecated features/versions. I have a similar battle with an Android app I maintain (with the help of DexPatcher). About 3/4 of my development effort goes into compatibility with the latest API and security changes Google device to enforce; none of this effort is seen or really respected by end users but you kind of just have to suck it up as a developer these days. |
implementation and api both compile against and bundle the dependency in the build. the difference only shows when a 3rd piece of code uses your code as a dep: api provides your dep to the compile step of the 3rd code (as is part of your api), while implementation doesnt (no leaking of transitive deps). in both cases your dep is added to the build (as is used by your code). example:
on the other hand, compileOnly is compiled against but never included in builds. what about its transitivity? for reasons i ignore, there is no api/imp distinction for compileOnly deps. i suppose they are never transitive, IDK. |
Hi there,
I hope you're doing ok, weathering the storm as it were.
I've just started playing with
--map-source
,--unmap-output
,--map mapping.txt
from a master build to port my patches to a new version of the app with wildly different obfuscation.One the of the main classes being patches has inner (not anon) classes which I also need to patch.
Are these supported at all at the moment?
java -jar "dexpatcher-tool\tool\build\libs\dexpatcher-1.8.0-alpha4.jar" --deanon-source --main-plan "Anon[_Level]" --output Anon\classes.dex classes.dex --map-source --map ../mapping.txt
I've tried:
and
neither of which seem to work.
In a decompiled copy of the outputted dex I do get iq4.java replace by NotificationComponent.java with all the inner classes now split out as iq4$a.java and so on.
iq4$d.java
persists with both the mappings as shown above, where I'd prefer it to be NotificationComponent$NotificationMonitorThanks in advance, in general this new mapping looks fantastic!
The text was updated successfully, but these errors were encountered: