Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android APK D2D: Convert classes.dex and related VM/Dalvik bytecode to plain JARs #1372

Closed
Tracked by #1366
pombredanne opened this issue Aug 29, 2024 · 6 comments
Closed
Tracked by #1366

Comments

@pombredanne
Copy link
Contributor

pombredanne commented Aug 29, 2024

The goal is to reverse the Android bytecode into Java class files and JARs amenable to further analysis.

There are multiple tools that may help with this:

@JonoYang
Copy link
Contributor

JonoYang commented Sep 9, 2024

I am able to use jadx to decompile the classes.dex file into java source files and then map the paths to the from codebase. However, we need to make jadx available to the user when they install scancode.io. We can do this by creating a new plugin that vendors jadx.

@JonoYang
Copy link
Contributor

We should also have a way to separate the kotlin runtime from results

@JonoYang
Copy link
Contributor

An android apk d2d pipeline has been created here: https://github.com/aboutcode-org/android-inspector

All android d2d related code for scancode.io will be placed here.

@chinyeungli
Copy link
Contributor

This is a little sample d2d note when I compile an APK from kotlin sources

I created an APK in a sample android project that have 7 sources (excluing those xml from res/) which include 6 .kt files and 1.java file.

If I extract the APK, I find 5 classes.dex files, making it unclear which dex files contain the 'real' sources and which ones contain Google libraries.

On the other hand, if I use JADX directly on the APK file, it generates file structure without worrying the dex files.

This note indicates that we can use JADX directly on the APK file rather than the DEX file.

However, in either cases, it generated many more files that we originally have in the sources.
It generated the following directories:

_COROUTINE/
android/support/v4/
andoridx/
com/example/ <-- source location
com/google/common/
kotlin/
kotlinx/
org/intellig/
org/jetbrains/

As note above, the source location should only be in com/example/, but it generates many other libs that we need to smartly ignore/identify when we perform the D2D in a sense that these will be no match.

For the file level, following are the source files in a sample andorid project:

empty_java.java
empty_kt.kt
MainActivity.kt
test_kt.kt

Following are the decompiled files generated from jadx:

ComposableSingletons$MainActivityKt.java
empty_java.java
empty_kotlin.java
MainActivity.java
MainActivityKt.java
R.java
test_kotlin.java
Test_ktKt.java

We already know all the R.java is generated, but for others, it is important to note that not all of them have matching basenames.
These are the matching:

Source: empty_java.java
JADX: empty_java.java

Source: empty_kt.kt
JADX: empty_kotlin.java

Source: MainActivity.kt
JADX: MainActivity.java, MainActivityKt.java

Source: test_kt.kt
JADX: test_kotlin.java, Test_ktKt.java

We need to find a pattern in order to increase the matching accuracy and avoid noise.

JonoYang added a commit that referenced this issue Sep 27, 2024
@JonoYang
Copy link
Contributor

I have created a library named android_inspector that provides a scancode.io pipeline that does android APK d2d. There is a step that calls jadx on .dex files.

@pombredanne
Copy link
Contributor Author

@chinyeungli I am creating a new issue for the Kotlin-specific parts that were not part of this originally and this here is working and done.
See for the follow up:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants