Skip to content

Latest commit

 

History

History
647 lines (496 loc) · 27.9 KB

README.adoc

File metadata and controls

647 lines (496 loc) · 27.9 KB

graal-docs

Rationale

This little repo’s goal is to collect scripts and tips on natively compiling Clojure programs with GraalVM.

GraalVM allows us to compile Java classes to native binaries. Because Clojure is hosted on the JVM, compiling Clojure programs to native binaries is also possible.

Native binaries have fast startup times. This makes them an attractive option for command line tools that are used for scripting and in editor integrations. Popular command line tool examples are babashka and clj-kondo. See resources for many more examples.

Most of our tips are related to Clojure. We will sometimes add a general tip because we didn’t easily find it elsewhere and feel it would be helpful to others.

GraalVM is more than just its native image compiler, but unless otherwise noted, when we refer to GraalVM, we are talking about the native image compiler.

If you are trying to decide if GraalVM is for you, the trade-offs are nicely explained by Oleg Šelajev in the "AOT vs JIT" section of his "Maximizing Java Application Performance with GraalVM video".

Community

👋 Need help or want to chat? Say hi on Clojurians Slack in #graalvm.

This is a team effort. We heartily welcome, and greatly appreciate, tips, tricks, corrections and improvements from you. Much thanks to all who have contributed.

Style guidance:

The current curators of this repository are: @borkdude and @lread.

This tutorial covers creating a native binary from a Clojure hello world program using GraalVM.

Tips and tricks

Clojure Version

Always use the current Clojure release, it includes several GraalVM specific fixes, including:

  • CLJ-1472 - locking macro creates monitor bytecode difficult to analyze in Graal native-image and ART runtime.

  • CLJ-2502 - Cannot use clojure.stracktrace/print-stack-trace with GraalVM.

  • CLJ-2571 - ex-cause is missing Throwable return tag.

  • CLJ-2572 - Prevent reflection in clojure.data namespace, make compatible with GraalVM.

  • CLJ-2636 - Get rid of reflection on java.util.Properties when defining *clojure-version*

Other issues of interest:

  • CLJ-2582 - Improve GraalVM native image size / compile time memory consumption when compiling clojure.pprint

  • TCHECK-157 - Randomization doesn’t work with GraalVM native-image

GraalVM Version

Use the latest stable GraalVM release. Choose the JDK version that makes sense for you.

GraalVM Version Scheme Change

The Graal team changed their versioning scheme. It used to be based on the year and quarter, but it now simply matches the JDK version.

We used to have, for example a v22.3.2 release for each supported JDK. We now have, for example GraalVM v17.0.8 and GraalVM v20.0.2.

This version change does make it awkward to refer to past versions of GraalVM. For the old version scheme we’ll prefix the version with legacy like so GraalVM legacy v22.

GraalVM Free Flavours

We now have 2 free flavours of GraalVM:

Oracle GraalVM is has more features but has a different license. Choose the flavour that makes sense for your project.

Tips should be generic to both flavours, but we’ll refer to Oracle GraalVM if/when appropriate.

Class Initialization

In most cases, Clojure compiled classes must be initialized at build time by GraalVM native-image. If this has not been done, when you attempt to run your resulting native binary, you will see an exception that includes:

java.io.FileNotFoundException: Could not locate clojure/core__init.class, clojure/core.clj or clojure/core.cljc on classpath

Fortunately, we have an easy solution for you:

  1. include clj-easy/graal-build-time on your native-image classpath

  2. specify --features=clj_easy.graal_build_time.InitClojureClasses on your native-image command line

Note: graal-build-time doesn’t work with single segment namespaces. A single segment namespace is one without any . characters in it, for example: (ns digest).

See graal-build-time docs for details.

Runtime Evaluation

A natively compiled application cannot use Clojure’s eval to evaluate Clojure code at runtime. If you want to dynamically evaluate Clojure code from your natively compiled app, consider using SCI, the Small Clojure Interpreter. The ultimate example of evaluating Clojure with a natively compiled Clojure application is babashka.

Reflection

Clojure can use reflection to determine what to call. But a GraalVM native image will only include what it thinks your program calls. We can either tweak Clojure to not use reflection, inform native-image compilation about reflective calls, or both.

Take this little contrived example:

  • deps.edn

    {:deps {org.clojure/clojure {:mvn/version "1.11.1"}
            com.github.clj-easy/graal-build-time {:mvn/version "1.0.5"}}}
  • src/refl/main.clj

    (ns refl.main
      (:gen-class))
    
    (defn refl-str [s]
      (.toUpperCase s)) ;; reflection on String happens here
    
    (defn -main [& _args]
      (println (refl-str "all good!")))

It will compile just fine:

$ mkdir -p classes
$ clojure -M -e "(compile 'refl.main)"
$ native-image -cp "$(clojure -Spath):classes" -H:Name=refl -H:+ReportExceptionStackTraces \
    --features=clj_easy.graal_build_time.InitClojureClasses --no-fallback refl.main

But when we go to run the native image, we’ll see the following failure:

$ ./refl
Exception in thread "main" java.lang.IllegalArgumentException: No matching field found: toUpperCase for class java.lang.String
	at clojure.lang.Reflector.getInstanceField(Reflector.java:397)
	at clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:440)
	at refl.main$refl_str.invokeStatic(main.clj:5)
	at refl.main$refl_str.invoke(main.clj:4)
	at refl.main$_main.invokeStatic(main.clj:8)
	at refl.main$_main.doInvoke(main.clj:7)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at clojure.lang.AFn.applyToHelper(AFn.java:152)
	at clojure.lang.RestFn.applyTo(RestFn.java:132)
	at refl.main.main(Unknown Source)

Use Type Hints to Avoid Reflection

Make sure you put (set! *warn-on-reflection* true) at the top of every namespace in your project. This tells the Clojure compiler to report cases where Clojure is using reflection.

(ns refl.main
  (:gen-class))

(set! *warn-on-reflection* true)

(defn refl-str [s]
  (.toUpperCase s))

(defn -main [& _args]
  (println (refl-str "all good!")))

If we recompile our Clojure source, we’ll see a warning:

$ clojure -M -e "(compile 'refl.main)"
Reflection warning, refl/main.clj:7:3 - reference to field toUpperCase can't be resolved.
refl.main

Let’s add a ^String type hint to avoid usage of Clojure reflection:

(ns refl.main
  (:gen-class))

(set! *warn-on-reflection* true)

(defn refl-str [^String s]
  (.toUpperCase s))

(defn -main [& _args]
  (println (refl-str "all good!")))

If we recompile our updated source:

$ mkdir -p classes
$ clojure -M -e "(compile 'refl.main)"
$ native-image -cp "$(clojure -Spath):classes" -H:Name=refl -H:+ReportExceptionStackTraces \
    --features=clj_easy.graal_build_time.InitClojureClasses --no-fallback refl.main

We no longer see our reflection warning and our native image now works just fine:

$ ./refl
ALL GOOD!
ℹ️
As an example, prior versions of Clojure’s own clojure.stacktrace made use of reflection (see JIRA CLJ-2502). But this has been addressed via type hints.

Enable or disable the warn-on-reflection depending on the alias, the following methods are available for each tool.

  • leiningen: Use :global-vars in project.clj

(defproject warn-on-refrection-test "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :profiles
  {:dev {:global-vars {*warn-on-reflection* true}}})
  • tools.deps: Use alter-var-root in user.clj

dev/user.clj

(ns user)

(alter-var-root #'*warn-on-reflection* (constantly true))

deps.edn

{:aliases
 {:dev {:extra-paths ["dev"]}}}

Specify a Reflection Config

When you cannot add type hints, you can specify a GraalVM config for classes that are reflected at runtime.

If we go back to our original src/refl/main.clj that is absent of any type hints:

(ns refl.main
  (:gen-class))

(defn refl-str [s]
  (.toUpperCase s)) ;; reflection on String happens here

(defn -main [& _args]
  (println (refl-str "all good!")))

And we create GraalVM reflect-config.json with:

[
  {
    "name":"java.lang.String",
    "allPublicMethods":true
  }
]

Then recompile specifying our reflection config:

$ mkdir -p classes
$ clojure -M -e "(compile 'refl.main)"
$ native-image -cp "$(clojure -Spath):classes" -H:Name=refl -H:+ReportExceptionStackTraces \
    -H:ReflectionConfigurationFiles=reflect-config.json \
    --features=clj_easy.graal_build_time.InitClojureClasses --no-fallback refl.main

We have success:

$ ./refl
ALL GOOD!

See the GraalVM docs on reflection for details on the reflection config format.

Reflection Config for Arrays

To configure reflection config for an array of Java objects, you need to specify [Lfully.qualified.class. For example a Statement[] would be specified as "[Ljava.sql.Statement".

You can discover this name by calling (.getClass instance) in a REPL. A contrived example:

❯ clj
Clojure 1.11.1
user=> (def foo (java.util.Locale/getAvailableLocales))
user=> (.getClass foo)
[Ljava.util.Locale;

Automatically Discovering Reflection Config

To automatically discover reflection, you can use the tracing agent.

To prevent false positives in the generated config, you can use a caller based filter. An example filter.json:

{
  "rules": [
    {
      "excludeClasses": "clojure.**"
    },
    {
      "includeClasses": "clojure.lang.Reflector"
    }
  ]
}

To invoke the agent, you run your program wth the GraalVM JVM and add the -agentlib:native-image-agent argument.

Let’s recompile our original reflection example app and then run it from GraalVM JVM with the tracing agent:

$ mkdir -p classes
$ clojure -M -e "(compile 'refl.main)"
refl.main
$ java -agentlib:native-image-agent=caller-filter-file=filter.json,config-output-dir=. \
    -cp $(clojure -Spath):classes refl.main
ALL GOOD!

This will output reflect-config.json:

[
{
  "name":"java.lang.String",
  "queryAllPublicMethods":true,
  "methods":[{"name":"toUpperCase","parameterTypes":[] }]
},
{
  "name":"java.lang.reflect.Method",
  "methods":[{"name":"canAccess","parameterTypes":["java.lang.Object"] }]
},
{
  "name":"java.util.concurrent.atomic.AtomicBoolean",
  "fields":[{"name":"value"}]
},
{
  "name":"java.util.concurrent.atomic.AtomicReference",
  "fields":[{"name":"value"}]
}
]

The entry for java.lang.reflect.Method is expected, see clojure.lang.Reflector.

You then feed this generated reflection config to native-image just like you would for a hand-coded one.

clojure.lang.Reflector

If you are suffering NoSuchMethodError: java.lang.reflect.AccessibleObject.canAccess exceptions, GraalVM needs a little help. Include the following to your reflect-config.json file:

{"name": "java.lang.reflect.AccessibleObject",
 "methods" : [{"name":"canAccess"}]}

Digging & Diagnosing

Sometimes you’ll want more details on what GraalVM has done or produced.

Report what is being analyzed

Use GraalVM’s native-image -H:+PrintAnalysisCallTree to to learn what packages, classes and methods are being analyzed. These details are written under ./reports.

Note that this option will greatly slow down compilation so it’s better to turn it off in production builds.

Visualize what is in your native image

To visualize what is in your native image, you can use -H:+DashboardAll and upload the .bgv file to the GraalVM Dashboard, here’s an example screenshot:

GraalVM Dashboard Screenshot

ℹ️
Apparently GraalVM is going to stop work on the dashboard and focus instead on HTML reports generated by -H:+BuildReport. At the time of this writing -H:+BuildReport is only available in Oracle GraalVM and not in the Community Edition (see GraalVM Free Flavours).

Resource Usage

native-image RAM usage

GraalVM’s native-image can sometimes consume more RAM than is available on free tiers of services such as CircleCI. To limit how much RAM native-image uses set max heap usage via the "-J-Xmx" option (for example "-J-Xmx3g" limits the heap to 3 gigabytes).

If you are suffering out of memory errors, experiment on your development computer with higher -J-Xmx values.

Refer to native-image output for Peak RSS for RAM usage.

Actual memory usage is an ideal. Once you have a successful build, you can experiment with lowering -J-Xmx below the ideal. The cost will be longer build times, and when -J-Xmx is too low, out of memory errors.

native-image compilation time

You can shorten the time it takes to compile a native image, and sometimes dramatically reduce the amount of RAM required, by using direct linking when compiling your Clojure code to JVM bytecode.

This is done by setting the Java system property clojure.compiler.direct-linking to true.

The most convenient place for you to set that system property will vary depending on what tool you’re using to compile your Clojure code:

  • If you’re using Leiningen, add :jvm-opts ["-Dclojure.compiler.direct-linking=true"] to the profile you’re using for compilation (the same one that includes :aot :all)

  • If you’re using tools.deps via the Clojure CLI tools, add :jvm-opts ["-Dclojure.compiler.direct-linking=true"] to the alias you’re using for compilation

    • You can alternatively specify this property at the command line when invoking clojure: clojure -J-Dclojure.compiler.direct-linking=true -M -e "(compile 'my.ns)"

Optional Transitive Dependencies

A Clojure app that optionally requires transitive dependencies can be made to work under GraalVM with dynaload. You’ll want to follow its advice for GraalVM.

Static Linking

Static linking vs DNS lookup

If you happen to need a DNS lookup in your program, you need to avoid statically linked images (at least on Linux). If you are building a minimal docker image, it is sufficient to add the linked libraries (like libnss*) to the resulting image. But be sure that those libraries have the same version as the ones used in the linking phase.

One way to achieve that is to compile within the docker image then scraping the intermediate files using the FROM scratch directive and COPY the executable and shared libraries linked to it into the target image.

Static linking with musl

Using musl for static builds is recommended by the official GraalVM docs. Usage of --static without specifying --libc=musl will use glibc instead. However, while this may look like a fully statically binary, this will still load some libraries (using dlopen) at runtime. This may result in some segmentation fault errors related to glibc version mismatches. See this section in official glibc documentation for more information on why glibc "static" builds are not really static.

With --static --libc=musl, you will have truly static binaries equivalent to Go’s with CGO_ENABLED=0 or Rust compiled with musl. This can be deployed almost anywhere and is also smaller than the glibc equivalent. However, keep in mind that musl builds still have some limitations:

  • Only works with Linux AMD64

  • You will need to either use a distro that already have musl and zlib statically compiled in the repositories or compile it yourself.

  • There is a known issue with stack sizes in musl being really small by default and main thread not respecting stack size settings. This may cause some stack overflow errors during runtime

If supporting non-glibc distros are not an issue for you, there is also an option of building a mostly static native image that should work in any glibc distro. Those binaries are very similar to Go binaries without CGO_ENABLED=0 and Rust images build with glibc (the default).

Writing GraalVM specific code

While it would be nice to have the same clojure code run within a GraalVM image as on the JVM, there may be times where a GraalVM specific workaround may be necessary. GraalVM provides a class to detect when running in a GraalVM environment:

This class provides the following methods:

static boolean inImageBuildtimeCode()
Returns true if (at the time of the call) code is executing in the context of image building (e.g. in a static initializer of class that will be contained in the image).

static boolean inImageCode()
Returns true if (at the time of the call) code is executing in the context of image building or during image runtime, else false.

static boolean inImageRuntimeCode()
Returns true if (at the time of the call) code is executing at image runtime.

static boolean isExecutable()
Returns true if the image is build as an executable.

static boolean isSharedLibrary()
Returns true if the image is build as a shared library.

Currently, the ImageInfo class is implemented by looking up specific keys using java.lang.System/getProperty. Below are the known relevant property names and values:

Property name: "org.graalvm.nativeimage.imagecode"
Values: "buildtime", "runtime"

Property name: "org.graalvm.nativeimage.kind"
Values: "shared", "executable"

Java Native Interface

JNI contains a suite of tools for transfering datatypes between Java and C. You can read about this API here for Java 17.

Watch for bugs

There have historically been bugs (example) in the GraalVM implementations of some JNI functions. If you encounter bugs with these API calls, you might try the latest development versions of GraalVM. If bugs persist please https://github.com/oracle/graal/issues [raise them with with the Graal project].

Interfacing with native libraries

For interfacing with native libraries you can use JNI.

To interface with C code using JNI:

  1. Write a java file is defining a class. This class contains public static native methods defining the C functions you would like, their arguments and the return types. An example .java file from Spire.

  2. Generate a C .h header file from this java file:

    • Java 11+ bundles this tool into javac. Run javac on your .java source file and specify a directory to store the header file: javac -h destination_dir Library.java

  3. Write a .c implementation file with function definitions that match the prototypes created in your generated .h file. You will need to #include your generated .h header file. An example .c file from Spire.

  4. Compile the C code into a shared library as follows (we assume JAVA_HOME is setup as per GraalVM installation instructions):

    • On linux:

      cc -I$JAVA_HOME/include -I$JAVA_HOME/include/linux -shared Library.c -o liblibrary.so -fPIC
    • On MacOS:

      cc -I$JAVA_HOME/Contents/Home/include -I$JAVA_HOME/Contents/Home/include/darwin -dynamiclib -undefined suppress -flat_namespace Library.c -o liblibrary.dylib -fPIC
  5. Load the generated library at runtime from clojure via (clojure.lang.RT/loadLibrary "library")

  6. The JVM will need to be able to find the library on the standard library path. This can be set via LD_LIBRARY_PATH environment variable or via the ld linker config file (/etc/ld.so.conf on linux). Alternately you can set the library path by passing -Djava.library.path="my_lib_dir" to the java command line or by setting it at runtime with (System/setProperty "java.library.path" "my_lib_dir")

  7. Functions may be called via standard Java interop in clojure via the interface specified in your Library.java file (from step 1): (Library/method args)

macOS

Startup performance on macOS

@borkdude noticed slower startup times for babashka on macOS when using GraalVM legacy v20. He elaborated in the @graalvm channel on Clojurians Slack:

The issue only happens with specific usages of certain classes that are somehow related to security, urls and whatnot. So not all projects will hit this issue.

Maybe it’s also related to enabling the SSL stuff. Likely, but I haven’t tested that hypothesis.

The Graal team closed the issue with the following absolutely reasonable rationales:

  • I don’t think we can do much on this issue. The problem is the inefficiency of the Apple dynamic linker/loader.

  • Yes, startup time is important, but correctness can of course never be compromised. You are correct that a more precise static analysis could detect that, but our current context insensitive analysis it too limited.

Apple may fix this issue in macOS someday, who knows? If you:

  • have measured a slowdown in startup time of your native-image produced app after moving to Graal legacy v20

  • want to restore startup app to what it was on macOS prior legacy v20 of Graal

  • are comfortable with a "caveat emptor" hack from the Graal team

then you may want to try incorporating this Java code with @borkdude’s tweaks into your project.

Targeting a minimum macOS version

On macOS, GraalVM’s native-image makes use of XCode command line tools. XCode creates native binaries that specify the minimum macOS version required for execution. This minimum version can change with each new release of XCode.

To explicitly tell XCode what minimum version is required for your native binary, you can set the MACOSX_DEPLOYMENT_TARGET environment variable.

Bonus tip: to check the the minimum macOS version required for a native binary, you can use otool. Example for babashka native binary at the time of this writing:

> bb --version
babashka v1.3.182
> otool -l $(which bb) | grep -B1 -A3 MIN_MAC
Load command 9
      cmd LC_VERSION_MIN_MACOSX
  cmdsize 16
  version 10.13
      sdk 12.3

GraalVM development builds

Development builds of GraalVM can be found here. Although, at the time of this writing, these builds seem to be tagged with the legacy version scheme, the artifacts follow the new version scheme. These builds are intended for early testing feedback, but can disappear after a proper release has been made, so don’t link to them from production CI builds.

License

Distributed under the EPL License, same as Clojure. See LICENSE.