Skip to content

Commit

Permalink
[GLUTEN-7887][VL][DOC] Add usage doc about dynamic load jvm libhdfs a…
Browse files Browse the repository at this point in the history
…nd native libhdfs3 (#7982)
  • Loading branch information
JkSelf authored Nov 19, 2024
1 parent ce45bc4 commit 5dda71a
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 3 deletions.
5 changes: 5 additions & 0 deletions docs/developers/MicroBenchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -358,6 +358,11 @@ ShuffleWriteRead/iterations:1/process_time/real_time/threads:1 121637629714 ns
Unless `spark.gluten.sql.debug` is set in the INI file via `--conf`, the logging behavior is same as debug mode off.
Developers can use `--debug-mode` command line flag to turn on debug mode when needed, and set verbosity/severity level via command line flags `--v` and `--minloglevel`. Note that constructing and deconstructing log strings can be very time-consuming, which may cause benchmark times to be inaccurate.


## Enable HDFS support

After enabling the dynamic loading of libhdfs.so at runtime to support HDFS, if you run the benchmark with an HDFS file, you need to set the classpath for Hadoop. You can do this by running `export CLASSPATH=`$HADOOP_HOME/bin/hdfs classpath --glob``. Otherwise, the HDFS connection will fail. If you have replaced ${HADOOP_HOME}/lib/native/libhdfs.so with libhdfs3.so, there is no need to set the `CLASSPATH`.

## Simulate write tasks

The last operator for a write task is a file write operator, and the output from Velox pipeline only
Expand Down
7 changes: 4 additions & 3 deletions docs/get-started/Velox.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,8 @@ shared libraries into another jar named `gluten-thirdparty-lib-$LINUX_OS-$VERSIO

## HDFS support

Hadoop hdfs support is ready via the [libhdfs3](https://github.com/apache/hawq/tree/master/depends/libhdfs3) library. The libhdfs3 provides native API for Hadoop I/O without the drawbacks of JNI. It also provides advanced authentication like Kerberos based. Please note this library has several dependencies which may require extra installations on Driver and Worker node.
Gluten supports dynamically loading both libhdfs.so and libhdfs3.so at runtime by using dlopen, allowing the JVM to load the appropriate shared library file as needed. This means you do not need to set the library path during the compilation phase.
To enable this functionality, you must set the JAVA_HOME and HADOOP_HOME environment variables. Gluten will then locate and load the ${HADOOP_HOME}/lib/native/libhdfs.so file at runtime. If you prefer to use libhdfs3.so instead, simply replace the ${HADOOP_HOME}/lib/native/libhdfs.so file with libhdfs3.so.

### Build with HDFS support

Expand All @@ -131,7 +132,7 @@ cd /path/to/gluten
./dev/buildbundle-veloxbe.sh --enable_hdfs=ON
```

### Configuration about HDFS support
### Configuration about HDFS support in Libhdfs3

HDFS uris (hdfs://host:port) will be extracted from a valid hdfs file path to initialize hdfs client, you do not need to specify it explicitly.

Expand Down Expand Up @@ -172,7 +173,7 @@ You also need to add configuration to the "hdfs-site.xml" as below:
</property>
```

### Kerberos support
### Kerberos support in libhdfs3

Here are two steps to enable kerberos.

Expand Down

0 comments on commit 5dda71a

Please sign in to comment.