Opening output.csv file in all ranks can lead to very high file open overhead #131

jprorama · 2024-09-29T01:29:34Z

Bug Report

The output.csv file is currently opened during parameter parsing as side effect of the read_config() called by all ranks in the main() function of the core benchmarks in h5bench_patterns/. This leads to a degenerate performance scenario when Lustre striping is set to -1 (stripe across all OSTs) and ranks coordinate opening the output.csv. This causes observable coordination overhead at small ranks (64) and degrades rapidly with scale, easily leading to more than an hour of coordination overhead at just 512 ranks. This delays the start of the actual HDF5 write test until the file open coordination is complete. This can lead to unexpected job time overruns when exploring benchmark scenarios.

Note this does not appear to affect benchmark results. It just seriously degrades process runtime with what is assumed to be metadata operations related to the file open from all ranks. This has been tested with collective I/O, although the output.csv is reported as part of STDIO module accounting according to Darshan.

This configuration arises when setting the -1 Lustre stripe pattern for the HDF5 write tests by applying it to the storage directory specified in the .json config file. This approach ensures newly created HDF5 files have the desired stripe config. If the benchmark config doesn't take care to move the output.csv file to a location where this strip pattern is not active, the benchmark runtime grows due to file open overhead.

It seems reasonable to open the output.csv only for the rank that will write the file, currently rank 0. Because the file is opened during parameter parsing there is no information available about the rank of the process. This can be resolved by moving the csv_init() out of the _set_params() function in hbench_util.c and into the main() body of the benchmark runner, which knows the required function params and the process rank. This also gives ranks more control over when the file is opened. It is only written by rank 0 at the conclusion of all benchmark operations.

To Reproduce

How are you building/running h5bench?

Running

mpirun -n 512 -ppn 64 --depth 1  --label --line-buffer /eagle/dist_relational_alg/nuio/bin//h5bench_write storage/f41141cb-2100189.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov/h5bench.cfg storage/test-2100189.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov.h5

I'm building h5bench on Polaris.

mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=${NUIO_INSTALL_DIR} \
         -DWITH_ASYNC_VOL=ON \
         -DCMAKE_C_FLAGS="-I$HDF5_DIR/include -L/$HDF5_DIR/lib"

make
make install

What is the input configuration file you use?

{
    "mpi": {
        "command": "mpirun",
        "ranks": "64",
        "configuration": "-n 512 -ppn 64 --depth 1  --label --line-buffer"
    },
    "vol": {},
    "file-system": {},
    "directory": "storage",
    "benchmarks": [
        {
            "benchmark": "write",
            "file": "test-2100189.polaris-pbs-01.hsn.cm.polaris.alcf.anl.gov.h5",
            "configuration": {
                "MEM_PATTERN": "CONTIG",
                "FILE_PATTERN": "CONTIG",
                "TIMESTEPS": "3",
                "DELAYED_CLOSE_TIMESTEPS": "2",
                "COLLECTIVE_DATA": "YES",
                "COLLECTIVE_METADATA": "YES",
                "EMULATED_COMPUTE_TIME_PER_TIMESTEP": "1 s",
                "NUM_DIMS": "1",
                "DIM_1": "375385",
                "STDEV_DIM_1": "0",
                "DIM_2": "1",
                "DIM_3": "1",
                "CSV_FILE": "output.csv",
                "MODE": "SYNC",
                "DATA_DIST_SCALE": "5"
            }
        }
    ]
}

Expected Behavior

Opening log files should have low overhead.

Software Environment

version of h5bench: [e.g. 1.0] 1.4
installed h5bench using: [spack, from source] from source
operating system: [name and version] SUSE in CrayOS
machine: [Are you running on a supercomputer or public cluster?] Polaris
version of HDF5: [e.g. 1.12.0] 1.14
version of VOL-ASYNC: [e.g. 1.13.1] 1.8.1
name and version of MPI: [e.g. OpenMPI 4.1.1] cray-mpich/8.1.28

Additional information

Add any other information about the problem here.

The text was updated successfully, but these errors were encountered:

This avoids opening the file in all ranks, causing file open contention between ranks. Contention increases with rank count and stripe count. Proposed partial fix for hpc-io#131 for base write benchmarks.

jprorama added the bug Something isn't working label Sep 29, 2024

jprorama changed the title ~~Opening CSV file in all ranks leads~~ Opening output.csv file in all ranks can lead to very high file open overhead Sep 29, 2024

jprorama mentioned this issue Oct 3, 2024

Move output.csv file open to the the main function and limit to rank 0 #132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Opening output.csv file in all ranks can lead to very high file open overhead #131

Opening output.csv file in all ranks can lead to very high file open overhead #131

jprorama commented Sep 29, 2024

Opening output.csv file in all ranks can lead to very high file open overhead #131

Opening output.csv file in all ranks can lead to very high file open overhead #131

Comments

jprorama commented Sep 29, 2024