-
Notifications
You must be signed in to change notification settings - Fork 0
Validation of basicFusion products
Give 99.999% confidence to data users that all input files are processed properly to output files.
Data validation is very important step in data production. How can you tell whether the final data product is processed correctly without missing data? There are several ways to ensure that the contents of input files are migrated successfully to output files.
The main problem is the typical big data problem - volume, variety, and velocity.
- Check sum of input file sizes vs. output file size: they should be linearly proportional
- Process error logs: any error message indicates that something went wrong.
- Perform image analysis: data values may differ because of scale/offset processing but patterns (like Hurricane) in imagery will be same.
The final product should be interoperable with netCDF. Thus, the following validation will be also helpful to ensure data interoperability.
- Comparison of ncdump and h5dump: Both tools should report similar output.
To set up Python environment with matplotlib on Roger, run
module load python/2.7.10
first and run configureEnv.sh
in the util/ directory.
It will create a virtual environment in externLib/BFpyEnv
with all the required dependencies. Source the source externLib/BFpyEnv/bin/activate
, then run the inquireSize.py
script.
To install python-hdf4 to generate image from HDF4 files, install HDF4 first with -fPIC
export CFLAGS=-fPIC && ./configure --disable-netcdf --prefx=/home/username && make && make install
Then, export library and include
export INCLUDE_DIRS=/home/hyoklee/include && export LIBRARY_DIRS=/home/hyoklee/lib && pip install python-hdf4