- To list the structure (variables, dimensions and descriptions) of the netcdf file ‘A2008DDD.HHMM.nc’, run:
ncdump -h <AYYYYDDD.HHMM.nc>
-
To visualize the content of the netcdf file ‘A2008DDD.HHMM.nc’, we suggest using Ncview or Panoply.
-
To load a netcdf file and get a variable as a masked numpy.ndarray in python, run:
import netCDF4 as nc4
file = nc4.Dataset(‘A2008DDD.HHMM.nc’, 'r', format='NETCDF4')
variable_content = file.variables['variable_name'][:]
Check out loader.py for loading utils. CUMULO's variables are categorized into:
- geographic coordinates
coordinates = ['latitude', 'longitude']
- calibrated radiances (training features)
radiances = ['ev_250_aggr1km_refsb_1', 'ev_250_aggr1km_refsb_2', 'ev_1km_emissive_29', 'ev_1km_emissive_33', 'ev_1km_emissive_34', 'ev_1km_emissive_35', 'ev_1km_emissive_36', 'ev_1km_refsb_26', 'ev_1km_emissive_27', 'ev_1km_emissive_20', 'ev_1km_emissive_21', 'ev_1km_emissive_22', 'ev_1km_emissive_23']
- computed cloud properties (derived from radiances)
properties = ['cloud_water_path', 'cloud_optical_thickness', 'cloud_effective_radius', 'cloud_phase_optical_properties', 'cloud_top_pressure', 'cloud_top_height', 'cloud_top_temperature', 'cloud_emissivity', 'surface_temperature']
- cloud binary mask (telling whether a pixel is certainly cloudy or not)
rois = 'cloud_mask'
- annotations and cloud information (from CloudSat, available only along the track of the satellite)
labels = 'cloud_layer_type'
additional_information = ['cloud_layer_base', 'cloud_layer_top', 'cloud_type_quality', 'precipitation_flag']
IMPORTANT:
All variables containing layer in their name have an additional vertical dimension (latitude - longitude - cloud layer). Therefore, each 2D-pixel can take multiple values. These variables are defined on up to 10 different vertical layers of clouds. Distinct cloud vertical layers are identified by splitting cloud clusters with hydrometeor-free separation of at least 480 m. Because spotted clouds obviously vary over space and time both in type and quantity, layers are not predefined intervals of fixed size over the height, but their number and thickness vary over the pixels.
In our work, we classified clouds by retaining for each pixel the most frequent label from cloud_layer_type but there could be better choices (e.g., using the distribution of labels for each pixel, or weighting labels by layer thickness).
The provided methods (iResNet and LightGBM) are applied on 3x3 tiles extracted from the whole images using the following script.
python netcdf/nc_tile_extractor.py
Labeled tiles are sampled around each labeled pixel of an image and an equal amount of unlabeled tiles is sampled uniformly on the remaining cloudy portions of the image.
-
The jupyter notebook training provides the code for training a LightGBM model. See doc for installation.
-
The script predicting provides the code for predicting over the whole swath using the trained model. As the model takes as input 3x3 tiles, it is applied on the 2030x1354 swath sequentially and without overlappings.
The provided code is an adaptation of Invertible Residual Networks, ICML 2019.
-
The script training provides the code for training a hybrid iResNet on CUMULO.
-
The script predicting provides the code for predicting over the whole swath using the trained model. As the model takes as input 3x3 tiles, it is applied on the 2030x1354 swath sequentially and without overlappings.