Name		Name	Last commit message	Last commit date
parent directory ..
src		src
README.md		README.md
iresnet_predict.py		iresnet_predict.py
iresnet_training.py		iresnet_training.py
lgbm.ipynb		lgbm.ipynb
lightgbm_predict.py		lightgbm_predict.py
physical evaluation.ipynb		physical evaluation.ipynb

README.md

Usage Examples

Reading NETCDF files

To list the structure (variables, dimensions and descriptions) of the netcdf file ‘A2008DDD.HHMM.nc’, run:

ncdump -h <AYYYYDDD.HHMM.nc>

To visualize the content of the netcdf file ‘A2008DDD.HHMM.nc’, we suggest using Ncview or Panoply.
To load a netcdf file and get a variable as a masked numpy.ndarray in python, run:

import netCDF4 as nc4

file = nc4.Dataset(‘A2008DDD.HHMM.nc’, 'r', format='NETCDF4')
variable_content = file.variables['variable_name'][:]

CUMULO for Machine Learning

Check out loader.py for loading utils. CUMULO's variables are categorized into:

geographic coordinates

coordinates = ['latitude', 'longitude']

calibrated radiances (training features)

radiances = ['ev_250_aggr1km_refsb_1', 'ev_250_aggr1km_refsb_2', 'ev_1km_emissive_29', 'ev_1km_emissive_33', 'ev_1km_emissive_34', 'ev_1km_emissive_35', 'ev_1km_emissive_36', 'ev_1km_refsb_26', 'ev_1km_emissive_27', 'ev_1km_emissive_20', 'ev_1km_emissive_21', 'ev_1km_emissive_22', 'ev_1km_emissive_23']

computed cloud properties (derived from radiances)

properties = ['cloud_water_path', 'cloud_optical_thickness', 'cloud_effective_radius', 'cloud_phase_optical_properties', 'cloud_top_pressure', 'cloud_top_height', 'cloud_top_temperature', 'cloud_emissivity', 'surface_temperature']

cloud binary mask (telling whether a pixel is certainly cloudy or not)

rois = 'cloud_mask'

annotations and cloud information (from CloudSat, available only along the track of the satellite)

labels = 'cloud_layer_type'
additional_information = ['cloud_layer_base', 'cloud_layer_top', 'cloud_type_quality', 'precipitation_flag']

IMPORTANT:

All variables containing layer in their name have an additional vertical dimension (latitude - longitude - cloud layer). Therefore, each 2D-pixel can take multiple values. These variables are defined on up to 10 different vertical layers of clouds. Distinct cloud vertical layers are identified by splitting cloud clusters with hydrometeor-free separation of at least 480 m. Because spotted clouds obviously vary over space and time both in type and quantity, layers are not predefined intervals of fixed size over the height, but their number and thickness vary over the pixels.

In our work, we classified clouds by retaining for each pixel the most frequent label from cloud_layer_type but there could be better choices (e.g., using the distribution of labels for each pixel, or weighting labels by layer thickness).

Running Baselines

Tile extraction

The provided methods (iResNet and LightGBM) are applied on 3x3 tiles extracted from the whole images using the following script.

python netcdf/nc_tile_extractor.py

Labeled tiles are sampled around each labeled pixel of an image and an equal amount of unlabeled tiles is sampled uniformly on the remaining cloudy portions of the image.

ML Baselines

LightGBM

The jupyter notebook training provides the code for training a LightGBM model. See doc for installation.
The script predicting provides the code for predicting over the whole swath using the trained model. As the model takes as input 3x3 tiles, it is applied on the 2030x1354 swath sequentially and without overlappings.

iResNet

The provided code is an adaptation of Invertible Residual Networks, ICML 2019.

The script training provides the code for training a hybrid iResNet on CUMULO.
The script predicting provides the code for predicting over the whole swath using the trained model. As the model takes as input 3x3 tiles, it is applied on the 2030x1354 swath sequentially and without overlappings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml-examples

ml-examples

README.md

Usage Examples

Reading NETCDF files

CUMULO for Machine Learning

Running Baselines

Tile extraction

ML Baselines

LightGBM

iResNet

Files

ml-examples

Directory actions

More options

Directory actions

More options

Latest commit

History

ml-examples

Folders and files

parent directory

README.md

Usage Examples

Reading NETCDF files

CUMULO for Machine Learning

Running Baselines

Tile extraction

ML Baselines

LightGBM

iResNet