Skip to content

Adaptation of the WaSR Segmentation Network for Unmanned Surface Vehicles v0.01

License

Notifications You must be signed in to change notification settings

t-thanh/wasr_network

Repository files navigation

WaSR - A water-obstacle separation and refinement network for unmanned surface vehicles

https://arxiv.org/abs/2001.01921 (ICRA 2020)

Obstacle detection by semantic segmentation shows a great promise for autonomous navigation in unmanned surface vehicles (USV). However, existing methods suffer from poor estimation of the water edge in the presence of visual ambiguities, poor detection of small obstacles and high false-positive rate on water reflections and wakes. We propose a new deep encoder-decoder architecture, a water-obstacle separation and refinement network (WaSR), to address these issues. Detection and water edge accuracy are improved by a novel decoder that gradually fuses inertial information from IMU with the visual features from the encoder. In addition, a novel loss function is designed to increase the separation between water and obstacle features early on in the network. Subsequently, the capacity of the remaining layers in the decoder is better utilised, leading to a significant reduction in false positives and increased true positives. Experimental results show that WaSR outperforms the current state-of-the-art by a large margin, yielding a 14% increase in F-measure over the second-best method.

Updates:

  • [March 2020] Thomas Clunie ported WaSR to Python3 and Tensorflow 1.15.2
  • [February 2020] Initial commit

To-Do:

  • Port the IMU variation fully to Python
  • Upload requirements.txt file for quick installation
  • Re-upload weights
  • Update the read-me file

1. Installation

With Dockfile

Requirements

  • Dockerfile is created for tensorflow-gpu 1.8.0 in python 2.7, base from docker image of tensorflow:tensorflow tag 1.8.0-gpu
  • You need to have NVIDIA graphic card compatible to at least CUDA 8.0
  • The container-toolkit must be installed follow NVIDIA's Instruction
  • In case you don't have NVIDA graphic card, then edit the Dockerfile to remove all the GPU requirements

Instructions

  • Download the Dockerfile, go in to the folder where the Dockerfile store
  • Pull out the container - install all the requirements - download the pre-trained weight - run the script
docker build -t wasr_docker .
  • Show result
docker run --gpus all -it --rm -e DISPLAY=${DISPLAY} -v /tmp/.X11-unix:/tmp/.X11-unix -v $HOME:/home/$USER wasr_docker bash -c "display /home/docker/wasr_network/test.jpg & display /home/docker/wasr_network/output/output_mask.png"

Requirements

To successfully run WaSR you will need the following packages:

Execute the following sequence of commands to download and install required packages and libraries (Ubuntu):

$ sudo apt-get update
$ sudo apt-get install python2.7
$ sudo apt-get install python-opencv
$ pip install -r requirements.txt

2. Architecture overview

The WaSR architecture consists of a contracting path (encoder) and an expansive path (decoder). The purpose of the encoder is construction of deep rich features, while the primary task of the decoder is fusion of inertial and visual information, increasing the spatial resolution and producing the segmentation output.

Encoder

Following the recent analysis [1] of deep networks on a maritime segmentation task, we base our encoder on the low-to-mid level backbone parts of DeepLab2 [2], i.e., a ResNet-101 [3] backbone with atrous convolutions. In particular, the model is composed of four residual convolutional blocks (denoted as res2, res3, res4 and res5) combined with max-pooling layers. Hybrid atrous convolutions are added to the last two blocks for increasing the receptive field and encoding a local context information into deep features.

Decoder

The primary tasks of the decoder is fusion of visual and inertial information. We introduce the inertial information by constructing an IMU feature channel that encodes location of horizon at a pixel level. In particular, camera-IMU projection [4] is used to estimate the horizon line and a binary mask with all pixels below the horizon set to one is constructed. This IMU mask serves a prior probability of water location and for improving the estimated location of the water edge in the output segmentation.

The IMU mask is treated as an externally generated feature channel, which is fused with the encoder features at multiple levels of the decoder. However, the values in the IMU channel and the encoder features are at different scales. To avoid having to manually adjust the fusion weights, we apply approaches called Attention Refinement Modules (ARM) and Feature Fusion Module (FFM) proposed by [5] to learn an optimal fusion strategy.

The final block of the decoder is Atrous Spatial Pyramid Pooling (ASPP) module [2], followed by a softmax which improve the segmentation of small structures (such as small buoys) and produces the final segmentation mask.

Semantic seperation loss

Since we would like to enforce clustering of water features, we can approximate their distribution by a Guassian with per-channel means and variances, where we assume channel independence for computational tractability. Similarity of all other pixels corresponding to obstacles can be measured as a joint probability under this Gaussian, i.e.,

We would like to enforce learning of features that minimize this probability. By expanding the equation for water per-channel standard deviations, taking the log of the above equation, flipping the sign and inverting, we arrive at the following equivalent obstacle-water separation loss

3. Running WaSR

Training

To train the network from scratch (or from some pretrained weights) use scripts wasr_train_noimu.py for the NO-IMU variation or wasr_train_imu.py for the IMU variation. Both scripts expect the same input arguments. When fine-tunning the network make sure to freeze the pretrained parameters for initial n iterations and train only the last layer.

Input Arguments

  • batch-size - number of images sent to the network in one step
  • data-dir - path to the directory containing the MODD2 dataset
  • data-list - path to the file listing the images in the dataset
  • grad-update-every - number of steps after which gradient update is applied
  • ignore-label - the value of the label to ignore during the training
  • input-size - comma-separated string with height and width of images (default: 384,512)
  • is-training - whether to update the running means and variances during the training
  • learning-rate - base learning rate for training with polynomial decay
  • momentum - moment component of the optimiser
  • not-restore-last - whether to no restore last layers (when using weights from pretrained encoder network)
  • num-classes - number of classes to predict
  • num-steps - number of training steps (this are not epoch!)
  • power - decay parameter to compute the learning rate
  • restore-from - where restore model parameters from
  • snapshot-dir - where to save snapshots of the model
  • weight-decay - regularisation parameter for L2-loss

Pretrained Weights

  • WaSR NO-IMU variant - weights are available for download here
  • WaSR IMU variant - To-Do

Inference

To perform the inference on a specific single image use scripts wasr_inference_noimu_general.py for the WaSR NO-IMU variant or wasr_inference_imu_general.py for the WaSR IMU variant. Both scripts expect the same input arguments and can be run on images from arbitrary maritime dataset.

Input Arguments (General Inference)

  • dataset-path - path to MODD2 dataset files on which inference is performed
  • model-weights - path to the file with model weights
  • num-classes - number of classes to predict
  • save-dir - where to save predicted mask
  • img-path - path to the image on which we want to run inference

Example usage:

python wasr_inference_noimu_general.py --img-path example_1.jpg

The above command will take image example_1.jpg from folder test_images/ and segment it. The segmentation result will be saved in the output/ folder by default.

Example input image Example segmentation output

To run the inference on the MODD2 dataset use the provided bash scripts wasr_inferences_noimu.sh for the WaSR NO-IMU variant or wasr_inferences_imu.sh for the WaSR IMU variant. Bash scripts will run corresponding Python codes (wasr_inference_noimu.py and wasr_inference_imu.py).

Input Arguments (Python MODD2 inference script)

  • dataset-path - path to MODD2 dataset files on which inference is performed
  • model-weights - path to the file with model weights
  • num-classes - number of classes to predict
  • save-dir - where to save predicted mask
  • seq - sequence number to evaluate
  • seq-txt - path to the file listing the images in the sequence

4. References

[1] Bovcon et. al, The MaSTr1325 Dataset for Training Deep USV Obstacle Detection Models, IROS 2019
[2] Chen et. al, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs, TPAMI 2018
[3] He et. al, Deep residual learning for image recognition, CVPR 2016
[4] Bovcon et. al, Stereo Obstacle Detection for Unmanned Surface Vehicles by IMU-assisted Semantic Segmentation, RAS 2018
[5] Yu et. al, Bisenet: Bilateral segmentation network for real-time semantic segmentation, ECCV 2018

About

Adaptation of the WaSR Segmentation Network for Unmanned Surface Vehicles v0.01

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published