Initial commit

egeozsoy · Jun 15, 2023 · 37d1910 · 37d1910
commit 37d1910
Show file tree

Hide file tree

Showing 102 changed files with 613,920 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2023 Ege Özsoy
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,80 @@
+# LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal Reasoning in Dynamic Operating Rooms
+<img align="right" src="figures/teaser.jpg" alt="teaser" width="30%" style="margin-left: 10px">
+Official code of the paper LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal Reasoning in Dynamic Operating Rooms (https://arxiv.org/abs/2303.13293) to be published at MICCAI 2023. LABRAD-OR introduces a novel way of using temporal information for more accurate and consistent holistic OR modeling.
+Specifically, we introduce memory scene graphs, where the scene graphs of previous time steps act as the temporal representation guiding the current prediction. We design an end-to-end architecture
+that intelligently fuses the temporal information of our lightweight memory scene graphs with the visual information from point clouds and images. We evaluate our method on the 4D-OR dataset and
+demonstrate that integrating temporality leads to more accurate and consistent results achieving an +5% increase and a new SOTA of 0.88 in macro F1. 
+
+
+**Authors**: [Ege Özsoy][eo], [Tobias Czempiel][tc], [Felix Holm][fh] , [Chantal Pellegrini][cp], [Nassir Navab][nassir]
+
+[eo]:https://www.cs.cit.tum.de/camp/members/ege-oezsoy/
+
+[tc]:https://www.cs.cit.tum.de/camp/members/tobias-czempiel/
+
+[fh]:https://www.cs.cit.tum.de/camp/members/felix-holm/
+
+[cp]:https://www.cs.cit.tum.de/camp/members/chantal-pellegrini/
+
+[nassir]:https://www.cs.cit.tum.de/camp/members/cv-nassir-navab/nassir-navab/
+
+```
+@inproceedings{Özsoy2023_LABRAD_OR,
+    title={LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal Reasoning in Dynamic Operating Rooms},
+    author={Ege Özsoy, Tobias Czempiel, Felix Holm, Chantal Pellegrini, Nassir Navab},
+    booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
+    year={2023},
+    organization={Springer}
+}
+@inproceedings{Özsoy2022_4D_OR,
+    title={4D-OR: Semantic Scene Graphs for OR Domain Modeling},
+    author={Ege Özsoy, Evin Pınar Örnek, Ulrich Eck, Tobias Czempiel, Federico Tombari, Nassir Navab},
+    booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
+    year={2022},
+    organization={Springer}
+}
+@inproceedings{Özsoy2021_MSSG,
+    title={Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical Procedures},
+    author={Ege Özsoy, Evin Pınar Örnek, Ulrich Eck, Federico Tombari, Nassir Navab},
+    booktitle={Arxiv},
+    year={2021}
+}
+```
+
+## What is not included in this repository?
+
+The 4D-OR Dataset itself, Human and Object Pose Prediction methods as well as the downstream task of role prediction are not part of this repository. Please refer to the
+original [4D-OR](https://github.com/egeozsoy/4D-OR) repository for information on downloading the dataset and running the 2D, 3D Human Pose Prediction and 3D Object Pose Prediction as well as the
+downstream task of Role Prediction.
+
+## Create Scene Graph Prediction Environment
+
+- Recommended PyTorch Version: pytorch==1.10.0
+- conda create --name labrad-or python=3.7
+- conda activate labrad-or
+- conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
+- `cd` into scene_graph_prediction and run `pip install -r requirements.txt`
+- Run `wget ` and unzip # TODO modify link
+- (Optional) To use the pretrained models, move files from the unzipped directory ending with `.ckpt` into the folder `scene_graph_prediction/scene_graph_helpers/paper_weights`
+- `cd` into pointnet2_dir and run `CUDA_HOME=/usr/local/cuda-11.3 pip install pointnet2_ops_lib/.`
+  Run `pip install torch-scatter==2.0.9 torch-sparse==0.6.12 torch-cluster==1.5.9 torch-spline-conv==1.2.1 torch-geometric==2.0.2 -f https://data.pyg.org/whl/torch-1.10.0+cu113.html`
+
+## Scene Graph Prediction
+<img src="figures/visual_abstract.pdf" alt="pipeline" width="75%"/>
+
+As we built upon https://github.com/egeozsoy/4D-OR, the code is structured similarly.
+
+- `cd` into scene_graph_prediction
+- To train a new visual only model which only uses point cloud, run `python -m scene_graph_prediction.main --config visual_only.json`.
+- To train a new visual only model which uses point cloud and images, run `python -m scene_graph_prediction.main --config visual_only_with_images.json`
+- To train labrad-or which only uses point clouds, run `python -m scene_graph_prediction.main --config labrad_or.json`. This requires the pretrained visual only model to be present.
+- To train labrad-or which uses point clouds and images, run `python -m scene_graph_prediction.main --config labrad_or_with_images.json`. This requires the pretrained visual only model to be present.
+- We provide all four pretrained models TODO link. You can simply use them instead of retraining your own models, as described in the environment setup.
+- To evaluate either a model you trained or a pretrained model from us, change the mode to `evaluate` in the main.py and rerun using the same commands as before
+    - If you want to replicate the results from the paper, you can hardcode the corresponding weight checkpoint as `checkpoint_path` in the main.py
+- To infer on the test set, change the mode to `infer` again run one of the 4 corresponding commands. Again you can hardcode the corresponding weight checkpoint as `checkpoint_path` in the main.py
+    - By default, evaluation is done on the validation set, and inference on test, but these can be changed.
+- You can evaluate on the test set as well by using https://bit.ly/4D-OR_evaluator and uploading your inferred predictions. Be aware that compared to the
+  evaluation in the paper, this evaluation does not require human poses to be available, and therefore can slightly overestimate the results. We get a macro
+  0.76 instead of 0.75 on test set.
+- If you want to continue with role prediction, please refer to the original 4D-OR repository. You can use the inferred scene graphs from the previous step as input to the role prediction.
diff --git a/consistency_calculator.py b/consistency_calculator.py
@@ -0,0 +1,64 @@
+import json
+
+from helpers.configurations import TAKE_SPLIT
+
+
+def main():
+    # json_file_name = 'scan_relations_visual_only_val.json' # visual_only
+    # json_file_name = 'scan_relations_visual_only_with_images_val.json' # visual_only_with_images
+    # json_file_name = 'scan_relations_labrad-or_val.json' # labrad-or
+    json_file_name = 'scan_relations_labrad-or_with_images_val.json'  # labrad-or_with_images
+    print(f'Calculating consistency for {json_file_name}...')
+    with open(json_file_name) as f:
+        data = json.load(f)
+        macro_rel_consistency = []
+        for take in TAKE_SPLIT['val']:
+            take_rel_consistencies = []
+            last_rels = None
+            for scan_id in sorted([key for key in data.keys() if int(key.split('_')[0]) == take]):
+                unique_rels = {pred for _, pred, _ in data[scan_id]}
+                if last_rels is not None:
+                    if len(unique_rels) == 0 and len(last_rels) == 0:
+                        rel_consistency = 1
+                    else:
+                        rel_consistency = len(unique_rels.intersection(last_rels)) / len(unique_rels.union(last_rels))
+                    if rel_consistency < 1:
+                        print(f'{scan_id}: {unique_rels.symmetric_difference(last_rels)}')
+                    take_rel_consistencies.append(rel_consistency)
+
+                last_rels = unique_rels
+            take_rel_consistency = sum(take_rel_consistencies) / len(take_rel_consistencies)
+            print(f'Take {take} rel consistency: {take_rel_consistency:.4f}')
+            macro_rel_consistency.append(take_rel_consistency)
+
+        print(f'Macro rel consistency: {sum(macro_rel_consistency) / len(macro_rel_consistency):.4f}')
+
+
+def main_gt():
+    json_file_name = 'data/relationships_validation.json'
+    print(f'Calculating consistency for GT {json_file_name}...')
+    with open(json_file_name) as f:
+        data = json.load(f)
+        macro_rel_consistency = []
+        for take in TAKE_SPLIT['val']:
+            take_rel_consistencies = []
+            last_rels = None
+            for scan in sorted([scan for scan in data['scans'] if scan['take_idx'] == take], key=lambda x: x['scan']):
+                unique_rels = {pred for _, _, _, pred in scan['relationships']}
+                if last_rels is not None:
+                    if len(unique_rels) == 0 and len(last_rels) == 0:
+                        rel_consistency = 1
+                    else:
+                        rel_consistency = len(unique_rels.intersection(last_rels)) / len(unique_rels.union(last_rels))
+                    take_rel_consistencies.append(rel_consistency)
+
+                last_rels = unique_rels
+            take_rel_consistency = sum(take_rel_consistencies) / len(take_rel_consistencies)
+            print(f'Take {take} rel consistency: {take_rel_consistency:.4f}')
+            macro_rel_consistency.append(take_rel_consistency)
+
+        print(f'Macro rel consistency: {sum(macro_rel_consistency) / len(macro_rel_consistency):.4f}')
+
+
+if __name__ == '__main__':
+    main_gt()
diff --git a/data/classes.txt b/data/classes.txt
@@ -0,0 +1,12 @@
+Patient
+anesthesia_equipment
+human_0
+human_1
+human_2
+human_3
+human_4
+human_5
+instrument
+instrument_table
+operating_table
+secondary_table
diff --git a/data/relationships.txt b/data/relationships.txt
@@ -0,0 +1,14 @@
+Assisting
+Cementing
+Cleaning
+CloseTo
+Cutting
+Drilling
+Hammering
+Holding
+LyingOn
+Operating
+Preparing
+Sawing
+Suturing
+Touching