![]() |
![]() |
![]() |
![]() |
This is not an officially supported Google product.
- Troy-VIS is the first efficient foundation model family for open-vocabulary object perception. It can detect and segment objects of any class in images and track objects of any class in videos.
- Troy-VIS can do open-vocabulary video instance segmentation of more than 1K object categories in real-time on A100 GPUs.
- Troy-VIS is trained on huge amount of images and videos from different domains, showing strong zero-shot perception ability.
- Installation: Please refer to INSTALL.md for more details.
- Data preparation: Please refer to DATA.md for more details.
- Training: Please refer to TRAIN.md for more details.
- Testing: Please refer to TEST.md for more details.
- Model zoo: Please refer to MODEL_ZOO.md for more details.
- Thanks GLEE for providing strong object-level foundation model as our baseline.
- ops from Deformable-DETR
- pycocotools from cocoapi
- d2 from detectron2