Skip to content

google-research/troyvis

Repository files navigation

Troy-VIS: Towards Real-Time Open-Vocabulary Video Instance Segmentation

fishes.gif wolf.gif
akkordion.gif donkey.gif

This is not an officially supported Google product.

Highlight:

  • Troy-VIS is the first efficient foundation model family for open-vocabulary object perception. It can detect and segment objects of any class in images and track objects of any class in videos.
  • Troy-VIS can do open-vocabulary video instance segmentation of more than 1K object categories in real-time on A100 GPUs.
  • Troy-VIS is trained on huge amount of images and videos from different domains, showing strong zero-shot perception ability.

Getting started

  1. Installation: Please refer to INSTALL.md for more details.
  2. Data preparation: Please refer to DATA.md for more details.
  3. Training: Please refer to TRAIN.md for more details.
  4. Testing: Please refer to TEST.md for more details.
  5. Model zoo: Please refer to MODEL_ZOO.md for more details.

Acknowledgments

  • Thanks GLEE for providing strong object-level foundation model as our baseline.

Third Party

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published