Stars
Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
A collection of projects designed to help developers quickly get started with building deployable applications using the Anthropic API
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Code for paper "Achieving Sparse Activation in Small Language Models"
Fast and memory-efficient exact attention
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an …
The code for ICCV2023 Oral paper: Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification
Large-Vocabulary Video Instance Segmentation dataset
OVTrack: Open-Vocabulary Multiple Object Tracking [CVPR 2023]
SAM-PT: Extending SAM to zero-shot video segmentation with point-based tracking.
Official PyTorch implementation of FB-BEV & FB-OCC - Forward-backward view transformation for vision-centric autonomous driving perception
Associating Objects with Transformers for Video Object Segmentation
This repository implements continuous test-time adaptation algorithms for object detection on the SHIFT dataset.
Official PyTorch Code and Models of "RePaint: Inpainting using Denoising Diffusion Probabilistic Models", CVPR 2022
(TPAMI 2024) A Survey on Open Vocabulary Learning
[CVPR 2023] Unifying Short and Long-Term Tracking with Graph Hierarchies
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
A DETR-style framework for open-vocabulary detection (OVD). CVPR 2023
Official PyTorch implementation of ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [CVPR 2023 Highlight]
[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model