forked from kookmin-sw/cap-template
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
8e4809e
commit 798b509
Showing
55 changed files
with
3,685 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
.ipynb_checkpoints | ||
*.pyc | ||
*.ipynb | ||
*.npy | ||
|
||
output/ | ||
datasets/ | ||
pretrained/ | ||
__pycache__/ | ||
condor_log/ | ||
.cache/ | ||
.nv/ | ||
docker_stderror |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
repos: | ||
|
||
- repo: https://github.com/psf/black | ||
rev: 20.8b1 # Replace by any tag/version: https://github.com/psf/black/tags | ||
hooks: | ||
- id: black | ||
language_version: python3 # Should be a command that runs python3.6+ | ||
|
||
# isort | ||
- repo: https://github.com/timothycrosley/isort | ||
rev: 5.6.4 | ||
hooks: | ||
- id: isort | ||
|
||
# flake8 | ||
- repo: https://github.com/PyCQA/flake8 | ||
rev: 3.8.3 | ||
hooks: | ||
- id: flake8 | ||
args: ["--config=setup.cfg", "--ignore=W504, W503, E501, E203, E741, F821"] | ||
|
||
# pre-commit-hooks | ||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v3.2.0 | ||
hooks: | ||
- id: trailing-whitespace # Trim trailing whitespace | ||
- id: check-merge-conflict # Check for files that contain merge conflict strings | ||
- id: end-of-file-fixer # Make sure files end in a newline and only a newline | ||
- id: requirements-txt-fixer # Sort entries in requirements.txt and remove incorrect entry for pkg-resources==0.0.0 | ||
- id: fix-encoding-pragma # Remove the coding pragma: # -*- coding: utf-8 -*- | ||
args: ["--remove"] | ||
- id: mixed-line-ending # Replace or check mixed line ending | ||
args: ["--fix=lf"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# Text Based Person Search with Limited Data | ||
|
||
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/text-based-person-search-with-limited-data/nlp-based-person-retrival-on-cuhk-pedes)](https://paperswithcode.com/sota/nlp-based-person-retrival-on-cuhk-pedes?p=text-based-person-search-with-limited-data) | ||
|
||
This is the codebase for our [BMVC 2021 paper](https://arxiv.org/abs/2110.10807). | ||
|
||
Slides and video for the online presentation are now available at [BMVC 2021 virtual conference website](https://www.bmvc2021-virtualconference.com/conference/papers/paper_0044.html). | ||
|
||
## Updates | ||
- (10/12/2021) Add download link of trained models. | ||
- (06/12/2021) Code refactor for easy reproduce. | ||
- (20/10/2021) Code released! | ||
|
||
## Abstract | ||
Text-based person search (TBPS) aims at retrieving a target person from an image gallery with a descriptive text query. | ||
Solving such a fine-grained cross-modal retrieval task is challenging, which is further hampered by the lack of large-scale datasets. | ||
In this paper, we present a framework with two novel components to handle the problems brought by limited data. | ||
Firstly, to fully utilize the existing small-scale benchmarking datasets for more discriminative feature learning, we introduce a cross-modal momentum contrastive learning framework to enrich the training data for a given mini-batch. Secondly, we propose to transfer knowledge learned from existing coarse-grained large-scale datasets containing image-text pairs from drastically different problem domains to compensate for the lack of TBPS training data. A transfer learning method is designed so that useful information can be transferred despite the large domain gap. Armed with these components, our method achieves new state of the art on the CUHK-PEDES dataset with significant improvements over the prior art in terms of Rank-1 and mAP. | ||
|
||
## Results | ||
![image](https://user-images.githubusercontent.com/37724292/144879635-86ab9c7b-0317-4b42-ac46-a37b06853d18.png) | ||
|
||
## Installation | ||
### Setup environment | ||
```bash | ||
conda create -n txtreid-env python=3.7 | ||
conda activate txtreid-env | ||
git clone https://github.com/BrandonHanx/TextReID.git | ||
cd TextReID | ||
pip install -r requirements.txt | ||
pre-commit install | ||
``` | ||
### Get CUHK-PEDES dataset | ||
- Request the images from [Dr. Shuang Li](https://github.com/ShuangLI59/Person-Search-with-Natural-Language-Description). | ||
- Download the pre-processed captions we provide from [Google Drive](https://drive.google.com/file/d/1V4d8OjFket5SaQmBVozFFeflNs6f9e1R/view?usp=sharing). | ||
- Organize the dataset as following: | ||
```bash | ||
datasets | ||
└── cuhkpedes | ||
├── annotations | ||
│ ├── test.json | ||
│ ├── train.json | ||
│ └── val.json | ||
├── clip_vocab_vit.npy | ||
└── imgs | ||
├── cam_a | ||
├── cam_b | ||
├── CUHK01 | ||
├── CUHK03 | ||
├── Market | ||
├── test_query | ||
└── train_query | ||
``` | ||
|
||
### Download CLIP weights | ||
```bash | ||
mkdir pretrained/clip/ | ||
cd pretrained/clip | ||
wget https://openaipublic.azureedge.net/clip/models/afeb0e10f9e5a86da6080e35cf09123aca3b358a0c3e3b6c78a7b63bc04b6762/RN50.pt | ||
wget https://openaipublic.azureedge.net/clip/models/8fa8567bab74a42d41c5915025a8e4538c3bdbe8804a470a72f30b0d94fab599/RN101.pt | ||
cd - | ||
|
||
``` | ||
|
||
### Train | ||
```bash | ||
python train_net.py \ | ||
--config-file configs/cuhkpedes/moco_gru_cliprn50_ls_bs128_2048.yaml \ | ||
--use-tensorboard | ||
``` | ||
### Inference | ||
```bash | ||
python test_net.py \ | ||
--config-file configs/cuhkpedes/moco_gru_cliprn50_ls_bs128_2048.yaml \ | ||
--checkpoint-file output/cuhkpedes/moco_gru_cliprn50_ls_bs128_2048/best.pth | ||
``` | ||
You can download our trained models (with CLIP RN50 and RN101) from [Google Drive](https://drive.google.com/drive/folders/1MoceVsLiByg3Sg8_9yByGSvR3ru15hJL?usp=sharing). | ||
|
||
## TODO | ||
- [ ] Try larger pre-trained CLIP models. | ||
- [ ] Fix the bug of multi-gpu runninng. | ||
- [ ] Add dataloader for [ICFG-PEDES](https://github.com/zifyloo/SSAN). | ||
|
||
## Citation | ||
If you find this project useful for your research, please use the following BibTeX entry. | ||
``` | ||
@inproceedings{han2021textreid, | ||
title={Text-Based Person Search with Limited Data}, | ||
author={Han, Xiao and He, Sen and Zhang, Li and Xiang, Tao}, | ||
booktitle={BMVC}, | ||
year={2021} | ||
} | ||
``` |
41 changes: 41 additions & 0 deletions
41
ai/TextReID/configs/cuhkpedes/baseline_gru_cliprn101_ls_bs128.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
MODEL: | ||
WEIGHT: "imagenet" | ||
FREEZE: False | ||
VISUAL_MODEL: "m_resnet101" | ||
TEXTUAL_MODEL: "bigru" | ||
NUM_CLASSES: 11003 | ||
GRU: | ||
ONEHOT: "clip_vit" | ||
EMBEDDING_SIZE: 512 | ||
NUM_UNITS: 512 | ||
VOCABULARY_SIZE: 512 | ||
DROPOUT_KEEP_PROB: 1.0 | ||
MAX_LENGTH: 100 | ||
RESNET: | ||
RES5_STRIDE: 1 | ||
EMBEDDING: | ||
EMBED_HEAD: 'simple' | ||
FEATURE_SIZE: 256 | ||
DROPOUT_PROB: 0.0 | ||
EPSILON: 0.1 | ||
INPUT: | ||
HEIGHT: 384 | ||
WIDTH: 128 | ||
USE_AUG: True | ||
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073] | ||
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711] | ||
DATASETS: | ||
TRAIN: ("cuhkpedes_train", ) | ||
TEST: ("cuhkpedes_test", ) | ||
SOLVER: | ||
IMS_PER_BATCH: 128 | ||
NUM_EPOCHS: 80 | ||
BASE_LR: 0.0001 | ||
WEIGHT_DECAY: 0.00004 | ||
CHECKPOINT_PERIOD: 40 | ||
LRSCHEDULER: 'step' | ||
STEPS: (40, 70) | ||
WARMUP_FACTOR: 0.1 | ||
WARMUP_EPOCHS: 5 | ||
TEST: | ||
IMS_PER_BATCH: 128 |
41 changes: 41 additions & 0 deletions
41
ai/TextReID/configs/cuhkpedes/baseline_gru_cliprn50_ls_bs128.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
MODEL: | ||
WEIGHT: "imagenet" | ||
FREEZE: False | ||
VISUAL_MODEL: "m_resnet50" | ||
TEXTUAL_MODEL: "bigru" | ||
NUM_CLASSES: 11003 | ||
GRU: | ||
ONEHOT: "clip_vit" | ||
EMBEDDING_SIZE: 512 | ||
NUM_UNITS: 512 | ||
VOCABULARY_SIZE: 512 | ||
DROPOUT_KEEP_PROB: 1.0 | ||
MAX_LENGTH: 100 | ||
RESNET: | ||
RES5_STRIDE: 1 | ||
EMBEDDING: | ||
EMBED_HEAD: 'simple' | ||
FEATURE_SIZE: 256 | ||
DROPOUT_PROB: 0.0 | ||
EPSILON: 0.1 | ||
INPUT: | ||
HEIGHT: 384 | ||
WIDTH: 128 | ||
USE_AUG: True | ||
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073] | ||
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711] | ||
DATASETS: | ||
TRAIN: ("cuhkpedes_train", ) | ||
TEST: ("cuhkpedes_test", ) | ||
SOLVER: | ||
IMS_PER_BATCH: 128 | ||
NUM_EPOCHS: 80 | ||
BASE_LR: 0.0001 | ||
WEIGHT_DECAY: 0.00004 | ||
CHECKPOINT_PERIOD: 40 | ||
LRSCHEDULER: 'step' | ||
STEPS: (40, 70) | ||
WARMUP_FACTOR: 0.1 | ||
WARMUP_EPOCHS: 5 | ||
TEST: | ||
IMS_PER_BATCH: 128 |
39 changes: 39 additions & 0 deletions
39
ai/TextReID/configs/cuhkpedes/baseline_gru_rn50_ls_bs128.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
MODEL: | ||
WEIGHT: "imagenet" | ||
FREEZE: False | ||
VISUAL_MODEL: "resnet50" | ||
TEXTUAL_MODEL: "bigru" | ||
NUM_CLASSES: 11003 | ||
GRU: | ||
ONEHOT: "yes" | ||
EMBEDDING_SIZE: 512 | ||
NUM_UNITS: 512 | ||
VOCABULARY_SIZE: 12000 | ||
DROPOUT_KEEP_PROB: 1.0 | ||
MAX_LENGTH: 100 | ||
RESNET: | ||
RES5_STRIDE: 1 | ||
EMBEDDING: | ||
EMBED_HEAD: 'simple' | ||
FEATURE_SIZE: 256 | ||
DROPOUT_PROB: 0.0 | ||
EPSILON: 0.1 | ||
INPUT: | ||
HEIGHT: 384 | ||
WIDTH: 128 | ||
USE_AUG: True | ||
DATASETS: | ||
TRAIN: ("cuhkpedes_train", ) | ||
TEST: ("cuhkpedes_test", ) | ||
SOLVER: | ||
IMS_PER_BATCH: 128 | ||
NUM_EPOCHS: 80 | ||
BASE_LR: 0.0001 | ||
WEIGHT_DECAY: 0.00004 | ||
CHECKPOINT_PERIOD: 40 | ||
LRSCHEDULER: 'step' | ||
STEPS: (40, 70) | ||
WARMUP_FACTOR: 0.1 | ||
WARMUP_EPOCHS: 5 | ||
TEST: | ||
IMS_PER_BATCH: 128 |
44 changes: 44 additions & 0 deletions
44
ai/TextReID/configs/cuhkpedes/moco_gru_cliprn101_ls_bs128_2048.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
MODEL: | ||
WEIGHT: "imagenet" | ||
FREEZE: False | ||
VISUAL_MODEL: "m_resnet101" | ||
TEXTUAL_MODEL: "bigru" | ||
NUM_CLASSES: 11003 | ||
GRU: | ||
ONEHOT: "clip_vit" | ||
EMBEDDING_SIZE: 512 | ||
NUM_UNITS: 512 | ||
VOCABULARY_SIZE: 512 | ||
DROPOUT_KEEP_PROB: 1.0 | ||
MAX_LENGTH: 100 | ||
RESNET: | ||
RES5_STRIDE: 1 | ||
EMBEDDING: | ||
EMBED_HEAD: 'moco' | ||
FEATURE_SIZE: 256 | ||
DROPOUT_PROB: 0.0 | ||
EPSILON: 0.1 | ||
MOCO: | ||
FC: False | ||
K: 2048 | ||
INPUT: | ||
HEIGHT: 384 | ||
WIDTH: 128 | ||
USE_AUG: True | ||
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073] | ||
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711] | ||
DATASETS: | ||
TRAIN: ("cuhkpedes_train", ) | ||
TEST: ("cuhkpedes_test", ) | ||
SOLVER: | ||
IMS_PER_BATCH: 128 | ||
NUM_EPOCHS: 80 | ||
BASE_LR: 0.0001 | ||
WEIGHT_DECAY: 0.00004 | ||
CHECKPOINT_PERIOD: 40 | ||
LRSCHEDULER: 'step' | ||
STEPS: (40, 70) | ||
WARMUP_FACTOR: 0.1 | ||
WARMUP_EPOCHS: 5 | ||
TEST: | ||
IMS_PER_BATCH: 128 |
44 changes: 44 additions & 0 deletions
44
ai/TextReID/configs/cuhkpedes/moco_gru_cliprn50_ls_bs128_2048.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
MODEL: | ||
WEIGHT: "imagenet" | ||
FREEZE: False | ||
VISUAL_MODEL: "m_resnet50" | ||
TEXTUAL_MODEL: "bigru" | ||
NUM_CLASSES: 11003 | ||
GRU: | ||
ONEHOT: "clip_vit" | ||
EMBEDDING_SIZE: 512 | ||
NUM_UNITS: 512 | ||
VOCABULARY_SIZE: 512 | ||
DROPOUT_KEEP_PROB: 1.0 | ||
MAX_LENGTH: 100 | ||
RESNET: | ||
RES5_STRIDE: 1 | ||
EMBEDDING: | ||
EMBED_HEAD: 'moco' | ||
FEATURE_SIZE: 256 | ||
DROPOUT_PROB: 0.0 | ||
EPSILON: 0.1 | ||
MOCO: | ||
FC: False | ||
K: 2048 | ||
INPUT: | ||
HEIGHT: 384 | ||
WIDTH: 128 | ||
USE_AUG: True | ||
PIXEL_MEAN: [0.48145466, 0.4578275, 0.40821073] | ||
PIXEL_STD: [0.26862954, 0.26130258, 0.27577711] | ||
DATASETS: | ||
TRAIN: ("cuhkpedes_train", ) | ||
TEST: ("cuhkpedes_test", ) | ||
SOLVER: | ||
IMS_PER_BATCH: 128 | ||
NUM_EPOCHS: 80 | ||
BASE_LR: 0.0001 | ||
WEIGHT_DECAY: 0.00004 | ||
CHECKPOINT_PERIOD: 40 | ||
LRSCHEDULER: 'step' | ||
STEPS: (40, 70) | ||
WARMUP_FACTOR: 0.1 | ||
WARMUP_EPOCHS: 5 | ||
TEST: | ||
IMS_PER_BATCH: 128 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
import json | ||
import re | ||
import random | ||
|
||
|
||
def encode(query): | ||
file_path = "./datasets/cuhkpedes/annotations/test.json" | ||
with open(file_path, "r") as file: | ||
data = json.load(file) | ||
|
||
word_dict = {} # word : encode | ||
max_onehot = -1 | ||
|
||
for i in range(len(data["annotations"])): | ||
words = re.sub(r'[^a-zA-Z0-9\s]', '', data["annotations"][i]["sentence"]) | ||
words = words.split() | ||
for word, onehot in zip(words, data["annotations"][i]["onehot"]): | ||
if onehot > max_onehot: | ||
max_onehot = onehot | ||
if word.lower() not in word_dict.keys(): | ||
word_dict[word.lower()] = onehot | ||
|
||
output = [] | ||
query = re.sub(r'[^a-zA-Z0-9\s]', '', query) | ||
for w in query.split(): | ||
try: | ||
output.append(word_dict[w.lower()]) | ||
except KeyError as e: | ||
print("Key %s not found in the dictionary."%{e.args[0]}) | ||
"""word_dict[max_onehot+1] = e.args[0] | ||
word_dict[e.args[0]] = max_onehot + 1 | ||
output.append(word_dict[w.lower()])""" | ||
output.append("None") | ||
max_onehot += 1 | ||
|
||
# print(word_dict) | ||
return output |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from .defaults import _C as cfg | ||
|
||
__all__ = ["cfg"] |
Oops, something went wrong.