Skip to content

Latest commit

 

History

History
16 lines (11 loc) · 1.87 KB

File metadata and controls

16 lines (11 loc) · 1.87 KB

Image caption generator

  • The idea of this project is to create a Deep Learning model that deliver textual description from given photographs. Thus a combination of different techniques from both Computer Vision and Natural Langue Processing are conducted.
  • I used pre-trained weight on imagenet dataset of Resnet50 architecture to extract training features. Later LSTM and Reset50 are combined as one deep CNN to train this caption generator model.
  • The dataset used for this dataset can be downloaded from Kaggle.

Before running this project

  1. In order to execute notebook test_model_image_generators.ipynb or python script test_model_image_generators.py, it is required to conduct several preprocessing stages and train the model in advance. Folder /preprocess contains python scripts with function described as below:

    • extract_image_features.py: extract features with pre-trained, fine-tune ResNet50 using Imagenet weight from training dataset. All features extracted are save as a pickle file /preprocess/features.pklfor later use (the file is not uploaded since it is too heavy)
    • generate_tokenizer.py: generate a tokenizer as pickle file /preprocess/tokenizer.pkl
    • preprocess_text_data: preprocess descriptions of every training image and save them as /preprocess/descriptions.txt
  2. After obtaining 3 files features.pkl, tokenizer.pkl,descriptions.txt in folder /preprocess, run train.pyto star training model. It should be notice that the training process consume lots of time and computational power (GPU and >8Gb is required). After the training is finished, the entire model and its weight would be save as file image_captioning_model.h5 (this file is not uploaded since it is too heavy).