Qwen2Vec: Adapting Qwen for Dense Retrieval

Instructions for Training the EBAR and EBAE Models

Follow these steps to train Qwen-2.5-0.5B with the EBAR and EBAE training method from the following paper: 'Llama2Vec: Unsupervised Adaptation of Large Language Models for Dense Retrieval'

Step 1: Download Spanish Wikipedia Articles

Run the following notebook to extract the first 1000 articles from Spanish Wikipedia and save them as a pickle file:

get_spanish_wiki.ipynb

This script will handle data extraction and save the articles for further processing.

Step 2: Prepare the Dataset

Use this notebook to preprocess the Wikipedia data and prepare it for EBAR and EBAE training:

prepare_dataset_for_ebar_ebae.ipynb

The preprocessing includes:

Tokenizing the articles.
Preparing chunks for input prompts and next sentences.

Step 3: Train the Model

Finally, run the main training notebook to adapt the model with EBAR and EBAE methods:

llama_2_vec.ipynb

This notebook will:

Load the preprocessed data.
Train the model using the specified loss functions for EBAR and EBAE.
Save the trained model for further use.

Notes

Ensure all dependencies are installed before running the notebooks.
For detailed explanations of the training procedure, refer to the documentation in each notebook.

Happy training!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
get_spanish_wiki.ipynb		get_spanish_wiki.ipynb
llama_2_vec.ipynb		llama_2_vec.ipynb
llama_2_vec.py		llama_2_vec.py
loss_calculation.py		loss_calculation.py
prepare_dataset_for_ebae.ipynb		prepare_dataset_for_ebae.ipynb
prepare_dataset_for_ebar_ebae.ipynb		prepare_dataset_for_ebar_ebae.ipynb
train_ebae.py		train_ebae.py
train_hf.py		train_hf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen2Vec: Adapting Qwen for Dense Retrieval

Instructions for Training the EBAR and EBAE Models

Step 1: Download Spanish Wikipedia Articles

Step 2: Prepare the Dataset

Step 3: Train the Model

Notes

About

Releases

Packages

Languages

LeonHecht/qwen-2-vec

Folders and files

Latest commit

History

Repository files navigation

Qwen2Vec: Adapting Qwen for Dense Retrieval

Instructions for Training the EBAR and EBAE Models

Step 1: Download Spanish Wikipedia Articles

Step 2: Prepare the Dataset

Step 3: Train the Model

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages