Vec2Sent
_{^{Probing Sentence Embeddings with Natural Language Generation}}

We introspect black-box sentence embeddings by conditionally generating from them with the objective to retrieve the underlying discrete sentence. We perceive of this as a new unsupervised probing task and show that it correlates well with downstream task performance. We also illustrate how the language generated from different encoders differs. We apply our approach to generate sentence analogies from sentence embeddings.

Quickstart

You can quickly install Vec2Sent using pip:

pip install "vec2sent @ git+https://github.com/maruker/vec2sent.git"

There are three entry points to generate and evaluate sentences, and to perform arithmetic in the vector space.

Vector Arithmetic

vec2sent_arithmetic -s infersent -c maruker/vec2sent-infersent

Please enter sentence a (Or nothing if done):his name is robert
Please enter sentence b (Or nothing if done):he is a doctor
Please enter sentence c (Or nothing if done):her name is julia
Please enter sentence d (Or nothing if done):
Please enter an arithmetic expression (e.g. (a + b) * c / 2):b-a+c
 she is a doctor

Sentence Generation

For example, generate outputs using the hierarchical sentence embedding

vec2sent_generate -s hier -c maruker/vec2sent-hier -d data/test.en.2008 -o hier.txt

Evaluation

The outputs from the previous step can now be evaluated. For example, the following command computes the bleu score

vec2sent_evaluate --metric BLEU --file hier.txt

The following metrics are available

Parameter	Explanation
ID	Fraction of all sentences where the output is identical to the input
PERM	Fraction of all output sentences that can be formed as a permutation of the input
ID_PERM	Fraction of all permuations that are identical to the input
BLEU	Document BLEU score
MOVER	Average Mover Score between input and output sentences

Tip

Vec2Sent needs to download several gigabites of sentence embedding models. Those files can be deleted using the command vec2sent_cleanup

Available Models

We upload our models to the Hugging Face Hub. The following table shows, which parameters to set in order to load the sentence embeddings and corresponding Vec2Sent models.

Sentence embedding name `-s`	Checkpoint `-c`	Explanation
avg	maruker/vec2sent-avg	Average pooling on BPEmb word embeddings
hier	maruker/vec2sent-hier	Hierarchical pooling on BPEmb
gem	maruker/vec2sent-gem	Geometric Embeddings
sent2vec	maruker/vec2sent-sent2vec	Sent2Vec
infersent	maruker/vec2sent-infersent	InferSent
sbert-large	maruker/vec2sent-sbert-large	SBERT

Additional sentence embeddings can be used by extending the class vec2sent.sentence_embeddings.abstract_sentence_embedding.AbstractEmbedding.

Installation

(Optional) Setup Virtual Environment

python -m venv venv
source venv/bin/activate

Download requirements

# Download git submodules (MoS model and some sentence embeddings)
git submodule update --init

Install

pip install .

Citation

If you find Vec2Sent useful in your academic work, please consider citing

@inproceedings{kerscher-eger-2020-vec2sent,
    title = "{V}ec2{S}ent: Probing Sentence Embeddings with Natural Language Generation",
    author = "Kerscher, Martin  and
      Eger, Steffen",
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.coling-main.152",
    pages = "1729--1736",
    abstract = "We introspect black-box sentence embeddings by conditionally generating from them with the objective to retrieve the underlying discrete sentence. We perceive of this as a new unsupervised probing task and show that it correlates well with downstream task performance. We also illustrate how the language generated from different encoders differs. We apply our approach to generate sentence analogies from sentence embeddings.",
}

Acknowledgments

The models are based on Mixture of Softmaxes with a context vector added to the inputs.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vec2Sent
_{^{Probing Sentence Embeddings with Natural Language Generation}}

Quickstart

Vector Arithmetic

Sentence Generation

Evaluation

Available Models

Installation

(Optional) Setup Virtual Environment

Download requirements

Install

Citation

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

maruker/vec2sent

Folders and files

Latest commit

History

Repository files navigation

Vec2SentProbing Sentence Embeddings with Natural Language Generation

Quickstart

Vector Arithmetic

Sentence Generation

Evaluation

Available Models

Installation

(Optional) Setup Virtual Environment

Download requirements

Install

Citation

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Vec2Sent
_{^{Probing Sentence Embeddings with Natural Language Generation}}

Packages