Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embeddings #19

Open
ParisNeo opened this issue Jun 28, 2023 · 15 comments
Open

Embeddings #19

ParisNeo opened this issue Jun 28, 2023 · 15 comments

Comments

@ParisNeo
Copy link
Contributor

Hi there. I am upgrading my bindings for the lord of llms tool and I now need to be able to vectorize text to embedding space of the current model. Is there a way to have access to the latent space of the model ? I input a text and get the encoder output in latent space?

Best regards

@absadiki
Copy link
Owner

Hi,

I am a little bit confused, maybe you are mixing terms!
Do you want to get the embeddings of the text? or you want the latent space of the encoder output ?

@ParisNeo
Copy link
Contributor Author

The latent space of the encoder output.

@absadiki
Copy link
Owner

I don't think the encoder output is exposed by llama.cpp API header file AFAIK. But let me know if you have an idea.

But why you would use llama.cpp for this task ? why not just use the original LLaMA Python code in this case. You can get the output of any transform block you want ?

@ParisNeo
Copy link
Contributor Author

I was thinking to build an animation that shows the model encoder output moving inside a 2d or 3d projection of the latent space with a background of text chunks data distribution of the document that is used by chat_with_document personality. This allows me to see how the model is exploring the ideas while generating its outputs compared to the reference texts. So I kinda wanted to use the model being used at the instant and not need to reload another pytorch model that is not exactly the same and is not quantized. It eats memory space. I figured, it would be better to have every thing done by the same model.

Maybe I'll ask llamacpp for this feature.

Thank you anyway.

@ParisNeo
Copy link
Contributor Author

I thought about it and maybe just expose the embed function of llamacpp would already be useful for me.

@absadiki
Copy link
Owner

I was thinking to build an animation that shows the model encoder output moving inside a 2d or 3d projection of the latent space with a background of text chunks data distribution of the document that is used by chat_with_document personality. This allows me to see how the model is exploring the ideas while generating its outputs compared to the reference texts. So I kinda wanted to use the model being used at the instant and not need to reload another pytorch model that is not exactly the same and is not quantized. It eats memory space. I figured, it would be better to have every thing done by the same model.

Maybe I'll ask llamacpp for this feature.

Thank you anyway.

Yeah, I understand. Nice idea.
You can ask llama.cpp and if you get any solution I'll be more than happy to integrate it in the bindings.

@absadiki
Copy link
Owner

I thought about it and maybe just expose the embed function of llamacpp would already be useful for me.

I think this is already exposed,
Are ou talking about llama_get_embedding, isn't-it ?

@ParisNeo
Copy link
Contributor Author

ParisNeo commented Jul 2, 2023

I need to give it text and it gives me the embeddings for the input text. Can you expose that in the model?

@absadiki
Copy link
Owner

absadiki commented Jul 3, 2023

Ok, I will try to expose it in the model class.
But I have just noticed that this method does not take the text as input, but just the context. Take a look.
So it is not the same as what you are looking for.

@ParisNeo
Copy link
Contributor Author

ParisNeo commented Jul 3, 2023

In llama-cpp-python binding, they have embed function in their model:
https://abetlen.github.io/llama-cpp-python/

Also the ctransformers binding have embed method:
https://github.com/marella/ctransformers

I think they use the llamacpp in background.

@absadiki
Copy link
Owner

absadiki commented Jul 3, 2023

yeah, if you want it just like what they did then I can add it. they are using the same function under hood.
The problem I was thinking about is just It is an overkill to eval the whole transformer to get just the first block.

If you are using the generate function on a prompt, then eval will be called and the embedding vector will change, without rerunning eval again.

Anyways I think I will add the two functions, one to get the last embedding and one to create them based on a string as input ?

@ParisNeo
Copy link
Contributor Author

ParisNeo commented Jul 4, 2023

Excellent!

absadiki added a commit that referenced this issue Jul 5, 2023
@absadiki
Copy link
Owner

absadiki commented Jul 5, 2023

@ParisNeo, Here you go

get_prompt_embeddings is what you are looking for.

if you don't have any other request I will push a new version to PYPI ?

@ParisNeo
Copy link
Contributor Author

ParisNeo commented Jul 6, 2023

Thanks. No requests for now. I'll update my binding as soon as you push it to pypi.

Thanks alot

@absadiki
Copy link
Owner

absadiki commented Jul 8, 2023

You are welcome.
The new version has been pushed to Pypi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants