-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpt.py how to save the model after training and how to use it so that it returns the text to me relevant to ChatGPT? #31
Comments
As Andrej mentioned in the video, this is a decoder only transformer. It will not respond based on condition since the architecture was not constructed as such. It will need an encoder part in the model that could be later used to condition like Q&A |
To use |
This isn't entirely accurate. ChatGPT is a decoder-only model but that just means that it's different from encoder-only models such as BERT and seq2seq style models. They do not actually need a decoder to perform their function. To say that they need an encoder isn't correct because the input text that you provide to a decoder-only LLM is the starting point for the continuous generation of subsequent tokens, as Andrej showed in the video when he said that each batch has within in several cases based on how many tokens have been sent and what the next token to predict is. It's confusing, certainly, but I just wanted to point out that if this is trained correctly, it can become a very small version of ChatGPT without any serious modification aside from scaling up the Blocks. |
Train on HuggingFace's OpenOrca, add special tokens like <|imuser|> and <|imassistant|> |
Actually, this model is pretty small. You'll need to bump up the hyperparameters. Use a more meaningful sub-word tokenization technique like Byte Pair Encoding etc., Train the model on a good text dataset. But the most important step to have a conversational model is to fine tune the model on text conversations (Question/Query and Answer/Response). The model will also need to know when to stop. |
I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it
The text was updated successfully, but these errors were encountered: