Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpt.py how to save the model after training and how to use it so that it returns the text to me relevant to ChatGPT? #31

Open
MrKsiJ opened this issue Sep 15, 2023 · 5 comments

Comments

@MrKsiJ
Copy link

MrKsiJ commented Sep 15, 2023

I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it

@touhi99
Copy link

touhi99 commented Dec 10, 2023

I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it

As Andrej mentioned in the video, this is a decoder only transformer. It will not respond based on condition since the architecture was not constructed as such. It will need an encoder part in the model that could be later used to condition like Q&A

@exponentialXP
Copy link

exponentialXP commented Dec 19, 2023

I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it

To use torch.save() to save the model and optimizer's state dict, and torch.load() to load them.
Example: torch.save(model.state_dict, 'params.pt) and do the same for the optimizer.

@fasterinnerlooper
Copy link

I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it

As Andrej mentioned in the video, this is a decoder only transformer. It will not respond based on condition since the architecture was not constructed as such. It will need an encoder part in the model that could be later used to condition like Q&A

This isn't entirely accurate. ChatGPT is a decoder-only model but that just means that it's different from encoder-only models such as BERT and seq2seq style models. They do not actually need a decoder to perform their function. To say that they need an encoder isn't correct because the input text that you provide to a decoder-only LLM is the starting point for the continuous generation of subsequent tokens, as Andrej showed in the video when he said that each batch has within in several cases based on how many tokens have been sent and what the next token to predict is.

It's confusing, certainly, but I just wanted to point out that if this is trained correctly, it can become a very small version of ChatGPT without any serious modification aside from scaling up the Blocks.

@exponentialXP
Copy link

Train on HuggingFace's OpenOrca, add special tokens like <|imuser|> and <|imassistant|>
But make sure to not calculate the loss for the user generations and only the assistant generations.

@ZainKhalidOfficial
Copy link

Actually, this model is pretty small. You'll need to bump up the hyperparameters. Use a more meaningful sub-word tokenization technique like Byte Pair Encoding etc., Train the model on a good text dataset. But the most important step to have a conversational model is to fine tune the model on text conversations (Question/Query and Answer/Response).

The model will also need to know when to stop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants