gpt.py how to save the model after training and how to use it so that it returns the text to me relevant to ChatGPT? #31

MrKsiJ · 2023-09-15T17:24:11Z

I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it

touhi99 · 2023-12-10T00:02:13Z

I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it

As Andrej mentioned in the video, this is a decoder only transformer. It will not respond based on condition since the architecture was not constructed as such. It will need an encoder part in the model that could be later used to condition like Q&A

exponentialXP · 2023-12-19T16:01:00Z

I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it

To use torch.save() to save the model and optimizer's state dict, and torch.load() to load them.
Example: torch.save(model.state_dict, 'params.pt) and do the same for the optimizer.

fasterinnerlooper · 2024-02-04T04:37:00Z

I have familiarized myself with the course gpt.py in principle, everything is clear with the training data, I have prepared a dataset. However, I want to save the resulting gpt model and then connect to it, insert some text into it and see how it will respond to it

As Andrej mentioned in the video, this is a decoder only transformer. It will not respond based on condition since the architecture was not constructed as such. It will need an encoder part in the model that could be later used to condition like Q&A

This isn't entirely accurate. ChatGPT is a decoder-only model but that just means that it's different from encoder-only models such as BERT and seq2seq style models. They do not actually need a decoder to perform their function. To say that they need an encoder isn't correct because the input text that you provide to a decoder-only LLM is the starting point for the continuous generation of subsequent tokens, as Andrej showed in the video when he said that each batch has within in several cases based on how many tokens have been sent and what the next token to predict is.

It's confusing, certainly, but I just wanted to point out that if this is trained correctly, it can become a very small version of ChatGPT without any serious modification aside from scaling up the Blocks.

exponentialXP · 2024-02-16T08:54:20Z

Train on HuggingFace's OpenOrca, add special tokens like <|imuser|> and <|imassistant|>
But make sure to not calculate the loss for the user generations and only the assistant generations.

ZainKhalidOfficial · 2024-02-21T20:19:43Z

Actually, this model is pretty small. You'll need to bump up the hyperparameters. Use a more meaningful sub-word tokenization technique like Byte Pair Encoding etc., Train the model on a good text dataset. But the most important step to have a conversational model is to fine tune the model on text conversations (Question/Query and Answer/Response).

The model will also need to know when to stop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpt.py how to save the model after training and how to use it so that it returns the text to me relevant to ChatGPT? #31

gpt.py how to save the model after training and how to use it so that it returns the text to me relevant to ChatGPT? #31

MrKsiJ commented Sep 15, 2023

touhi99 commented Dec 10, 2023

exponentialXP commented Dec 19, 2023 •

edited

Loading

fasterinnerlooper commented Feb 4, 2024

exponentialXP commented Feb 16, 2024

ZainKhalidOfficial commented Feb 21, 2024

gpt.py how to save the model after training and how to use it so that it returns the text to me relevant to ChatGPT? #31

gpt.py how to save the model after training and how to use it so that it returns the text to me relevant to ChatGPT? #31

Comments

MrKsiJ commented Sep 15, 2023

touhi99 commented Dec 10, 2023

exponentialXP commented Dec 19, 2023 • edited Loading

fasterinnerlooper commented Feb 4, 2024

exponentialXP commented Feb 16, 2024

ZainKhalidOfficial commented Feb 21, 2024

exponentialXP commented Dec 19, 2023 •

edited

Loading