-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RLHF #18
Comments
@sshivaditya2019 Rfc Roughly estimating a week time estimate but also curious if you can help clarify implementation details. I'm assuming v1 of this can be used to affect our RAG/embeddings search first and then v2 maybe eventually to fine tune a specific model. |
This feels like the wrong repo to be opening this task in as it's a model-training task, but we'd add
|
Simpler to just compare the earliest and latest comment revision. Don't think we need to complicate this and store in database or generate any embeddings. Just add diff to context and say
Something like that. Can possibly optimize to make it only show diff instead of entire beginning comment example. This would show it clearly the filename of our |
I may be out of my depth but we need to move away from thinking in terms of prompt injection when considering model training. We need to fine-tune a model with specific datasets to embed it into it's foundational knowledge separate from our prompts. This training will also be partner specific so we should build with this in mind so we can white label it as a service for our partners chatbot. |
First step is always prompting. Later comes RAG/true fine tuning. Technically this may not qualify as true RLHF |
I think this could be designed like, we could keep a track of the reactions and edits, we keep track of edit diffs as well. So, We can monitor both positive and negative examples within a repository and incorporate them into the prompt at query time. This process should be implemented at the repository level. For positive examles we could add this
For Negative Examples
For edits we would need to track what was changes in a particular edit:
With this approach, we can create a word/phrase weight dictionary, where certain words receive higher rewards while others incur greater penalties. We can provide this dictionary to the LLM as context and calculate the overall score for each generation. If the score falls below the organization-wide limit, we can trigger a restart of the prompt generation process. |
GitHub does this for us already so we should not use our own DB to handle that aspect
This would imply that for all correction and edits that are made repository wide they are going to be pulled from a DB (I assumed you meant keeping track via DB storage) and they'd all be fed into the context window?
I don't think injecting all of the edits etc into the systemMessage is the way to go with it exactly if we consider our issues with context window so far. Perhaps we build another LLM call like for groundTruths and have it summarize or succinctly embody all of the various Our system message is already MASSIVE and littered with context from all sorts of tasks, I don't think injecting 10s of diffs and full body llm responses and the original context window it was fed is a great idea. |
As I mentioned, we would only require the weights and not the edits or reactions.
We would only retain the word/phrase weight pairs to provide to the LLM, without any edits or diffs.
I don't think we can store inside plugins, so we might be limited to
I’m not sure how this approach differs from our current process. Our goal is to introduce feedback into the system, not to revisit the same set of issues and comments repeatedly.
We are not injecting edits; rather, we will be incorporating a weight dictionary that enables the model to select from those options. These weights will be maintained for each repository. This approach can also be easily extended for fine-tuning if we decide to implement it in the future. |
Appreciate the response that clarified things for me, I'm looking forward to seeing it implemented. |
@0x4007 I think implementing this would take about a week. Also, rfc for my approach? |
How are you dealing with the numbers? Assuming using typescript because if it's being handled directly by the LLM it wont be great. |
I am not sure, what you mean by numbers. I am assuming you are referring to weights. It would be Upto the LLM model for now to choose the high reward phrases and not choose low reward phrase, top phrases both positive and negative would be added to prompt as of now with the end goal being a fine tuned model later. I think this the is closest we can get to RLHF approach without actually modifying weights as such. |
/start |
@sshivaditya2019 the deadline is at Tue, Nov 5, 3:36 AM UTC |
Passed the deadline and no activity is detected, removing assignees: @sshivaditya2019. |
We should support three inputs for the model to learn from:
Edits could be useful to compare what the output should be. That way we have a chance to correct some specific mistakes. In the below example it incorrectly writes the
.ubiquity-os.config.yml
name and this could be a good opportunity to teach it.ubiquity-os/ubiquity-os-kernel#111 (comment)
The text was updated successfully, but these errors were encountered: