-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chatbot: Dynamic ground truths #13
Comments
Ground truths act as essential guardrails for LLMs, influencing the assumptions the models make while generating outputs. These truths should be derived from the actual context of the task at hand, rather than from the input query provided. The aim is to ensure that the LLM's responses remain relevant and accurate. For example, consider an organization that specializes in working with Rust files and employs tools like Rustup for managing Rust versions and Railway for deployment. The relevant ground truths for this scenario might include terms that encapsulate the key aspects of their technology stack and workflow. In this case, the ground truths could be: If someone requests information about deployment, the model should make a reasonable assumption based on the provided ground truths. In this case, it should offer Rust-based examples for deployment. |
"task" here represents what, do you mean our github issues/tasks? Or the use-case behind the query? E.g: My use-case is performing code review. The ground truths then by your description should embody the tech stack involved in that review? It seems more logical to me to guardrail the model to perform review that is grounded in truth in relation to the task spec as opposed to the code being reviewed. What do you suggest? Truths are source from the code being reviewed or are sourced from the spec, which is the objective/guideline/standards that the review must be performed to?
But a code-review specific logic flow already has clear context on tech stack and coding languages used as it's parsing the entire raw text diff. |
for
this covers tech stacks and frameworks, we could leverage the org readme, not the repo readme, too which might include more but Each application deserves it's own discussion as to where and how the "Ground Truths" are sourced from and how they are injected into that flow' system message, so I've made this issue specifically for the chatbot ( |
Just over 4 hours at this point in terms of time - awaiting review. And I'm unsure of priority. |
! No price label has been set. Skipping permit generation. |
@0x4007 Can you price and regen the reward please? |
Good catch, needs wrapped. It either should be included in the next PR to be merged mine or yours, or addressed in #17, whatever approach you think is best. |
I can take care of this quick fix; I just wanted to confirm that I didn’t break anything, lol! |
No it was definitely me I wrote that fetch call, everyone makes mistakes we're only human after all. |
These need to be approved to be funded before work is done and expecting to get paid for it. |
|
View | Contribution | Count | Reward |
---|---|---|---|
Issue | Comment | 3 | 12.1855 |
Review | Comment | 9 | 0 |
Conversation Incentives
Comment | Formatting | Relevance | Reward |
---|---|---|---|
Ground truths act as essential guardrails for LLMs, influencing … | 6.83content: content: p: score: 0 elementCount: 4 result: 0 regex: wordCount: 144 wordValue: 0.1 result: 6.83 | 0.75 | 5.1225 |
@Keyrxng, could you take a look at this? If a repository is miss… | 6.85content: content: p: score: 0 elementCount: 2 a: score: 5 elementCount: 1 result: 5 regex: wordCount: 31 wordValue: 0.1 result: 1.85 | 0.7 | 6.295 |
I can take care of this quick fix; I just wanted to confirm that… | 1.28content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 20 wordValue: 0.1 result: 1.28 | 0.6 | 0.768 |
Wouldn't it be better to check for files such as `requiremen… | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 25 wordValue: 0 result: 0 | 0.9 | 0 |
This is a suggestion: use capital letters for conditions to ensu… | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 19 wordValue: 0 result: 0 | 0.5 | 0 |
We currently support OpenRouter, so we should check if we have a… | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 45 wordValue: 0 result: 0 | 0.8 | 0 |
Is there a reason to create another OpenAI client here? If not, … | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 26 wordValue: 0 result: 0 | 0.7 | 0 |
```jsDO NOT LIST every LANGUAGE or DEPENDENCY; foc… | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 29 wordValue: 0 result: 0 | 0.6 | 0 |
As I mentioned, it works at times, but when it does, it effectiv… | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 39 wordValue: 0 result: 0 | 0.4 | 0 |
Claude performs much better in real-world coding tasks, so I sug… | 0content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 74 wordValue: 0 result: 0 | 0.75 | 0 |
Yes, all prompts related to code, such as pull prechecks, should… | 0content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 59 wordValue: 0 result: 0 | 0.65 | 0 |
I was working on setting this up in the repo, sorry for the dela… | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 15 wordValue: 0 result: 0 | 0.3 | 0 |
[ 53.937 WXDAI ]
@Keyrxng
Contributions Overview
View | Contribution | Count | Reward |
---|---|---|---|
Issue | Specification | 1 | 37.95 |
Issue | Comment | 6 | 15.987 |
Review | Comment | 25 | 0 |
Conversation Incentives
Comment | Formatting | Relevance | Reward |
---|---|---|---|
https://github.com/ubiquity-os-marketplace/command-ask/blob/e45d… | 12.65content: content: p: score: 0 elementCount: 8 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 4 ol: score: 1 elementCount: 1 result: 4 regex: wordCount: 190 wordValue: 0.1 result: 8.65 | 1 | 37.95 |
"task" here represents what, do you mean our github issues/tasks… | 7.63content: content: p: score: 0 elementCount: 6 result: 0 regex: wordCount: 164 wordValue: 0.1 result: 7.63 | 0.9 | 6.867 |
for `@ubqbot` we can pull the tech stacks dynamically fo… | 8.06content: content: p: score: 0 elementCount: 5 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 2 hr: score: 0 elementCount: 1 result: 2 regex: wordCount: 125 wordValue: 0.1 result: 6.06 | 0.85 | 7.151 |
Just over 4 hours at this point in terms of time - awaiting revi… | 1.22content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 19 wordValue: 0.1 result: 1.22 | 0.2 | 0.244 |
@0x4007 Can you price and regen the reward please? | 0.65content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65 | 0.1 | 0.065 |
Good catch, needs wrapped.It either should be included in the … | 1.75content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 29 wordValue: 0.1 result: 1.75 | 0.6 | 1.05 |
No it was definitely me I wrote that fetch call, everyone makes … | 1.22content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 19 wordValue: 0.1 result: 1.22 | 0.5 | 0.61 |
Resolves #13 | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 2 wordValue: 0 result: 0 | 0.1 | 0 |
This can be ignored as it's only relevant to #11 so this is all … | 0content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 37 wordValue: 0 result: 0 | 0.2 | 0 |
I implemented this thinking it'd be too slow but I think `o1… | 2content: content: p: score: 0 elementCount: 3 ol: score: 1 elementCount: 1 li: score: 0.5 elementCount: 2 result: 2 regex: wordCount: 45 wordValue: 0 result: 0 | 0.5 | 0 |
Broke this into a fn to make it easier to see what we are pullin… | 0content: content: p: score: 0 elementCount: 3 result: 0 regex: wordCount: 68 wordValue: 0 result: 0 | 0.6 | 0 |
We should eitherA. Just hardcode a list of these sorts of file… | 0content: content: p: score: 0 elementCount: 7 hr: score: 0 elementCount: 1 result: 0 regex: wordCount: 104 wordValue: 0 result: 0 | 0.4 | 0 |
Can you show me what you mean and convert one or two of those li… | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 34 wordValue: 0 result: 0 | 0.3 | 0 |
```1. ASSUME your output BUILDS the FOUNDATION for… | 0content: content: p: score: 0 elementCount: 3 result: 0 regex: wordCount: 56 wordValue: 0 result: 0 | 0.7 | 0 |
The suggestion is to use claude when parsing code as was recomme… | 0content: content: p: score: 0 elementCount: 3 result: 0 regex: wordCount: 121 wordValue: 0 result: 0 | 0.8 | 0 |
What should be fed to Claude is my base question: just code? or … | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 26 wordValue: 0 result: 0 | 0.5 | 0 |
I've used/seen it used sparingly throughout a prompt to enforce … | 0content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 99 wordValue: 0 result: 0 | 0.5 | 0 |
So to clarify, for #11, the context window will contain:- raw … | 3content: content: p: score: 0 elementCount: 7 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 4 result: 3 regex: wordCount: 95 wordValue: 0 result: 0 | 0.6 | 0 |
Okay this will need another task opened to alter the `@ubqbo… | 0content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 33 wordValue: 0 result: 0 | 0.3 | 0 |
As am I but would you want me to still embed the truths that get… | 0content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 34 wordValue: 0 result: 0 | 0.7 | 0 |
`@ubqbot` because it's pulling in context from multiple … | 2content: content: p: score: 0 elementCount: 5 hr: score: 0 elementCount: 1 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 2 result: 2 regex: wordCount: 121 wordValue: 0 result: 0 | 0.8 | 0 |
I was referring to that also my question was referring to meanin… | 0content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 68 wordValue: 0 result: 0 | 0.4 | 0 |
Resolving this thread as this requires it's own task as there is… | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 45 wordValue: 0 result: 0 | 0.2 | 0 |
Marking as ready for review to get eyes on it and opinions as-is… | 10content: content: p: score: 0 elementCount: 7 a: score: 5 elementCount: 2 result: 10 regex: wordCount: 125 wordValue: 0 result: 0 | 0.5 | 0 |
@0x4007 @sshivaditya2019 @gentlementlegen @rndquu requesting rev… | 0content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 31 wordValue: 0 result: 0 | 0.1 | 0 |
I'm not manually adding anything it's GPT that's creating the ar… | 0content: content: p: score: 0 elementCount: 3 result: 0 regex: wordCount: 85 wordValue: 0 result: 0 | 0.4 | 0 |
the @ubqbot command right now that's in `development` th… | 0content: content: p: score: 0 elementCount: 4 result: 0 regex: wordCount: 83 wordValue: 0 result: 0 | 0.3 | 0 |
In my opinion "Ground Truths" should be considered in relation t… | 7content: content: p: score: 0 elementCount: 4 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 2 a: score: 5 elementCount: 1 result: 7 regex: wordCount: 124 wordValue: 0 result: 0 | 0.6 | 0 |
Dynamic chatbot ground truths QA used within my fork of this rep… | 0content: content: p: score: 0 elementCount: 3 result: 0 regex: wordCount: 41 wordValue: 0 result: 0 | 0.7 | 0 |
@gentlementlegen @0x4007 @sshivaditya2019 @rndquu Why can't we… | 0content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 22 wordValue: 0 result: 0 | 0.2 | 0 |
QA: https://github.com/ubq-testing/ask-plugin/issues/11#issuecom… | 1.5content: content: p: score: 0 elementCount: 3 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 1 result: 1.5 regex: wordCount: 79 wordValue: 0 result: 0 | 0.8 | 0 |
lmk if there is anything holding back this PR and I'll push it f… | 0content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 15 wordValue: 0 result: 0 | 0.1 | 0 |
[ 11.243 WXDAI ]
@0x4007
Contributions Overview
View | Contribution | Count | Reward |
---|---|---|---|
Issue | Comment | 1 | 0.61 |
Review | Comment | 7 | 10.633 |
Conversation Incentives
Comment | Formatting | Relevance | Reward |
---|---|---|---|
These need to be approved to be funded before work is done and e… | 1.22content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 19 wordValue: 0.1 result: 1.22 | 0.5 | 0.61 |
OpenAI system prompts don't look like this so I'm inclined to be… | 1.54content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 25 wordValue: 0.1 result: 1.54 | 0.7 | 1.078 |
Doesn't seem right to add this line | 0.59content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 8 wordValue: 0.1 result: 0.59 | 0.5 | 0.295 |
In my deep experience, when I am building off of and integrating… | 6.91content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 146 wordValue: 0.1 result: 6.91 | 0.9 | 6.219 |
We can try ground truths I was referring to capitalized words fo… | 0.88content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 13 wordValue: 0.1 result: 0.88 | 0.6 | 0.528 |
Why are you adding redundant information in those arrays? | 0.65content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65 | 0.8 | 0.52 |
I think we should not have redundant messages but they should be… | 1.28content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 20 wordValue: 0.1 result: 1.28 | 0.6 | 0.768 |
@sshivaditya2019 you should review and decide when this pull is … | 1.75content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 29 wordValue: 0.1 result: 1.75 | 0.7 | 1.225 |
It never produced a $200 reward and I claimed the $53 so I guess it worked out |
command-ask/src/adapters/openai/helpers/completions.ts
Lines 40 to 42 in e45db81
command-ask/src/handlers/ask-llm.ts
Line 70 in e45db81
I think these should be dynamic, I don't have super clear context on them but from what I can surmise they'd benefit from it.
@ubqbot
command or the fully builtformattedChat
that includes all context?I think we have GPT-4o build our array based on the input it's given because:
o1
may be too long to wait for little added benefit.The text was updated successfully, but these errors were encountered: