Chatbot: Dynamic ground truths #13

Keyrxng · 2024-10-24T04:01:56Z

command-ask/src/adapters/openai/helpers/completions.ts

Lines 40 to 42 in e45db81

    
           "You Must obey the following ground truths: [" + 
        
           groundTruths.join(":") + 
        
           "]\n" +

command-ask/src/handlers/ask-llm.ts

Line 70 in e45db81

    
           ["typescript", "github", "cloudflare worker", "actions", "jest", "supabase", "openai"],

I think these should be dynamic, I don't have super clear context on them but from what I can surmise they'd benefit from it.

Are they restricted to being one-word strings? What's the max length if not? How many truths can we provide/what's the optimal amount?
Are they supposed to embody the org, the incoming query or the task in relation to the current @ubqbot command or the fully built formattedChat that includes all context?

I think we have GPT-4o build our array based on the input it's given because:

Truths are dynamic and based on either A) fully built/initial incoming user query B) built around the payload issue spec or whatever C) task spec when considering code review
We need a quick response imo, a huge spec to truth followed by the spec + the entire diff both to o1 may be too long to wait for little added benefit.

The text was updated successfully, but these errors were encountered:

sshivaditya2019 · 2024-10-24T15:06:02Z

Ground truths act as essential guardrails for LLMs, influencing the assumptions the models make while generating outputs. These truths should be derived from the actual context of the task at hand, rather than from the input query provided. The aim is to ensure that the LLM's responses remain relevant and accurate.

For example, consider an organization that specializes in working with Rust files and employs tools like Rustup for managing Rust versions and Railway for deployment. The relevant ground truths for this scenario might include terms that encapsulate the key aspects of their technology stack and workflow. In this case, the ground truths could be:
["rust", "rustup", "default-nightly-toolchain", "wasm32bindgen", "ffi", "railway"].

If someone requests information about deployment, the model should make a reasonable assumption based on the provided ground truths. In this case, it should offer Rust-based examples for deployment.

Keyrxng · 2024-10-24T15:22:41Z

These truths should be derived from the actual context of the task at hand, rather than from the input query provided.

"task" here represents what, do you mean our github issues/tasks? Or the use-case behind the query?

E.g: My use-case is performing code review. The ground truths then by your description should embody the tech stack involved in that review?

It seems more logical to me to guardrail the model to perform review that is grounded in truth in relation to the task spec as opposed to the code being reviewed.

What do you suggest? Truths are source from the code being reviewed or are sourced from the spec, which is the objective/guideline/standards that the review must be performed to?

@ubqbot grounded in truth with the tech stack as a general chatbot makes sense because we only ever work with those stacks you stated, so in general it should offer responses involving those.

But a code-review specific logic flow already has clear context on tech stack and coding languages used as it's parsing the entire raw text diff.

Keyrxng · 2024-10-24T15:33:01Z

for @ubqbot we can pull the tech stacks dynamically for the partner per repo by:

fetching the repo's language stats and passing them in, we could adjust the prompt to give more weight to the higher percentage language present
We can use the package.json to find out what libs/frameworks etc the repo uses

this covers tech stacks and frameworks, we could leverage the org readme, not the repo readme, too which might include more but package.json and language stats should do it imo. What do you think?

Each application deserves it's own discussion as to where and how the "Ground Truths" are sourced from and how they are injected into that flow' system message, so I've made this issue specifically for the chatbot (@ubqbot command).

Keyrxng · 2024-10-25T19:12:41Z

Just over 4 hours at this point in terms of time - awaiting review. And I'm unsure of priority.

ubiquity-os-beta · 2024-10-27T16:26:40Z

! No price label has been set. Skipping permit generation.

Keyrxng · 2024-10-27T16:27:42Z

@0x4007 Can you price and regen the reward please?

sshivaditya2019 · 2024-10-27T17:58:30Z

@Keyrxng, could you take a look at this? If a repository is missing a package.json, it's causing a fatal error. Let me know if you are able to replicate this.

Link

Keyrxng · 2024-10-27T18:00:45Z

Good catch, needs wrapped.

It either should be included in the next PR to be merged mine or yours, or addressed in #17, whatever approach you think is best.

sshivaditya2019 · 2024-10-27T18:02:13Z

Good catch, needs wrapped.

It either should be included in the next PR to be merged mine or yours, or addressed in #17, whatever approach you think is best.

I can take care of this quick fix; I just wanted to confirm that I didn’t break anything, lol!

Keyrxng · 2024-10-27T18:03:01Z

No it was definitely me I wrote that fetch call, everyone makes mistakes we're only human after all.

0x4007 · 2024-10-27T22:56:12Z

@0x4007 Can you price and regen the reward please?

These need to be approved to be funded before work is done and expecting to get paid for it.

ubiquity-os-beta · 2024-10-27T22:56:49Z

[ 12.1855 WXDAI ]
@sshivaditya2019

Contributions Overview

View	Contribution	Count	Reward
Issue	Comment	3	12.1855
Review	Comment	9	0

Conversation Incentives

Comment	Formatting	Relevance	Reward
Ground truths act as essential guardrails for LLMs, influencing …	6.83 content: content: p: score: 0 elementCount: 4 result: 0 regex: wordCount: 144 wordValue: 0.1 result: 6.83	0.75	5.1225
@Keyrxng, could you take a look at this? If a repository is miss…	6.85 content: content: p: score: 0 elementCount: 2 a: score: 5 elementCount: 1 result: 5 regex: wordCount: 31 wordValue: 0.1 result: 1.85	0.7	6.295
I can take care of this quick fix; I just wanted to confirm that…	1.28 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 20 wordValue: 0.1 result: 1.28	0.6	0.768
Wouldn't it be better to check for files such as `requiremen…	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 25 wordValue: 0 result: 0	0.9	0
This is a suggestion: use capital letters for conditions to ensu…	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 19 wordValue: 0 result: 0	0.5	0
We currently support OpenRouter, so we should check if we have a…	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 45 wordValue: 0 result: 0	0.8	0
Is there a reason to create another OpenAI client here? If not, …	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 26 wordValue: 0 result: 0	0.7	0
```jsDO NOT LIST every LANGUAGE or DEPENDENCY; foc…	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 29 wordValue: 0 result: 0	0.6	0
As I mentioned, it works at times, but when it does, it effectiv…	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 39 wordValue: 0 result: 0	0.4	0
Claude performs much better in real-world coding tasks, so I sug…	0 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 74 wordValue: 0 result: 0	0.75	0
Yes, all prompts related to code, such as pull prechecks, should…	0 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 59 wordValue: 0 result: 0	0.65	0
I was working on setting this up in the repo, sorry for the dela…	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 15 wordValue: 0 result: 0	0.3	0

[ 53.937 WXDAI ]
@Keyrxng

Contributions Overview

View	Contribution	Count	Reward
Issue	Specification	1	37.95
Issue	Comment	6	15.987
Review	Comment	25	0

Conversation Incentives

Comment	Formatting	Relevance	Reward
https://github.com/ubiquity-os-marketplace/command-ask/blob/e45d…	12.65 content: content: p: score: 0 elementCount: 8 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 4 ol: score: 1 elementCount: 1 result: 4 regex: wordCount: 190 wordValue: 0.1 result: 8.65	1	37.95
"task" here represents what, do you mean our github issues/tasks…	7.63 content: content: p: score: 0 elementCount: 6 result: 0 regex: wordCount: 164 wordValue: 0.1 result: 7.63	0.9	6.867
for `@ubqbot` we can pull the tech stacks dynamically fo…	8.06 content: content: p: score: 0 elementCount: 5 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 2 hr: score: 0 elementCount: 1 result: 2 regex: wordCount: 125 wordValue: 0.1 result: 6.06	0.85	7.151
Just over 4 hours at this point in terms of time - awaiting revi…	1.22 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 19 wordValue: 0.1 result: 1.22	0.2	0.244
@0x4007 Can you price and regen the reward please?	0.65 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65	0.1	0.065
Good catch, needs wrapped.It either should be included in the …	1.75 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 29 wordValue: 0.1 result: 1.75	0.6	1.05
No it was definitely me I wrote that fetch call, everyone makes …	1.22 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 19 wordValue: 0.1 result: 1.22	0.5	0.61
Resolves #13	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 2 wordValue: 0 result: 0	0.1	0
This can be ignored as it's only relevant to #11 so this is all …	0 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 37 wordValue: 0 result: 0	0.2	0
I implemented this thinking it'd be too slow but I think `o1…	2 content: content: p: score: 0 elementCount: 3 ol: score: 1 elementCount: 1 li: score: 0.5 elementCount: 2 result: 2 regex: wordCount: 45 wordValue: 0 result: 0	0.5	0
Broke this into a fn to make it easier to see what we are pullin…	0 content: content: p: score: 0 elementCount: 3 result: 0 regex: wordCount: 68 wordValue: 0 result: 0	0.6	0
We should eitherA. Just hardcode a list of these sorts of file…	0 content: content: p: score: 0 elementCount: 7 hr: score: 0 elementCount: 1 result: 0 regex: wordCount: 104 wordValue: 0 result: 0	0.4	0
Can you show me what you mean and convert one or two of those li…	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 34 wordValue: 0 result: 0	0.3	0
```1. ASSUME your output BUILDS the FOUNDATION for…	0 content: content: p: score: 0 elementCount: 3 result: 0 regex: wordCount: 56 wordValue: 0 result: 0	0.7	0
The suggestion is to use claude when parsing code as was recomme…	0 content: content: p: score: 0 elementCount: 3 result: 0 regex: wordCount: 121 wordValue: 0 result: 0	0.8	0
What should be fed to Claude is my base question: just code? or …	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 26 wordValue: 0 result: 0	0.5	0
I've used/seen it used sparingly throughout a prompt to enforce …	0 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 99 wordValue: 0 result: 0	0.5	0
So to clarify, for #11, the context window will contain:- raw …	3 content: content: p: score: 0 elementCount: 7 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 4 result: 3 regex: wordCount: 95 wordValue: 0 result: 0	0.6	0
Okay this will need another task opened to alter the `@ubqbo…	0 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 33 wordValue: 0 result: 0	0.3	0
As am I but would you want me to still embed the truths that get…	0 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 34 wordValue: 0 result: 0	0.7	0
`@ubqbot` because it's pulling in context from multiple …	2 content: content: p: score: 0 elementCount: 5 hr: score: 0 elementCount: 1 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 2 result: 2 regex: wordCount: 121 wordValue: 0 result: 0	0.8	0
I was referring to that also my question was referring to meanin…	0 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 68 wordValue: 0 result: 0	0.4	0
Resolving this thread as this requires it's own task as there is…	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 45 wordValue: 0 result: 0	0.2	0
Marking as ready for review to get eyes on it and opinions as-is…	10 content: content: p: score: 0 elementCount: 7 a: score: 5 elementCount: 2 result: 10 regex: wordCount: 125 wordValue: 0 result: 0	0.5	0
@0x4007 @sshivaditya2019 @gentlementlegen @rndquu requesting rev…	0 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 31 wordValue: 0 result: 0	0.1	0
I'm not manually adding anything it's GPT that's creating the ar…	0 content: content: p: score: 0 elementCount: 3 result: 0 regex: wordCount: 85 wordValue: 0 result: 0	0.4	0
the @ubqbot command right now that's in `development` th…	0 content: content: p: score: 0 elementCount: 4 result: 0 regex: wordCount: 83 wordValue: 0 result: 0	0.3	0
In my opinion "Ground Truths" should be considered in relation t…	7 content: content: p: score: 0 elementCount: 4 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 2 a: score: 5 elementCount: 1 result: 7 regex: wordCount: 124 wordValue: 0 result: 0	0.6	0
Dynamic chatbot ground truths QA used within my fork of this rep…	0 content: content: p: score: 0 elementCount: 3 result: 0 regex: wordCount: 41 wordValue: 0 result: 0	0.7	0
@gentlementlegen @0x4007 @sshivaditya2019 @rndquu Why can't we…	0 content: content: p: score: 0 elementCount: 2 result: 0 regex: wordCount: 22 wordValue: 0 result: 0	0.2	0
QA: https://github.com/ubq-testing/ask-plugin/issues/11#issuecom…	1.5 content: content: p: score: 0 elementCount: 3 ul: score: 1 elementCount: 1 li: score: 0.5 elementCount: 1 result: 1.5 regex: wordCount: 79 wordValue: 0 result: 0	0.8	0
lmk if there is anything holding back this PR and I'll push it f…	0 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 15 wordValue: 0 result: 0	0.1	0

[ 11.243 WXDAI ]
@0x4007

Contributions Overview

View	Contribution	Count	Reward
Issue	Comment	1	0.61
Review	Comment	7	10.633

Conversation Incentives

Comment	Formatting	Relevance	Reward
These need to be approved to be funded before work is done and e…	1.22 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 19 wordValue: 0.1 result: 1.22	0.5	0.61
OpenAI system prompts don't look like this so I'm inclined to be…	1.54 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 25 wordValue: 0.1 result: 1.54	0.7	1.078
Doesn't seem right to add this line	0.59 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 8 wordValue: 0.1 result: 0.59	0.5	0.295
In my deep experience, when I am building off of and integrating…	6.91 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 146 wordValue: 0.1 result: 6.91	0.9	6.219
We can try ground truths I was referring to capitalized words fo…	0.88 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 13 wordValue: 0.1 result: 0.88	0.6	0.528
Why are you adding redundant information in those arrays?	0.65 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 9 wordValue: 0.1 result: 0.65	0.8	0.52
I think we should not have redundant messages but they should be…	1.28 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 20 wordValue: 0.1 result: 1.28	0.6	0.768
@sshivaditya2019 you should review and decide when this pull is …	1.75 content: content: p: score: 0 elementCount: 1 result: 0 regex: wordCount: 29 wordValue: 0.1 result: 1.75	0.7	1.225

Keyrxng · 2024-10-30T03:46:22Z

@0x4007 Can you price and regen the reward please?

These need to be approved to be funded before work is done and expecting to get paid for it.

It never produced a $200 reward and I claimed the $53 so I guess it worked out

Keyrxng mentioned this issue Oct 24, 2024

feat: dynamic ground truths #14

Merged

devpool-directory-superintendent bot mentioned this issue Oct 24, 2024

Dynamic ground truths ubiquity/devpool-directory#1772

Closed

Keyrxng changed the title ~~Dynamic ground truths~~ Chatbot: Dynamic ground truths Oct 25, 2024

sshivaditya2019 closed this as completed in #14 Oct 27, 2024

0x4007 added Time: <4 Hours Priority: 2 (Medium) labels Oct 27, 2024

ubiquity-os-beta bot added the Price: 200 USD label Oct 27, 2024

0x4007 reopened this Oct 27, 2024

0x4007 closed this as completed Oct 27, 2024

Keyrxng mentioned this issue Oct 30, 2024

Task author receives no payout for completing the task ubiquity-os-marketplace/text-conversation-rewards#177

Closed

Chatbot: Dynamic ground truths #13

Chatbot: Dynamic ground truths #13

Comments

Keyrxng commented Oct 24, 2024 • edited by ubiquity-os-beta bot Loading

sshivaditya2019 commented Oct 24, 2024 • edited Loading

Keyrxng commented Oct 24, 2024

Keyrxng commented Oct 24, 2024 • edited Loading

Keyrxng commented Oct 25, 2024

ubiquity-os-beta bot commented Oct 27, 2024 • edited Loading

Keyrxng commented Oct 27, 2024

sshivaditya2019 commented Oct 27, 2024

Keyrxng commented Oct 27, 2024

sshivaditya2019 commented Oct 27, 2024 • edited Loading

Keyrxng commented Oct 27, 2024

0x4007 commented Oct 27, 2024

ubiquity-os-beta bot commented Oct 27, 2024 • edited Loading

Contributions Overview

Conversation Incentives

Contributions Overview

Conversation Incentives

Contributions Overview

Conversation Incentives

Keyrxng commented Oct 30, 2024

Keyrxng commented Oct 24, 2024 •

edited by ubiquity-os-beta bot

Loading

sshivaditya2019 commented Oct 24, 2024 •

edited

Loading

Keyrxng commented Oct 24, 2024 •

edited

Loading

ubiquity-os-beta bot commented Oct 27, 2024 •

edited

Loading

sshivaditya2019 commented Oct 27, 2024 •

edited

Loading

ubiquity-os-beta bot commented Oct 27, 2024 •

edited

Loading