Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dynamic ground truths #14

Conversation

Keyrxng
Copy link
Member

@Keyrxng Keyrxng commented Oct 24, 2024

Resolves #13

Copy link

github-actions bot commented Oct 24, 2024

Unused types (1)

Filename types
src/types/llm.ts GroundTruthsSystemMessage

@Keyrxng Keyrxng marked this pull request as ready for review October 24, 2024 04:06
@Keyrxng
Copy link
Member Author

Keyrxng commented Oct 24, 2024

Marking as ready for review to get eyes on it and opinions as-is.

Picked from e6586a4

QA: ubq-testing#6 (comment)

Each array belongs to the review it performed on the QA PR, it also contains the actual review (I haven't started to refine the review prompt).

Spec that it's sourcing truths from here

As I said I don't have full context but if the truths are better sourced from something else let me know but this seems appropriate at least from the purpose of #11.

The ones currently in use in development appear to be more like categories/themes/genres to me, not sure if this approach defeats the purpose of them or not.

[
  'The bot should initiate review when a pull request is created as a draft and finalized by the contributor.',
  'The bot should parse the issue specification and pull request diff to assess compliance.',
  'If the pull request does not meet the specification, the bot should provide actionable feedback and change the review state to requested changes.',
  'The bot should convert non-compliant pulls back to draft status if they fail the specification check.',
  "The bot should only leave a 'commented' state for pulls that meet the specification.",
  'If a collaborator re-finalizes a draft pull, the bot should stop further interventions.',
  'The inspection process should be triggered only during initial creation and when a draft is finalized by the pull author.'
]
[
  'The bot should verify that the pull request is initially opened as a draft.',
  'The bot should check for changes from draft to finalized pull request status for initiating review.',
  'The bot needs to check pull request diffs against the issue specification for compliance.',
  'The bot should provide actionable feedback for specification discrepancies in the review.',
  'If the pull request does not meet specifications, the bot should convert it back to draft and request changes.',
  "If the pull request meets specifications, the bot should mark it as 'commented' without approval.",
  'The bot must refrain from intervening if a collaborator changes the pull request back to finalized.',
  'The bot’s intervention should be limited to triggers on pull creation and author-led status changes.',
  'Optionally handle Continuous Integration (CI) checks separately due to external factors.',
  'Consider implementing a daily limit on bot reviews per user to prevent abuse of the review system.'
]
[
  'The contributor must initially open the pull request as a draft.',
  'When the pull request is ready for review, the contributor should convert it to a finalized pull request.',
  'The bot should analyze the issue specification along with the pull request diff.',
  'The bot should provide actionable feedback indicating any missing specifications.',
  "If the pull request doesn't meet the specification, the bot should require changes and revert the pull back to a draft.",
  'If the pull request meets the specification, the bot should leave a comment without approval.',
  'The bot must not intervene if a collaborator changes the pull request from draft to finalized.',
  'The bot should only conduct inspections upon pull creation and when the author finalizes a draft.',
  'Optional: Ensure CI passes, but account for potential external failures.',
  'Optional: Limit bot reviews to one per day per contributor to prevent excessive use for minor changes.'
]

@Keyrxng
Copy link
Member Author

Keyrxng commented Oct 24, 2024

@0x4007 @sshivaditya2019 @gentlementlegen @rndquu requesting review

CI can be ignored as it's used in #11 but not here, or I can comment it out or something so it passes CI.

@Keyrxng Keyrxng mentioned this pull request Oct 24, 2024
@0x4007
Copy link
Member

0x4007 commented Oct 24, 2024

Why are you adding redundant information in those arrays?

@Keyrxng
Copy link
Member Author

Keyrxng commented Oct 24, 2024

Why are you adding redundant information in those arrays?

I'm not manually adding anything it's GPT that's creating the array contents based on the spec and prompt, that's it.

Without the context and add. info I requested here #13, I'm not 100% how to refine and improve inline with @sshivaditya2019' original intention for them, I know how I'd refine them personally but this is not my show.

Right now GPT is consuming the task spec and creating these outputs based on this prompt and settings to completions endpoint

@Keyrxng
Copy link
Member Author

Keyrxng commented Oct 24, 2024

the @ubqbot command right now that's in development the prompt looks like this regarding ground truths:

You Must obey the following ground truths: ["typescript" : "github" : "cloudflare worker" : "actions" : "jest" : "supabase": "openai"]

Which doesn't make a whole lot of sense to me without the additional context. These are more like a classification of the subject areas of the tech stack involved in the query/task/org?

If that's the true intention of "Ground Truths" then I know how to refactor. Or is how I'm using them the correct way to use them for my use-case?

@0x4007
Copy link
Member

0x4007 commented Oct 24, 2024

I think we should not have redundant messages but they should be more substantial than the keywords we have now.

@Keyrxng
Copy link
Member Author

Keyrxng commented Oct 24, 2024

In my opinion "Ground Truths" should be considered in relation to the use-case if they are intended to guardrail the model to conform to a specific workflow, which we can consider different applications, i.e chatbot vs code review.

  • @ubqbot: General Organization Chatbot. It's truths should be like "The org uses ...these tech stacks only consider these in your response.", "This repo uses ...these frameworks/libs", etc...
  • pull precheck: Pull Request Review. It's truths should be based on the task spec (and maybe some of our contribution standards) as that's the source of truth for this application of the model, proving the spec is implemented. It's truths are dynamic, if we had a contributing.md or something in each repo we could ground truth those. E.g: No JS files, No empty strings, etc

See this comment for my suggestion on dynamically generating the chatbot ground truths

@Keyrxng
Copy link
Member Author

Keyrxng commented Oct 25, 2024

Dynamic chatbot ground truths QA used within my fork of this repo so it's pulling the deps and languages of this repo.

ubq-testing#8
ubq-testing#9

 [
  {
    role: 'system',
    content: '\n' +
      'Using the input provided, your goal is to produce an array of strings that represent "Ground Truths."\n' +
      'These ground truths are high-level abstractions that encapsulate the tech stack and dependencies of the repository.\n' +
      '  \n' +
      'Each ground truth should:\n' +
      '- Be succinct and easy to understand.\n' +
      '- Use only the information provided in the input.\n' +
      '- Focus on essential requirements, behaviors, or assumptions involved in the repository.\n' +
      '  \n' +
      'Example:\n' +
      'Languages: { TypeScript: 60%, JavaScript: 15%, HTML: 10%, CSS: 5%, ... }\n' +
      'Dependencies: Esbuild, Wrangler, React, Tailwind CSS, ms, React-carousel, React-icons, ...\n' +
      'Dev Dependencies: @types/node, @types/jest, @mswjs, @testing-library/react, @testing-library/jest-dom, @Cypress ...\n' +
      'Ground Truths:\n' +
      '- The repo predominantly uses TypeScript, with JavaScript, HTML, and CSS also present.\n' +
      '- The repo is a React project that uses Tailwind CSS.\n' +
      '- The project is built with Esbuild and deployed with Wrangler, indicating a Cloudflare Workers project.\n' +
      '- The repo tests use Jest, Cypress, mswjs, and React Testing Library.\n' +
      '  \n' +
      'Conditions:\n' +
      'Assume your output builds the foundation for a chatbot to understand the repository when asked an arbitrary query.\n' +
      'Do not list every language or dependency, focus on the most prevalent ones.\n' +
      'Focus on what is essential to understand the repository at a high level.\n' +
      'Brevity is key. Use zero formatting. Do not wrap in quotes, backticks, or other characters.\n' +
      'response === ["some", "array", "of", "strings"]\n' +
      '  \n' +
      'Generate similar ground truths adhering to a maximum of 10.\n' +
      '  \n' +
      'Return a JSON parsable array of strings representing the ground truths, without comment or directive.'
  },
  {
    role: 'user',
    content: '{"dependencies":{"@mswjs/data":"^0.16.2","@octokit/rest":"20.1.1","@octokit/webhooks":"13.2.7","@sinclair/typebox":"0.32.33","@supabase/supabase-js":"^2.45.4","@ubiquity-dao/ubiquibot-logger":"^1.3.0","dotenv":"^16.4.5","openai":"^4.63.0","typebox-validators":"0.3.5","voyageai":"^0.0.1-5"},"devDependencies":{"@actions/core":"^1.11.1","@actions/github":"^6.0.0","@commitlint/cli":"19.3.0","@commitlint/config-conventional":"19.2.2","@cspell/dict-node":"5.0.1","@cspell/dict-software-terms":"3.4.6","@cspell/dict-typescript":"3.1.5","@eslint/js":"9.5.0","@jest/globals":"29.7.0","@types/jest":"^29.5.12","@types/node":"20.14.5","cspell":"8.9.0","eslint":"9.5.0","eslint-config-prettier":"9.1.0","eslint-plugin-check-file":"2.8.0","eslint-plugin-prettier":"5.1.3","eslint-plugin-sonarjs":"1.0.3","husky":"9.0.11","jest":"29.7.0","jest-junit":"16.0.0","jest-md-dashboard":"0.8.0","knip":"5.21.2","lint-staged":"15.2.7","npm-run-all":"4.1.5","prettier":"3.3.2","ts-jest":"29.1.5","tsx":"4.15.6","typescript":"5.4.5","typescript-eslint":"7.13.1","wrangler":"^3.81.0"},"languages":[["TypeScript",0.9235672829913418],["PLpgSQL",0.03861807956191261],["JavaScript",0.03622889642996839],["Shell",0.00158574101677714]]}'
  }
]
languages:  [                                                                                                                                                                               
  [ 'TypeScript', 0.9235672829913418 ],                                                                                                                                                     
  [ 'PLpgSQL', 0.03861807956191261 ],                                                                                                                                                       
  [ 'JavaScript', 0.03622889642996839 ],
  [ 'Shell', 0.00158574101677714 ]
]
Ground Truths:  [
  'The repository is primarily written in TypeScript, with some PLpgSQL and JavaScript code.',
  'The project uses Supabase for backend services.',
  'Integration with GitHub APIs is handled via Octokit.',
  "The application leverages OpenAI's API for AI functionalities.",
  'Jest is used as the testing framework, configured for TypeScript.',
  'ESLint and Prettier are employed for code linting and formatting.',
  'GitHub Actions manage the CI/CD workflows.',
  'Husky and lint-staged are set up for pre-commit hooks.',
  'The project is deployed using Wrangler, indicating deployment to Cloudflare Workers.',
  'Commit messages are enforced using Commitlint with conventional commit standards.'
]
 [
  {
    role: 'system',
    content: '\n' +
      'Using the input provided, your goal is to produce an array of strings that represent "Ground Truths."\n' +
      'These ground truths are high-level abstractions that encapsulate the tech stack and dependencies of the repository.\n' +
      '  \n' +
      'Each ground truth should:\n' +
      '- Be succinct and easy to understand.\n' +
      '- Use only the information provided in the input.\n' +
      '- Focus on essential requirements, behaviors, or assumptions involved in the repository.\n' +
      '  \n' +
      'Example:\n' +
      'Languages: { TypeScript: 60%, JavaScript: 15%, HTML: 10%, CSS: 5%, ... }\n' +
      'Dependencies: Esbuild, Wrangler, React, Tailwind CSS, ms, React-carousel, React-icons, ...\n' +
      'Dev Dependencies: @types/node, @types/jest, @mswjs, @testing-library/react, @testing-library/jest-dom, @Cypress ...\n' +
      'Ground Truths:\n' +
      '- The repo predominantly uses TypeScript, with JavaScript, HTML, and CSS also present.\n' +
      '- The repo is a React project that uses Tailwind CSS.\n' +
      '- The project is built with Esbuild and deployed with Wrangler, indicating a Cloudflare Workers project.\n' +
      '- The repo tests use Jest, Cypress, mswjs, and React Testing Library.\n' +
      '  \n' +
      'Conditions:\n' +
      'Assume your output builds the foundation for a chatbot to understand the repository when asked an arbitrary query.\n' +
      'Do not list every language or dependency, focus on the most prevalent ones.\n' +
      'Focus on what is essential to understand the repository at a high level.\n' +
      'Brevity is key. Use zero formatting. Do not wrap in quotes, backticks, or other characters.\n' +
      'response === ["some", "array", "of", "strings"]\n' +
      '  \n' +
      'Generate similar ground truths adhering to a maximum of 10.\n' +
      '  \n' +
      'Return a JSON parsable array of strings representing the ground truths, without comment or directive.'
  },
  {
    role: 'user',
    content: '{"dependencies":{"@mswjs/data":"^0.16.2","@octokit/rest":"20.1.1","@octokit/webhooks":"13.2.7","@sinclair/typebox":"0.32.33","@supabase/supabase-js":"^2.45.4","@ubiquity-dao/ubiquibot-logger":"^1.3.0","dotenv":"^16.4.5","openai":"^4.63.0","typebox-validators":"0.3.5","voyageai":"^0.0.1-5"},"devDependencies":{"@actions/core":"^1.11.1","@actions/github":"^6.0.0","@commitlint/cli":"19.3.0","@commitlint/config-conventional":"19.2.2","@cspell/dict-node":"5.0.1","@cspell/dict-software-terms":"3.4.6","@cspell/dict-typescript":"3.1.5","@eslint/js":"9.5.0","@jest/globals":"29.7.0","@types/jest":"^29.5.12","@types/node":"20.14.5","cspell":"8.9.0","eslint":"9.5.0","eslint-config-prettier":"9.1.0","eslint-plugin-check-file":"2.8.0","eslint-plugin-prettier":"5.1.3","eslint-plugin-sonarjs":"1.0.3","husky":"9.0.11","jest":"29.7.0","jest-junit":"16.0.0","jest-md-dashboard":"0.8.0","knip":"5.21.2","lint-staged":"15.2.7","npm-run-all":"4.1.5","prettier":"3.3.2","ts-jest":"29.1.5","tsx":"4.15.6","typescript":"5.4.5","typescript-eslint":"7.13.1","wrangler":"^3.81.0"},"languages":[["TypeScript",0.9235672829913418],["PLpgSQL",0.03861807956191261],["JavaScript",0.03622889642996839],["Shell",0.00158574101677714]]}'
  }
]
languages:  [                                                                                                                                                                               
  [ 'TypeScript', 0.9235672829913418 ],                                                                                                                                                     
  [ 'PLpgSQL', 0.03861807956191261 ],                                                                                                                                                       
  [ 'JavaScript', 0.03622889642996839 ],
  [ 'Shell', 0.00158574101677714 ]
]
Ground Truths:  [
  'The repository is primarily written in TypeScript with minor use of JavaScript and PLpgSQL.',
  'It integrates with Supabase for backend services.',
  'The project leverages OpenAI for AI functionalities.',
  'Environment variables are managed using dotenv.',
  'Deployment is handled with Wrangler, indicating Cloudflare Workers usage.',
  'The development setup includes Jest for testing and ESLint for linting.',
  'GitHub Actions are employed for continuous integration and deployment workflows.',
  'Commit messages are standardized using Commitlint and enforced with Husky hooks.',
  'The project uses @octokit libraries for GitHub API interactions and webhooks.',
  'TypeScript is utilized with typebox for schema validation and type safety.'
]

@Keyrxng
Copy link
Member Author

Keyrxng commented Oct 25, 2024

@gentlementlegen @0x4007 @sshivaditya2019 @rndquu

Why can't we request reviews in this org lmao? Anyway this is ready for review team, thanks.

.gitignore Outdated Show resolved Hide resolved
src/types/llm.ts Show resolved Hide resolved
@Keyrxng
Copy link
Member Author

Keyrxng commented Oct 25, 2024

QA: ubq-testing#11 (comment)

future improvements:

  • make categories like testing, architecture etc so we can get fuller bodied results for specific areas i.e below it just says Jest is utilized but it would be better if it also included @msjw as it would know what testing db setup to use etc, so each category can have an individual little prompt.

with every usage of @ubqbot we get (I've wrapped in backticks so you all can see)

<!-- Ubiquity - LLM Ground Truths and Token Usage - runPlugin - undefined
{
  "metadata": {
    "groundTruths": [
      "The repository is primarily written in TypeScript with some PLpgSQL and JavaScript.",
      "Supabase is used for backend services and database management.",
      "GitHub Actions are integrated for continuous integration and deployment workflows.",
      "Jest is utilized for testing the codebase.",
      "ESLint and Prettier are employed for code linting and formatting.",
      "The project leverages OpenAI APIs for its functionalities.",
      "Wrangler indicates that the project is deployed on Cloudflare Workers.",
      "TypeBox is used for type definitions and schema validations.",
      "Husky and lint-staged manage Git hooks and enforce code quality.",
      "Environment variables are handled using dotenv."
    ],
    "tokenUsage": {
      "input": 923,
      "output": 46,
      "total": 969
    }
  },
  "caller": "runPlugin"
}
-->

@0x4007
Copy link
Member

0x4007 commented Oct 26, 2024

@sshivaditya2019 you should review and decide when this pull is ready. I encourage QA for changes to prove they work and ideally you should also test as a reviewer

@Keyrxng Keyrxng mentioned this pull request Oct 27, 2024
@Keyrxng
Copy link
Member Author

Keyrxng commented Oct 27, 2024

lmk if there is anything holding back this PR and I'll push it forward

@sshivaditya2019
Copy link
Collaborator

sshivaditya2019 commented Oct 27, 2024

lmk if there is anything holding back this PR and I'll push it forward

I was working on setting this up in the repo, sorry for the delay. LGTM!

@sshivaditya2019 sshivaditya2019 merged commit 2a1e15b into ubiquity-os-marketplace:development Oct 27, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Chatbot: Dynamic ground truths
3 participants