Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add workflow for daily knowledge json/csv file #20

Open
madjin opened this issue Jan 5, 2025 · 1 comment
Open

feat: add workflow for daily knowledge json/csv file #20

madjin opened this issue Jan 5, 2025 · 1 comment

Comments

@madjin
Copy link
Contributor

madjin commented Jan 5, 2025

have a file that any AI agent can add as knowledge that gives them good RAG abilities of the github repo

update it daily. It can go something like date,type(issue,pr,commit),title,author, etc

@madjin
Copy link
Contributor Author

madjin commented Jan 5, 2025

Here are some useful jq commands to transform your contributor activity data:

  1. Basic activity timeline (commits, PRs, and issues):
jq -r '.[] | .activity | (.code.commits[] | ["commit", .created_at, .message, .author] | @csv), (.code.pull_requests[] | ["pr", .created_at, .title, .author] | @csv), (.issues.opened[] | ["issue", .created_at, .title, .author] | @csv)' input.json > activity_timeline.csv
  1. Extract detailed PR information:
jq -r '.[] | .activity.code.pull_requests[] | [.created_at, "pr", .title, .author, .state, .body] | @csv' input.json > pr_details.csv
  1. Create commit activity summary:
jq -r '.[] | .activity.code.commits[] | [.created_at, .sha[0:7], .message, (.additions + .deletions | tostring + " changes"), .changed_files] | @csv' input.json > commit_summary.csv
  1. Combine contributor summaries:
jq -r '.[] | [.contributor, .score, .summary, .activity.code.total_commits, .activity.code.total_prs] | @csv' input.json > contributor_summary.csv

You can then use these CSV files as input for Eliza's knowledge system using the following approach:

import { readFileSync } from 'fs';
import { parse } from 'csv-parse/sync';

// Read and parse CSV
const activityData = readFileSync('activity_timeline.csv');
const records = parse(activityData);

// Create knowledge items in chunks
for (const record of records) {
    await knowledge.set(runtime, {
        id: generateId(), // Use appropriate ID generation
        content: {
            text: `On ${record[1]}, ${record[3]} ${record[0]}ed: ${record[2]}`,
            type: record[0],
            author: record[3],
            date: record[1]
        }
    }, 512); // Using default chunk size
}

The knowledge system will:

  1. Preprocess the text (remove special characters, normalize whitespace, etc.)
  2. Split into chunks of appropriate size (default 512 tokens)
  3. Generate embeddings for each chunk
  4. Store in the knowledge database for later retrieval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant