feat: add workflow for daily knowledge json/csv file #20

madjin · 2025-01-05T06:47:42Z

have a file that any AI agent can add as knowledge that gives them good RAG abilities of the github repo

update it daily. It can go something like date,type(issue,pr,commit),title,author, etc

madjin · 2025-01-05T06:55:15Z

Here are some useful jq commands to transform your contributor activity data:

Basic activity timeline (commits, PRs, and issues):

jq -r '.[] | .activity | (.code.commits[] | ["commit", .created_at, .message, .author] | @csv), (.code.pull_requests[] | ["pr", .created_at, .title, .author] | @csv), (.issues.opened[] | ["issue", .created_at, .title, .author] | @csv)' input.json > activity_timeline.csv

Extract detailed PR information:

jq -r '.[] | .activity.code.pull_requests[] | [.created_at, "pr", .title, .author, .state, .body] | @csv' input.json > pr_details.csv

Create commit activity summary:

jq -r '.[] | .activity.code.commits[] | [.created_at, .sha[0:7], .message, (.additions + .deletions | tostring + " changes"), .changed_files] | @csv' input.json > commit_summary.csv

Combine contributor summaries:

jq -r '.[] | [.contributor, .score, .summary, .activity.code.total_commits, .activity.code.total_prs] | @csv' input.json > contributor_summary.csv

You can then use these CSV files as input for Eliza's knowledge system using the following approach:

import { readFileSync } from 'fs';
import { parse } from 'csv-parse/sync';

// Read and parse CSV
const activityData = readFileSync('activity_timeline.csv');
const records = parse(activityData);

// Create knowledge items in chunks
for (const record of records) {
    await knowledge.set(runtime, {
        id: generateId(), // Use appropriate ID generation
        content: {
            text: `On ${record[1]}, ${record[3]} ${record[0]}ed: ${record[2]}`,
            type: record[0],
            author: record[3],
            date: record[1]
        }
    }, 512); // Using default chunk size
}

The knowledge system will:

Preprocess the text (remove special characters, normalize whitespace, etc.)
Split into chunks of appropriate size (default 512 tokens)
Generate embeddings for each chunk
Store in the knowledge database for later retrieval

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add workflow for daily knowledge json/csv file #20

feat: add workflow for daily knowledge json/csv file #20

madjin commented Jan 5, 2025

madjin commented Jan 5, 2025

feat: add workflow for daily knowledge json/csv file #20

feat: add workflow for daily knowledge json/csv file #20

Comments

madjin commented Jan 5, 2025

madjin commented Jan 5, 2025