Turn code into Markdown for LLMs with one simple terminal command
Fetches all code files in the current directory, ignoring what's in .gitignore
and .codefetchignore
, then outputs them into a single Markdown file with line numbers.
Click here for a Demo & Videos
Basic usage with output file and tree
npx codefetch
# You codebase will be saved to `codefetch/codebase.md`
Include a default prompt:
npx codefetch -p improve
Include a tree with depth
npx codefetch -t 3
Filter by file extensions:
npx codefetch -e .ts,.js -o typescript-files.md --token-encoder cl100k
Include or exclude specific files and directories:
# Exclude node_modules and public directories
npx codefetch --exclude-dir test,public
# Include only TypeScript files
npx codefetch --include-files "*.ts" -o typescript-only.md
# Include src directory, exclude test files
npx codefetch --include-dir src --exclude-files "*.test.ts" -o src-no-tests.md
Dry run (only output to console)
npx codefetch --d
If no output file is specified (-o
or --output
), it will print to codefetch/codebase.md
Option | Description |
---|---|
-o, --output <file> |
Specify output filename (defaults to codebase.md) |
--dir <path> |
Specify the directory to scan (defaults to current directory) |
--max-tokens <number> |
Limit output tokens (default: 500,000) |
-e, --extension <ext,...> |
Filter by file extensions (e.g., .ts,.js) |
--token-limiter <type> |
Token limiting strategy when using --max-tokens (sequential, truncated) |
--include-files <pattern,...> |
Include specific files (supports patterns like *.ts) |
--exclude-files <pattern,...> |
Exclude specific files (supports patterns like *.test.ts) |
--include-dir <dir,...> |
Include specific directories |
--exclude-dir <dir,...> |
Exclude specific directories |
-v, --verbose [level] |
Show processing information (0=none, 1=basic, 2=debug) |
-t, --project-tree [depth] |
Generate visual project tree (optional depth, default: 2) |
--token-encoder <type> |
Token encoding method (simple, p50k, o200k, cl100k) |
--disable-line-numbers |
Disable line numbers in output |
-d, --dry-run |
Output markdown to stdout instead of file |
All options that accept multiple values use comma-separated lists. File patterns support simple wildcards:
*
matches any number of characters?
matches a single character
You can generate a visual tree representation of your project structure:
# Generate tree with default depth (2 levels)
npx codefetch --project-tree
# Generate tree with custom depth
npx codefetch -t 3
# Generate tree and save code to file
npx codefetch -t 2 -o output.md
Example output:
Project Tree:
└── my-project
├── src
│ ├── index.ts
│ ├── types.ts
│ └── utils
├── tests
│ └── index.test.ts
└── package.json
You can add predefined or custom prompts to your output:
# Use default prompt (looks for codefetch/prompts/default.md)
npx codefetch -p
npx codefetch --prompt
# Use built-in prompts
npx codefetch -p fix # fixes codebase
npx codefetch -p improve # improves codebase
npx codefetch -p codegen # generates code
npx codefetch -p testgen # generates tests
# Use custom prompts
npx codefetch --prompt custom-prompt.md
npx codefetch -p my-architect.txt
Create custom prompts in codefetch/prompts/
directory:
- Create a markdown file (e.g.,
codefetch/prompts/my-prompt.md
) - Use it with
--prompt my-prompt.md
You can also set a default prompt in your codefetch.config.mjs
:
export default {
defaultPromptFile: "dev", // Use built-in prompt
}
export default {
defaultPromptFile: "custom-prompt.md", // Use custom prompt file
}
The prompt resolution order is:
- CLI argument (
-p
or--prompt
) - Config file prompt setting
- No prompt if neither is specified
When using just -p
or --prompt
without a value, codefetch will look for codefetch/prompts/default.md
.
When using --max-tokens
, you can control how tokens are distributed across files using the --token-limiter
option:
# Sequential mode - process files in order until reaching token limit
npx codefetch --max-tokens 500 --token-limiter sequential
# Truncated mode (default) - distribute tokens evenly across all files
npx codefetch --max-tokens 500 --token-limiter truncated
sequential
: Processes files in order until the total token limit is reached. Useful when you want complete content from the first files.truncated
: Distributes tokens evenly across all files, showing partial content from each file. This is the default mode and is useful for getting an overview of the entire codebase.
codefetch supports two ways to ignore files:
.gitignore
- Respects your project's existing.gitignore
patterns.codefetchignore
- Additional patterns specific to codefetch
The .codefetchignore
file works exactly like .gitignore
and is useful when you want to ignore files that aren't in your .gitignore
.
Codefetch uses a set of default ignore patterns to exclude common files and directories that typically don't need to be included in code reviews or LLM analysis.
You can view the complete list of default patterns in default-ignore.ts.
Codefetch supports different token counting methods to match various AI models:
simple
: Basic word-based estimation (not very accurate but fastest!)p50k
: GPT-3 style tokenizationo200k
: gpt-4o style tokenizationcl100k
: GPT-4 style tokenization
Select the appropriate encoder based on your target model:
# For GPT-4o
npx codefetch --token-encoder o200k
By default (unless using --dry-run) codefetch will:
- Create a
codefetch/
directory in your project - Store all output files in this directory
This ensures that:
- Your fetched code is organized in one place
- The output directory won't be fetched so we avoid fetching the codebase again
Add codefetch/
to your .gitignore
file to avoid committing the fetched codebase.
You can use this command to create code-to-markdown in bolt.new, cursor.com, ... and ask the AI chat for guidance about your codebase.
npm install -g codefetch
codefetch -o output.md
Initialize your project with codefetch:
npx codefetch init
This will:
- Create a
.codefetchignore
file for excluding files - Generate a
codefetch.config.mjs
with your preferences - Set up the project structure
Create a codefetch.config.mjs
file in your project root:
export default {
// Output settings
outputPath: "codefetch", // Directory for output files
outputFile: "codebase.md", // Output filename
maxTokens: 999_000, // Token limit
disableLineNumbers: false, // Toggle line numbers in output
// Processing options
verbose: 1, // Logging level (0=none, 1=basic, 2=debug)
projectTree: 2, // Project tree depth
defaultIgnore: true, // Use default ignore patterns
gitignore: true, // Respect .gitignore
dryRun: false, // Output to console instead of file
// Token handling
tokenEncoder: "simple", // Token counting method (simple, p50k, o200k, cl100k)
tokenLimiter: "truncated", // Token limiting strategy
// File filtering
extensions: [".ts", ".js"], // File extensions to include
includeFiles: ["src/**/*.ts"], // Files to include (glob patterns)
excludeFiles: ["**/*.test.ts"], // Files to exclude
includeDirs: ["src", "lib"], // Directories to include
excludeDirs: ["test", "dist"], // Directories to exclude
// AI/LLM settings
trackedModels: [
"chatgpt-4o-latest",
"claude-3-5-sonnet-20241022",
"o1",
"deepseek-v3",
"gemini-exp-1206",
],
// Prompt handling
prompt: "dev", // Built-in prompt or custom prompt file
defaultChat: "https://chat.com", // Default chat URL
templateVars: {}, // Variables for template substitution
}
All configuration options are optional and will fall back to defaults if not specified. You can override any config option using CLI arguments.
- X/Twitter: @kregenrek
- Bluesky: @kevinkern.dev
- Learn Cursor AI: Ultimate Cursor Course
- Learn to build software with AI: AI Builder Hub
- codefetch - Turn code into Markdown for LLMs with one simple terminal command
- aidex A CLI tool that provides detailed information about AI language models, helping developers choose the right model for their needs.
- codetie - XCode CLI
This project was inspired by