Skip to content

nickpending/honeycomb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Honeycomb

Honeycomb is an experimental framework for autonomous log analysis, designed to explore prompt chaining and reasoning in real-world security use cases. Built entirely on Anthropic's APIs, Honeycomb focuses on providing transparency and control without relying on external abstractions like LangChain.

🚧 Work-in-Progress 🚧

Honeycomb is an evolving experiment in applying advanced prompt chaining techniques, including chain-of-density prompting, to honeypot log analysis. Current focus areas include:

  • Chunking large logs into manageable pieces for reliable processing
  • Refining prompt structures to maximize the quality and relevance of responses
  • Building autonomous analysis chains for statistical metrics, pattern detection, and threat intelligence
  • Implementing chain-of-density principles for progressive insight generation

Key Features

  • Chain-of-Density Implementation: Uses a modified version of Anthropic's chain-of-density prompting technique, adapted for security analysis instead of content generation
  • Progressive Analysis Chains: Each chain builds upon previous chains' outputs, creating increasingly sophisticated insights
  • Log Chunking: Breaks down large datasets into smaller, manageable pieces to improve processing accuracy
  • Statistical Metrics: Extracts raw statistics (e.g., request counts, IP distributions) without interpretation
  • Pattern Detection: Identifies routine and anomalous scanning activity using predefined prompts
  • End-to-End Control: Built solely with Anthropic's APIs, ensuring all functionality is explicit and customizable

Why Honeycomb?

Unlike frameworks that abstract complexity (e.g., LangChain), Honeycomb is built to expose the mechanics of prompt chaining. This makes it an excellent platform for:

  • Experimenting with autonomous reasoning in security contexts
  • Understanding the limits and strengths of Anthropic's APIs in practical applications
  • Building a foundation for more advanced AI-driven analysis workflows
  • Exploring novel applications of chain-of-density prompting in analytical tasks

Chain-of-Density Architecture

Honeycomb adapts the chain-of-density prompting technique, originally designed for content generation, into a security analysis framework:

  1. Base Layer - Metrics Collection (Chain 1):

    • Focuses on pure data extraction
    • Generates structured statistical measurements
    • Creates foundation for further analysis
  2. Pattern Recognition Layer (Chain 2):

    • Takes metrics from Chain 1
    • Identifies routine patterns and known behaviors
    • Groups related activities for context
  3. Anomaly Analysis Layer (Chain 3):

    • Uses output from previous chains
    • Identifies deviations from routine patterns
    • Focuses on novel or interesting behaviors
  4. Aggregation Layer (Chain 5):

    • Combines metrics across all chunks
    • Consolidates statistical insights
  5. Pattern Consolidation Layer (Chain 6):

    • Deduplicates and merges routine patterns
    • Creates unified view of activities
  6. Summary Layer (Chain 4):

    • Synthesizes all previous chain outputs
    • Produces human-readable insights
    • Maintains information density while ensuring clarity

Current Status

Implemented:

  1. Log Chunking: Divides large log files for better handling and processing
  2. Chain-of-Density Prompt Chains:
    • Base metrics collection (Chain 1)
    • Pattern recognition (Chain 2)
    • Anomaly detection (Chain 3)
    • Metric aggregation (Chain 5)
    • Pattern deduplication (Chain 6)
    • Summary generation (Chain 4)

In Progress:

  • Enhancing chunking logic for large and diverse datasets
  • Adding contextual correlation in pattern detection
  • Testing outputs with real-world honeypot data
  • Fine-tuning prompt designs for better chain integration
  • Optimizing chain-of-density implementation for security analysis

TODO:

  1. Chunk Size Optimization:

    • Implement quantitative metrics:
      • Pattern detection accuracy across chunk sizes
      • Token usage and performance metrics
      • Pattern coverage and uniqueness measures
    • Develop qualitative evaluation framework:
      • Analysis quality assessment
      • Pattern context preservation
      • Temporal relationship maintenance
    • Create test datasets with known patterns
    • Build automated chunk size evaluation tools
  2. Chain Enhancement:

    • Add feedback loops between chains
    • Implement cross-chunk pattern correlation
    • Develop chain-specific performance metrics
    • Create chain output validation mechanisms
  3. Analysis Improvements:

    • Enhanced temporal pattern detection
    • Geographic attack pattern correlation
    • Payload similarity analysis
    • Attack campaign identification
    • Multi-honeypot data correlation
  4. System Architecture:

    • Implement caching for intermediate results
    • Add support for distributed processing
    • Create visualization components
    • Develop real-time analysis capabilities
    • Add export formats for popular SIEM systems
  5. Documentation and Testing:

    • Create comprehensive API documentation
    • Add unit tests for each chain
    • Develop integration test suite
    • Write detailed deployment guides
    • Create example analysis notebooks

Installation

  1. Clone the repository:

    git clone https://github.com/nickpending/honeycomb.git
    cd honeycomb
  2. Install dependencies:

    pip install -r requirements.txt
  3. Setup your API key:

    export ANTHROPIC_API_KEY="your_api_key"

Usage

  1. Configure your config.yaml file with the prompt templates:

    prompt_chain_one: |
      # Metrics collection prompt...
    prompt_chain_two: |
      # Pattern recognition prompt...
    # ... other chain prompts
  2. Run the script with desired arguments:

    python honeycomb.py --chunk-size 100 --debug --input logs.json

    Available arguments:

    • --chunk-size: Number of log entries per analysis chunk
    • --debug: Enable debug logging
    • --input: Input log file path
    • --output: Output directory (optional)
  3. View outputs in the console or in the specified output directory.

Contributing

Contributions are welcome! To get started:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature-branch)
  3. Submit a pull request with your changes

Please ensure your PR:

  • Includes appropriate tests
  • Updates relevant documentation
  • Follows the project's coding style
  • Describes the changes made and their purpose

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Honeycomb draws inspiration from:

  • Anthropic's chain-of-density prompting research
  • Cybersecurity intelligence workflows
  • Honeypot log analysis best practices
  • Autonomous AI reasoning methodologies

Citing Honeycomb

If you use Honeycomb in your research or projects, please cite:

@software{honeycomb2024,
  title = {Honeycomb: Chain-of-Density Security Log Analysis},
  author = {[Your Name]},
  year = {2024},
  url = {https://github.com/nickpending/honeycomb}
}

Stay tuned for updates as Honeycomb evolves into a robust, intelligent log analysis platform!

About

AI-driven log analysis for HTTP-based honeypots

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages