Skip to content

Releases: MultiTonic/thinking-dataset

Thinking Dataset v0.0.2: Enhanced XML Processing & Pipeline Optimization

02 Feb 02:39
Compare
Choose a tag to compare

Release v0.0.2 - Performance & Validation Update

CMS_Higgs-event

A Framework for Strategic Business Insights

Overview

The Thinking Dataset Project v0.0.2 focuses on pipeline optimization and data validation improvements. This release introduces enhanced XML processing capabilities, increased batch processing efficiency, and refined template validation, making the framework more robust and performant for large-scale data operations.

Key Features

  • Improved Batch Processing: Support for 1000-row batches
  • Enhanced Validation: Schema-based element validation framework
  • XML Processing: Advanced formatting and validation for responses
  • Template System: Improved metadata handling and extraction logic
  • Asset Management: New initialization module for better resource control

Technical Improvements

  • Streamlined logging system
  • Unified method signatures
  • Enhanced error handling
  • Improved code documentation
  • Better memory management

Breaking Changes

None - All improvements maintain backward compatibility

Bug Fixes

  • Resolved "NoneNoneNone" output issue in export command @vashisthrahul13
  • Enhanced error handling in XML processing
  • Improved template validation reliability

Dependencies

  • Python 3.12+
  • Updated SQLite integration
  • Enhanced Ollama provider support
  • Latest HuggingFace API compatibility

What's New

  • Improved XML validation framework by @p3nGu1nZz
  • Enhanced batch processing capabilities
  • Streamlined logging system implementation
  • Template system improvements

This release focuses on stability, performance, and code quality while maintaining our commitment to robust data processing and validation.

Full Changelog: v0.0.1...v0.0.2

What's Changed

New Contributors

Full Changelog: v0.0.1...v0.0.2

Thinking Dataset v0.0.1: A Foundation for Strategic AI-Driven Business Intelligence with STaR Case Study Generation

26 Jan 15:00
Compare
Choose a tag to compare

OIG 4CuuligKRCqhaUoOtA

A Framework for Strategic Business Insights

Release v0.0.1 - Initial Release

Overview

The Thinking Dataset Project v0.0.1 introduces the foundational framework for generating strategic business insights and STaR (Situation, Task, Action, Result) case studies. This initial release sets up the basic infrastructure for analyzing complex strategic scenarios, ethical dilemmas, and decision-making processes.

Key Features

  • Basic Pipeline Infrastructure: Initial implementation of data ingestion and preprocessing pipelines
  • SQLite Database Setup: Basic data storage and management system
  • CLI Tool: Essential command-line interface for basic dataset operations
  • Initial Adapters: Basic support for Hugging Face and Ollama endpoints
  • Core Documentation: Basic documentation covering installation and usage

Components

Core Features

  • Data ingestion functionality
  • Preprocessing pipeline
  • Foundational case study format
  • Model evaluation framework
  • Adapter implementations

Technical Implementation

  • Python 3.12+ support
  • SQLite and SQLAlchemy integration
  • Pipeline configuration
  • Automatic environment setup
  • Robust logging

Installation

Prerequisites

  • Python 3.10 or later
  • Git
  • A cloud-based account (e.g., OpenAI) or a GPU (RTX 3090 or greater) for processing, or both

Setup

  1. Clone the repository:

    git clone https://github.com/MultiTonic/thinking-dataset.git
    cd thinking-dataset
  2. Install uv package manager:

    First add the package into the global environment:

    pip install uv

    Then add uv tools directory to PATH*:

    uv tool update-shell
  3. Set up the project:

    uv run setup

    *You may need to restart your terminal session for the changes to update.

This will create a virtual environment, install the project dependencies, and activate the virtual environment.

  1. Set up environment variables:

    Copy the .env.sample file to .env and change the values as needed:

    cp .env.sample .env

    Update the .env file with your credentials:

    # Required settings
    HF_ORG="my_huggingface_organization"
    HF_USER="my_huggingface_username"
    HF_READ_TOKEN="my_huggingface_read_access_token"
    HF_WRITE_TOKEN="my_huggingface_write_access_token"
    
    # Required configuration
    CONFIG_PATH="config/config.yaml"
    
    # One or more providers
    OLLAMA_SERVER_URL="http://localhost:11434"
    OPENAI_API_TOKEN="your_openai_api_token"
    RUNPOD_API_TOKEN="your_runpod_api_token"

Breaking Changes

None (Initial Release)

Bug Fixes

  • Initial release, no bug fixes to report

Dependencies

  • Python 3.10+
  • SQLite
  • pandas
  • scikit-learn
  • rich
  • python-dotenv
  • Hugging Face Transformers
  • Ollama

Setup

  1. Clone the repository:

    git clone https://github.com/MultiTonic/thinking-dataset.git
    cd thinking-dataset
  2. Install uv package manager:

    First add the package into the global environment:

    pip install uv

    Then add uv tools directory to PATH*:

    uv tool update-shell
  3. Set up the project:

    uv run setup

    *You may need to restart your terminal session for the changes to update.

This will create a virtual environment, install the project dependencies, and activate the virtual environment.

  1. Set up environment variables:

    Copy the .env.sample file to .env and change the values as needed:

    cp .env.sample .env

    Update the .env file with your credentials:

    # Required settings
    HF_ORG="my_huggingface_organization"
    HF_USER="my_huggingface_username"
    HF_READ_TOKEN="my_huggingface_read_access_token"
    HF_WRITE_TOKEN="my_huggingface_write_access_token"
    
    # Required configuration
    CONFIG_PATH="config/config.yaml"
    
    # One or more providers
    OLLAMA_SERVER_URL="http://localhost:11434"
    OPENAI_API_TOKEN="your_openai_api_token"
    RUNPOD_API_TOKEN="your_runpod_api_token"
  • Runpod
  • Additional dependencies listed in thinking-dataset.toml

Security Updates

  • Initial security configurations implemented
  • Basic authentication and authorization flows established

Documentation Updates

  • Added initial project documentation
  • Included installation guide
  • Basic usage instructions
  • Architecture overview

Known Issues

  1. Limited to text-based data processing in this release
  2. GPU support requires RTX 3090 or greater
  3. Some advanced features planned for future releases
  4. Can only think, reasoning coming in v0.02!

Upgrade Instructions

Initial release - no upgrade needed.

Contributors

Special thanks to our initial contributors:

  • Kara Rawson (Lead Engineer)
  • Joseph Pollack (Creator & Business Leader)
  • MultiTonic Team

Support

For support and questions:

Version

  • Release: v0.0.1
  • Date: 2024-01-25
  • Commit: 0716d8d (Initial commit)

What's Changed

New Contributors

Full Changelog: https://github.com/MultiTonic/thinking-dataset/commits/v0.0.1