Releases: MultiTonic/thinking-dataset
Thinking Dataset v0.0.2: Enhanced XML Processing & Pipeline Optimization
Release v0.0.2 - Performance & Validation Update
A Framework for Strategic Business Insights
Overview
The Thinking Dataset Project v0.0.2 focuses on pipeline optimization and data validation improvements. This release introduces enhanced XML processing capabilities, increased batch processing efficiency, and refined template validation, making the framework more robust and performant for large-scale data operations.
Key Features
- Improved Batch Processing: Support for 1000-row batches
- Enhanced Validation: Schema-based element validation framework
- XML Processing: Advanced formatting and validation for responses
- Template System: Improved metadata handling and extraction logic
- Asset Management: New initialization module for better resource control
Technical Improvements
- Streamlined logging system
- Unified method signatures
- Enhanced error handling
- Improved code documentation
- Better memory management
Breaking Changes
None - All improvements maintain backward compatibility
Bug Fixes
- Resolved "NoneNoneNone" output issue in export command @vashisthrahul13
- Enhanced error handling in XML processing
- Improved template validation reliability
Dependencies
- Python 3.12+
- Updated SQLite integration
- Enhanced Ollama provider support
- Latest HuggingFace API compatibility
What's New
- Improved XML validation framework by @p3nGu1nZz
- Enhanced batch processing capabilities
- Streamlined logging system implementation
- Template system improvements
This release focuses on stability, performance, and code quality while maintaining our commitment to robust data processing and validation.
Full Changelog: v0.0.1...v0.0.2
What's Changed
- Update pylint.yml by @p3nGu1nZz in #81
- fix : update exceptions.py to avoid printing 'NoneNoneNoneNoneNoneNon… by @vashisthrahul13 in #84
New Contributors
- @vashisthrahul13 made their first contribution in #84
Full Changelog: v0.0.1...v0.0.2
Thinking Dataset v0.0.1: A Foundation for Strategic AI-Driven Business Intelligence with STaR Case Study Generation
A Framework for Strategic Business Insights
Release v0.0.1 - Initial Release
Overview
The Thinking Dataset Project v0.0.1 introduces the foundational framework for generating strategic business insights and STaR (Situation, Task, Action, Result) case studies. This initial release sets up the basic infrastructure for analyzing complex strategic scenarios, ethical dilemmas, and decision-making processes.
Key Features
- Basic Pipeline Infrastructure: Initial implementation of data ingestion and preprocessing pipelines
- SQLite Database Setup: Basic data storage and management system
- CLI Tool: Essential command-line interface for basic dataset operations
- Initial Adapters: Basic support for Hugging Face and Ollama endpoints
- Core Documentation: Basic documentation covering installation and usage
Components
Core Features
- Data ingestion functionality
- Preprocessing pipeline
- Foundational case study format
- Model evaluation framework
- Adapter implementations
Technical Implementation
- Python 3.12+ support
- SQLite and SQLAlchemy integration
- Pipeline configuration
- Automatic environment setup
- Robust logging
Installation
Prerequisites
- Python 3.10 or later
- Git
- A cloud-based account (e.g., OpenAI) or a GPU (RTX 3090 or greater) for processing, or both
Setup
-
Clone the repository:
git clone https://github.com/MultiTonic/thinking-dataset.git cd thinking-dataset
-
Install
uv
package manager:First add the package into the global environment:
pip install uv
Then add uv tools directory to PATH*:
uv tool update-shell
-
Set up the project:
uv run setup
*You may need to restart your terminal session for the changes to update.
This will create a virtual environment, install the project dependencies, and activate the virtual environment.
-
Set up environment variables:
Copy the
.env.sample
file to.env
and change the values as needed:cp .env.sample .env
Update the
.env
file with your credentials:# Required settings HF_ORG="my_huggingface_organization" HF_USER="my_huggingface_username" HF_READ_TOKEN="my_huggingface_read_access_token" HF_WRITE_TOKEN="my_huggingface_write_access_token" # Required configuration CONFIG_PATH="config/config.yaml" # One or more providers OLLAMA_SERVER_URL="http://localhost:11434" OPENAI_API_TOKEN="your_openai_api_token" RUNPOD_API_TOKEN="your_runpod_api_token"
Breaking Changes
None (Initial Release)
Bug Fixes
- Initial release, no bug fixes to report
Dependencies
- Python 3.10+
- SQLite
- pandas
- scikit-learn
- rich
- python-dotenv
- Hugging Face Transformers
- Ollama
Setup
-
Clone the repository:
git clone https://github.com/MultiTonic/thinking-dataset.git cd thinking-dataset
-
Install
uv
package manager:First add the package into the global environment:
pip install uv
Then add uv tools directory to PATH*:
uv tool update-shell
-
Set up the project:
uv run setup
*You may need to restart your terminal session for the changes to update.
This will create a virtual environment, install the project dependencies, and activate the virtual environment.
-
Set up environment variables:
Copy the
.env.sample
file to.env
and change the values as needed:cp .env.sample .env
Update the
.env
file with your credentials:# Required settings HF_ORG="my_huggingface_organization" HF_USER="my_huggingface_username" HF_READ_TOKEN="my_huggingface_read_access_token" HF_WRITE_TOKEN="my_huggingface_write_access_token" # Required configuration CONFIG_PATH="config/config.yaml" # One or more providers OLLAMA_SERVER_URL="http://localhost:11434" OPENAI_API_TOKEN="your_openai_api_token" RUNPOD_API_TOKEN="your_runpod_api_token"
- Runpod
- Additional dependencies listed in
thinking-dataset.toml
Security Updates
- Initial security configurations implemented
- Basic authentication and authorization flows established
Documentation Updates
- Added initial project documentation
- Included installation guide
- Basic usage instructions
- Architecture overview
Known Issues
- Limited to text-based data processing in this release
- GPU support requires RTX 3090 or greater
- Some advanced features planned for future releases
- Can only think, reasoning coming in v0.02!
Upgrade Instructions
Initial release - no upgrade needed.
Contributors
Special thanks to our initial contributors:
- Kara Rawson (Lead Engineer)
- Joseph Pollack (Creator & Business Leader)
- MultiTonic Team
Support
For support and questions:
- Create an issue on GitHub
- Join our Discord
- Email: [email protected]
Version
- Release: v0.0.1
- Date: 2024-01-25
- Commit: 0716d8d (Initial commit)
What's Changed
- Kev With Code - 🏆 by @Josephrp in #3
- Added Dynamic Variables into our configuration file. by @p3nGu1nZz in #45
- Renamed Prepare to Process by @Daksh2000 in #64
New Contributors
- @Josephrp made their first contribution in #3
- @p3nGu1nZz made their first contribution in #45
Full Changelog: https://github.com/MultiTonic/thinking-dataset/commits/v0.0.1