[Feature Request] Custom "chat" HF datasets #1088

chimezie · 2024-11-03T23:11:10Z

The LoRa tuners's local datasets support the following data format:

{"messages": [{"role": "system", "content": "You are a helpful assistant."}, 
              {"role": "user", "content": "Hello."}, 
              {"role": "assistant", "content": "How can I assistant you today."}]}

Some HF datasets, such as the UltraFeedback dataset, used for Direct Preference Optimization (see: HF DPO trainer and #513) use a (json) data format such as the following:

[ { "content": "...", "role": "user" }, { "content": "...", "role": "assistant" } ]

To incorporate the use of such HF datasets, it would be helpful to to generalize the use of prompt_feature, text_feature, and completion_feature to include chat_feature, which indicates the HF dataset feature to use for the chat template structure.

The text was updated successfully, but these errors were encountered:

…, adds support for custom chat HF datasets (ml-explore#1088), and fixes (ml-explore#1087)

chimezie added a commit to chimezie/mlx-examples that referenced this issue Nov 4, 2024

Generalize HF datasets to a collection of HF dataasets via datasets…

9df7bbb

…, adds support for custom chat HF datasets (ml-explore#1088), and fixes (ml-explore#1087)

chimezie mentioned this issue Nov 4, 2024

Generalize HF datasets to a collection of HF datasets via hf_datasets #1090

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Custom "chat" HF datasets #1088

[Feature Request] Custom "chat" HF datasets #1088

chimezie commented Nov 3, 2024 •

edited

Loading

[Feature Request] Custom "chat" HF datasets #1088

[Feature Request] Custom "chat" HF datasets #1088

Comments

chimezie commented Nov 3, 2024 • edited Loading

chimezie commented Nov 3, 2024 •

edited

Loading