INFO logging seems more like DEBUG #223

danmcp · 2024-07-27T01:08:38Z

Examples from: ilab generate --pipeline full

INFO 2024-07-27 00:46:23,461 instructlab.sdg.llmblock:50: LLM server supports batched inputs: False
INFO 2024-07-27 00:46:23,461 instructlab.sdg.pipeline:172: Running block: gen_questions
INFO 2024-07-27 00:46:23,461 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response'],
    num_rows: 5
})
INFO 2024-07-27 00:53:06,986 instructlab.sdg.pipeline:172: Running block: eval_questions
INFO 2024-07-27 00:53:06,987 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
    num_rows: 146
})
INFO 2024-07-27 00:58:05,372 instructlab.sdg.pipeline:172: Running block: gen_responses
INFO 2024-07-27 00:58:05,373 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response', 'question'],
    num_rows: 27
})
INFO 2024-07-27 00:58:29,652 instructlab.sdg.pipeline:172: Running block: evaluate_qa_pair
INFO 2024-07-27 00:58:29,652 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
    num_rows: 18
})
INFO 2024-07-27 00:59:09,898 instructlab.sdg.pipeline:172: Running block: filter_qa_pair
INFO 2024-07-27 00:59:09,899 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
    num_rows: 18
})
INFO 2024-07-27 00:59:10,443 instructlab.sdg.datamixing:123: Dataset columns: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'id', 'messages']

Opinions may vary on this. INFO logging should generally be used for messages appropriate for end users and not so chatty that it couldn't be left on all the time without being annoying. Some of these messages are probably fine but could be a little more user friendly. Exs:

LLM server supports batched inputs: False
Running block: gen_questions
Running block: eval_questions
Running block: gen_responses
Running block: evaluate_qa_pair
Running block: filter_qa_pair

They are probably all fine but better written as sentences that give the user the context to know what's going on.

Others feel more like debug. Exs:

INFO 2024-07-27 00:46:23,461 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response'],
    num_rows: 5
})
INFO 2024-07-27 00:59:10,443 instructlab.sdg.datamixing:123: Dataset columns: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'id', 'messages']

Perhaps there is some useful INFO there like how many tasks are being processed. A sentence stating that would probably be more appropriate for an INFO log. Otherwise it might be better to move a bunch of these to debug.

The text was updated successfully, but these errors were encountered:

markmc modified the milestones: 0.2.2, 0.2.3 Jul 27, 2024

nathan-weinberg added the refactor Same results, different method label Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INFO logging seems more like DEBUG #223

INFO logging seems more like DEBUG #223

danmcp commented Jul 27, 2024

INFO logging seems more like DEBUG #223

INFO logging seems more like DEBUG #223

Comments

danmcp commented Jul 27, 2024