Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INFO logging seems more like DEBUG #223

Open
danmcp opened this issue Jul 27, 2024 · 0 comments
Open

INFO logging seems more like DEBUG #223

danmcp opened this issue Jul 27, 2024 · 0 comments
Labels
refactor Same results, different method

Comments

@danmcp
Copy link
Member

danmcp commented Jul 27, 2024

Examples from: ilab generate --pipeline full

INFO 2024-07-27 00:46:23,461 instructlab.sdg.llmblock:50: LLM server supports batched inputs: False
INFO 2024-07-27 00:46:23,461 instructlab.sdg.pipeline:172: Running block: gen_questions
INFO 2024-07-27 00:46:23,461 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response'],
    num_rows: 5
})
INFO 2024-07-27 00:53:06,986 instructlab.sdg.pipeline:172: Running block: eval_questions
INFO 2024-07-27 00:53:06,987 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response', 'num_samples', 'question'],
    num_rows: 146
})
INFO 2024-07-27 00:58:05,372 instructlab.sdg.pipeline:172: Running block: gen_responses
INFO 2024-07-27 00:58:05,373 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response', 'question'],
    num_rows: 27
})
INFO 2024-07-27 00:58:29,652 instructlab.sdg.pipeline:172: Running block: evaluate_qa_pair
INFO 2024-07-27 00:58:29,652 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response', 'question', 'response'],
    num_rows: 18
})
INFO 2024-07-27 00:59:09,898 instructlab.sdg.pipeline:172: Running block: filter_qa_pair
INFO 2024-07-27 00:59:09,899 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'evaluation', 'score'],
    num_rows: 18
})
INFO 2024-07-27 00:59:10,443 instructlab.sdg.datamixing:123: Dataset columns: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'id', 'messages']

Opinions may vary on this. INFO logging should generally be used for messages appropriate for end users and not so chatty that it couldn't be left on all the time without being annoying. Some of these messages are probably fine but could be a little more user friendly. Exs:

LLM server supports batched inputs: False
Running block: gen_questions
Running block: eval_questions
Running block: gen_responses
Running block: evaluate_qa_pair
Running block: filter_qa_pair

They are probably all fine but better written as sentences that give the user the context to know what's going on.

Others feel more like debug. Exs:

INFO 2024-07-27 00:46:23,461 instructlab.sdg.pipeline:173: Dataset({
    features: ['task_description', 'seed_question', 'seed_response'],
    num_rows: 5
})
INFO 2024-07-27 00:59:10,443 instructlab.sdg.datamixing:123: Dataset columns: ['task_description', 'seed_question', 'seed_response', 'question', 'response', 'id', 'messages']

Perhaps there is some useful INFO there like how many tasks are being processed. A sentence stating that would probably be more appropriate for an INFO log. Otherwise it might be better to move a bunch of these to debug.

@markmc markmc modified the milestones: 0.2.2, 0.2.3 Jul 27, 2024
@nathan-weinberg nathan-weinberg added the refactor Same results, different method label Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Same results, different method
Projects
None yet
Development

No branches or pull requests

3 participants