-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supports structured logging. #444
base: master
Are you sure you want to change the base?
Conversation
…rough using environment variables.
…in py311, py312, py313.
@TlexCypher Thanks for your contribution! Could you check the CI errors? |
@hirosassa Thank you for reply. |
Current workI'm working on fix CI errors. Error Content[gw12] darwin -- Python 3.9.6 /Users/araki/personal/gokart/.tox/py39/bin/python
self = <test.testing.test_run_with_empty_data_frame.TestTestFrameworkForPandasDataFrame testMethod=test_run_with_namespace>
def test_run_with_namespace(self):
argv = [
f'{__name__}.DummyWorkFlowWithoutError',
'--local-scheduler',
'--test-run-pandas',
f'--test-run-namespace={__name__}',
'--log-level=CRITICAL',
'--no-lock',
]
logger = logging.getLogger('gokart.testing.check_if_run_with_empty_data_frame')
with patch.object(logger, 'info') as mock_debug:
with self.assertRaises(SystemExit) as exit_code:
gokart.run(argv)
> log_str = mock_debug.call_args[0][0]
E TypeError: 'NoneType' object is not subscriptable
test/testing/test_run_with_empty_data_frame.py:102: TypeError
```python
_______________________________________ TestTestFrameworkForPandasDataFrame.test_run_with_error _______________________________________
[gw12] darwin -- Python 3.9.6 /Users/araki/personal/gokart/.tox/py39/bin/python
self = <test.testing.test_run_with_empty_data_frame.TestTestFrameworkForPandasDataFrame testMethod=test_run_with_error>
def test_run_with_error(self):
argv = [f'{__name__}.DummyWorkFlowWithError', '--local-scheduler', '--test-run-pandas', '--log-level=CRITICAL', '--no-lock']
logger = logging.getLogger('gokart.testing.check_if_run_with_empty_data_frame')
with patch.object(logger, 'info') as mock_debug:
with self.assertRaises(SystemExit) as exit_code:
gokart.run(argv)
> log_str = mock_debug.call_args[0][0]
E TypeError: 'NoneType' object is not subscriptable
test/testing/test_run_with_empty_data_frame.py:85: TypeError This contribution is related logging, also from error content, in both cases, the source of problem is log_str is None. How to reproduceExecutre following command on this branch(support-slog) If you are familiar with this kind of errors, way to reproduce on tox env, please tell me. |
In my local environment, all of the |
…ogger configuration file has been loaded.
Hi, @hirosassa . |
The reason why CI errors were occured.By disabling parallel test execution, I found that the test failures became consistent, with the same test failing every time. Based on these two points, I suspected that my newly added implementation and test might have somehow broken the Logger configuration. In the implementation I added in The reason the test initially appeared flaky was that, when running tests in parallel, the slog test would sometimes run after After applying the fix, I confirmed that the tests now pass consistently, regardless of whether the parallel execution option is enabled or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed investigation.
gokart/slog_config.py
Outdated
|
||
class SlogConfig(object): | ||
""" | ||
LoggerConfig is for logging configuration, Utility-class. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoggerConfig is for logging configuration, Utility-class. | |
SlogConfig is for logging configuration, Utility-class. |
gokart/slog_config.py
Outdated
On the other hand, set logging configuration as plain text. | ||
""" | ||
logger_mode = os.environ.get('GOKART_LOGGER_FORMAT') | ||
if not logger_mode or logger_mode.lower() == 'json': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep default behavior (especially for current users), if the GOKART_LOGGER_FORMAT
is not set, I prefer the log format should be text.
What do you think about this idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, structured logging is more useful than text simple logging as metrics, so default logging format should be structured logging.
If user wanna change log format, user should change GOKART_LOGGER_FORMAT
, or write logging
.ini, logging configuration file.
But you bet, I can understand your opinion, for current user, it might be breaking changes, so avoid to make slog configuration default.
As I checked, if user write their original logging configuration, that configuration is prioritized.
In conclusion, both are acceptable, and I follow maintainers opinion.
pyproject.toml
Outdated
@@ -26,6 +26,10 @@ dependencies = [ | |||
"dill", | |||
"backoff", | |||
"typing-extensions>=4.11.0; python_version<'3.13'", | |||
"python-json-logger>=3.2.1", | |||
"notebook>=7.3.2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is unnecessary dependency. Could you please remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python-json-logger is necessary for adding JsonFormatter. But notebook is not, so removed.
pyproject.toml
Outdated
"tox>=4.24.1", | ||
"tox-uv>=1.25.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These dependencies are unnecessary for production use (only for tests). Could you move these items to test
dependecy-group?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this dependency is not necessary even for test.
I made a mistake, add tox and tox-uv...
So, remove this.
gokart/slog_config.py
Outdated
default_date_format = '%Y/%m/%d %H:%M:%S' | ||
|
||
@staticmethod | ||
def apply_slog_format(logger): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[IMO] This function selects log format and returns logger with the format. So there's the situation that the logger format is not "structured".
We should change the name of this function to such as apply_log_format
, switch_log_format
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. I adopt swich_log_format
from your suggestion. thx.
gokart/slog_config.py
Outdated
last_resort_handler = logging.lastResort | ||
if not last_resort_handler: | ||
last_resort_handler = logging.StreamHandler() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last_resort_handler = logging.lastResort | |
if not last_resort_handler: | |
last_resort_handler = logging.StreamHandler() | |
last_resort_handler = logging.lastResort if logging.lastResort is not None else logging.StreamHandler() |
gokart/slog_config.py
Outdated
# plain text mode, so nothing is applied. | ||
elif logger_mode.lower() == 'text': | ||
return logger | ||
else: | ||
raise Exception(f'Unknown logger format: {logger_mode}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These elif
and else
block is short. Let's use "early return" on the start part of this function to improve readability.
[IMO] It seems that providing built-in logging functionality for first-party use may not fall within the scope of Gokart, as users are typically expected to choose their own logging libraries and formats. However, allowing Gokart (or Luigi) to switch log formats could be quite helpful, as configuring this can be a bit challenging for users. It might be worth considering an option that simplifies this customization without enforcing a specific logging solution. What do you think? |
Thank you for thoughtful and stimulating comments. And also, I think gokart should support structured logging as default. |
@TlexCypher By the way, python-json-logger is archived.... 😢 |
Source issue.
#124 (Thank you for @hirosassa )
Summary
This PR implements SlogConfig, which supports switch functionalities structured logging and plain text logging.
Of course, I confirmed end-to-end workflow with samples on provided issue, then structured logging and plain text logging works well.
Changes
Added SlogConfig and its tests.
If user export ${GOKART_LOGGING_FORMAT}=json, all logs are dumped as json, structured logging for machine.
On the other hand, the environment variables, ${GOKART_LOGGING_FORMAT}=text, all logs are dumped as plain text, for human. And also, no supported value is set, raise exception.
Also, we need user friendly interface, so implemented decorator function. So user import gokart.getLoggert instead of logging.getLogger.
Potential Discussion Points
The interface of logging might be the important discussion point.
This is my first OSS contribution, so please let me know how my code will be improved!