Supports structured logging. #444

TlexCypher · 2025-02-22T17:21:26Z

Source issue.

Summary

This PR implements SlogConfig, which supports switch functionalities structured logging and plain text logging.
Of course, I confirmed end-to-end workflow with samples on provided issue, then structured logging and plain text logging works well.

Changes

Added SlogConfig and its tests.
If user export ${GOKART_LOGGING_FORMAT}=json, all logs are dumped as json, structured logging for machine.
On the other hand, the environment variables, ${GOKART_LOGGING_FORMAT}=text, all logs are dumped as plain text, for human. And also, no supported value is set, raise exception.
Also, we need user friendly interface, so implemented decorator function. So user import gokart.getLoggert instead of logging.getLogger.

Potential Discussion Points

The interface of logging might be the important discussion point.

This is my first OSS contribution, so please let me know how my code will be improved!

…rough using environment variables.

…in py311, py312, py313.

hirosassa · 2025-02-23T23:13:16Z

@TlexCypher Thanks for your contribution! Could you check the CI errors?

TlexCypher · 2025-02-24T05:27:58Z

@hirosassa Thank you for reply.
By the way, of cource, I should work on fixing CI errors as you said, but I think current implementation is not following Python standard when library provides logging configration, because of adding handlers. So, I will convert this PR to draft.
I work on both fixing CI errors and new implementations, so wait for a new code, please.

TlexCypher · 2025-02-24T08:03:42Z

Current work

I'm working on fix CI errors.
Failure is ocurred on TestTestFrameworkForPandasDataFrame::test_run_with_error and TestTestFrameworkForPandasDataFrame.test_run_with_namespace when execute tox -e py39, but when I execure tox -e py312, all tests have passed.
And also, I create Python3.9 virtual env with uv, and run target test, tests failed on tox with python 3.9 has passed.
Do you know how to reproduce tox environment? I wanna fix these CI errors on the tox env, with py3.9.

Error Content

[gw12] darwin -- Python 3.9.6 /Users/araki/personal/gokart/.tox/py39/bin/python

self = <test.testing.test_run_with_empty_data_frame.TestTestFrameworkForPandasDataFrame testMethod=test_run_with_namespace>

    def test_run_with_namespace(self):
        argv = [
            f'{__name__}.DummyWorkFlowWithoutError',
            '--local-scheduler',
            '--test-run-pandas',
            f'--test-run-namespace={__name__}',
            '--log-level=CRITICAL',
            '--no-lock',
        ]
        logger = logging.getLogger('gokart.testing.check_if_run_with_empty_data_frame')
        with patch.object(logger, 'info') as mock_debug:
            with self.assertRaises(SystemExit) as exit_code:
                gokart.run(argv)
>       log_str = mock_debug.call_args[0][0]
E       TypeError: 'NoneType' object is not subscriptable

test/testing/test_run_with_empty_data_frame.py:102: TypeError




```python
_______________________________________ TestTestFrameworkForPandasDataFrame.test_run_with_error _______________________________________
[gw12] darwin -- Python 3.9.6 /Users/araki/personal/gokart/.tox/py39/bin/python

self = <test.testing.test_run_with_empty_data_frame.TestTestFrameworkForPandasDataFrame testMethod=test_run_with_error>

    def test_run_with_error(self):
        argv = [f'{__name__}.DummyWorkFlowWithError', '--local-scheduler', '--test-run-pandas', '--log-level=CRITICAL', '--no-lock']
        logger = logging.getLogger('gokart.testing.check_if_run_with_empty_data_frame')
        with patch.object(logger, 'info') as mock_debug:
            with self.assertRaises(SystemExit) as exit_code:
                gokart.run(argv)
>       log_str = mock_debug.call_args[0][0]
E       TypeError: 'NoneType' object is not subscriptable

test/testing/test_run_with_empty_data_frame.py:85: TypeError

This contribution is related logging, also from error content, in both cases, the source of problem is log_str is None.

How to reproduce

Executre following command on this branch(support-slog)
$tox -e py39 # [Expected] Exact two cases might fail.
$tox -e py312 # [Expected] All testcases might pass.

If you are familiar with this kind of errors, way to reproduce on tox env, please tell me.
I'm still working on fixing CI errors.

hirosassa · 2025-02-24T10:01:46Z

In my local environment, all of the tox tests run successfully 🤔
I'll take a look this deeply.

…ogger configuration file has been loaded.

TlexCypher · 2025-03-02T12:34:35Z

Hi, @hirosassa .
I found the reason why CI errors are not fixed, and success to fix it.
So, I'll re-open this PR.
Please review it.
Sorry for late response.
Below is the description, how I found the way to fix CI errors.

TlexCypher · 2025-03-02T12:47:06Z

The reason why CI errors were occured.

By disabling parallel test execution, I found that the test failures became consistent, with the same test failing every time.
Additionally, as expected, there were no test failures on the master branch.

Based on these two points, I suspected that my newly added implementation and test might have somehow broken the Logger configuration.
Upon inspecting the source file of the failing test, test/testing/test_run_with_empty_data_frame.py, I realized that it was reusing the test Logger configuration defined in gokart/testing/check_if_run_with_empty_data_frame.py.

In the implementation I added in test/test_slog_config.py, I had removed the existing Logger configuration. This caused the failure because gokart/testing/check_if_run_with_empty_data_frame.py could no longer reuse the predefined Logger configuration.

The reason the test initially appeared flaky was that, when running tests in parallel, the slog test would sometimes run after test/testing/test_run_with_empty_data_frame.py, in which case the removal of the existing Logger configuration did not cause a failure. This made it seem flaky.

After applying the fix, I confirmed that the tests now pass consistently, regardless of whether the parallel execution option is enabled or not.

hirosassa

Thanks for the detailed investigation.

hirosassa · 2025-03-02T14:54:42Z

gokart/slog_config.py

+
+class SlogConfig(object):
+    """
+    LoggerConfig is for logging configuration, Utility-class.


Suggested change

LoggerConfig is for logging configuration, Utility-class.

SlogConfig is for logging configuration, Utility-class.

hirosassa · 2025-03-02T15:00:04Z

gokart/slog_config.py

+        On the other hand, set logging configuration as plain text.
+        """
+        logger_mode = os.environ.get('GOKART_LOGGER_FORMAT')
+        if not logger_mode or logger_mode.lower() == 'json':


To keep default behavior (especially for current users), if the GOKART_LOGGER_FORMAT is not set, I prefer the log format should be text.
What do you think about this idea?

In my opinion, structured logging is more useful than text simple logging as metrics, so default logging format should be structured logging.
If user wanna change log format, user should change GOKART_LOGGER_FORMAT, or write logging.ini, logging configuration file.
But you bet, I can understand your opinion, for current user, it might be breaking changes, so avoid to make slog configuration default.
As I checked, if user write their original logging configuration, that configuration is prioritized.
In conclusion, both are acceptable, and I follow maintainers opinion.

hirosassa · 2025-03-02T15:03:56Z

pyproject.toml

@@ -26,6 +26,10 @@ dependencies = [
  "dill",
  "backoff",
  "typing-extensions>=4.11.0; python_version<'3.13'",
+  "python-json-logger>=3.2.1",
+  "notebook>=7.3.2",


I think this is unnecessary dependency. Could you please remove this?

python-json-logger is necessary for adding JsonFormatter. But notebook is not, so removed.

hirosassa · 2025-03-02T15:05:06Z

pyproject.toml

+  "tox>=4.24.1",
+  "tox-uv>=1.25.0",


These dependencies are unnecessary for production use (only for tests). Could you move these items to test dependecy-group?

Ah, this dependency is not necessary even for test.
I made a mistake, add tox and tox-uv...
So, remove this.

hirosassa · 2025-03-02T15:17:30Z

gokart/slog_config.py

+    default_date_format = '%Y/%m/%d %H:%M:%S'
+
+    @staticmethod
+    def apply_slog_format(logger):


[IMO] This function selects log format and returns logger with the format. So there's the situation that the logger format is not "structured".
We should change the name of this function to such as apply_log_format, switch_log_format, etc.

Exactly. I adopt swich_log_format from your suggestion. thx.

hirosassa · 2025-03-02T15:20:12Z

gokart/slog_config.py

+                last_resort_handler = logging.lastResort
+                if not last_resort_handler:
+                    last_resort_handler = logging.StreamHandler()


Suggested change

last_resort_handler = logging.lastResort

if not last_resort_handler:

last_resort_handler = logging.StreamHandler()

last_resort_handler = logging.lastResort if logging.lastResort is not None else logging.StreamHandler()

hirosassa · 2025-03-02T15:21:49Z

gokart/slog_config.py

+        # plain text mode, so nothing is applied.
+        elif logger_mode.lower() == 'text':
+            return logger
+        else:
+            raise Exception(f'Unknown logger format: {logger_mode}')


These elif and else block is short. Let's use "early return" on the start part of this function to improve readability.

hiro-o918 · 2025-03-02T15:44:40Z

[IMO]

It seems that providing built-in logging functionality for first-party use may not fall within the scope of Gokart, as users are typically expected to choose their own logging libraries and formats.

However, allowing Gokart (or Luigi) to switch log formats could be quite helpful, as configuring this can be a bit challenging for users. It might be worth considering an option that simplifies this customization without enforcing a specific logging solution.

What do you think?

TlexCypher · 2025-03-02T15:57:23Z

[IMO]

It seems that providing built-in logging functionality for first-party use may not fall within the scope of Gokart, as users are typically expected to choose their own logging libraries and formats.

However, allowing Gokart (or Luigi) to switch log formats could be quite helpful, as configuring this can be a bit challenging for users. It might be worth considering an option that simplifies this customization without enforcing a specific logging solution.

What do you think?

Thank you for thoughtful and stimulating comments.
Hmm, as u said, built-in structured logging functionalities is not scope for gokart, because gokart is just a ML pipeline library.
So, my implementation is following this thought.
If user writes logging.ini as usual, that configuration is prioritized.

And also, I think gokart should support structured logging as default.
Many programming languages and libraries support structured logging feature recent years.
For example golang, at first, standard organization did't support structured logging, so zap, zerolog, many structured logging libraries were developed by community.
As you seen in golang history, structured logging demand is growing, and especially for ML, we need the way to handle and gather bunch of parameters and logs easily.
So in my opinion, some kind of specific logging solution is blazingly important especially for ML libraries, but at the same time, flexibility is also important, so my implementation is following that.

…ame.

hiro-o918 · 2025-03-03T05:24:58Z

@TlexCypher
I understand.
You mean user have a choices to use gokart.getlooger or anther one, don't you?

By the way, python-json-logger is archived.... 😢
So we should omit this dependencies.

https://github.com/madzak/python-json-logger

TlexCypher added 3 commits February 23, 2025 01:28

feat: Add structured logging feature, configuration can be changed th…

513be51

…rough using environment variables.

test: add test.

d78de2d

CI: fix CI. Applied mypy and ruff. Also confirm passed all testcases …

7f009cc

…in py311, py312, py313.

TlexCypher changed the title ~~Support structured logging.~~ Supports structured logging. Feb 22, 2025

hirosassa requested review from hirosassa, yokomotod, Hi-king, kitagry and mski-iksm February 23, 2025 11:46

TlexCypher marked this pull request as draft February 24, 2025 05:28

TlexCypher added 2 commits February 24, 2025 16:26

WIP: Add GokartLogger and wrap original logging.Logger class.

3a212a0

CI: pass mypy, ruff.

48b79c4

TlexCypher added 2 commits March 2, 2025 18:57

feat: Add an implementation to skip applying the slog config if the l…

163acc0

…ogger configuration file has been loaded.

feat: remove redundant print debug.

60e6f93

feat: fix ci errors.

2b1e5c3

TlexCypher marked this pull request as ready for review March 2, 2025 12:36

hirosassa reviewed Mar 2, 2025

View reviewed changes

TlexCypher added 3 commits March 3, 2025 01:16

feat: change to early return, remove redundant if statement.

69155b8

feat: change test method name as following system under test method n…

d5ef44e

…ame.

feat: remove redundant dependencies.

b3d2207

TlexCypher requested a review from hirosassa March 2, 2025 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supports structured logging. #444

Supports structured logging. #444

TlexCypher commented Feb 22, 2025 •

edited

Loading

hirosassa commented Feb 23, 2025

TlexCypher commented Feb 24, 2025 •

edited

Loading

TlexCypher commented Feb 24, 2025 •

edited

Loading

hirosassa commented Feb 24, 2025

TlexCypher commented Mar 2, 2025 •

edited

Loading

TlexCypher commented Mar 2, 2025

hirosassa left a comment

hirosassa Mar 2, 2025

hirosassa Mar 2, 2025

TlexCypher Mar 2, 2025 •

edited

Loading

hirosassa Mar 2, 2025

TlexCypher Mar 2, 2025

hirosassa Mar 2, 2025

TlexCypher Mar 2, 2025

hirosassa Mar 2, 2025

TlexCypher Mar 2, 2025

hirosassa Mar 2, 2025

hirosassa Mar 2, 2025 •

edited

Loading

hiro-o918 commented Mar 2, 2025

TlexCypher commented Mar 2, 2025 •

edited

Loading

hiro-o918 commented Mar 3, 2025

	LoggerConfig is for logging configuration, Utility-class.
	SlogConfig is for logging configuration, Utility-class.

Supports structured logging. #444

Are you sure you want to change the base?

Supports structured logging. #444

Conversation

TlexCypher commented Feb 22, 2025 • edited Loading

Source issue.

Summary

Changes

Potential Discussion Points

hirosassa commented Feb 23, 2025

TlexCypher commented Feb 24, 2025 • edited Loading

TlexCypher commented Feb 24, 2025 • edited Loading

Current work

Error Content

How to reproduce

hirosassa commented Feb 24, 2025

TlexCypher commented Mar 2, 2025 • edited Loading

TlexCypher commented Mar 2, 2025

The reason why CI errors were occured.

hirosassa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TlexCypher Mar 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hirosassa Mar 2, 2025 • edited Loading

Choose a reason for hiding this comment

hiro-o918 commented Mar 2, 2025

TlexCypher commented Mar 2, 2025 • edited Loading

hiro-o918 commented Mar 3, 2025

TlexCypher commented Feb 22, 2025 •

edited

Loading

TlexCypher commented Feb 24, 2025 •

edited

Loading

TlexCypher commented Feb 24, 2025 •

edited

Loading

TlexCypher commented Mar 2, 2025 •

edited

Loading

TlexCypher Mar 2, 2025 •

edited

Loading

hirosassa Mar 2, 2025 •

edited

Loading

TlexCypher commented Mar 2, 2025 •

edited

Loading