Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENG-833: Create config file for additional configuration options. #138

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

timmyjose
Copy link
Contributor

Changes:

  • Current Config API is maintained as is (with the change to Command).
  • Added a new configuration mode - via a TOML file.
  • Added a config directory with sample config.toml.example file.
  • Added core and extra sections in the TOML config file. Only core is used for now; also using static keys (struct)
    for TOML instead of as HashMap (say) for better type checking.
  • The order of priority (CLI > ENV > TOML > DEFAULTS) is maintained.
  • Converted the existing Config to use clap builder mode.
  • Change BLOCK_CACHE_SIZE to BLOCK_CACHE_COUNT to make it uniform.
  • Changed BLOCK_TXN_COUNT env to BLOCK_TXNS_COUNT to make it uniform.
  • Added missing help for BLOCK_CACHE_COUNT.
  • Added error type for configuration errors.
  • Renamed Command in config to PolybaseCommand to avoid confusion.
  • Updated uses in main.rs.`

Status: Open for review.

  - Added a new configuration mode - via a TOML file.
  - Added a config directory with sample `config.toml.example` file.
  - Added `core` and `extra` sections in the TOML config file.
  - The order of priority (CLI > ENV > TOML > DEFAULTS) is maintained.
  - Converted the existing `Config` to use `clap` builder mode.
  - Change `BLOCK_CACHE_SIZE` to `BLOCK_CACHE_COUNT` to make it uniform.
  - Changed `BLOCK_TXN_COUNT` env to `BLOCK_TXNS_COUNT` to make it
    uniform.
  - Added missing help for `BLOCK_CACHE_COUNT`.
  - Added error type for configuration errors.
  - Renamed `Command` in config to `PolybaseCommand` to avoid confusion.
  - Updated uses in `main.rs`.`
@linear
Copy link

linear bot commented Jun 17, 2023

ENG-833 Create config file for additional configuration options

Problem

Currently we use clap to enable configuration via command line (e.g. --option) and env vars (e.g. ENV_NAME). However, we will probably end up with a large number of configurable options, therefore it would be useful to also allow configuration via a file.

Requirements

  • All configuration options would be available via the config file, but only key config options will be available via cli and env. For now, all existing configs will be available in all places, as we don't have too many configuration options.
  • Configuration should be searched for in config dir of the rootDir provided.
  • Priority order of configuration would be (in case of duplicates): clienvfile (with cli being the highest priority)
  • We should use TOML for the file based configuration.

@timmyjose
Copy link
Contributor Author

Keeping the PR open here while I work on the failing tests and clippy fixes, plus thorough testing to ensure existing behaviour is maintained.

  - Fixes failing api tests: changed `restrict-namespaces to flag
    instead of an option.`
@timmyjose timmyjose requested review from calummoore and mateuszmlc and removed request for calummoore June 17, 2023 12:55
@timmyjose
Copy link
Contributor Author

Keeping the PR open here while I work on the failing tests and clippy fixes, plus thorough testing to ensure existing behaviour is maintained.

Tests fixed. Will test thoroughly tomorrow.

@timmyjose timmyjose requested a review from calummoore June 17, 2023 12:58
  - Fixed incorrect path for TOML config file - moved to
    `root_dir/config/config.toml`
  - Removed `root_dir` option from TOML config (due to the above
    constraint)
@timmyjose
Copy link
Contributor Author

Fixed basic bug in config file location, tested. Ready for review.

@calummoore
Copy link
Contributor

One of the things I wanted to avoid, was having the clap/toml in two places, as that means you have to update to places for every config change. Is there any way around that?

@timmyjose
Copy link
Contributor Author

timmyjose commented Jun 19, 2023

One of the things I wanted to avoid, was having the clap/toml in two places, as that means you have to update to places for every config change. Is there any way around that?

(Assuming you are referring to the duplication of the Config struct in two places).

I did explore that as well while experimenting. I saw a few issues unifying the config into a single one:

The biggest potential change would be the API. Basically, all of the possible options would have to become Option types in the Config struct. The reason being that since TOML (as by extensions the toml crate) simply detects the absence of a key-value pair as a None value unlike clap. So, suppose we had pub log_format: LogFormat, clap would read it fine, but unless we had an entry like log_format = INFO in the TOML file, the toml parsing itself would fail since it could not find the log_format key-value pair.

NOTE: This may be possible if we simply read the contents of the TOML file as raw toml crate values (i.e., toml::Value values, and seek out the keys that we expect to be present. However, this would mean that we would have to make the TOML parsing "dynamic" instead of directly deserialising into the Config struct ("static"). thereby losing a degree of static type-checking. I can explore this option.

  - Changed the toml config logic to use the common `Config` struct.
  - Added better validation and error-handling for TOML.
@timmyjose
Copy link
Contributor Author

I made a few changes in my local branch:

  • changed the toml config module to use the common Config struct, and
  • added better TOML validation and schema.

However, from the usage perspective, if we make changes to the "core" (i.e., current) set of configurations, we will still need to make code changes in the toml config as well, but the changes to the Config struct will be common.

Ideally, it would have been nice to be able to extend clap's (derive mode) from:

    /// Log level
    #[arg(value_enum, long, env = "LOG_LEVEL", default_value = "INFO")]
    pub log_level: LogLevel,

to something like:

    /// Log level
    #[arg(value_enum, long, env = "LOG_LEVEL", toml = "log_level", default_value = "INFO")] // new `toml` attribute
    pub log_level: LogLevel,

but unfortunately clap does not, from what I could gather, provide any means of extending the arg attribute. There were discussions around adding support for it, but nothing came of it in the end.

@calummoore
Copy link
Contributor

calummoore commented Jun 19, 2023

Yeah, we could write our own directive macro that runs on top, something like this:

 #[toml(path = "log_level", default_value="INFO")]
 #[arg(value_enum, long, env = "LOG_LEVEL", default_value = "INFO")] // new `toml` attribute
 pub log_level: LogLevel,

Or another option could be to implement our own directive macro, that converts the struct into some code that generates clap code + toml code in one:

 #[our_macro(value_enum, long, env = "LOG_LEVEL", toml = "log_level", default_value = "INFO")] 
 pub log_level: LogLevel,

You might be able to get chat-gpt to help you with this (you have to be persistent with it, and keep asking for more detail):

https://sharegpt.com/c/6yFeUnt
https://sharegpt.com/c/1M4Rzeu

This might be quite complex to code though, so maybe do some initial investigations and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants