Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot override env var when creating model #180

Open
chbndrhnns opened this issue Oct 30, 2023 · 10 comments
Open

Cannot override env var when creating model #180

chbndrhnns opened this issue Oct 30, 2023 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@chbndrhnns
Copy link

I expect a field value given while creating a model instance takes precedence over an existing environment variable.

It seems this is not possible here:

from pydantic import AnyHttpUrl, Field
from pydantic_settings import BaseSettings, SettingsConfigDict


class Settings(BaseSettings):
    abc: AnyHttpUrl = Field(validation_alias="my_abc")

    model_config = SettingsConfigDict(populate_by_name=True, extra="allow")


def test_(monkeypatch):
    monkeypatch.setenv("MY_ABC", "http://localhost.com/")

    assert (
        str(Settings(abc="http://prod.localhost.com/").abc)
        == "http://prod.localhost.com/"
    )

One workaround is to use the validation alias when creating the instance:

from pydantic import AnyHttpUrl, Field
from pydantic_settings import BaseSettings, SettingsConfigDict


class Settings(BaseSettings):
    abc: AnyHttpUrl = Field(validation_alias="my_abc")

    model_config = SettingsConfigDict(populate_by_name=True, extra="allow")


def test_(monkeypatch):
    monkeypatch.setenv("MY_ABC", "http://localhost.com/")

    assert (
        str(Settings(my_abc="http://prod.localhost.com/").abc)
        == "http://prod.localhost.com/"
    )
@hramezani
Copy link
Member

I expect a field value given while creating a model instance takes precedence over an existing environment variable.

This is right.

It seems this is not possible here:

It's because you initialize the field with its name. you have to use the validation_alias defined.

So, this example works:

from pydantic import AnyHttpUrl, Field
from pydantic_settings import BaseSettings


class Settings(BaseSettings):
    abc: AnyHttpUrl = Field(validation_alias="my_abc")



def test_(monkeypatch):
    monkeypatch.setenv("MY_ABC", "http://localhost.com/")

    assert (
        str(Settings(my_abc="http://prod.localhost.com/").abc)
        == "http://prod.localhost.com/"
    )

@chbndrhnns
Copy link
Author

Thanks for the reply!

If I understand you correctly, there is currently no way to instruct a model to look for a specific env var name but take the declared field name when working just in Python?

Looking at it together with some other features, this becomes even more strange to me, e.g. If I want to force upper case env vars, I need to rename the fields to uppercase and then violate pep8 aka use uppercase field names for class members?
That feels like we are leaking some OS-level implementation detail (env vars are typically upper case) right into Python.

@hramezani
Copy link
Member

Here is the data that comes to Pydantic
{'my_abc': 'http://localhost.com/', 'abc': 'http://prod.localhost.com/'}

and Pydantic picks the my_abc because of validation_alias. Actually, it is not on pydantic-settings side.

As I mentioned above you can initialize Settings class by alias like Settings(my_abc="http://prod.localhost.com/") then the data that comes to Pydantic would be {'my_abc': 'http://prod.localhost.com/'}.

@chbndrhnns
Copy link
Author

And how come that populate_by_name does not work in this case?

@hramezani
Copy link
Member

It does work. but Pydantic first collects value by validation_alias and then goes with the name of field if populate_by_name is enabled.

So, in your case, if the validation_alias doesn't present in the data(MY_ABC env variable), then pydantic picks the value by name.

@chbndrhnns
Copy link
Author

I would argue that when I explicitly pass a value for a field name, pydantic-settings should rather not look for environment variables at all. Because the current logic is:

  • provide pydantic with {"abc": "myval"}
  • if an env var MY_ABC exists:
    • provide pydantic with {"my_abc": "val"}
    • ignore any incoming field with the name abc

It's an unexpected side effect for me that an env var changes the parsing logic. Production code potentially needs to change based on some outside condition.

Are you open to discuss a change? I believe it can be local to pydantic-settings.
It would be like:

  • if a value is provided during instantiation for either alias or field name, ignore any env var.

@hramezani
Copy link
Member

hramezani commented Nov 2, 2023

Right now pydantic-settings has source classes like env source, dotenv source and init source is also one of the sources classes that does a simple thing and return all the initial params passed to class.

return self.init_kwargs

Source classes don't have any idea about each other and they just collect and return the data related to them and at the end pydantic decides about the data that has to be assigned to the field.

We have the source priority concept that you can change the priority of sources over each other. but in your case, you have multiple data with different names.

But this is a special case, honestly, I don't want to add source-specific logic to pydantic-settings main. So, the best way to implement is somehow pass the init data to other sources and let them decide to ignore collecting data based on the init data(which seems not good to me).

when looking at your setting class

class Settings(BaseSettings):
    abc: AnyHttpUrl = Field(validation_alias="my_abc")

    model_config = SettingsConfigDict(populate_by_name=True, extra="allow")

You defined a validation_alias which means that I want to collect data for my field with my_abc key. Also, by enabling the validation_alias config, you defined I want to be able to collect model fields by name as well. then you provided two values for the same field. IMO, pydantic-settings is responsible for collecting values from different sources and passing them to Pydantic.

I think the simple and correct way to do it is by passing my_abc in your setting class as I mentioned in my earlier comment.

BTW, I am not going to say I don't want to change it. just trying to think louder.
@samuelcolvin what is your opinion here?

@chbndrhnns
Copy link
Author

I am mostly arguing from a user perspective, knowing only a little about pydantic internals.

IMO, a strong argument is that env var names are currently coupled to field names (at least in my example). If I modify an env name, I need to update places that do not deal with env vars at all.

The documentation does not reflect this coupling well. It says:

I'll wait for other's opinions now.

@hramezani
Copy link
Member

@chbndrhnns FYI, after discussion with the Pydantic team we decided to do something clever in pydantic-settings.

So, PR welcome.

@matthewgrossman
Copy link

Anyone have a suggestion for a workaround in this issue? I have a similar issue:

class Settings(BaseSettings):
    model_config = SettingsConfigDict(populate_by_name=True, extra="allow")

    otel_endpoint: Optional[str] = Field(
        alias="OTEL_EXPORTER_OTLP_TRACES_ENDPOINT", default=None
    )

When running this via my standard API server, it will initiate via settings = Settings(), and properly pull from the envvar specified in alias. This all works great.

However, there's sometimes instances where I want to manually instantiate settings (tests, scripts, etc), like

settings = Settings(otel_endpoint="http://localhost:4317")

As I understand it in this thread, doing the above right now is not possible; I must instead specify the kwarg as Settings(otel_exporter_otlp_traces_endpoint="http://localhost:4317").


Intuitively, I'd expect that when constructing the settings object manually in python, any kwargs I pass should:

  1. Match the actual variable on the BaseSettings itself; in this case the otel_endpoint, not the alias=
  2. Override any implicit envvar scraping

Few other thoughts:

  1. This functionality is exactly what I'd expect populate_by_name=True to do, so if we're open to special-casing BaseSettings, maybe that's the place to do it? Otherwise, what does that really do for settings classes?
  2. A simple workaround is to do AliasChoices where one of them is name of the var as well:
    otel_endpoint: Optional[str] = Field(
        validation_alias=AliasChoices("otel_endpoint", "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"), default=None
    )

Obviously this isn't particularly pretty, but has unblocked me for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants