Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.wait_for_idle says OK while charm is installing agent #1204

Open
dimaqq opened this issue Nov 22, 2024 · 4 comments
Open

.wait_for_idle says OK while charm is installing agent #1204

dimaqq opened this issue Nov 22, 2024 · 4 comments
Labels
kind/bug indicates a bug in the project

Comments

@dimaqq
Copy link
Contributor

dimaqq commented Nov 22, 2024

Description

Orfeas Kourkakis

[wait_for_idle() passes while apps are in waiting] As the title says. We also tried to hardcode all applications names in the wait_for_idle() call and that resulted in (missing) logs which timed out. Note that this happens intermittently (but really often). We have tracked some debugging attempts in this issue. Any clues on what could be causing this? (juju 3.4.6, pylibjuju 3.5.2.0)

running juju.status with sh package from inside the python file showed that while the test succeeded, the charms were still in waiting

this is a simple test that deployes 14 charms and then we have this call

    await ops_test.model.wait_for_idle(
        status="active",
        raise_on_blocked=False,  # These apps block while waiting for each other to deploy/relate
        raise_on_error=True,
        timeout=3600,
        idle_period=30,
    )

Nicolas Vinuesa

just to be clear, the tests are passing even though the charms are not in active status yet?

I didn't understand if the problem is that the charms are in waiting status or if the test passes

Daniela Plascencia

the charms are in waiting status and the test case that calls wait_for_idle passes because apparently, wait for idle says "charms are active and idle", but in reality they are not ^

so, after calling ops_test.model.wait_for_idle(), charms are still in waiting status, actually with a message of "installing agent"

Nicolas Vinuesa

maybe it's an issue in pylibjuju?
ping
Dima Tisnek

Dima Tisnek

Let me create a python-libjuju issue for that.

Urgency

Annoying bug in our test suite

Python-libjuju version

3.5.2.0

Juju version

3.4.6

Reproduce / Test

# deploy 14 apps, then

    await ops_test.model.wait_for_idle(
        status="active",
        raise_on_blocked=False,  # These apps block while waiting for each other to deploy/relate
        raise_on_error=True,
        timeout=3600,
        idle_period=30,
    )
@dimaqq
Copy link
Contributor Author

dimaqq commented Nov 22, 2024

Downstream: canonical/kfp-operators#601

@dimaqq
Copy link
Contributor Author

dimaqq commented Nov 22, 2024

One possible cause is that Juju defines really many status values:

https://github.com/juju/juju/blob/3.6/core/status/status.go

While this library only processes these few:

severities = {
"error": 100,
"blocked": 90,
"waiting": 80,
"maintenance": 70,
"active": 60,
"terminated": 50,
"unknown": 40,
}

Which means that the calculation of "worst status across units" doesn't effeectively take "installing" into account.

It's a supposition for now.

@orfeas-k
Copy link

Which means that the calculation of "worst status across units" doesn't effeectively take "installing" into account.

Maybe I misunderstand something but when wait_for_idle says OK, the charms are in waiting as mentioned in canonical/kfp-operators#601, for example:

kfp-ui                            waiting    0/1  kfp-ui                                   0                  no       installing agent
...
kfp-ui/0                   waiting      allocating                      installing agent

@dimaqq
Copy link
Contributor Author

dimaqq commented Nov 26, 2024

It's possible that there's an outright bug.

It's also possible that juju cli shows computed status, while the jrpc api reports raw status "unset" in a case like this. After all, the workload cannot possibly report its status before the agent is installed.

(Edit: This library tries to offer computed status, which I suspect was intended to be on par with juju cli. The library is clearly behind times in many aspects though.)

Note that Charm Tech is working on a new library for integration/e2e testing that will wrap the juju cli in a nice IDE-friendly typed API. Your solution to shell out is probably the best way forward. Ping @benhoyt to be on the list of early adopters when we're ready for that.

This bug will surely get looked at, I just can't be sure when.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug indicates a bug in the project
Projects
None yet
Development

No branches or pull requests

2 participants