pending allocations stuck in pending state after adoption by a new deployment #23305

lattwood · 2024-06-11T20:01:31Z

Nomad version

Nomad v1.7.7
BuildDate 2024-04-16T19:26:43Z
Revision 0f34c85ee63f6472bd2db1e2487611f4b176c70c

Operating system and Environment details

All Linux

Issue

We have a large Nomad cluster- > 2k agents.
We run health checks on each agent and do dynamic metadata updates.
We have a large job that gets scheduled on 99% of the healthy agents.
We use a shell script to query nomad for node counts matching a filter on relevant dynamic metadata & run nomad job scale JOB TG COUNT when it changes.
We have a way to identify if the currently running deployment is due to a new Docker Image, or due to a scaling activity.
If it's due to a scaling activity on another task group, we cancel that deployment, and issue a new nomad job scale command immediately afterwards.
- Due to how frequently the dynamic metadata is changing, we are constantly doing deployments.
- We kicked off 250 deployments in the last hour.
Existing allocations usually all get adopted by the new deployment.
The issues:
- most of the time, but not always, the version on the existing allocs that are still "Pending" at deployment cancelation are updated. but we end up with allocations that don't get the version updated.
- sometimes the allocation exists, but isn't assigned to a client in the UI. if you inspect the alloc with the Nomad CLI, it tells you which client it has been assigned to.
- allocations get stuck in the pending state. we've found sending a SIGQUIT to the Nomad Agent fixes this, but that doesn't scale to > 2k nodes.
- allocations will get rescheduled before they ever leave the pending state. YEAH.
- possibly related, but might not be- if a node heartbeat is missed and reregistered, we can end up with stuck pending jobs on a node as well. This is also fixed with a SIGQUIT.
  - The stuck jobs aspect is brutal, since that doesn't start the health/deployment timers as far as I can tell.
  - The fact restarting the nomad agent is a little unnerving. I vaguely recall seeing something in the logs about "allocations out of date" or something like that when this issue has manifested itself in the past.

Reproduction steps

run a big cluster of nodes with intermittent connectivity issues to the master.
run a big long running job on said cluster
constantly be deploying/scaling the jobspec
wait

Expected Result

nomad schedules and runs jobs

Actual Result

nomad doesn't

Screenshots of Absurdity

In this screenshot, all the allocs are part of the same job, but different task groups. The colours correspond to the actual task groups. Note the varying versions, and some having client and some not.

Additionally, you can see the Modified time is the same for ones staying up to date, but isn't changing on other ones- you can also see the Created times are all over the place.

In this screenshot, you can see we have a pending allocation that has been rescheduled, and that rescheduled allocation is marked pending as well. And neither of the allocations have been assigned to a client as far as the Nomad WebUI is concerned.

Reporter's speculation

Maybe it has something to do w/ how the allocations are being adopted by the rapid deployments? This definitely reeks of race condition.

Nomad logs

Useless, unfortunately, due to this bug: #22431

The text was updated successfully, but these errors were encountered:

lattwood · 2024-06-11T20:47:02Z

in trying to kick the broken allocs, I've managed to get Nomad to try sending an interrupt to a pending allocation!

edit: had to killall -9 nomad on ca11.

lattwood · 2024-06-11T21:12:34Z

I figured out how to make it worse! If I drain the node and mark it as ineligible, then re-enable eligibility, all of the system jobs end up with additional allocations. The only solution here is restarting/killing the Nomad agent.

lattwood · 2024-06-11T21:19:12Z

Just adding for additional flavour/I found this hilarious-

lattwood · 2024-06-12T13:59:43Z

Pending alloc example with logs from the Nomad Agent. Times are all accurate to each other.

root@HOSTNAME:~# TZ=America/Halifax journalctl --unit nomad --since '14 hours ago' | grep -v '\(runner\)'
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"received EOF, stopping recv loop","@module":"client.driver_mgr.docker.docker_logger.stdio","@timestamp":"2024-06-12T02:28:35.213106Z","driver":"docker","err":"rpc error: code = Unavailable desc = error reading from server: EOF"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"info","@message":"plugin process exited","@module":"client.driver_mgr.docker.docker_logger","@timestamp":"2024-06-12T02:28:35.224019Z","driver":"docker","id":"2362096","plugin":"/usr/local/bin/nomad"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"plugin exited","@module":"client.driver_mgr.docker.docker_logger","@timestamp":"2024-06-12T02:28:35.225535Z","driver":"docker"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"image id reference count decremented","@module":"client.driver_mgr.docker","@timestamp":"2024-06-12T02:28:35.430987Z","driver":"docker","image_id":"sha256:31d1924d3c7e119a450ce9d0d2fbd417a1aba168781625acbc7bf77b398cee33","references":0}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"(watcher) stopping all views","@module":"agent","@timestamp":"2024-06-12T02:28:35.472850Z"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"info","@message":"stopped container","@module":"client.driver_mgr.docker","@timestamp":"2024-06-12T02:28:35.610787Z","container_id":"02ae65871d36198b058a7b98387c5b40bc857e7dfc8eb6bd6de60407e6f28305","driver":"docker"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"received EOF, stopping recv loop","@module":"client.driver_mgr.docker.docker_logger.stdio","@timestamp":"2024-06-12T02:28:35.626094Z","driver":"docker","err":"rpc error: code = Unavailable desc = error reading from server: EOF"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"error","@message":"error encountered while scanning stdout","@module":"client.driver_mgr.docker.docker_logger","@timestamp":"2024-06-12T02:28:35.631515Z","driver":"docker","error":"read |0: file already closed"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"info","@message":"plugin process exited","@module":"client.driver_mgr.docker.docker_logger","@timestamp":"2024-06-12T02:28:35.631600Z","driver":"docker","id":"2361837","plugin":"/usr/local/bin/nomad"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"plugin exited","@module":"client.driver_mgr.docker.docker_logger","@timestamp":"2024-06-12T02:28:35.632337Z","driver":"docker"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"image id reference count decremented","@module":"client.driver_mgr.docker","@timestamp":"2024-06-12T02:28:35.654011Z","driver":"docker","image_id":"sha256:6331aa0368327f0829316ca10b45844eb7303caf5f63cc979063df0ef7ffca01","references":0}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"info","@message":"marking allocation for GC","@module":"client.gc","@timestamp":"2024-06-12T02:28:35.669645Z","alloc_id":"221a2ab1-43ea-22bb-1616-24d9ce047609"}
Jun 11 23:31:40 HOSTNAME nomad[3215]: {"@level":"debug","@message":"cleanup removed downloaded image","@module":"client.driver_mgr.docker","@timestamp":"2024-06-12T02:31:40.406819Z","driver":"docker","image_id":"sha256:31d1924d3c7e119a450ce9d0d2fbd417a1aba168781625acbc7bf77b398cee33"}
Jun 11 23:31:41 HOSTNAME nomad[3215]: {"@level":"debug","@message":"cleanup removed downloaded image","@module":"client.driver_mgr.docker","@timestamp":"2024-06-12T02:31:41.201500Z","driver":"docker","image_id":"sha256:6331aa0368327f0829316ca10b45844eb7303caf5f63cc979063df0ef7ffca01"}

From what I can tell, all the logs from ~23:38:35 are due to the failed allocation, and there is nothing in the agent's logs about the pending allocation.

Generally speaking, forcibly killing Nomad (we have draining enabled) tends to fix these issues, but these state inconsistency bugs are terrifying.

jrasell · 2024-06-12T14:13:11Z

Hi @lattwood and thanks for raising this issue with the detail included. I'll add this to our backlog to investigate further. In the meantime, if you discover any other information or concrete reproduction, please let us know.

lattwood · 2024-06-12T15:32:45Z

More log spelunking- after this shows up in the logs on an agent/client, no further alloc updates occur, and the drain issue with the pending allocs also starts occuring too.

Jun 11 14:50:29 HOSTNAME nomad[2835]: {"@level":"debug","@message":"allocation updates applied","@module":"client","@timestamp":"2024-06-11T14:50:29.411617Z","added":0,"errors":0,"ignored":4,"removed":0,"updated":0}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"yamux: keepalive failed: i/o deadline reached","@module":"client","@timestamp":"2024-06-11T14:50:50.303589Z"}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"yamux: Failed to read stream data: read tcp [IPv6_ADDRESS_1]:42188-\u003e[IPv6_ADDRESS_2]:4647: use of closed network connection","@module":"client","@timestamp":"2024-06-11T14:50:50.303834Z"}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"error performing RPC to server","@module":"client.rpc","@timestamp":"2024-06-11T14:50:50.304920Z","error":"rpc error: EOF","rpc":"Node.UpdateAlloc","server":{"IP":"IPv6_ADDRESS_2","Port":4647,"Zone":""}}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"error performing RPC to server which is not safe to automatically retry","@module":"client.rpc","@timestamp":"2024-06-11T14:50:50.305307Z","error":"rpc error: EOF","rpc":"Node.UpdateAlloc","server":{"IP":"IPv6_ADDRESS_2","Port":4647,"Zone":""}}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"error updating allocations","@module":"client","@timestamp":"2024-06-11T14:50:50.305578Z","error":"rpc error: EOF"}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"error performing RPC to server","@module":"client.rpc","@timestamp":"2024-06-11T14:50:50.305401Z","error":"rpc error: EOF","rpc":"Alloc.GetAllocs","server":{"IP":"IPv6_ADDRESS_2","Port":4647,"Zone":""}}

lattwood · 2024-06-12T16:07:17Z

grabbed a goroutine stack dump and found a clue.

the same node is blocked here, and was for 1489 minutes, which ends up being just after the "error performing RPC to server" stuff in the above comment: https://github.com/hashicorp/yamux/blob/6c5a7317d6e3b6e7f85db696d6c83ed353e7cb4c/stream.go#L145-L153

Maybe timeouts aren't getting set on retries? idk.

edit: the deserialization methods for a jobspec are also evident in the stack trace.

Jun 12 15:40:35 HOSTNAME nomad[2835]: goroutine 254 gp=0xc0006888c0 m=nil [select, 1489 minutes]:
Jun 12 15:40:35 HOSTNAME nomad[2835]: runtime.gopark(0xc0024a8d00?, 0x2?, 0x25?, 0xa9?, 0xc0024a8c94?)
Jun 12 15:40:35 HOSTNAME nomad[2835]:         runtime/proc.go:402 +0xce fp=0xc0024a8b38 sp=0xc0024a8b18 pc=0x44630e
Jun 12 15:40:35 HOSTNAME nomad[2835]: runtime.selectgo(0xc0024a8d00, 0xc0024a8c90, 0xc003ed46f0?, 0x0, 0xc0024a8cd8?, 0x1)
Jun 12 15:40:35 HOSTNAME nomad[2835]:         runtime/select.go:327 +0x725 fp=0xc0024a8c58 sp=0xc0024a8b38 pc=0x458385
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/yamux.(*Stream).Read(0xc0032fa000, {0xc002cc1000, 0x1000, 0xc001cc0b00?})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/[email protected]/stream.go:145 +0x40e fp=0xc0024a8d50 sp=0xc0024a8c58 pc=0xdb2ace
Jun 12 15:40:35 HOSTNAME nomad[2835]: bufio.(*Reader).Read(0xc002081e00, {0xc001cc0b00, 0xb, 0xdf9db0?})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         bufio/bufio.go:241 +0x197 fp=0xc0024a8d88 sp=0xc0024a8d50 pc=0x6ca857
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/go-msgpack/codec.decReadFull({0x3970800, 0xc002081e00}, {0xc001cc0b00, 0xb, 0x40})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/[email protected]/codec/decode.go:3086 +0xa3 fp=0xc0024a8dd8 sp=0xc0024a8d88 pc=0xdfd303
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/go-msgpack/codec.(*ioDecReader).readb(0xc0035bde00, {0xc001cc0b00, 0xb, 0x7ed07119d5b8?})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/[email protected]/codec/decode.go:422 +0x4d fp=0xc0024a8e28 sp=0xc0024a8dd8 pc=0xdf1aed
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/go-msgpack/codec.(*decReaderSwitch).readbIO(0xc002081e00?, {0xc001cc0b00?, 0x0?, 0x17?})
...
elided for brevity
...
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).rpc(0xc00091f888, {0x3112c54, 0xf}, {0x2f9c6c0, 0xc000485d90}, {0x2c6b860, 0xc0008f1860})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/rpc.go:104 +0x23e fp=0xc0024a9988 sp=0xc0024a9818 pc=0x1dcdb9e
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).RPC(...)
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/rpc.go:61
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).rpc(0xc00091f888, {0x3112c54, 0xf}, {0x2f9c6c0, 0xc000485d90}, {0x2c6b860, 0xc0008f1860})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/rpc.go:154 +0xb2d fp=0xc0024a9af8 sp=0xc0024a9988 pc=0x1dce48d
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).RPC(...)
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/rpc.go:61
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).watchAllocations(0xc00091f888, 0xc0008f1740)
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/client.go:2395 +0x96e fp=0xc0024a9fc0 sp=0xc0024a9af8 pc=0x1dacf6e
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).run.gowrap1()
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/client.go:1850 +0x25 fp=0xc0024a9fe0 sp=0xc0024a9fc0 pc=0x1da9d65
Jun 12 15:40:35 HOSTNAME nomad[2835]: runtime.goexit({})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0024a9fe8 sp=0xc0024a9fe0 pc=0x47ebe1
Jun 12 15:40:35 HOSTNAME nomad[2835]: created by github.com/hashicorp/nomad/client.(*Client).run in goroutine 247
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/client.go:1850 +0x86

lattwood · 2024-06-20T15:39:37Z

Confirmed that hashicorp/yamux#127 fixes the issue.

We'll be running a forked Nomad build until there's a release of Nomad with the yamux fix in it.

edit: idk if I'm out of line for saying this, but the yamux fix probably should be backported everywhere it's in use 😅

schmichael · 2024-08-14T20:33:10Z

Did you have TLS configured for Nomad's RPC?

lattwood · 2024-08-14T20:41:28Z

@schmichael yup. (our nomad agents are distributed around the world)

Specifically to debug hashicorp/nomad#23305 but tests should probably run against multiple net.Conn implementations as yamux is sensitive to net.Conn behaviors such as concurrent Read|Write|Close and what errors are returned.

lattwood · 2024-08-21T14:37:25Z

Just saw another issue opened on yamux this week that could be responsible for what we're seeing here as well- hashicorp/yamux#133

In #20165 we fixed a bug where a partially configured `client.template` retry block would set any unset fields to nil instead of their default values. But this patch introduced a regression in the default values, so we were now defaulting to unlimited retries. Restore the correct behavior and add better test coverage at both the config parsing and template configuration code. Ref: #20165 Ref: #23305 (comment)

In #20165 we fixed a bug where a partially configured `client.template` retry block would set any unset fields to nil instead of their default values. But this patch introduced a regression in the default values, so we were now defaulting to unlimited retries if the retry block was unset. Restore the correct behavior and add better test coverage at both the config parsing and template configuration code. Ref: #20165 Ref: #23305 (comment)

tgross · 2025-02-13T20:54:00Z

Fix for the template config is up here: #25113

In #20165 we fixed a bug where a partially configured `client.template` retry block would set any unset fields to nil instead of their default values. But this patch introduced a regression in the default values, so we were now defaulting to unlimited retries if the retry block was unset. Restore the correct behavior and add better test coverage at both the config parsing and template configuration code. Ref: #20165 Ref: #23305 (comment)

tgross · 2025-02-19T21:23:17Z

Hey @lattwood just by way of follow-up, we're nudging along the folks who need to approve that go-msgpack PR. Once that's done I'll open a PR updating the dependency, and that PR will close this issue.

lattwood · 2025-02-19T22:45:46Z

@tgross thanks for the update, completely understand

lattwood · 2025-02-21T15:43:02Z

~~@tgross on the consul template thing, we rolled all our nomad agents with the new vault_retry block, as well...~~

/etc/nomad# ls -l | grep nomad.hcl
-rw-r--r-- 9 root root 2852 Feb 21 15:24 nomad.hcl
/etc/nomad# ps aux | grep --color=auto 'nomad agent' | grep -v grep
root     1914893  0.6  2.1 2045348 86600 ?       Ssl  15:30   0:00 /usr/local/bin/nomad agent -config /etc/nomad/nomad.hcl
/etc/nomad# grep -A4 vault_retry nomad.hcl
    vault_retry {
      attempts = 123
      backoff = "456ms"
      max_backoff = "23s"
    }
/etc/nomad# journalctl --unit nomad --since '5 minutes ago' | grep 'final config' | sed 's/.*: {"@/{"@/g' | tail -n 1 | jq -r '.["@message"]' | sed 's/(runner) final config: //g' | jq .Vault.Retry
{
  "Attempts": 0,
  "Backoff": 250000000,
  "MaxBackoff": 60000000000,
  "Enabled": true
}

~~This is on 1.9.3 with the patch for the newer msgpack.~~

~~Going to hook-up delve to the agent to see if I can inspect vault config in memory etc~~

So, turns out we put them in client, not client.template.

Would be really nice if Nomad loudly complained about stuff like that 😅

tgross · 2025-02-24T13:47:25Z

Accidentally closed.

It turns out when `net/rpc` calls `WriteResponse`, it [**doesn't do anything with the error**](https://github.com/golang/go/blob/cb47156e90cac6b1030d747f1ccd433c00494dfc/src/net/rpc/server.go#L359-L362)! (it does correctly propagate errors when `WriteRequest` is called though) This means this codec needs to handle connection disposal, because there's no way for someone using `net/rpc` to find out about the error and handle it- we're in the twilight zone. Additionally, because `net/rpc` is frozen upstream, there is a lack of desire to fix even the logging aspect, let alone the underlying defect: golang/go#71559 The `WriteResponse` and `WriteRequest` methods both use the same underlying `write` method, so this fix took three commits- * adding the first test coverage to this package, to prove de-DRYing doesn't break anything * de-DRYing by inlining `write` into `WriteResponse` and `WriteRequest` * the actual fix, along with a test for it This puts an end to two years of "spooky behaviour at a distance" on our large and geographically distributed Nomad cluster. Fixes: hashicorp/nomad#23305

Fixes a bug where connections would not be closed on write errors in the msgpack encoder, which would cause the reader end of RPC connections to hang indefinitely. This resulted in clients in widely-distributed geographies being unable to poll for allocation updates. Fixes: #23305

tgross · 2025-02-24T18:27:27Z

Closed by #25201

Fixes a bug where connections would not be closed on write errors in the msgpack encoder, which would cause the reader end of RPC connections to hang indefinitely. This resulted in clients in widely-distributed geographies being unable to poll for allocation updates. Fixes: #23305

lattwood · 2025-02-24T19:51:49Z

lattwood added the type/bug label Jun 11, 2024

jrasell added theme/client stage/needs-investigation labels Jun 12, 2024

lattwood mentioned this issue Jun 13, 2024

Streams should check for Session shutdown when waiting for data & clean up timers hashicorp/yamux#127

Merged

This comment has been minimized.

Sign in to view

david-yu added the hcc/jira label Jun 13, 2024

This comment has been minimized.

Sign in to view

tgross added this to Nomad - Community Issues Triage Jun 24, 2024

tgross moved this to Needs Roadmapping in Nomad - Community Issues Triage Jun 24, 2024

schmichael mentioned this issue Aug 14, 2024

Allow half-closed reads and test against TCP/TLS connections hashicorp/yamux#131

Merged

schmichael closed this as completed in hashicorp/yamux#127 Sep 5, 2024

github-project-automation bot moved this from Needs Roadmapping to Done in Nomad - Community Issues Triage Sep 5, 2024

tgross mentioned this issue Feb 13, 2025

template: fix client's default retry configuration #25113

Merged

6 tasks

hc-github-team-nomad-core mentioned this issue Feb 14, 2025

Backport of template: fix client's default retry configuration into release/1.9.x #25117

Merged

6 tasks

schmichael mentioned this issue Feb 20, 2025

Update CODEOWNERS hashicorp/net-rpc-msgpackrpc#17

Merged

mukeshjc closed this as completed in hashicorp/net-rpc-msgpackrpc#17 Feb 24, 2025

mukeshjc closed this as completed in hashicorp/net-rpc-msgpackrpc@0709571 Feb 24, 2025

github-project-automation bot moved this from In Progress to Done in Nomad - Community Issues Triage Feb 24, 2025

tgross reopened this Feb 24, 2025

github-project-automation bot moved this from Done to Needs Triage in Nomad - Community Issues Triage Feb 24, 2025

tgross moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Feb 24, 2025

tgross closed this as completed in hashicorp/net-rpc-msgpackrpc#15 Feb 24, 2025

github-project-automation bot moved this from In Progress to Done in Nomad - Community Issues Triage Feb 24, 2025

tgross mentioned this issue Feb 24, 2025

deps: update go-msgpack and net-rpc-msgpackrpc #25201

Merged

tgross added this to the 1.9.x milestone Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pending allocations stuck in pending state after adoption by a new deployment #23305

pending allocations stuck in pending state after adoption by a new deployment #23305

lattwood commented Jun 11, 2024 •

edited

Loading

lattwood commented Jun 11, 2024 •

edited

Loading

lattwood commented Jun 11, 2024

lattwood commented Jun 11, 2024

lattwood commented Jun 12, 2024 •

edited

Loading

jrasell commented Jun 12, 2024

lattwood commented Jun 12, 2024

lattwood commented Jun 12, 2024 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

lattwood commented Jun 20, 2024 •

edited

Loading

schmichael commented Aug 14, 2024

lattwood commented Aug 14, 2024

lattwood commented Aug 21, 2024

tgross commented Feb 13, 2025

tgross commented Feb 19, 2025

lattwood commented Feb 19, 2025

lattwood commented Feb 21, 2025 •

edited

Loading

tgross commented Feb 24, 2025

tgross commented Feb 24, 2025

lattwood commented Feb 24, 2025

pending allocations stuck in pending state after adoption by a new deployment #23305

pending allocations stuck in pending state after adoption by a new deployment #23305

Comments

lattwood commented Jun 11, 2024 • edited Loading

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

Screenshots of Absurdity

Reporter's speculation

Nomad logs

lattwood commented Jun 11, 2024 • edited Loading

lattwood commented Jun 11, 2024

lattwood commented Jun 11, 2024

lattwood commented Jun 12, 2024 • edited Loading

jrasell commented Jun 12, 2024

lattwood commented Jun 12, 2024

lattwood commented Jun 12, 2024 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

lattwood commented Jun 20, 2024 • edited Loading

schmichael commented Aug 14, 2024

lattwood commented Aug 14, 2024

lattwood commented Aug 21, 2024

tgross commented Feb 13, 2025

tgross commented Feb 19, 2025

lattwood commented Feb 19, 2025

lattwood commented Feb 21, 2025 • edited Loading

tgross commented Feb 24, 2025

tgross commented Feb 24, 2025

lattwood commented Feb 24, 2025

lattwood commented Jun 11, 2024 •

edited

Loading

lattwood commented Jun 11, 2024 •

edited

Loading

lattwood commented Jun 12, 2024 •

edited

Loading

lattwood commented Jun 12, 2024 •

edited

Loading

lattwood commented Jun 20, 2024 •

edited

Loading

lattwood commented Feb 21, 2025 •

edited

Loading