-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test the NBE v2.0 prototype #48
Comments
Prior to installing v2, I downed the previous version (v1.16.2) with this result:
|
the WARNING is because you installed that NB with peripherals, but then you only "downed" the core components. Either pass the remaining compose files at "down" time, or add the flag The ERROR is a known problem in v1. Sometimes you need to "down" twice because during the shutdown, the system-manager will try to self-heal, without knowing that there is an intended shutdown in progress. v2 solves that |
We need to decide on how to guide users migrating from v1 -> v2. Options include:
Other options? |
Error in adding new ssh key:
|
was this is push or pull mode? |
Push |
Ok found it. this was a bug introduced in the job-engine and job-engine-lite, which has since then been fixed, but it's still in a PR: nuvla/job-engine#188 Since it's a blocking bug, I've applied the fix to master already and released a new version of the job-engine. The fix has been applied to NBE v1.16.2 and v2.0.0, so all jobs in pull mode should work. For jobs in push mode, this fix needs to be deployed in nuvla.io (wait for the next release) |
My NB is reported as offline, but it isn't. Looking at the logs of the
Is it related? |
Update v2.0.0 -> v2.0.0 fails:
See details: job/07891fa5-4b9d-4853-ba1e-97b36042bfe5 The working directory is set to |
Update v2.0.0 -> v2.0.0 timed out
Noticed that the VPN container didn't come back:
|
After the above failure, the |
Supervise the agent. On a shaky network, the agent fails to start if it can't reach Nuvla. |
fixed |
hum, do you still have this NB? what's the ID? |
With the latest agent:
|
I think your OS is bugging again with a similar Python issue as in the past...it's claiming we are using Python 2 but we aren't |
|
that's your host, not the system-manager container |
With
|
Re-launching the command worked. |
fixed then |
|
nuvlabox/89a2923e-8baf-4936-91ea-15d443d01da8 |
working-dir is set to |
This is where I ran the docker-compose command from. The /opt/nuvlabox folder didn't exist and I got an error pointing to the folder not existing:
|
Failed leaving cluster:
|
and as result:
|
Agent died:
and
|
can you reproduce this? |
This is interesting. It is a Docker issue, that can happen. The question is: in what state did your NB get into? I see you've followed up your comment with a DEGRADED state (which is normal when your host changes modes), but is it still DEGRADED? |
also, isn't the system-manager restarting the agent? Can you check both logs? |
Here's the system-manager logs:
The NB is in the same degraded state. |
As for the working directory:
|
can you try to pull and re-deploy the nuvladev/system-manager:master? It would be nice if we could reproduce this issue (leaving the cluster)... |
Attempting to join a cluster fails with this error: Job nuvlabox_cluster_join_worker failed |
This was "hydro-engine-9" joining "Studio KOH NuvlaBox Lionel test". Both are a single node cluster manager. |
Sharing a NB that is a manager, doesn't share the cluster it belongs to. I think it should. |
well this is server-side, and basically just means you didn't "pass" a token for the NB to join. The clustering action wasn't even attempted. This can happen for 1 of 2 possible causes:
|
y I think so too. Please put it as a TODO feature ticket on the api-server |
How to test
use https://swarm.nuvla.io
clone https://github.com.nuvlabox/deployment, and you'll find
.docker-compose.test.yml
file under thetest
folder. Use that one do install the NBE.All v2.0 related development is in
clustering
branches, for the following repos:api-server
,job-engine
,ui
,agent
,system-manager
,deployment
,on-stop
What to test
Please provide your feedback on the following topics:
ability to update. Has it improved? Does it how from v1.x to v2?
deployment of apps into a cluster. Confirm that deployments will only be listed under the managing NB where they were deployed to, but not the actual NBs where they are running. Feel free to propose a solution for this
use of the Data Gateway. Try to use MQTT communication when the NB is a manager, a worker or a standalone node
test the NB actions, both the old ones and also the new cluster operation
trying fiddling with the NBs, physically (reboot it, cut its network, etc.) and see if the cluster recovers by itself
needs We need to decide on how to guide users migrating from v1 -> v2. #54
The text was updated successfully, but these errors were encountered: