Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPEs are sometimes not populated - solvable changed to 0 #96

Closed
Affie opened this issue Apr 6, 2022 · 10 comments · Fixed by #221
Closed

PPEs are sometimes not populated - solvable changed to 0 #96

Affie opened this issue Apr 6, 2022 · 10 comments · Fixed by #221
Assignees
Labels
bug Something isn't working downstream
Milestone

Comments

@Affie
Copy link
Member

Affie commented Apr 6, 2022

I have a graph with one of the variables not producing a PPE and the result is skewed.

vars[5] = Dict{String, Any}(
"label" => "x1", 
"ppes" => Any[], 
"variableType" => "RoME.Pose2", 
"tags" => Any["VARIABLE"], 
"timestamp" => "2022-04-04T08:28:01.159Z", 
"_version" => "0.18.1")

image

My theory is: solvable is changed to 0 and therefore it's not included in the solve, hence no ppe.

Reference sessions:
robotId = "SDKjl_f97d"
sessionId = "Tutorial2_b376"
and
robotId = SDKjl_cdfe
sessionId = Tutorial1_5bfa

@jim-hill-r
Copy link
Contributor

I have seen this once before. @dehann if a solve crashed mid-way would it be possible solve was set to zero and never reverted? This is my first guess at the cause.

@dehann
Copy link
Member

dehann commented Apr 7, 2022

if the solve crashed then all bets are off. We currently have no safety net for this. I think we should do two fixes for these cases:

  • Status to user the solve failed and a new one is restarting -- results might be skewed message,
  • Something in the solver to overcome potential skewness -- this is a little tricky, as the easiest is to always start the solve from an init backup solveKey. But if it crashed, it may well just crash again at the same place.

(not for today) This will naturally lead to the need for the cloud system to compile a crash report with the session name and the call stack error message.

@GearsAD
Copy link
Contributor

GearsAD commented Apr 7, 2022

Do we set solvable=0 inside the solver during solving though? If so, why do we do this?

Was interesting, because I can't think of a reason we would do that

@jim-hill-r
Copy link
Contributor

Do we set solvable=0 inside the solver during solving though? If so, why do we do this?

Was interesting, because I can't think of a reason we would do that

I have no idea if we are. It was just a theory and likely a red herring, hence the question.

if the solve crashed then all bets are off.

Solvers writing to the production db need to be transactional/atomic in the cloud. Let's do an engineering session on how to make that possible.

But back to the actual issue at hand. Does the solver write solvable=0 at any point or is that a red herring?

@Affie
Copy link
Member Author

Affie commented Apr 7, 2022

But back to the actual issue at hand. Does the solver write solvable=0 at any point or is that a red herring?

I checked and our theory in the discussion looks correct.

If a variable is orphaned, ie with no connected factors, it is marked as solvable=0.

So I think it is as we suspect that a solve is triggered with the factor pushed to the back of the queue.

@Affie
Copy link
Member Author

Affie commented Apr 7, 2022

It is also possible to cause an error from the solver. I'll open an issue with IIF.

@jim-hill-r
Copy link
Contributor

Nice find Johan. So as long as we fail on waitForCompletion, we should be good for ICRA. I think handling this in Caesar would be the long term fix yes? Not sure the fix but something like ignore the node but leave it solvable...

@Affie
Copy link
Member Author

Affie commented Apr 7, 2022

Luckily it wasn't too hard to test, IIF emits a warning:
Warning: solveTree! dissallows solvable variables without any connected solvable factors -- forcing solvable=0 on [:x1]

So as long as we fail on waitForCompletion, we should be good for ICRA.

I think so, yes.

I think handling this in Caesar would be the long term fix yes?

I'll think about it more in the morning. We should also catch islands with no priors on them JuliaRobotics/IncrementalInference.jl#1512.

@jim-hill-r
Copy link
Contributor

If there is an issue on orphans I can move the conversation there, but I like the idea of solvable being, "the user wants this to be included" and maybe a new field called "orphaned" that is the solver engine saying, "I can't use this, sorry, maybe next time". As we think through this, we should consider how these fields might be used for "ghost" variables in the context of out of order factor injection. In the end maybe we end up with three fields? Solvable, Orphaned, and Ghost (ie placeholder).

@dehann
Copy link
Member

dehann commented Apr 10, 2022

maybe a new field called "orphaned" that is the solver engine saying, "I can't use this, sorry, maybe next time"

I'd avoid a dedicated orphan flag and rather use the field i previously developed (not in the new GQL versions yet) which back then was called backendselect. orphaned status is already stored in the graph structure and don't want to duplicate state.

The solve should only touch variables for which the intersect exists, solvable=1 (i.e. what the user wants), and backendset=1 which is atomically flashed to 1 in DB just before solve on all nodes in that solver scope. When we recently started talking about this again, it was a bit more more complicated with the addition of multiple solver on solveKeys. In terms of MVP, we can assume there will be only one solver per session, so the backendset logic will work. When we have two solvers working the graph, more will need to be done, which will pull solveKeys into the conversation.

BTW, being able to solve orphaned graph fragments is a requirement. The issue is if a graph has no priors to gauge/ground the solution. There is no problem with two disconnected graph fragments, each with their own priors. The CSM logic in the tree caters for that case.

Check this out DFG v0.2: https://github.com/JuliaRobotics/DistributedFactorGraphs.jl/blob/9f0834d71e43c9db9ef20adaaa1adc103f76ed29/src/entities/DFGVariable.jl#L93
Also, `ready` later became `solvable`

So as long as we fail on waitForCompletion, we should be good for ICRA.

I think so, yes.

Yes, that would be the best way to work for ICRA.

Do we set solvable=0 inside the solver during solving though? If so, why do we do this?
Was interesting, because I can't think of a reason we would do that
I have no idea if we are. It was just a theory and likely a red herring, hence the question.

Strictly speaking, solvable should only be under the user's control. If the solver is bumping it to 0, then it's a corner case or workaround. However, the solver should NEVER automatically bump solvable=1.

The idea here is the user should be able to peacefully modify the graph with variables and factors, while the solver is working hard on parts of the graph the user has deemed solvable.

if the solve crashed then all bets are off.

Perhaps I should clarify, if there is a corner case where the solver did bump down a solvable=0 (and I'm not immediately remembering if detected priorless graph fragments are bumped), and the solve crashed because there was a priorless graph fragment failed numerically. It is possible (bets off part) that when calling the solve a second time on the graph would then find that some of these bumped nodes are now solvable=0. I'd have to really dig to remember if this is true or not.

However, calling solveGraph on the same graph over and over will either break at the same place, or for the same reason if solvable=0 needs to happen in many places. The idea is that if the user pushed in a bad graph with solvable=1, don't be surprised if the solver is struggling. I agree, good error messages will help. Attempt 1:

Luckily it wasn't too hard to test, IIF emits a warning:
Warning: solveTree! dissallows solvable variables without any connected solvable factors -- forcing solvable=0 on [:x1]

@Affie Affie added this to the v0.6.0 milestone Apr 19, 2023
@Affie Affie linked a pull request Apr 19, 2023 that will close this issue
12 tasks
@Affie Affie added bug Something isn't working downstream labels Apr 19, 2023
@Affie Affie self-assigned this Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working downstream
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants