Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FCL collision checking crashes during motion #3083

Closed
rr-tom-noble opened this issue Nov 11, 2024 · 17 comments
Closed

FCL collision checking crashes during motion #3083

rr-tom-noble opened this issue Nov 11, 2024 · 17 comments
Labels
bug Something isn't working

Comments

@rr-tom-noble
Copy link
Contributor

rr-tom-noble commented Nov 11, 2024

Description

In some situations (I haven't been able to find a common cause), the move group crashes when executing a motion. Backtracing with gdb suggests that the crash is occuring within the FCL collision checker.

ROS Distro

Humble

OS and version

Ubuntu 22.04 (via docker)

Source or binary build?

Both
(source for moveit_core and pilz_industrial_motion_planner, binaries for everything else)

If binary, which release version?

2.5.5
(for everything besides moveit_core and pilz_industrial_motion_planner)

If source, which branch?

https://github.com/rr-tom-noble/moveit2/tree/bugfix/pilz-fix (humble plus #3077, #3055, #3070)
(only for moveit_core and pilz_industrial_motion_planner)

Which RMW are you using?

FastRTPS

Steps to Reproduce

Unsure, as the error appears to be sporadic. I'll update with details as I investigate.

Expected behavior

Valid motion plans should not crash the move group

Actual behavior

When attempting a motion, the move group (sometimes) crashes. Backtracing with gdb suggests that this is happening within the FCL collision checker.

Backtrace or Console output

Image

@rr-tom-noble rr-tom-noble added the bug Something isn't working label Nov 11, 2024
@TSNoble
Copy link
Contributor

TSNoble commented Nov 11, 2024

Some additional findings:

  • Our robot performs a move on start-up. This motion sometimes crashes and sometimes succeeds.
    • The crash seems to be related to collision objects, so I wonder if the success case occurs when the robot moves before the object have loaded
    • I believe our xacro contains some typos in names when disabling collisions. I wonder if the collision checker is looking up an object that doesn't exist, and dereferencing a nullptr somewhere. I'll fix the typos tomorrow and test.
  • When we manually attempt to move the robot to a pose, the motion consistently crashes.
  • I don't have the logs to hand at the moment, but there was a message indicating the plan succeeded before the crash, so it may be execution that's crashing

@rr-tom-noble
Copy link
Contributor Author

I just checked and the typos in collision disabling have been fixed, so that rules out that as a cause.

I'm fairly confident the issue in in these lines, since the backtrace suggests the crash occurs within std::_Rb_tree::find(), which is the underlying type of std::set. active_components_only_ uses a std::set.

These are the logs leading up to the failure:

[ns_motion_motion_server-30] [INFO] [1731398504.992854780] [move_group_interface]: MoveGroup action client/server ready
[move_group-2] [INFO] [1731398504.993094281] [moveit_move_group_default_capabilities.move_action_capability]: Received request
[move_group-2] [INFO] [1731398504.993187273] [moveit_move_group_default_capabilities.move_action_capability]: executing..
[ns_motion_motion_server-30] [INFO] [1731398504.993182201] [move_group_interface]: Planning request accepted
[move_group-2] [INFO] [1731398505.025682103] [moveit_move_group_default_capabilities.move_action_capability]: Planning request received for MoveGroup action. Forwarding to planning pipeline.
[move_group-2] [INFO] [1731398505.025921357] [moveit_move_group_capabilities_base.move_group_capability]: Using planning pipeline 'pilz_industrial_motion_planner'
[move_group-2] [INFO] [1731398505.025978619] [moveit.pilz_industrial_motion_planner.trajectory_generator_ptp]: Initialized Point-to-Point Trajectory Generator.
[move_group-2] [INFO] [1731398505.025998497] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Generating PTP trajectory...
[move_group-2] [INFO] [1731398505.026005017] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Validating request...
[move_group-2] [INFO] [1731398505.026015528] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Request validated.
[move_group-2] [INFO] [1731398505.026020302] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Validating command specific request...
[move_group-2] [INFO] [1731398505.026024897] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Command specific request validated.
[move_group-2] [INFO] [1731398505.026088138] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Extracting motion plan info...
[move_group-2] [INFO] [1731398505.026095275] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Motion plan info extracted.
[move_group-2] [INFO] [1731398505.026099727] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Planning trajectory...
[move_group-2] [INFO] [1731398505.026106010] [moveit.pilz_industrial_motion_planner.trajectory_generator_ptp]: Goal already reached, set one goal point explicitly.
[move_group-2] [INFO] [1731398505.026113184] [moveit.pilz_industrial_motion_planner.trajectory_generator]: Trajectory planned.
[move_group-2] [INFO] [1731398505.026253898] [moveit_move_group_default_capabilities.move_action_capability]: Motion plan was computed successfully.
[ns_motion_motion_server-30] [INFO] [1731398505.026461550] [move_group_interface]: Planning request complete!
[ns_motion_motion_server-30] [INFO] [1731398505.026839870] [move_group_interface]: time taken to generate plan: 0.000130243 seconds
[move_group-2] [INFO] [1731398505.027109540] [moveit_ros.trajectory_execution_manager]: Received event 'stop'
[move_group-2] [INFO] [1731398505.059315208] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Received goal request
[move_group-2] [INFO] [1731398505.059413687] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution request received
[move_group-2] [INFO] [1731398505.059433556] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.059449191] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.059493330] [moveit_ros.trajectory_execution_manager]: Validating trajectory with allowed_start_tolerance 0.01
[ns_motion_motion_server-30] [INFO] [1731398505.059393960] [move_group_interface]: Execute request accepted
[move_group-2] [INFO] [1731398505.092037996] [moveit_ros.trajectory_execution_manager]: Starting trajectory execution ...
[move_group-2] [INFO] [1731398505.092065627] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.092078071] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.092422814] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: sending trajectory to fake_manipulator_controller
[ros2_control_node-4] [INFO] [1731398505.092516203] [fake_manipulator_controller]: Received new action goal
[ros2_control_node-4] [INFO] [1731398505.092557186] [fake_manipulator_controller]: Accepted new action goal
[move_group-2] [INFO] [1731398505.092727880] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: fake_manipulator_controller started execution
[move_group-2] [INFO] [1731398505.092740935] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Goal request accepted!
[ros2_control_node-4] [INFO] [1731398505.258340515] [fake_manipulator_controller]: Goal reached, success!
[move_group-2] [INFO] [1731398505.292939760] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Controller 'fake_manipulator_controller' successfully finished
[move_group-2] [INFO] [1731398505.392107285] [moveit_ros.trajectory_execution_manager]: Completed trajectory execution with status SUCCEEDED ...
[move_group-2] [INFO] [1731398505.392147886] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution completed: SUCCEEDED
[ns_motion_motion_server-30] [INFO] [1731398505.392399971] [move_group_interface]: Execute request success!
[move_group-2] [INFO] [1731398505.575211349] [moveit_ros.trajectory_execution_manager]: Received event 'stop'
[move_group-2] [INFO] [1731398505.592394086] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Received goal request
[move_group-2] [INFO] [1731398505.592457884] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution request received
[move_group-2] [INFO] [1731398505.592475875] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.592502318] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.592531202] [moveit_ros.trajectory_execution_manager]: Validating trajectory with allowed_start_tolerance 0.01
[ns_motion_motion_server-30] [INFO] [1731398505.592511270] [move_group_interface]: Execute request accepted
[move_group-2] [INFO] [1731398505.625645950] [moveit_ros.trajectory_execution_manager]: Starting trajectory execution ...
[move_group-2] [INFO] [1731398505.625676937] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.625686477] [moveit.plugins.moveit_simple_controller_manager]: Returned 2 controllers in list
[move_group-2] [INFO] [1731398505.626042684] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: sending trajectory to fake_manipulator_controller
[ros2_control_node-4] [INFO] [1731398505.626138961] [fake_manipulator_controller]: Received new action goal
[ros2_control_node-4] [INFO] [1731398505.626179536] [fake_manipulator_controller]: Accepted new action goal
[move_group-2] [INFO] [1731398505.626258316] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: fake_manipulator_controller started execution
[move_group-2] [INFO] [1731398505.626268719] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Goal request accepted!
[ros2_control_node-4] [INFO] [1731398505.691653330] [fake_manipulator_controller]: Goal reached, success!
[move_group-2] [INFO] [1731398505.726385311] [moveit.simple_controller_manager.follow_joint_trajectory_controller_handle]: Controller 'fake_manipulator_controller' successfully finished
[move_group-2] [INFO] [1731398505.825666208] [moveit_ros.trajectory_execution_manager]: Completed trajectory execution with status SUCCEEDED ...
[move_group-2] [INFO] [1731398505.825708487] [moveit_move_group_default_capabilities.execute_trajectory_action_capability]: Execution completed: SUCCEEDED
[ns_motion_motion_server-30] [INFO] [1731398505.825938128] [move_group_interface]: Execute request success!
[ERROR] [move_group-2]: process has died [pid 63, exit code -11, cmd '/opt/ros/humble/lib/moveit_ros_move_group/move_group --ros-args -r __ns:=/move_group --params-file /tmp/launch_params_bnavf_bb --params-file /tmp/launch_params_m7qcuz37 --params-file /tmp/launch_params_42mjwpxx --params-file /tmp/launch_params_1u87_tmt --params-file /tmp/launch_params_c8b4stge --params-file /tmp/launch_params_ebym3ekg --params-file /tmp/launch_params_hai0u7e8 --params-file /tmp/launch_params_ky6e47uj --params-file /tmp/launch_params_67fn9m32 --params-file /tmp/launch_params_3ev073xe'].
[move_group-2] 

@rr-tom-noble
Copy link
Contributor Author

rr-tom-noble commented Nov 12, 2024

Ah actually, our version of moveit builds moveit_core and pilz_industrial_motion_planner from source, with some fixes from recent PRs. The branch is https://github.com/rr-tom-noble/moveit2/tree/bugfix/pilz-fix, which is based on humble plus #3077, #3055, #3070. None of those PRs touch collision checking and I believe FCL runs after the motion has been planned, so I think this represents a new issue we're running into now that those PRs have fixed our usage of pilz and the plan is reaching the collision checking stage

@rr-mark
Copy link

rr-mark commented Nov 12, 2024

Backtrace with line numbers:


Thread 22 "move_group" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f7a7fe00640 (LWP 672)]
0x00007f7aa4b8cd79 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) 
(gdb) bt
#0  0x00007f7aa4b8cd79 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f7aa4271fd7 in std::char_traits<char>::compare (
    __n=<optimized out>, __s2=<optimized out>, 
    __s1=0x6177616b736179 <error: Cannot access memory at address 0x6177616b736179>) at /usr/include/c++/11/bits/char_traits.h:389
#2  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare (this=0x556107bbe0c0, __str="diagonal_wall", this=0x556107bbe0c0, 
    __str="diagonal_wall") at /usr/include/c++/11/bits/basic_string.h:2879
#3  std::operator< <char, std::char_traits<char>, std::allocator<char> > (
    __rhs="diagonal_wall", 
    __lhs=<error: Cannot access memory at address 0x6177616b736179>)
    at /usr/include/c++/11/bits/basic_string.h:6343
#4  std::operator< <std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__y={...}, __x={...})
    at /usr/include/c++/11/bits/stl_pair.h:490
#5  std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::operator() (__y={...}, __x={...}, 
    this=0x7f7a7fdfecf8) at /usr/include/c++/11/bits/stl_function.h:400
#6  std::_Rb_tree<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::pair<std::pair<std::__cxx11::basic_string<c--Type <RET> for more, q to quit, c to continue without paging--
har, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > >, std::_Select1st<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > > >::_M_lower_bound (
    __k={...}, __y=0x7f7a7fdfed00, __x=0x556107bbe0a0, this=0x7f7a7fdfecf8)
    at /usr/include/c++/11/bits/stl_tree.h:1905
#7  std::_Rb_tree<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > >, std::_Select1st<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits--Type <RET> for more, q to quit, c to continue without paging--
<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > > >::find (
    this=this@entry=0x7f7a7fdfecf8, __k={...})
    at /usr/include/c++/11/bits/stl_tree.h:2523
#8  0x00007f7aa426df29 in std::map<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> >, std::less<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, std::allocator<std::pair<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, std::vector<collision_detection::Contact, std::allocator<collision_detection::Contact> > > > >::find (
    __x={...}, this=0x7f7a7fdfecf8) at /usr/include/c++/11/bits/stl_map.h:1170
#9  collision_detection::collisionCallback (o1=<optimized out>, 
    o2=0x7f7a38002000, data=0x7f7a7fdfe960)
--Type <RET> for more, q to quit, c to continue without paging--
    at /root/workspace/src/moveit2/moveit_core/collision_detection_fcl/src/collision_common.cpp:167
#10 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7
#11 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7
#12 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7
#13 0x00007f7aa34a185e in ?? () from /lib/x86_64-linux-gnu/libfcl.so.0.7

#14 0x00007f7aa427e8ef in collision_detection::CollisionEnvFCL::checkSelfCollisionHelper (this=0x7f7a6c0e2b60, req=..., res=..., state=..., acm=0x5561080b1350)
    at /root/workspace/src/moveit2/moveit_core/collision_detection_fcl/src/collision_env_fcl.cpp:269
#15 0x00007f7a9e34902a in move_group::MoveGroupStateValidationService::computeService(std::shared_ptr<rmw_request_id_s> const&, std::shared_ptr<moveit_msgs::srv::GetStateValidity_Request_<std::allocator<void> > > const&, std::shared_ptr<moveit_msgs::srv::GetStateValidity_Response_<std::allocator<void> > > const&) ()
   from /opt/ros/humble/lib/libmoveit_move_group_default_capabilities.so
#16 0x00007f7a9e3455bf in ?? ()
   from /opt/ros/humble/lib/libmoveit_move_group_default_capabilities.so
#17 0x00007f7aa4fcc376 in ?? () from /opt/ros/humble/lib/librclcpp.so
#18 0x00007f7aa4fc9d5a in rclcpp::Executor::execute_service(std::shared_ptr<rclcpp::ServiceBase>) () from /opt/ros/humble/lib/librclcpp.so
#19 0x00007f7aa4fca0c6 in rclcpp::Executor::execute_any_executable(rclcpp::AnyExecutable&) () from /opt/ros/humble/lib/librclcpp.so
#20 0x00007f7aa4fd134a in rclcpp::executors::MultiThreadedExecutor::run(unsigned--Type <RET> for more, q to quit, c to continue without paging--
 long) () from /opt/ros/humble/lib/librclcpp.so
#21 0x00007f7aa4d18253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#22 0x00007f7aa4a87ac3 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#23 0x00007f7aa4b18a04 in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) 

@rr-mark
Copy link

rr-mark commented Nov 12, 2024

Probably not related to the current problem, but line 199 of collision_common.cpp is suspicious:

      const std::pair<std::string, std::string>& pc = cd1->getID() < cd2->getID() ?
                                                          std::make_pair(cd1->getID(), cd2->getID()) :
                                                          std::make_pair(cd2->getID(), cd1->getID());

Why is this storing the output of std::make_pair by reference?

@rhaschke
Copy link
Contributor

@rr-mark, indeed collision_common.cpp:199 is wrong.
However, probably it's unrelated to the observed segfault, which occurs earlier, in collision_common.cpp:167.

The crash is also unrelated to execution: The log clearly shows that execution was successfully finished.
Only later, in MoveGroupStateValidationService, the crash occurs. Do you explicitly call this service?

@rr-mark
Copy link

rr-mark commented Nov 12, 2024

Yes, we call /move_group/check_state_validity for checking pre-planned trajectories are still valid.

@rr-mark
Copy link

rr-mark commented Nov 12, 2024

And yes, you're right, it is during that service call that it crashes.

@rr-mark
Copy link

rr-mark commented Nov 12, 2024

I'm adding some debug prints now to try and work out what's going wrong.

@rr-mark
Copy link

rr-mark commented Nov 12, 2024

It looks like cdata->res_->contacts is being used uninitialised. Printing cdata->res_->contacts.size() gives 97998116997072
(I'm assuming this is just random memory), and trying to iterate across cdata->res_->contacts throws

terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_M_construct null not valid

@rr-mark
Copy link

rr-mark commented Nov 12, 2024

Both cdata and cdata->res_ look like valid pointers.

@rr-mark
Copy link

rr-mark commented Nov 12, 2024

(Fixing line 199 did not fix the crash)

@rhaschke
Copy link
Contributor

Objects are automatically initialized by calling the default constructor. I think cdata->res_->contacts is corrupted (or released) after creation. You need to figure out why/when.

@rr-mark
Copy link

rr-mark commented Nov 12, 2024

I'm confused.

  • The CollisionResult is default initialised in MoveGroupStateValidationService::computeService.
  • In PlanningScene::checkCollision, res.contacts.size() is already junk.

I'm just rejigging my dependency stack so I can get debugging and prints from moveit_ros.

@rr-mark
Copy link

rr-mark commented Nov 12, 2024

Hmm, adding moveit_ros.move_group to our list of source dependencies seems to have fixed the problem. Maybe this is a problem with our mix of source and binary dependencies, rather than a problem with moveit.

@rhaschke
Copy link
Contributor

Probably. The collision checking code is rather old and segfaults weren't reported for quite some time...

@rr-tom-noble
Copy link
Contributor Author

As @rr-mark mentioned, building the additional packages seems to have fixed the issue. I think the cause was that our binaries are from 2.5.5, whereas our build-from-source packages are based on the moveit2 humble branch, which contains backports, so we were likely linking between incompatible shared objects.

Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants