Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auto-reconnect to ROS for the robot browser #76

Merged
merged 6 commits into from
Jul 18, 2024

Conversation

hello-amal
Copy link
Collaborator

@hello-amal hello-amal commented Jul 16, 2024

Description

This PR depends on our fork of roslibjs#1. Although, for backwards compatibility, I have ported that code locally as well.

This PR addresses the following related bugs:

  1. If the robot browser launches before rosbridge is running, it will never join the robot room, which means the operator will never be allowed to join the operator room.
  2. If the robot browser launches too quickly after rosbridge, sometimes it invokes the /get_joint_states service before the service is actually ready, which results in it hanging. After that point, no other service invokation or topic publication from the web app will be received by rosbridge, and the only way to resolve it is to run pm2 restart start_robot_browser

This PR addresses them, respectively, by:

  1. Adding an auto-reconnect every 1 second if the ROS connection gets closed or encounters an error.
  2. This was resolved with two changes:
    1. Having rosbridge launch services in a separate thread, so that even if one service invocation hangs, it doesn't harm other rosbridge communications.
    2. Waiting until several key topics have at least one publisher, before the robot browser proceeds.

Testing procedure

  • Recreate the issue: Pull the code on master.
    • To recreate the first issue, add an artificial delay in launching rosbridge by adding a import time; time.sleep(5.0) to web_interface.launch.py before returning the launch description, and re-build the workspace. Run ./launch_interface.sh, load the operator browser, and verify it only shows the loading icon forevor (e.g., even if you refresh after 5 seconds once rosbridge has loaded, it still won't load)
    • Recreating the second issue is difficult. I'd recommend trying ./launch_interface.sh and ./stop_interface until you experience the issue on the operator browser. A tell-tale sign of this issue is if the tool-specific features you expect to be enabled (e.g., click-to-pregrasp) are (because that means rosbridge did not return the tool parameter). I had to run the interface 7 times before experiencing the issue, but once I did, I verified that no commands from the web app were going through to rosbridge. At that stage, even if you refresh the page, ros commands still don't go through.
  • Verify the solution: Pull the code from this branch and re-build your workspace. Also, update npm's roslibjs dependency to point to the branch in roslibjs#1 (or the same branch if that has been merged in).
    • Add the artificial delay in the launchfile (see above) and re-build the workspace. Run ./launch_interface.sh. Load the operator interface asap. Verify that it shows the loading icon initially, but after ~5 seconds when rosbridge loads, it loads correctly (without needing you to refresh the page).
    • Remove the artificial delay and re-build. Launch the interface and verify all video streams are live and motion commands are executed. Attach to the screen session (screen -r web_teleop_ros) and terminate it (Ctrl-c). Verify the video streams have stopped updating. Now, launch a new screen session (screen -dm -S "web_teleop_ros" ros2 launch stretch_web_teleop web_interface.launch.py). Verify that without needing to refresh the operator page, the video streams start back up again and commands start executing again.
    • Launch the interface, load the operator browser, and then stop the interface 10 times. Verify that the aforementioned issue never arises (in the few cases where the page doesn't show the features you expect, refreshing the operator interface should fix it).

Before opening a pull request

From the top-level of this repository, run:

  • pre-commit run --all-files

To merge

  • Squash & Merge

@hello-amal hello-amal changed the title Address the bug where initial ROS communications don't work Add auto-reconnect to ROS for the robot browser Jul 16, 2024
@hello-amal hello-amal merged commit 18b18a4 into master Jul 18, 2024
1 check passed
@hello-amal hello-amal deleted the bugfix_no_initial_ros_comms branch July 18, 2024 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant