Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy-matching Trajectory Cache Injectable Traits refactor 🔥🔥 #2941

Open
wants to merge 62 commits into
base: main
Choose a base branch
from

Conversation

methylDragon
Copy link
Contributor

@methylDragon methylDragon commented Jul 31, 2024

As requested in the original PR, this PR refactors the TrajectoryCache to allow users to inject their own behaviors (which will allow them to cache on and sort by any arbitrary feature, as long as it can be represented as a series of warehouse_ros columns).

Depends on:

Builds on top of:

TODOs:

  • Fix integration-test
  • Formatting pass (will be done after review)
  • Fix tutorials
  • Fix bugs

Preamble

I apologize that this PR is even larger than the one it builds on top of. Most of the added lines are docstrings or tests, and boilerplate to support the behavior injection pattern this refactor is concerned with.

On the other hand, the average file length has decreased, so the code is MUCH more modular and hopefully easy to digest.

I can't split up this PR into multiple smaller ones since technically speaking, in order to preserve cache functionality, all the feature extractors and cache insert policies introduced in this PR will need to go in together.

I would suggest looking at the tests to aid in review (they run the gamut of unit and integration tests).

You can also build and run the example:

PS: If the size is still prohibitive, and we are okay with having partial implementations live in moveit2 while reviews are pending, let me know and I will split up the PR into smaller PRs (though I suspect at that point, that a logical number of splits might end up being somewhere close to 5-10 PRs.)

Navigating This PR

Since this PR is so large, here is a proposed order for comprehending the PR.

  • Fully read this PR description and the example code in the description
  • Build and run the demo to convince yourself that the cache works (instructions in that PR)
  • Look at the new interfaces introduced
    • features/features_interface.hpp, cache_insert_policies/cache_insert_policy_interface.hpp
  • Then look at their implementations and tests
    • features/, cache_insert_policies/
  • Then look at the main TrajectoryCache class
    • trajectory_cache.hpp, trajectory_cache.cpp
  • Then tie it all together by looking at the example usage of the classes in this PR in the demo code in Add Trajectory Cache Example For Refactor moveit2_tutorials#940

Additionally, all docstrings are filled, including file ones, as appropriate. So hopefully any clarificatory questions would have already been pre-empted and answered.

Description

This PR builds on top of the fuzzy-matching Trajectory Cache introduced in:

The implementation in that PR coupled the cache tightly with assumptions about what features to extract and sort by (i.e., a specific set of start/goal constraints, and pruning by execution time.)

This means that users who might want to store different features or a subset of those features, or who might want to fetch and prune on different features (e.g., minimum jerk, path length, etc.) will be unable to.

This PR refactors the cache to allow users to inject their own feature extractors and pruning/insertion logic!

This is done by introducing two new abstract classes that can be injected into the cache methods, acting a lot like class "traits":

  • FeaturesInterface<>: Governs what query/metadata items to extract and append to the warehouse_ros query.
  • CacheInserterInterface<>: Governs the pruning and insertion logic.

For more details on FeaturesInterface, see the Docstrings: https://github.com/moveit/moveit2/blob/cc0feb37cf423076e133523ccdbbf3038b84a01e/moveit_ros/trajectory_cache/include/moveit/trajectory_cache/features/features_interface.hpp

Some notes:

  • I decided to not go with lambdas, because there should be tight coupling between the query and metadata insertion logic for a particular feature.
  • Similarly, cache insertion logic heavily benefits from being stateful, and coupling those chunks of logic together.

Example

In other words, before. the cache was used like this:

auto traj_cache = std::make_shared<TrajectoryCache>(node);
traj_cache->init(/*db_host=*/":memory:", /*db_port=*/0, /*exact_match_precision=*/1e-6);

auto fetched_trajectory =
    traj_cache->fetchBestMatchingTrajectory(*move_group_interface, robot_name, motion_plan_req_msg,
                                            _cache_start_match_tolerance, _cache_goal_match_tolerance,
                                            /*sort_by=*/"execution_time_s", /*ascending=*/true);

if (fetched_trajectory)
{
  // Great! We got a cache hit
  // Do something with the plan.
}
else
{
  // Plan... And put it for posterity!
  traj_cache->insertTrajectory(
      *interface, robot_name, std::move(plan_req_msg), std::move(res->result.trajectory),
      rclcpp::Duration(res->result.trajectory.joint_trajectory.points.back().time_from_start).seconds(),
      res->result.planning_time, /*delete_worse_trajectories=*/true);
}

Now the cache is used like this:

auto traj_cache = std::make_shared<TrajectoryCache>(node);
traj_cache->init(/*db_host=*/":memory:", /*db_port=*/0, /*exact_match_precision=*/1e-6);

std::vector<std::unique_ptr<FeaturesInterface<MotionPlanRequest>>> features;
features.emplace_back(std::make_unique<WorkspaceFeatures>());
features.emplace_back(std::make_unique<StartStateJointStateFeatures>(start_tolerance));
// ...

auto fetched_trajectory =
    traj_cache->fetchBestMatchingTrajectory(move_group, robot_name, motion_plan_req_msg,
                                            /*features=*/features,
                                            /*sort_by=*/TrajectoryCache::getDefaultSortFeature(),
                                            /*ascending=*/true);

// Or more simply, if you want the same feature set as before the refactor, instead of painfully listing the features one by one:
// Type: std::vector<std::unique_ptr<FeaturesInterface<MotionPlanRequest>>>
auto default_features = TrajectoryCache::getDefaultFeatures(_cache_start_match_tolerance, _cache_goal_match_tolerance);

if (fetched_trajectory)
{
  // Great! We got a cache hit
  // Do something with the plan.
}
else
{
  // Plan... And put it for posterity!
  //
  // NOTE: Now instead of passing a trajectory, pass the plan result,
  // it'll contain the execution time and planning time we need!
  //
  // cache_inserter is a CacheInserterInterface<MotionPlanRequest, MotionPlanResponse, msg::RobotTrajectory>
  // It will tell the trajectory cache:
  //   - how to fetch "matching entries"
  //   - how to determine if they should be pruned,
  //   - how to determine when to insert the candidate cache entry
  //   - and what metadata to attach 
  //
  // additional_features allows a user to further add more metadata features for use with fetching
  // though they will not be considered by the cache_inserter
  traj_cache->insertTrajectory(
      move_group, robot_name, std::move(plan_req_msg), std::move(plan),
      /*cache_inserter=*/BestSeenExecutionTimePolicy(),
      /*prune_worse_trajectories=*/true, /*additional_features=*/{});
}

See the motion plan request features here: 79b7f95

The Feature Contract

Users may use an instance of FeaturesInterface<> to fetch a cache entry only if it was supported by the CacheInserterInterface<> instance that they used (or on insert, the feature was added in additional_features).

I could not think of a way to create a coupling between uses of the cache inserters and the features. This is the cost of generality and allowing users to inject arbitrary logic into the cache.

As such, users must take care to look at the docs of the cache inserter to see what features can be used with them.

(This can be mitigated by adding helper methods to get "standard" bundles of features and a "standard" CacheInserter.)

Bonus

I added new features to the default feature extractors set and cleaned up some utilities!

There are now FeaturesInterface<> implementations that can handle path and trajectory constraints!
Multiple goal constraints are also handled (at the cost of increased cardinality.)

I also added "atomic" features that wrap the basic ops you can do with warehouse_ros, to allow users to specify their own metadata to tag cache entries with.

Here: cc0feb3

Pre-Existing Support

The package now provides some starter implementations that covers most general cases of motion planning.

For more information, see the implementations of:

  • FeaturesInterface
  • CacheInsertPolicyInterface

Cache Keying Features

The following are features of the plan request and response that you can key the cache on.

These support separately configurable fuzzy lookup on start and goal conditions!
Additionally, these features "canonicalize" the inputs to reduce the cardinality of the cache, increasing the chances of cache hits. (e.g., restating poses relative to the planning frame).

Supported Features:

  • "Start"
    • WorkspaceFeatures: Planning workspace
    • StartStateJointStateFeatures: Starting robot joint state
  • "Goal"
    • MaxSpeedAndAccelerationFeatures: Max velocity, acceleration, and cartesian speed limits
    • GoalConstraintsFeatures: Planning request goal_constraints
      • This includes ALL joint, position, and orientation constraints (but not constraint regions)!
    • PathConstraintsFeatures: Planning request path_constraints
    • TrajectoryConstraintsFeatures: Planning request trajectory_constraints

Additionally, support for user-specified features are provided for query-only or cache metadata tagging constant features.

Similar support exists for the cartesian variants of these.

Cache Insert and Pruning Policies

The following are cache insertion and pruning policies to govern when cache entries are inserted, and how they are (optionally) pruned.

Supported Cache Insert Policies:

  • BestSeenExecutionTimePolicy: Only insert best seen execution time, optionally prune on best execution time.
  • AlwaysInsertNeverPrunePolicy: Always insert, never prune

Caveat

The increased functionality is now no longer 100% covered. But I tried adding tests where I had time to. I am unfortunately running out of time to iterate on this, so let's be targeted with the improvements!

@methylDragon methylDragon force-pushed the ch3/trajectory-cache-refactor branch 12 times, most recently from 91b48e8 to cc0feb3 Compare August 1, 2024 05:50
@methylDragon methylDragon changed the title (DO NOT MERGE) Ch3/trajectory cache refactor (WIP) Pluggable Fuzzy Matching TrajectoryCache refactor Aug 1, 2024
@methylDragon methylDragon changed the title (WIP) Pluggable Fuzzy Matching TrajectoryCache refactor (WIP) Pluggable fuzzy-matching Trajectory Cache refactor Aug 1, 2024
@methylDragon methylDragon changed the title (WIP) Pluggable fuzzy-matching Trajectory Cache refactor (WIP) Fuzzy-matching Trajectory Cache Traits refactor Aug 1, 2024
@methylDragon methylDragon changed the title (WIP) Fuzzy-matching Trajectory Cache Traits refactor (WIP) Fuzzy-matching Trajectory Cache Injectable Traits refactor Aug 1, 2024
@methylDragon
Copy link
Contributor Author

methylDragon commented Aug 1, 2024

Quick question: clang-format/tidy is erroneously editing the template parameters.

How do I get around it?
Related bug: llvm/llvm-project#46097

Is it acceptable to throw in NOLINT directives?

@stephanie-eng
Copy link
Contributor

You'd be far from the first use-case of NOLINT, but would recommend judicious use of it.

@methylDragon methylDragon force-pushed the ch3/trajectory-cache-refactor branch 9 times, most recently from 77789f3 to 1feb211 Compare August 3, 2024 10:11
@methylDragon
Copy link
Contributor Author

Rebased!

@methylDragon
Copy link
Contributor Author

CI is failing due to the jump_threshold arg being called on computeCartesianPlan. But the cache needs to consider it due to the field's presence in the GetCartesianPath service... How do I disable the warning?

Copy link

mergify bot commented Sep 16, 2024

This pull request is in conflict. Could you fix it @methylDragon?

@methylDragon
Copy link
Contributor Author

methylDragon commented Oct 17, 2024

Are we good to merge? (:
@sjahr

Quoting from before:

CI is failing due to the jump_threshold arg being called on computeCartesianPlan. But the cache needs to consider it due to the field's presence in the GetCartesianPath service... How do I disable the warning?

@sjahr
Copy link
Contributor

sjahr commented Oct 18, 2024

🙈 I've been busy but maybe I'll get to it on the weekend, sorry. The API of computeCartesianPlan got changed #2916. Can you adapt the cache to the new API? I think you don't need to change the service for

@methylDragon
Copy link
Contributor Author

Small poke @sjahr (:

Copy link
Contributor

@sjahr sjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Tests seem to work now, thanks.

@sjahr sjahr self-requested a review November 8, 2024 08:36
Copy link
Contributor

@sjahr sjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I did not read the log correctly. Any idea what might call this error:

$ ( source /home/runner/work/moveit2/moveit2/.work/upstream_ws/install/setup.bash && cd /home/runner/work/moveit2/moveit2/.work/target_ws && colcon test-result --verbose; )
build/moveit_ros_trajectory_cache/Testing/20241107-2346/Test.xml: 7 tests, 0 errors, 1 failure, 0 skipped
- test_utils
  <<< failure message
    -- run_test.py: invoking following command in '/home/runner/work/moveit2/moveit2/.work/target_ws/build/moveit_ros_trajectory_cache/test':
     - /home/runner/work/moveit2/moveit2/.work/target_ws/build/moveit_ros_trajectory_cache/test/test_utils --gtest_output=xml:/home/runner/work/moveit2/moveit2/.work/target_ws/build/moveit_ros_trajectory_cache/test_results/moveit_ros_trajectory_cache/test_utils.gtest.xml
    [==========] Running 3 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 1 test from WarehouseFixture
    [ RUN      ] WarehouseFixture.QueryAppendCenterWithToleranceWorks
Error: ROR] [1731023211.703482413] [warehouse_ros_sqlite.query]: Preparing Query failed: no such column: M_unrelated_metadata
    [       OK ] WarehouseFixture.QueryAppendCenterWithToleranceWorks (14 ms)
    [----------] 1 test from WarehouseFixture (14 ms total)
    
    [----------] 2 tests from TestUtils
    [ RUN      ] TestUtils.GetExecutionTimeWorks
    [       OK ] TestUtils.GetExecutionTimeWorks (0 ms)
    [ RUN      ] TestUtils.ConstraintSortingWorks
    [       OK ] TestUtils.ConstraintSortingWorks (0 ms)
    [----------] 2 tests from TestUtils (0 ms total)
    
    [----------] Global test environment tear-down
    [==========] 3 tests from 2 test suites ran. (14 ms total)

https://github.com/moveit/moveit2/actions/runs/11733210217/job/32686999595

@methylDragon
Copy link
Contributor Author

methylDragon commented Nov 9, 2024

Whoops, I did not read the log correctly. Any idea what might call this error:

$ ( source /home/runner/work/moveit2/moveit2/.work/upstream_ws/install/setup.bash && cd /home/runner/work/moveit2/moveit2/.work/target_ws && colcon test-result --verbose; )
build/moveit_ros_trajectory_cache/Testing/20241107-2346/Test.xml: 7 tests, 0 errors, 1 failure, 0 skipped
- test_utils
  <<< failure message
    -- run_test.py: invoking following command in '/home/runner/work/moveit2/moveit2/.work/target_ws/build/moveit_ros_trajectory_cache/test':
     - /home/runner/work/moveit2/moveit2/.work/target_ws/build/moveit_ros_trajectory_cache/test/test_utils --gtest_output=xml:/home/runner/work/moveit2/moveit2/.work/target_ws/build/moveit_ros_trajectory_cache/test_results/moveit_ros_trajectory_cache/test_utils.gtest.xml
    [==========] Running 3 tests from 2 test suites.
    [----------] Global test environment set-up.
    [----------] 1 test from WarehouseFixture
    [ RUN      ] WarehouseFixture.QueryAppendCenterWithToleranceWorks
Error: ROR] [1731023211.703482413] [warehouse_ros_sqlite.query]: Preparing Query failed: no such column: M_unrelated_metadata
    [       OK ] WarehouseFixture.QueryAppendCenterWithToleranceWorks (14 ms)
    [----------] 1 test from WarehouseFixture (14 ms total)
    
    [----------] 2 tests from TestUtils
    [ RUN      ] TestUtils.GetExecutionTimeWorks
    [       OK ] TestUtils.GetExecutionTimeWorks (0 ms)
    [ RUN      ] TestUtils.ConstraintSortingWorks
    [       OK ] TestUtils.ConstraintSortingWorks (0 ms)
    [----------] 2 tests from TestUtils (0 ms total)
    
    [----------] Global test environment tear-down
    [==========] 3 tests from 2 test suites ran. (14 ms total)

https://github.com/moveit/moveit2/actions/runs/11733210217/job/32686999595

This is coming from this test of the warehouse_ros utils:

Query::Ptr unrelated_query = coll.createQuery();
moveit_ros::trajectory_cache::queryAppendCenterWithTolerance(*unrelated_query, "unrelated_metadata", 1.0, 10.0);
EXPECT_TRUE(coll.queryList(unrelated_query).empty());

Which double checks that a query for a metadata key that doesn't exist returns empty, so this is working as expected. I think the emission is just a warning from warehouse_ros since we're deliberately looking up a non-existent column in the test.


If we're instead talking this segfault on cleanup:

    Stack trace (most recent call last):
    #15   Object "", at 0xffffffffffffffff, in 
    #14   Object "/home/runner/work/moveit2/moveit2/.work/target_ws/build/moveit_ros_trajectory_cache/test/test_utils", at 0x55def50858a4, in _start
    #13   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fe64f535e3f, in __libc_start_main
    #12   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fe64f535d96, in 
    #11   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fe64f55160f, in exit
    #10   Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fe64f551494, in 
    #9    Object "/home/runner/work/moveit2/moveit2/.work/target_ws/install/moveit_ros_warehouse/lib/libmoveit_warehouse.so.2.11.0", at 0x7fe64f1fa814, in std::unique_ptr<warehouse_ros::DatabaseLoader, std::default_delete<warehouse_ros::DatabaseLoader> >::~unique_ptr()
    #8    Object "/opt/ros/humble/lib/libwarehouse_ros.so", at 0x7fe64ead80c1, in warehouse_ros::DatabaseLoader::~DatabaseLoader()
    #7    Object "/opt/ros/humble/lib/libwarehouse_ros.so", at 0x7fe64eae1748, in 
    #6    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e9a181d, in class_loader::MultiLibraryClassLoader::~MultiLibraryClassLoader()
    #5    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e9a177a, in class_loader::MultiLibraryClassLoader::shutdownAllClassLoaders()
    #4    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e9a15b5, in class_loader::MultiLibraryClassLoader::unloadLibrary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
    #3    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e9a0bd2, in class_loader::ClassLoader::unloadLibraryInternal(bool)
    #2    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e9a0575, in class_loader::impl::unloadLibrary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, class_loader::ClassLoader*)
    #1    Object "/opt/ros/humble/lib/libclass_loader.so", at 0x7fe64e99e111, in class_loader::impl::findLoadedLibrary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
    #0    Object "/usr/lib/x86_64-linux-gnu/libc.so.6", at 0x7fe64f6a5a92, in 
    Segmentation fault (Address not mapped to object [0x55dba8977341])

I think it's an issue with the warehouse_ros database loader, specific to humble. So not an issue with the tests here.
You can see that I'm not doing anything special with the warehouse_ros fixture the test uses

@methylDragon
Copy link
Contributor Author

@sjahr

Sorry to keep poking; is it possible to exclude the trajectory cache targets/tests from the humble CI? Or is there anything that I need to do on my end?

Copy link
Contributor

@sjahr sjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to find the error cause but I did not succeed for now and due to a lack of more time, I'd say, just skip this test on humble. warehouse_ros throws when the test fixture is teared down 🤷 Memory management is not trivial I guess 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants