Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top-level loop for compiler #1576

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open

Top-level loop for compiler #1576

wants to merge 23 commits into from

Conversation

wmdi
Copy link
Collaborator

@wmdi wmdi commented Jan 15, 2025

Description of changes:
Implement the top-level loop for compiler.

TODO:

  • Unit tests

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

Before merging:

  • Did you update the flexflow-third-party repo, if modifying any of the Cmake files, the build configs, or the submodules?

This change is Reviewable

@wmdi wmdi added the compiler Compiler library label Jan 15, 2025
@wmdi wmdi requested a review from lockshaw January 15, 2025 22:09
Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 25 of 46 files at r1, 19 of 19 files at r2, all commit messages.
Reviewable status: 44 of 46 files reviewed, 27 unresolved discussions (waiting on @wmdi)


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 54 at r2 (raw file):

        for (parallel_layer_guid_t layer : get_parallel_layers(pcg)) {
          // NOTE(@wmdi): Assume layers with the same key have the same allowed
          // machine views

It would be cleaner to instead bake the allowed machine views into the leaves of the MachineMappingProblemTree itself to avoid any ambiguity, so that the allowed_machine_views function isn't even taken as an argument, but is instead all handled by get_machine_mapping_problem_tree

Suggestion:

          // NOTE(@wmdi): Assume layers with the same key have the same allowed
          // machine view

lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 59 at r2 (raw file):

        }
        return mapping;
      }();

Replace with generate_map from utils/containers.h

Code quote:

  std::unordered_map<UnmappedOpCostEstimateKey, parallel_layer_guid_t>
      mapping_from_unmapped_op_cost_estimate_key_parallel_layer = [&] {
        std::unordered_map<UnmappedOpCostEstimateKey, parallel_layer_guid_t>
            mapping;
        for (parallel_layer_guid_t layer : get_parallel_layers(pcg)) {
          // NOTE(@wmdi): Assume layers with the same key have the same allowed
          // machine views
          mapping.insert(
              {get_unmapped_op_cost_estimate_key_for_layer(pcg, layer), layer});
        }
        return mapping;
      }();

lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 63 at r2 (raw file):

  MachineMappingCache cached_subgraph_costs = MachineMappingCache{
      {},
  };

Suggestion:

  MachineMappingCache cached_subgraph_costs = empty_machine_mapping_cache();

lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 91 at r2 (raw file):

    }

    PCGBinarySPDecomposition sp_decomp = maybe_sp_decomp.value();

Or alternatively, add a function

template <typename T>
T expect(std::optional<T> const &x, std::string const &err) {
  return unwrap(x, []() { throw mk_runtime_error(err); });
}

to optional.h and then use expect instead of unwrap for a slightly cleaner syntax

Suggestion:

    if (!maybe_sp_decomp.has_value()) {
      throw std::runtime_error("Fail to SP-ize PCG");
    }

    PCGBinarySPDecomposition sp_decomp = unwrap(
      get_pcg_balanced_binary_sp_decomposition(pcg),
      []() {
        throw std::runtime_error("Failed to get SP decomposition of PCG"); 
      },
    );

lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 95 at r2 (raw file):

    MachineMappingConstraints constraints = MachineMappingConstraints{
        /*machine_views=*/{},
    };

Suggestion:

    MachineMappingProblemTree problem_tree = get_machine_mapping_problem_tree(pcg, sp_decomp);
    MachineMappingConstraints constraints = 
      get_unconstrained_solution_for_layers(get_all_leaf_paths(problem_tree));

lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 109 at r2 (raw file):

    } else {
      runtime_with_optimal_mm = mm_result.raw_result.value().runtime;
    }

Pull out into a separate get_runtime_cost(MachineMappingResult const &) that's responsible for this

Code quote:

    float runtime_with_optimal_mm;
    if (mm_result.raw_result == std::nullopt) {
      runtime_with_optimal_mm = std::numeric_limits<float>::infinity();
    } else {
      runtime_with_optimal_mm = mm_result.raw_result.value().runtime;
    }

lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 155 at r2 (raw file):

      /*machine_mapping=*/optimize_pcg(best_state.pcg).second,
  };
}

Would be nice to pull a bunch of this out into a separate generic backtracking_search function. This function could either take std::functions as arguments for the operations, or you could use dtgen to create a struct of std::functions along the lines of GenericBinarySPDecompositionTreeImplementation. Would just make things a lot cleaner to read as the core search algorithm wouldn't be all mixed up with the domain-specific logic

Code quote:

  GraphOptimizeState best_state = optimize_pcg(pcg).first;
  candidates.push(best_state);

  for (int iteration = 0;
       !candidates.empty() && iteration < search_config.budget;
       ++iteration) {
    GraphOptimizeState current_state = candidates.top();
    candidates.pop();

    if (current_state < best_state) {
      best_state = current_state;
    } else if (current_state.runtime_with_optimal_mm >
               best_state.runtime_with_optimal_mm * search_config.alpha) {
      continue;
    }

    for (ParallelComputationGraph const &new_pcg :
         all_pcgs_obtained_by_applying_a_substitution(current_state.pcg,
                                                      substitutions)) {
      std::optional<GraphOptimizeState> new_pcg_optimize_result =
          optimize_pcg(new_pcg).first;
      if (new_pcg_optimize_result == std::nullopt) {
        continue;
      }
      GraphOptimizeState new_state = new_pcg_optimize_result.value();
      if (new_state.runtime_with_optimal_mm <= search_config.threshold &&
          get_nodes(new_pcg.raw_graph).size() <= search_config.max_num_ops) {
        candidates.push(new_state);
      }
    }
  }

  return SearchResult{
      /*pcg=*/best_state.pcg,
      /*machine_mapping=*/optimize_pcg(best_state.pcg).second,
  };
}

lib/pcg/include/pcg/parallel_computation_graph/parallel_computation_graph.h line 71 at r2 (raw file):

ParallelComputationGraph
    parallel_computation_graph_from_computation_graph(ComputationGraph const &);

Add unit tests for this


lib/compiler/include/compiler/machine_mapping/machine_mapping_problem_tree/get_machine_mapping_problem_tree_result.struct.toml line 0 at r2 (raw file):
I'm not seeing this actually used anywhere? Should this be removed?


lib/compiler/include/compiler/compiler.h line 21 at r2 (raw file):

                      SearchAlgorithm,
                      AlgorithmConfig const &,
                      DeviceType);

AlgorithmConfig already specifies the search algorithm to be used, so the extra SearchAlgorithm parameter is unnecessary

Suggestion:

SearchResult optimize(ComputationGraph const &,
                      MachineSpecification const &,
                      CostEstimator const &,
                      AlgorithmConfig const &,
                      DeviceType);

lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 19 at r2 (raw file):

}

MachineMapping get_machine_mapping_from_machine_mapping_result(

Add tests


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 24 at r2 (raw file):

  BinarySPDecompositionTree sp_tree =
      binary_sp_tree_from_pcg_sp_tree(sp_decomposition);

Ideally operate over a PCGBinarySPDecomposition instead (the underlying functions just require a GenericBinarySPDecompositionTreeImplementation, which is available for PCGBinarySPDecomposition here:

GenericBinarySPDecompositionTreeImplementation<PCGBinarySPDecomposition,
PCGBinarySeriesSplit,
PCGBinaryParallelSplit,
parallel_layer_guid_t>
generic_impl_for_pcg_sp_tree();


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 26 at r2 (raw file):

      binary_sp_tree_from_pcg_sp_tree(sp_decomposition);

  auto get_layer_from_path =

Make this a top-level function in PCGBinarySPDecomposition


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 31 at r2 (raw file):

        binary_sp_decomposition_tree_get_subtree_at_path(sp_tree, path);
    if (!subtree_optional.has_value()) {
      throw std::runtime_error("Invalid tree path");

Suggestion:

    throw std::runtime_error(fmt::format("Invalid tree path {}", path));

lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 35 at r2 (raw file):

    BinarySPDecompositionTree subtree = subtree_optional.value();
    if (!subtree.is_node()) {
      throw std::runtime_error("Invalid tree path to a leaf");

Suggestion:

    throw std::runtime_error(fmt::format("Invalid tree path to a leaf: found {} instead", subtree));

lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 38 at r2 (raw file):

    }
    return parallel_layer_guid_t{
        subtree.get<Node>(),

Suggestion:

subtree.require_leaf(),

lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 53 at r2 (raw file):

  }

  return MachineMapping{mm};

Replace with map_keys from utils/containers

Code quote:

  std::unordered_map<parallel_layer_guid_t, MachineView> mm;

  if (mm_result.raw_result) {
    FeasibleMachineMappingResult const &feasible_mm_result =
        mm_result.raw_result.value();
    for (auto const &[path, mv] :
         feasible_mm_result.machine_mapping.raw_mapping) {
      mm.insert({get_layer_from_path(path), mv});
    }
  }

  return MachineMapping{mm};

lib/compiler/include/compiler/unity_algorithm/graph_optimize_state.h line 12 at r2 (raw file):

                     float runtime_with_optimal_mm);

  ParallelComputationGraph pcg;

Why the change from GraphOptimizeResult to ParallelComputationGraph?


lib/compiler/include/compiler/unity_algorithm/unity_algorithm.h line 17 at r2 (raw file):

                            std::vector<Substitution> const &substitutions,
                            UnitySearchConfig const &search_config,
                            DeviceType device_type);

Remove this argument and just hardcode DeviceType::GPU for now--everything in this function is almost certainly going to assume GPU anyway, so this parameter is kind of a lie

Code quote:

                            DeviceType device_type);

lib/pcg/src/pcg/operator_task_space.cc line 41 at r2 (raw file):

OperatorTaskSpace get_operator_task_space(ParallelComputationGraph const &pcg,
                                          parallel_layer_guid_t const &layer) {
  NOT_IMPLEMENTED();

Use the version from #1565


lib/compiler/src/compiler/unity_algorithm/graph_optimize_state.cc line 31 at r2 (raw file):

  // https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.graph_hashing.weisfeiler_lehman_graph_hash.html#networkx.algorithms.graph_hashing.weisfeiler_lehman_graph_hash
  size_t seed = 0;
  auto layers = topological_ordering(state.pcg);

Suggestion:

 std::vector<parallel_layer_guid_t> layers = topological_ordering(state.pcg);

lib/compiler/src/compiler/unity_algorithm/graph_optimize_state.cc line 33 at r2 (raw file):

  auto layers = topological_ordering(state.pcg);
  ::FlexFlow::hash_combine(seed, layers.size());
  for (auto layer : layers) {

Suggestion:

for (parallel_layer_guid_t const &layer : layers) {

lib/compiler/src/compiler/unity_algorithm/graph_optimize_state.cc line 35 at r2 (raw file):

  for (auto layer : layers) {
    ::FlexFlow::hash_combine(seed, get_parallel_layer_attrs(state.pcg, layer));
    auto inputs = get_incoming_tensors(state.pcg, layer);

Suggestion:

    std::vector<parallel_tensor_guid_t> inputs = get_incoming_tensors(state.pcg, layer);

lib/compiler/src/compiler/unity_algorithm/graph_optimize_state.cc line 39 at r2 (raw file):

    for (auto input : inputs) {
      for (size_t i = 0; i < layers.size(); ++i) {
        if (get_source_layer(input) == layers[i]) {

Suggestion:

        if (get_source_layer(input) == layers.at(i)) {

lib/compiler/include/compiler/search_result.struct.toml line 8 at r1 (raw file):

includes = [
  "pcg/parallel_computation_graph/parallel_computation_graph.h",
  "machine_mapping/machine_mapping.h",

Prefer absolute includes

Suggestion:

  "compiler/machine_mapping/machine_mapping.h",

lib/compiler/src/compiler/compiler.cc line 6 at r2 (raw file):

namespace FlexFlow {

SearchResult optimize(ComputationGraph const &computation_graph,

This type doesn't match the one declared in the header?


lib/compiler/src/compiler/compiler.cc line 15 at r2 (raw file):

    case SearchAlgorithm::DATA_PARALLEL:
      throw std::runtime_error(
          "Data parallel search algorithm is not implemented yet");

Can you take care of adding this in another PR? Created an issue to track it: #1577


lib/compiler/src/compiler/compiler.cc line 28 at r2 (raw file):

    }
    default:
      throw std::runtime_error("Unknown search algorithm");

Suggestion:

      throw std::runtime_error(fmt::format("Unknown search algorithm {}", search_algorithm));

Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 7.14286% with 130 lines in your changes missing coverage. Please review.

Project coverage is 60.40%. Comparing base (de7fa32) to head (efc7a9a).

Files with missing lines Patch % Lines
...er/src/compiler/unity_algorithm/unity_algorithm.cc 0.00% 70 Missing ⚠️
...er/src/compiler/machine_mapping/machine_mapping.cc 0.00% 21 Missing ⚠️
...c/compiler/unity_algorithm/graph_optimize_state.cc 34.61% 17 Missing ⚠️
lib/compiler/src/compiler/compiler.cc 0.00% 13 Missing ⚠️
...compiler/machine_mapping/machine_mapping_result.cc 0.00% 3 Missing ⚠️
...el_computation_graph/parallel_computation_graph.cc 0.00% 2 Missing ⚠️
lib/utils/include/utils/optional.h 0.00% 2 Missing ⚠️
...decomposition_tree/binary_sp_decomposition_tree.cc 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1576      +/-   ##
==========================================
- Coverage   60.94%   60.40%   -0.55%     
==========================================
  Files         618      619       +1     
  Lines       14978    15068      +90     
==========================================
- Hits         9129     9102      -27     
- Misses       5849     5966     +117     
Flag Coverage Δ
unittests 60.40% <7.14%> (-0.55%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...de/compiler/unity_algorithm/graph_optimize_state.h 100.00% <ø> (ø)
.../compiler/machine_mapping/allowed_machine_views.cc 95.31% <ø> (ø)
...ping_problem_tree/unmapped_op_cost_estimate_key.cc 100.00% <100.00%> (ø)
...el_computation_graph/parallel_computation_graph.cc 76.42% <0.00%> (-1.27%) ⬇️
lib/utils/include/utils/optional.h 66.66% <0.00%> (-33.34%) ⬇️
...decomposition_tree/binary_sp_decomposition_tree.cc 93.75% <0.00%> (-6.25%) ⬇️
...compiler/machine_mapping/machine_mapping_result.cc 92.95% <0.00%> (-4.11%) ⬇️
lib/compiler/src/compiler/compiler.cc 0.00% <0.00%> (ø)
...c/compiler/unity_algorithm/graph_optimize_state.cc 34.61% <34.61%> (ø)
...er/src/compiler/machine_mapping/machine_mapping.cc 16.00% <0.00%> (-84.00%) ⬇️
... and 1 more

... and 4 files with indirect coverage changes

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 44 of 46 files reviewed, 27 unresolved discussions (waiting on @lockshaw)


lib/compiler/include/compiler/compiler.h line 21 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

AlgorithmConfig already specifies the search algorithm to be used, so the extra SearchAlgorithm parameter is unnecessary

Done.


lib/compiler/include/compiler/machine_mapping/machine_mapping_problem_tree/get_machine_mapping_problem_tree_result.struct.toml line at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

I'm not seeing this actually used anywhere? Should this be removed?

Done.


lib/compiler/src/compiler/compiler.cc line 6 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This type doesn't match the one declared in the header?

Done.


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 53 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Replace with map_keys from utils/containers

Done.


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 109 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Pull out into a separate get_runtime_cost(MachineMappingResult const &) that's responsible for this

Done.


lib/compiler/include/compiler/search_result.struct.toml line 8 at r1 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Prefer absolute includes

Done.


lib/compiler/include/compiler/unity_algorithm/graph_optimize_state.h line 12 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Why the change from GraphOptimizeResult to ParallelComputationGraph?

I feel it's better to remove the machine mapping information from GraphOptimizeResult since it can be derived from the PCG by applying get_optimal_machine_mapping. It seems redundant to have this field.


lib/compiler/include/compiler/unity_algorithm/unity_algorithm.h line 17 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Remove this argument and just hardcode DeviceType::GPU for now--everything in this function is almost certainly going to assume GPU anyway, so this parameter is kind of a lie

Done.


lib/compiler/src/compiler/unity_algorithm/graph_optimize_state.cc line 31 at r2 (raw file):

  // https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.graph_hashing.weisfeiler_lehman_graph_hash.html#networkx.algorithms.graph_hashing.weisfeiler_lehman_graph_hash
  size_t seed = 0;
  auto layers = topological_ordering(state.pcg);

Done.


lib/compiler/src/compiler/unity_algorithm/graph_optimize_state.cc line 33 at r2 (raw file):

  auto layers = topological_ordering(state.pcg);
  ::FlexFlow::hash_combine(seed, layers.size());
  for (auto layer : layers) {

Done.


lib/compiler/src/compiler/unity_algorithm/graph_optimize_state.cc line 35 at r2 (raw file):

  for (auto layer : layers) {
    ::FlexFlow::hash_combine(seed, get_parallel_layer_attrs(state.pcg, layer));
    auto inputs = get_incoming_tensors(state.pcg, layer);

Done.


lib/compiler/src/compiler/unity_algorithm/graph_optimize_state.cc line 39 at r2 (raw file):

    for (auto input : inputs) {
      for (size_t i = 0; i < layers.size(); ++i) {
        if (get_source_layer(input) == layers[i]) {

Done.


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 31 at r2 (raw file):

        binary_sp_decomposition_tree_get_subtree_at_path(sp_tree, path);
    if (!subtree_optional.has_value()) {
      throw std::runtime_error("Invalid tree path");

Done.


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 35 at r2 (raw file):

    BinarySPDecompositionTree subtree = subtree_optional.value();
    if (!subtree.is_node()) {
      throw std::runtime_error("Invalid tree path to a leaf");

Done.


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 38 at r2 (raw file):

    }
    return parallel_layer_guid_t{
        subtree.get<Node>(),

Done.


lib/compiler/src/compiler/compiler.cc line 28 at r2 (raw file):

    }
    default:
      throw std::runtime_error("Unknown search algorithm");

Done.


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 63 at r2 (raw file):

  MachineMappingCache cached_subgraph_costs = MachineMappingCache{
      {},
  };

Done.


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 95 at r2 (raw file):

    MachineMappingConstraints constraints = MachineMappingConstraints{
        /*machine_views=*/{},
    };

Done.

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 36 of 46 files reviewed, 27 unresolved discussions (waiting on @lockshaw)


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 24 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Ideally operate over a PCGBinarySPDecomposition instead (the underlying functions just require a GenericBinarySPDecompositionTreeImplementation, which is available for PCGBinarySPDecomposition here:

GenericBinarySPDecompositionTreeImplementation<PCGBinarySPDecomposition,
PCGBinarySeriesSplit,
PCGBinaryParallelSplit,
parallel_layer_guid_t>
generic_impl_for_pcg_sp_tree();

I don't quite understand this. Could you elaborate more on the relations between the classes involved?

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 36 of 46 files reviewed, 27 unresolved discussions (waiting on @lockshaw)


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 54 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

It would be cleaner to instead bake the allowed machine views into the leaves of the MachineMappingProblemTree itself to avoid any ambiguity, so that the allowed_machine_views function isn't even taken as an argument, but is instead all handled by get_machine_mapping_problem_tree

Seems we cannot do that, since allowed machine views depend on the amount of resources for each sub-problem, and therefore we cannot generate those at the beginning of the search


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 59 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Replace with generate_map from utils/containers.h

We cannot replace it since the map is generated from value to key

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 36 of 46 files reviewed, 27 unresolved discussions (waiting on @lockshaw)


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 91 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Or alternatively, add a function

template <typename T>
T expect(std::optional<T> const &x, std::string const &err) {
  return unwrap(x, []() { throw mk_runtime_error(err); });
}

to optional.h and then use expect instead of unwrap for a slightly cleaner syntax

Done.


lib/pcg/src/pcg/operator_task_space.cc line 41 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Use the version from #1565

Done.

Copy link
Collaborator

@lockshaw lockshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 46 files at r1, 4 of 8 files at r3, 30 of 30 files at r4, all commit messages.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @wmdi)


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 24 at r2 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

I don't quite understand this. Could you elaborate more on the relations between the classes involved?

This is the "vtable hack" I mentioned--essentially, rather than using C++ inheritance (which is full of sharp edges), we just specify a .struct.toml of std::functions that specify the interface (let's call this template <typename T> struct Vtable<T>;, and then can define template <typename T> functions that take in a Vtable<T> const & to allow the polymorphism. Here you're using a function (get_subtree_at_path) that uses the vtable hack to operate over any type that has a valid GenericBinarySPDecompositionTreeImplementation. Right now you're first converting to a BinarySPDecompositionTree, but you can just operate directly over a PCGBinarySPDecomposition


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 54 at r2 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

Seems we cannot do that, since allowed machine views depend on the amount of resources for each sub-problem, and therefore we cannot generate those at the beginning of the search

Good point. To fix this, you could bake in the allowed_machine_views for the whole possible set of resources, and then filter them at the leaf level?


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 59 at r2 (raw file):

Previously, wmdi (Mengdi Wu) wrote…

We cannot replace it since the map is generated from value to key

True--would unordered_map_from_pairs combined with transform (and optionally zip) work?


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 20 at r4 (raw file):

}

MachineMapping get_machine_mapping_from_machine_mapping_result(

Suggestion:

std::optional<MachineMapping>

lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 53 at r4 (raw file):

  }

  return MachineMapping{mm};

Suggestion:

  return transform(mm_result.raw_result, 
                   [&](FeasibleMachineMappingResult const &feasible_mm_result) {
                     return MachineMapping{
                       map_keys(feasible_mm_result.machine_mapping.raw_mapping,
                                get_layer_from_path),
                     };
                   });

lib/compiler/test/src/unity_algorithm.cc line 5 at r4 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  // TODO: to be udpated

Update


lib/compiler/test/src/graph_optimize_state.cc line 8 at r4 (raw file):

TEST_SUITE(FF_TEST_SUITE) {
  // TODO(@wmdi): to be udpated

Update


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 84 at r4 (raw file):

  };

  auto get_runtime_cost = [](MachineMappingResult const &mm_result) {

Move this to machine_mapping_result.h

@lockshaw
Copy link
Collaborator

lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 155 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Would be nice to pull a bunch of this out into a separate generic backtracking_search function. This function could either take std::functions as arguments for the operations, or you could use dtgen to create a struct of std::functions along the lines of GenericBinarySPDecompositionTreeImplementation. Would just make things a lot cleaner to read as the core search algorithm wouldn't be all mixed up with the domain-specific logic

@victorli2002

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @lockshaw and @victorli2002)


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 24 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

This is the "vtable hack" I mentioned--essentially, rather than using C++ inheritance (which is full of sharp edges), we just specify a .struct.toml of std::functions that specify the interface (let's call this template <typename T> struct Vtable<T>;, and then can define template <typename T> functions that take in a Vtable<T> const & to allow the polymorphism. Here you're using a function (get_subtree_at_path) that uses the vtable hack to operate over any type that has a valid GenericBinarySPDecompositionTreeImplementation. Right now you're first converting to a BinarySPDecompositionTree, but you can just operate directly over a PCGBinarySPDecomposition

Done.


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 26 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Make this a top-level function in PCGBinarySPDecomposition

Done.


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 54 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Good point. To fix this, you could bake in the allowed_machine_views for the whole possible set of resources, and then filter them at the leaf level?

Done.


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 59 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

True--would unordered_map_from_pairs combined with transform (and optionally zip) work?

Done.


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 155 at r2 (raw file):

Previously, lockshaw (Colin Unger) wrote…

@victorli2002

Done.


lib/compiler/src/compiler/unity_algorithm/unity_algorithm.cc line 84 at r4 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Move this to machine_mapping_result.h

Done. BTW, how does [[nodiscard]] work in machine_mapping_result.h?


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 20 at r4 (raw file):

}

MachineMapping get_machine_mapping_from_machine_mapping_result(

Done.


lib/compiler/src/compiler/machine_mapping/machine_mapping.cc line 53 at r4 (raw file):

  }

  return MachineMapping{mm};

Done.

Copy link
Collaborator Author

@wmdi wmdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 17 of 57 files reviewed, 12 unresolved discussions (waiting on @lockshaw and @victorli2002)


lib/compiler/test/src/graph_optimize_state.cc line 8 at r4 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Update

Done.


lib/compiler/test/src/unity_algorithm.cc line 5 at r4 (raw file):

Previously, lockshaw (Colin Unger) wrote…

Update

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler Compiler library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants