Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially a bug: Fail to find field_name from input_schema in MixtureOfAgentsConvert #62

Open
chjuncn opened this issue Jan 15, 2025 · 0 comments

Comments

@chjuncn
Copy link
Collaborator

chjuncn commented Jan 15, 2025

When I run
python demos/paperDemo.py --datasetid=biofabric-medium --workload=medical-schema-matching --executor=SEQUENTIAL --policy=maxquality, I got error:

  File "/Users/chjun/Documents/GitHub/code/palimpzest/src/palimpzest/query/operators/mixture_of_agents_convert.py", line 272, in _call_proposer
    proposer_prompt = self._construct_proposer_prompt(fields_to_generate=fields, model=proposer_model)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chjun/Documents/GitHub/code/palimpzest/src/palimpzest/query/operators/mixture_of_agents_convert.py", line 147, in _construct_proposer_prompt
    field_desc = getattr(self.input_schema, field_name).desc
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'Table' has no attribute 'contents'

The original code is as following:

>         multiline_input_field_description = ""
>         input_fields = (
>             self.input_schema.field_names()
>             if not self.depends_on
>             else [field.split(".")[-1] for field in self.depends_on]
>         )
>         for field_name in input_fields:
>             field_desc = getattr(self.input_schema, field_name).desc
>             multiline_input_field_description += prompts.INPUT_FIELD.format(
>                 field_name=field_name, field_desc=field_desc
>             )

When I tracked the code, I see how we assign node._depends_on in optmizer.py. We basically set all upstream schemas' field to depends_on for the current op.

            # otherwise, make the node depend on all upstream nodes
            node._depends_on = set()
            for upstream_node in dataset_nodes[:node_idx]:
                node._depends_on.update(upstream_node.schema.field_names(unique=True, id=upstream_node.universal_identifier()))
            node._depends_on = list(node._depends_on)

I see you do this intentionally, so not sure about what's wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant