Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running the SythesizRR No file ending in ".predictions-params.json" #14

Open
EdenHuji opened this issue Feb 8, 2025 · 0 comments

Comments

@EdenHuji
Copy link

EdenHuji commented Feb 8, 2025

I'm trying to run the SythesizRR on the Polarity task from the example main.py.
My main file is:

from typing import *
from synthesizrr.base.data.FileMetadata import FileMetadata
from synthesizrr.base.framework import ChainExecution, Tracker
from synthesizrr.base.constants import Status
from synthesizrr.expt.common import ModelName, Experiment, DatasetName, Retriever, Corpus, RESULTS_DIR
from synthesizrr.expt.driver import run_chain

import synthesizrr
import ray
from ray.util.dask import ray_dask_get, enable_dask_on_ray, disable_dask_on_ray
from pprint import pprint
pprint(ray.init(
    # address='ray://127.0.0.1:6379',  ## MODIFY THIS
    ignore_reinit_error=True,
    _temp_dir=str('/tmp/ray/'),
    runtime_env={"py_modules": [
        synthesizrr,
    ]},
))
enable_dask_on_ray()
pprint(ray.cluster_resources())  ## Shows you number of cpus and gpus to make sure it is setup properly.


TRACKER = Tracker.of('log', path='./synthesizrr_run.log')  ## Execution outputs will get logged to this file.
BACKGROUND: bool = False
CART_FRAC: Optional[float] = 0.83  ## Make None to Cartography filtering.

if __name__ == '__main__':
    """
      ___       _             _  _
     | _ \ ___ | | __ _  _ _ (_)| |_  _  _
     |  _// _ \| |/ _` || '_|| ||  _|| || |
     |_|  \___/|_|\__,_||_|  |_| \__| \_, |
                                      |__/
    """

    SYNTHESIZRR_NUM_SHOTS_LIST = [32]
    synthesizrr_no_retr_icl_amazon_polarity_llama_2_13b_chat_exn = run_chain(
        results_dir=RESULTS_DIR,
        expt=Experiment.SynthesizRR,
        dataset_name=DatasetName.AmazonReviewsPolarity,
        model_name=ModelName.LLaMa_2_13B_Chat,
        num_shots_list=SYNTHESIZRR_NUM_SHOTS_LIST,

        corpus=Corpus.AmazonProducts,
        retriever=Retriever.Contriever,
        #retriever=Retriever.BM25Okapi,

        num_samples_per_label=5_000,
        seed_type='train_set',
        seed_set_stratify_on_ground_truth=False,

        icl_type='seed',

        llm_batch_size=1,
        llm_submission_batch_size=12,
        llm_num_models=48,
        llm_num_concurrent_preds=2,

        metrics_overall_num_samples_per_label=4_000,
        metrics_max_parallel=3,
        metrics_label_distribution='train_set',
        # metrics_to_evaluate=None,
        icl_and_prompt_template=dict(
            icl_template="""
Review: {{icl[example_text]}}""".strip() + ' ',
            prompt_template="""
{{icl_examples}}

Product details:
{{retrieved_context}}

Write a review about the above product on Amazon which discusses {label_verbalization}. Include relevant product details which are mentioned above. The review should only be a single short sentence, or a single paragraph of 3 to 4 sentences. Add very minor typos.
Review: """.strip() + ' ',
        ),

        tracker=TRACKER,
        background=BACKGROUND,
        verbosity=5,
        step_wait=5,
        cart_frac=CART_FRAC,

        # dry_run=True,
        notifier=None
    )

The error I'm getting is from the RetrieveFromSeedSet step:

ERROR_LITMUS:root:[ERROR]: ValueError: "No file ending in ".predictions-params.json" was found in "~/Experiments/synthesizrr/src/results/retrieval-corpus/amazon_products/contriever-embeddings/"; this file is required to create a Predictions object; please check the directory is correct."

It seems the predicition-params.json file really doesn't get created.
My results tree structure looks like this:

results
|-- retrieval-augmented-dataset-generation
|   `-- amazon_polarity
|       `-- retrieval-data
|           `-- amazon_products
|               `-- contriever
|                   `-- retrieval_top_k=500
|-- retrieval-corpus
|   `-- amazon_products
|       `-- contriever-embeddings
`-- seed-set
    `-- amazon_polarity
        `-- train_set
            |-- train_set_seed_set-dataset=amazon_polarity-seed_size=200-stratified=not_gt_stratified-seed=42.dataset-params.json
            `-- train_set_seed_set-dataset=amazon_polarity-seed_size=200-stratified=not_gt_stratified-seed=42.parquet

Attaching full log including the debug messages - synthesizrr-error.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant