Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any inference code or something to check the model #8

Open
Occupying-Mars opened this issue Nov 16, 2023 · 6 comments
Open

any inference code or something to check the model #8

Occupying-Mars opened this issue Nov 16, 2023 · 6 comments

Comments

@Occupying-Mars
Copy link

No description provided.

@truebit
Copy link

truebit commented Nov 23, 2023

After some investigation, I replicated inference code using the same goal with supplied one/multiple snapshots and history actions.
It is not working very well on zero-shot situations.

@YiDa858
Copy link

YiDa858 commented Dec 25, 2023

@truebit Can you publish your inference code? I would appreciate it!

@kirtishrinkhala
Copy link

@truebit Please share the inference code if possible.

@kirtishrinkhala
Copy link

kirtishrinkhala commented Jan 9, 2024

I have been working on writing the inference code, here is what I could achieve till now. I wrote a function to produce the processed input for an image and the goal. However, now I am not sure on how to use that as an input on a pretrained model.

This is the code that I wrote to process the image file and the goal:

`

import action_type, action_matching
import tensorflow as tf
import numpy as np
from tqdm import tqdm
import json
import jax.numpy as jnp
import argparse
import pickle
import torch
import tensorflow as tf
from PIL import Image
from transformers import AutoProcessor, Blip2Model

device = "cuda" if torch.cuda.is_available() else "cpu"
model = Blip2Model.from_pretrained("Salesforce/blip2-opt-2.7b", torch_dtype=torch.float16)
model.to(device)
processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")

def parse_image(
    image_file_path
):

    goal = "How to login?"
    step_id = "123"
        # episode_id = ex.features.feature['episode_id'].bytes_list.value[0].decode('utf-8')
    output_ep = {
        "goal": goal,
        "step_id": step_id
    }

    img = Image.open('sample.png')


    image_height = img.height
    image_height
    image_width = img.width
#     image_channels = img.getChannel()
    with torch.no_grad():
        inputs = processor(images=img, return_tensors="pt").to(device, torch.float16)
        image_features = model.get_image_features(**inputs).pooler_output[0]
        image_features = image_features.detach().cpu()
    output_ep["image"] = image_features
    output = []
    output.append(output_ep)
    parsed_episode = []
    parsed_episode.append({"episode_id":123, "data":output})
    return parsed_episode


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--dataset', type=str, default='general')
    parser.add_argument("--split_file", type=str, default="dataset/general_texts_splits.json")
    parser.add_argument('--output_dir', type=str, default='dataset')
    parser.add_argument('--get_images', default=True, action='store_true')
    parser.add_argument('--get_annotations', default=True, action='store_true')
    parser.add_argument('--get_actions', default=True, action='store_true')
    parser.add_argument('--file_path', type=str, default='sample.png')
    
    args = parser.parse_args()
    return args

if __name__ == '__main__':

    args = parse_args()
    print('====Input Arguments====')
    print(json.dumps(vars(args), indent=2, sort_keys=False))

    all_parsed_episode = parse_image(args.file_path)

    with open(f"{args.output_dir}_test_val.obj", "wb") as wp:
        pickle.dump(all_parsed_episode,wp)

`

@Jiayi-Pan
Copy link

Hi friends,

We’ve got AutoUI running and tested its end-to-end performance in our recent paper. You can find the inference code here

https://github.com/Berkeley-NLP/Agent-Eval-Refine/tree/main/exps/android_exp/models/Auto-UI

@Yingrjimsch
Copy link

Hi friends,

We’ve got AutoUI running and tested its end-to-end performance in our recent paper. You can find the inference code here

https://github.com/Berkeley-NLP/Agent-Eval-Refine/tree/main/exps/android_exp/models/Auto-UI

Great job thanks will try that 👍 any insights in how good it works for zero shot approaches?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants