`VisualReplayStrategy` with replay instructions #610

abrichr · 2024-03-12T01:32:03Z

This PR implements support for user instructions via python -m openadapt.replay <ReplayStrategy> --instructions <natural language replay instructions>

Get active element descriptions

For each processed event, if it is a mouse event:
- segment the event's active window
- get a natural language description of each element

Modify actions according to replay instructions

Get a natural language description of what the active element should be given the
replay instructions.

Replay modified events

For each modified event:

a. Convert descriptions to coordinates

If it is a click, scroll, or the last in a sequence of isolated events:

segment the current active window
determine the coordinates of the modified active element

b. Replay modified event

See prompts for details:

openadapt/prompts/system.j2
openadapt/prompts/description.j2
openadapt/prompts/apply_replay_instructions.j2

python -m openadapt.replay VisualReplayStrategy --instructions "Multiply 6x8"
...

6x8.mov

Raw image:

Segmented image:

Raw masks:

Refined masks:

Masked image descriptions:

gpt-4-vision-preview:

gpt-4-turbo-2024-04-09:

gemini-1.5-pro-latest:

claude-3-opus-20240229:

Event location:

2024-03-20 17:20:59.334 | INFO     | openadapt.adapters.openai:prompt:111 - result=
('```json\n'
 '{\n'
 '  "descriptions": [\n'
 '    "Number button \'2\'",\n'
 '    "Number button \'9\'",\n'
 '    "Arithmetic operation button for subtraction \'-\'.",\n'
 '    "Title bar showing window controls (minimize, maximize, close) and the '
 'application display area that shows the numeral \'0\'.",\n'
 '    "Number button \'8\'",\n'
 '    "Arithmetic operation button for division \'÷\'.",\n'
 '    "Number button \'6\'",\n'
 '    "Number button \'3\'",\n'
 '    "Number button \'4\'",\n'
 '    "Number button \'1\'",\n'
 '    "Number button \'7\'",\n'
 '    "Number button \'5\'",\n'
 '    "Function button for clearing the calculator input \'AC\'.",\n'
 '    "Arithmetic operation button for addition \'+\'.",\n'
 '    "Function button for toggling positive/negative sign \'+/–\'.",\n'
 '    "Decimal point button \'.\'",\n'
 '    "Arithmetic operation button for equal \'=\'.",\n'
 '    "Wide number button \'0\'",\n'
 '    "Arithmetic operation button for multiplication \'×\'.",\n'
 '    "Function button for calculating percentage \'%\'."\n'
 '  ]\n'
 '}\n'
 '```')
2024-03-20 17:20:59.334 | INFO     | openadapt.strategies.visual:prompt_for_descriptions:363 - descriptions=["Number button '2'", "Number button '9'", "Arithmetic operation button for subtraction '-'.", "Title bar showing window controls (minimize, maximize, close) and the application display area that shows the numeral '0'.", "Number button '8'", "Arithmetic operation button for division '÷'.", "Number button '6'", "Number button '3'", "Number button '4'", "Number button '1'", "Number button '7'", "Number button '5'", "Function button for clearing the calculator input 'AC'.", "Arithmetic operation button for addition '+'.", "Function button for toggling positive/negative sign '+/–'.", "Decimal point button '.'", "Arithmetic operation button for equal '='.", "Wide number button '0'", "Arithmetic operation button for multiplication '×'.", "Function button for calculating percentage '%'."]

python -m openadapt.record "typing my name"
...
richard
...
python -m openadapt.replay VisualReplayStrategy --instructions "write everything in UPPER CASE"
...
2024-03-12 09:51:28.960 | INFO     | openadapt.strategies.base:run:89 - action_event=                                                                                                                                                                     
{'canonical_text': 'R-I-C-H-A-R-D',                                                                                                                                                                                                                       
 'children': [{'canonical_key_char': 'R',                                                                                                                                                                                                                 
               'key_char': 'R',                                                                                                                                                                                                                           
               'key_vk': '15',                                                                                                                                                                                                                            
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'R',                                                                                                                                                                                                                 
               'key_char': 'R',                                                                                                                                                                                                                           
               'key_vk': '15',                                                                                                                                                                                                                            
               'name': 'release'},                                                                                                                                                                                                                        
              {'canonical_key_char': 'I',                                                                                                                                                                                                                 
               'key_char': 'I',                                                                                                                                                                                                                           
               'key_vk': '34',                                                                                                                                                                                                                            
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'I',                                                                                                                                                                                                                 
               'key_char': 'I',                                                                                                                                                                                                                           
               'key_vk': '34',                                                                                                                                                                                                                            
               'name': 'release'},                                                                                                                                                                                                                        
              {'canonical_key_char': 'C',                                                                                                                                                                                                                 
               'key_char': 'C',                                                                                                                                                                                                                           
               'key_vk': '8',                                                                                                                                                                                                                             
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'C',                                                                                                                                                                                                                 
               'key_char': 'C',                                                                                                                                                                                                                           
               'key_vk': '8',                                                                                                                                                                                                                             
               'name': 'release'},                                                                                                                                                                                                                        
              {'canonical_key_char': 'H',                                                                                                                                                                                                                 
               'key_char': 'H',                                                                                                                                                                                                                                                                             
               'key_vk': '4',                                                                                                                                                                                                                                                                               
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'A',                                                                                                                                                                                                                 
               'key_char': 'A',                                                                                                                                                                                                                           
               'key_vk': '0',                                                                                                                                                                                                                             
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'H',                                                                                                                                                                                                                                                                   
               'key_char': 'H',                                                                                                                                                                                                                                                                             
               'key_vk': '4',                                                                                                                                                                                                                             
               'name': 'release'},                                                                                                                                                                                                                        
              {'canonical_key_char': 'R',                                                                                                                                                                                                                 
               'key_char': 'R',                                                                                                                                                                                                                           
               'key_vk': '15',                                                                                                                                                                                                                            
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'A',                                                                                                             
               'key_char': 'A',                                                                                                                                                                                                                                                                             
               'key_vk': '0',                                                                                                                         
               'name': 'release'},                                                                                                                                                                                                                                                                          
              {'canonical_key_char': 'D',                                                                                                             
               'key_char': 'D',                                                                                                                                                                                                                                                                             
               'key_vk': '2',                                                                                                                         
               'name': 'press'},                                                                                                                                                                                                                                                                            
              {'canonical_key_char': 'R',                                                                                                             
               'key_char': 'R',                                                                                                                                                                                                                                                                             
               'key_vk': '15',                                                                                                                                                                                                                                                                              
               'name': 'release'},                                                                                                                                                                                                                                                                          
              {'canonical_key_char': 'D',                                                                                                             
               'key_char': 'D',                                                                                                                                                                                                                                                                             
               'key_vk': '2',                                              
               'name': 'release'}],                                        
 'text': 'R-I-C-H-A-R-D'}  
...
RICHARD
...

abrichr · 2024-03-22T17:35:50Z

Currently blocked on:

2024-03-22 13:32:13.086 | INFO     | openadapt.adapters.openai:get_completion:90 - result=                                                                                                                                                                
{'error': {'code': 'content_policy_violation',                                                                                                                                                                                                            
           'message': 'Your input image may contain content that is not '                                                    
                      'allowed by our safety system.',                                                                                                                                                                                                    
           'param': None,                                     
           'type': 'invalid_request_error'}}   
...
  File "/Users/abrichr/oa/OpenAdapt/openadapt/adapters/openai.py", line 91, in get_completion
    choices = result["choices"]
              -> {'error': {'message': 'Your input image may contain content that is not allowed by our safety system.', 'type': 'invalid_requ...

KeyError: 'choices'

We often come across the ‘content_policy_violation’ error for content that doesn’t seem to violate any known policies or be offensive. For instance, we received this error for a picture showing only a Razer brand computer mouse on a mouse pad. However, when we used the same image with Gemini Pro, we didn’t encounter any problems. So, despite my reservations about discussing this on an OpenAI forum, I will say our approach now is to use Gemini Pro Vision as a fallback upon this error, and if that doesn’t work, we turn to LLavA as our final recourse.

Upon retrying:

2024-03-22 13:50:16.296 | INFO     | openadapt.adapters.openai:get_completion:90 - result=                 
{'error': {'code': 'sanitizer_server_error',                                                                                                                                                                                                              
           'message': 'You uploaded an unsupported image. Please make sure '
                      'your image is below 20 MB in size and is of one the '
                      "following formats: ['png', 'jpeg', 'gif', 'webp'].",                                                                                                                                                                               
           'param': None,                                    
           'type': 'invalid_request_error'}}

https://community.openai.com/t/400-errors-on-gpt-vision-api-since-today/534538/23

abrichr · 2024-03-22T20:59:15Z

2024-03-22 16:57:27.693 | INFO     | openadapt.adapters.openai:get_completion:93 - result=                                   
{'error': {'code': 'rate_limit_exceeded',                                                                                                                                                                                                                 
           'message': 'Request too large for gpt-4-vision-preview in '                                                                                                                                                                                    
                      'organization org-2UaOg7seSEeLWYW73QCY94YS on tokens per '                                             
                      'min (TPM): Limit 40000, Requested 55277. The input or '                                                                                                                                                                            
                      'output tokens must be reduced in order to run '                                 
                      'successfully. Visit '                                                                                 
                      'https://platform.openai.com/account/rate-limits to '                                                                                                                                                                               
                      'learn more.',                                                                                                                                                                                                                      
           'param': None,                                                                                                    
           'type': 'tokens'}}

* Cleanup * Add predict for image wrapper. Cleanup * cleanup * Resolve conflicts * Cleanup * Update openadapt/config.py --------- Co-authored-by: Cody DeVilliers <[email protected]> Co-authored-by: Richard Abrich <[email protected]>

…vent.from_dict

abrichr added 10 commits March 11, 2024 21:30

work in progress

4f25558

add prompts/, adapters/openai.py, strategies/visual.py (wip)

c3147df

fixes

563feaa

adapters.anthropic

1754a78

add anthropic.py

3023879

wip

c91b64c

wip

e1a1776

wip

1182809

prompt with active segment descriptions

9827878

wip

5b064e7

working modulo openai error

c2c8fdf

Cody-DV and others added 9 commits March 22, 2024 17:10

Set-of-Mark Prompting Adapter (#612)

45bfed4

* Cleanup * Add predict for image wrapper. Cleanup * cleanup * Resolve conflicts * Cleanup * Update openadapt/config.py --------- Co-authored-by: Cody DeVilliers <[email protected]> Co-authored-by: Richard Abrich <[email protected]>

wip

04e317a

wip (working completions)

b7acb4d

working

bd57862

add missing openadapt/prompts/description.j2

e28204b

remove dead code

9c93cb8

add TODO

361cb11

remove_move_before_click

70dc614

started_counter

07667a3

abrichr changed the title ~~Feat/replayparam~~ VisualReplayStrategy with replay instructions Apr 9, 2024

abrichr mentioned this pull request Apr 9, 2024

Avoid unnecessary segmentation + description in VisualReplayStrategy #614

Open

abrichr added 6 commits April 9, 2024 17:16

started_counter; adapters.ultralytics

f2dbef5

add missing vision.py

b209b84

add openadapt/adapters/google.py

bec47b5

various fixes

83131e4

filter_masks_by_size

5df193a

documentationg

6e96949

abrichr and others added 22 commits April 13, 2024 10:50

simplify filter_masks_by_size

f43cc0f

documentation

f9641a6

caching, error handling

e841932

update README

a45987d

Merge branch 'main' into feat/replayparam

de42432

dashboard -> install-dashboard

ac4a519

Checkout code once, with the current branch

49048ef

pull_request_target -> pull_request

e8741df

add ultralytics

adb981f

black

4f55a55

flake8

2d8a7c7

Merge branch 'main' into feat/replayparam

7b5718c

flake8

4ce9a3e

fix mss import

8ac523a

fix typo

f572ed8

fixes

cc626fd

exclude alembic

eecb165

exclude .venv

c1f004b

disable som adapter; remove logging

e8c7899

Merge branch 'main' into feat/replayparam

2ed23f1

use config.ACTION_TEXT_SEP, ACTION_TEXT_NAME_PREFIX/SUFFIX in ActionE…

5ead019

…vent.from_dict

black

ede2a2e

abrichr merged commit 250943f into main Apr 16, 2024
1 check passed

abrichr deleted the feat/replayparam branch April 16, 2024 17:40

This was referenced Apr 16, 2024

Implement Gemini Vision #551

Closed

Increase the number of objects/elements detected microsoft/SoM#28

Closed

This was referenced May 30, 2024

It it possible to increase grid density in FastSAM? ultralytics/ultralytics#13250

Closed

I would like to ask how to do a visual grounding (REC) task directly using GPTY4v? microsoft/SoM#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`VisualReplayStrategy` with replay instructions #610

`VisualReplayStrategy` with replay instructions #610

abrichr commented Mar 12, 2024 •

edited

Loading

abrichr commented Mar 22, 2024 •

edited

Loading

abrichr commented Mar 22, 2024

VisualReplayStrategy with replay instructions #610

VisualReplayStrategy with replay instructions #610

Conversation

abrichr commented Mar 12, 2024 • edited Loading

abrichr commented Mar 22, 2024 • edited Loading

abrichr commented Mar 22, 2024

`VisualReplayStrategy` with replay instructions #610

`VisualReplayStrategy` with replay instructions #610

abrichr commented Mar 12, 2024 •

edited

Loading

abrichr commented Mar 22, 2024 •

edited

Loading