Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VisualReplayStrategy with replay instructions #610

Merged
merged 48 commits into from
Apr 16, 2024
Merged

Conversation

abrichr
Copy link
Member

@abrichr abrichr commented Mar 12, 2024

This PR implements support for user instructions via python -m openadapt.replay <ReplayStrategy> --instructions <natural language replay instructions>

  1. Get active element descriptions

For each processed event, if it is a mouse event:
- segment the event's active window
- get a natural language description of each element

  1. Modify actions according to replay instructions

Get a natural language description of what the active element should be given the
replay instructions.

  1. Replay modified events

For each modified event:

a. Convert descriptions to coordinates

If it is a click, scroll, or the last in a sequence of isolated events:

  • segment the current active window
  • determine the coordinates of the modified active element

b. Replay modified event

See prompts for details:

  • openadapt/prompts/system.j2
  • openadapt/prompts/description.j2
  • openadapt/prompts/apply_replay_instructions.j2
python -m openadapt.replay VisualReplayStrategy --instructions "Multiply 6x8"
...
6x8.mov

Raw image:

image

Segmented image:

image

Raw masks:

image

Refined masks:

image

Masked image descriptions:

gpt-4-vision-preview:

image

gpt-4-turbo-2024-04-09:

image

gemini-1.5-pro-latest:

image

claude-3-opus-20240229:

image

Event location:

image
2024-03-20 17:20:59.334 | INFO     | openadapt.adapters.openai:prompt:111 - result=
('```json\n'
 '{\n'
 '  "descriptions": [\n'
 '    "Number button \'2\'",\n'
 '    "Number button \'9\'",\n'
 '    "Arithmetic operation button for subtraction \'-\'.",\n'
 '    "Title bar showing window controls (minimize, maximize, close) and the '
 'application display area that shows the numeral \'0\'.",\n'
 '    "Number button \'8\'",\n'
 '    "Arithmetic operation button for division \'÷\'.",\n'
 '    "Number button \'6\'",\n'
 '    "Number button \'3\'",\n'
 '    "Number button \'4\'",\n'
 '    "Number button \'1\'",\n'
 '    "Number button \'7\'",\n'
 '    "Number button \'5\'",\n'
 '    "Function button for clearing the calculator input \'AC\'.",\n'
 '    "Arithmetic operation button for addition \'+\'.",\n'
 '    "Function button for toggling positive/negative sign \'+/–\'.",\n'
 '    "Decimal point button \'.\'",\n'
 '    "Arithmetic operation button for equal \'=\'.",\n'
 '    "Wide number button \'0\'",\n'
 '    "Arithmetic operation button for multiplication \'×\'.",\n'
 '    "Function button for calculating percentage \'%\'."\n'
 '  ]\n'
 '}\n'
 '```')
2024-03-20 17:20:59.334 | INFO     | openadapt.strategies.visual:prompt_for_descriptions:363 - descriptions=["Number button '2'", "Number button '9'", "Arithmetic operation button for subtraction '-'.", "Title bar showing window controls (minimize, maximize, close) and the application display area that shows the numeral '0'.", "Number button '8'", "Arithmetic operation button for division '÷'.", "Number button '6'", "Number button '3'", "Number button '4'", "Number button '1'", "Number button '7'", "Number button '5'", "Function button for clearing the calculator input 'AC'.", "Arithmetic operation button for addition '+'.", "Function button for toggling positive/negative sign '+/–'.", "Decimal point button '.'", "Arithmetic operation button for equal '='.", "Wide number button '0'", "Arithmetic operation button for multiplication '×'.", "Function button for calculating percentage '%'."]
python -m openadapt.record "typing my name"
...
richard
...
python -m openadapt.replay VisualReplayStrategy --instructions "write everything in UPPER CASE"
...
2024-03-12 09:51:28.960 | INFO     | openadapt.strategies.base:run:89 - action_event=                                                                                                                                                                     
{'canonical_text': 'R-I-C-H-A-R-D',                                                                                                                                                                                                                       
 'children': [{'canonical_key_char': 'R',                                                                                                                                                                                                                 
               'key_char': 'R',                                                                                                                                                                                                                           
               'key_vk': '15',                                                                                                                                                                                                                            
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'R',                                                                                                                                                                                                                 
               'key_char': 'R',                                                                                                                                                                                                                           
               'key_vk': '15',                                                                                                                                                                                                                            
               'name': 'release'},                                                                                                                                                                                                                        
              {'canonical_key_char': 'I',                                                                                                                                                                                                                 
               'key_char': 'I',                                                                                                                                                                                                                           
               'key_vk': '34',                                                                                                                                                                                                                            
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'I',                                                                                                                                                                                                                 
               'key_char': 'I',                                                                                                                                                                                                                           
               'key_vk': '34',                                                                                                                                                                                                                            
               'name': 'release'},                                                                                                                                                                                                                        
              {'canonical_key_char': 'C',                                                                                                                                                                                                                 
               'key_char': 'C',                                                                                                                                                                                                                           
               'key_vk': '8',                                                                                                                                                                                                                             
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'C',                                                                                                                                                                                                                 
               'key_char': 'C',                                                                                                                                                                                                                           
               'key_vk': '8',                                                                                                                                                                                                                             
               'name': 'release'},                                                                                                                                                                                                                        
              {'canonical_key_char': 'H',                                                                                                                                                                                                                 
               'key_char': 'H',                                                                                                                                                                                                                                                                             
               'key_vk': '4',                                                                                                                                                                                                                                                                               
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'A',                                                                                                                                                                                                                 
               'key_char': 'A',                                                                                                                                                                                                                           
               'key_vk': '0',                                                                                                                                                                                                                             
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'H',                                                                                                                                                                                                                                                                   
               'key_char': 'H',                                                                                                                                                                                                                                                                             
               'key_vk': '4',                                                                                                                                                                                                                             
               'name': 'release'},                                                                                                                                                                                                                        
              {'canonical_key_char': 'R',                                                                                                                                                                                                                 
               'key_char': 'R',                                                                                                                                                                                                                           
               'key_vk': '15',                                                                                                                                                                                                                            
               'name': 'press'},                                                                                                                                                                                                                          
              {'canonical_key_char': 'A',                                                                                                             
               'key_char': 'A',                                                                                                                                                                                                                                                                             
               'key_vk': '0',                                                                                                                         
               'name': 'release'},                                                                                                                                                                                                                                                                          
              {'canonical_key_char': 'D',                                                                                                             
               'key_char': 'D',                                                                                                                                                                                                                                                                             
               'key_vk': '2',                                                                                                                         
               'name': 'press'},                                                                                                                                                                                                                                                                            
              {'canonical_key_char': 'R',                                                                                                             
               'key_char': 'R',                                                                                                                                                                                                                                                                             
               'key_vk': '15',                                                                                                                                                                                                                                                                              
               'name': 'release'},                                                                                                                                                                                                                                                                          
              {'canonical_key_char': 'D',                                                                                                             
               'key_char': 'D',                                                                                                                                                                                                                                                                             
               'key_vk': '2',                                              
               'name': 'release'}],                                        
 'text': 'R-I-C-H-A-R-D'}  
...
RICHARD
...

@abrichr
Copy link
Member Author

abrichr commented Mar 22, 2024

Currently blocked on:

2024-03-22 13:32:13.086 | INFO     | openadapt.adapters.openai:get_completion:90 - result=                                                                                                                                                                
{'error': {'code': 'content_policy_violation',                                                                                                                                                                                                            
           'message': 'Your input image may contain content that is not '                                                    
                      'allowed by our safety system.',                                                                                                                                                                                                    
           'param': None,                                     
           'type': 'invalid_request_error'}}   
...
  File "/Users/abrichr/oa/OpenAdapt/openadapt/adapters/openai.py", line 91, in get_completion
    choices = result["choices"]
              -> {'error': {'message': 'Your input image may contain content that is not allowed by our safety system.', 'type': 'invalid_requ...

KeyError: 'choices'

Related:

https://community.openai.com/t/gpt4v-and-content-policy-violation/507576/4

We often come across the ‘content_policy_violation’ error for content that doesn’t seem to violate any known policies or be offensive. For instance, we received this error for a picture showing only a Razer brand computer mouse on a mouse pad. However, when we used the same image with Gemini Pro, we didn’t encounter any problems. So, despite my reservations about discussing this on an OpenAI forum, I will say our approach now is to use Gemini Pro Vision as a fallback upon this error, and if that doesn’t work, we turn to LLavA as our final recourse.

Upon retrying:

2024-03-22 13:50:16.296 | INFO     | openadapt.adapters.openai:get_completion:90 - result=                 
{'error': {'code': 'sanitizer_server_error',                                                                                                                                                                                                              
           'message': 'You uploaded an unsupported image. Please make sure '
                      'your image is below 20 MB in size and is of one the '
                      "following formats: ['png', 'jpeg', 'gif', 'webp'].",                                                                                                                                                                               
           'param': None,                                    
           'type': 'invalid_request_error'}}  

Related:

https://community.openai.com/t/maximum-number-of-images-in-a-gpt-4v-request/480233/4

https://community.openai.com/t/400-errors-on-gpt-vision-api-since-today/534538/23

@abrichr
Copy link
Member Author

abrichr commented Mar 22, 2024

2024-03-22 16:57:27.693 | INFO     | openadapt.adapters.openai:get_completion:93 - result=                                   
{'error': {'code': 'rate_limit_exceeded',                                                                                                                                                                                                                 
           'message': 'Request too large for gpt-4-vision-preview in '                                                                                                                                                                                    
                      'organization org-2UaOg7seSEeLWYW73QCY94YS on tokens per '                                             
                      'min (TPM): Limit 40000, Requested 55277. The input or '                                                                                                                                                                            
                      'output tokens must be reduced in order to run '                                 
                      'successfully. Visit '                                                                                 
                      'https://platform.openai.com/account/rate-limits to '                                                                                                                                                                               
                      'learn more.',                                                                                                                                                                                                                      
           'param': None,                                                                                                    
           'type': 'tokens'}} 
image image

Cody-DV and others added 9 commits March 22, 2024 17:10
* Cleanup

* Add predict for image wrapper. Cleanup

* cleanup

* Resolve conflicts

* Cleanup

* Update openadapt/config.py

---------

Co-authored-by: Cody DeVilliers <[email protected]>
Co-authored-by: Richard Abrich <[email protected]>
@abrichr abrichr changed the title Feat/replayparam VisualReplayStrategy with replay instructions Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants