Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation and fix prompt #101

Merged
merged 37 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
f537ba6
Merge pull request #93 from microsoft/pre-release
vyokky Jun 24, 2024
84027eb
Merge pull request #97 from microsoft/pre-release
ShilinHe Jun 25, 2024
62e2807
fix prompt
vyokky Jun 25, 2024
e2a9f73
fix prompt
vyokky Jun 25, 2024
36f6656
Update README.md
TigeR0se Jun 26, 2024
010f77b
readme
vyokky Jun 26, 2024
04aaf2c
Merge pull request #98 from TigeR0se/patch-1
vyokky Jun 26, 2024
970babb
bug fixed
vyokky Jun 26, 2024
19d1412
Merge pull request #99 from microsoft/main
vyokky Jun 26, 2024
8fd6e92
doc
vyokky Jun 26, 2024
d2b66c0
doc deployment
vyokky Jun 26, 2024
ebdb5c7
doc deployment
vyokky Jun 26, 2024
4eb996d
doc deployment
vyokky Jun 26, 2024
86cf045
doc deployment
vyokky Jun 26, 2024
233aa6d
doc deployment
vyokky Jun 26, 2024
4526731
doc deployment
vyokky Jun 26, 2024
b7fa3f3
doc deployment
vyokky Jun 26, 2024
5f16e5c
doc deployment
vyokky Jun 26, 2024
1ecc452
doc deployment
vyokky Jun 26, 2024
a7c05ad
doc deployment
vyokky Jun 26, 2024
3269e28
doc deployment
vyokky Jun 26, 2024
fc1beea
documentation
vyokky Jun 27, 2024
7488dce
documentation
vyokky Jun 27, 2024
a5d60fb
doc
vyokky Jun 27, 2024
b77315b
doc
vyokky Jun 28, 2024
6052a81
doc
vyokky Jun 28, 2024
2666ad8
doc
vyokky Jun 28, 2024
e9dc19d
doc
vyokky Jun 28, 2024
e0d5822
doc
vyokky Jun 28, 2024
661d430
doc
vyokky Jun 28, 2024
ed7ddbb
doc
vyokky Jun 28, 2024
22a6996
doc
vyokky Jun 28, 2024
b375caa
doc
vyokky Jun 28, 2024
2db9f5e
doc
vyokky Jun 28, 2024
ff88134
doc
vyokky Jun 28, 2024
4e428cb
documentation
vyokky Jun 28, 2024
71f84aa
documentation
vyokky Jun 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/document_deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Deploy MkDocs site

on:
push:
branches:
- main # 当推送到主分支时触发
- vyokky/dev # 当推送到 vyokky_dev 分支时触发
paths:
- 'documents/**' # 当 docs 目录中的文件变化时触发

jobs:
deploy:
runs-on: ubuntu-latest
permissions:
contents: write

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.9'

- name: Install MkDocs and dependencies
run: |
pip install mkdocs mkdocs-material mkdocstrings mkdocstrings[python]

- name: Deploy to GitHub Pages
run: |
cd documents
mkdocs gh-deploy --config-file mkdocs.yml --force
env:
github_token: ${{ secrets.GITHUB_TOKEN }}
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ vectordb/docs/*
vectordb/experience/*
vectordb/demonstration/*

# Ignore the data files
# Ignore the data files and scripts
tasks/*
scripts/*

# Don't ignore the example files
!vectordb/docs/example/
Expand Down
41 changes: 22 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,25 @@
[![arxiv](https://img.shields.io/badge/Paper-arXiv:202402.07939-b31b1b.svg)](https://arxiv.org/abs/2402.07939) 
![Python Version](https://img.shields.io/badge/Python-3776AB?&logo=python&logoColor=white-blue&label=3.10%20%7C%203.11) 
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) 
![Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat) 
[![Documentation](https://img.shields.io/badge/Documentation-%230ABAB5?style=flat&logo=readthedocs&logoColor=black)](https://microsoft.github.io/UFO/) 
[![YouTube](https://img.shields.io/badge/YouTube-white?logo=youtube&logoColor=%23FF0000)](https://www.youtube.com/watch?v=QT_OhygMVXU) 
<!-- [![X (formerly Twitter) Follow](https://img.shields.io/twitter/follow/UFO_Agent)](https://twitter.com/intent/follow?screen_name=UFO_Agent) -->
<!-- ![Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)&ensp; -->

</div>

**UFO** is a **UI-Focused** dual-agent framework to fulfill user requests on **Windows OS** by seamlessly navigating and operating within individual or spanning multiple applications.
**UFO** is a **UI-Focused** multi-agent framework to fulfill user requests on **Windows OS** by seamlessly navigating and operating within individual or spanning multiple applications.

<h1 align="center">
<img src="./assets/overview_n.png"/>
</h1>


## 🕌 Framework
<b>UFO</b> <img src="./assets/ufo_blue.png" alt="UFO Image" width="24"> operates as a dual-agent framework, encompassing:
- <b>HostAgent (Previously AppAgent) 🤖</b>, tasked with choosing an application for fulfilling user requests. This agent may also switch to a different application when a request spans multiple applications, and the task is partially completed in the preceding application.
- <b>AppAgent (Previously ActAgent) 👾</b>, responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application.
- <b>Control Interaction 🎮</b>, is tasked with translating actions from HostAgent and AppAgent into interactions with the application and its UI controls. It's essential that the targeted controls are compatible with the Windows **UI Automation** API.
<b>UFO</b> <img src="./assets/ufo_blue.png" alt="UFO Image" width="24"> operates as a multi-agent framework, encompassing:
- <b>HostAgent 🤖</b>, tasked with choosing an application for fulfilling user requests. This agent may also switch to a different application when a request spans multiple applications, and the task is partially completed in the preceding application.
- <b>AppAgent 👾</b>, responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application.
- <b>Control Interaction 🎮</b>, is tasked with translating actions from HostAgent and AppAgent into interactions with the application and its UI controls. It's essential that the targeted controls are compatible with the Windows **UI Automation** or **Win32** API.

Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend the application UI and fulfill the user's request. For more details, please consult our [technical report](https://arxiv.org/abs/2402.07939).
<h1 align="center">
Expand All @@ -33,10 +35,11 @@ Both agents leverage the multi-modal capabilities of GPT-Vision to comprehend th


## 📢 News
- 📅 2024-06-28: We are thrilled to announce that our official introduction video is now available on [YouTube](https://www.youtube.com/watch?v=QT_OhygMVXU)! Additionally, you can check out the early version of our [documentation](https://microsoft.github.io/UFO/). We welcome your contributions and feedback!
- 📅 2024-06-25: **New Release for v0.2.1!** We are excited to announce the release of version 0.2.1! This update includes several new features and improvements:
1. **HostAgent Refactor:** We've refactored the HostAgent to enhance its efficiency in managing AppAgents within UFO.
2. **Evaluation Agent:** Introducing an evaluation agent that assesses task completion and provides real-time feedback.
3. **Google Gemini Support:** UFO now supports Google Gemini as the inference engine. Refer to our detailed guide in [README.md](/model_worker/readme.md).
3. **Google Gemini Support:** UFO now supports Google Gemini as the inference engine. Refer to our detailed guide in [Documentation](https://microsoft.github.io/UFO/supported_models/gemini/).
4. **Customized User Agents:** Users can now create customized agents by simply answering a few questions.
- 📅 2024-05-21: We have reached 5K stars!✨
- 📅 2024-05-08: **New Release for v0.1.1!** We've made some significant updates! Previously known as AppAgent and ActAgent, we've rebranded them to HostAgent and AppAgent to better align with their functionalities. Explore the latest enhancements:
Expand Down Expand Up @@ -70,10 +73,11 @@ These sources provide insights into the evolving landscape of technology and the
## 💥 Highlights

- [x] **First Windows Agent** - UFO is the pioneering agent framework capable of translating user requests in natural language into actionable operations on Windows OS.
- [x] **RAG Enhanced** - UFO is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources to promote its ability, including offling help documents and online search engine.
- [x] **Interactive Mode** - UFO facilitates multiple sub-requests from users within the same session, enabling the completion of complex tasks seamlessly.
- [x] **Action Safeguard** - UFO incorporates safeguards to prompt user confirmation for sensitive actions, enhancing security and preventing inadvertent operations.
- [x] **Easy Extension** - UFO offers extensibility, allowing for the integration of additional functionalities and control types to tackle diverse and intricate tasks with ease.
- [x] **Agent as an Expert** - UFO is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including offline help documents, online search engines, and human demonstrations, making the agent an application "expert".
- [x] **Rich Skill Set** - UFO is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native API, and "Copilot".
- [x] **Interactive Mode** - UFO facilitates multiple sub-requests from users within the same session, enabling the seamless completion of complex tasks.
- [x] **Agent Customization** - UFO allows users to customize their own agents by providing additional information. The agent will proactively query users for details when necessary to better tailor its behavior.
- [x] **Scalable AppAgent Creation** - UFO offers extensibility, allowing users and app developers to create their own AppAgents in an easy and scalable way.


## ✨ Getting Started
Expand Down Expand Up @@ -105,7 +109,7 @@ API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.
API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
API_KEY: "sk-", # The OpenAI API key, begin with sk-
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview", # The only OpenAI model by now that accepts visual input
API_MODEL: "gpt-4-vision-preview", # The only OpenAI model
```

#### Azure OpenAI (AOAI)
Expand All @@ -115,7 +119,7 @@ API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.
API_BASE: "YOUR_ENDPOINT", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
API_KEY: "YOUR_KEY", # The aoai API key
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview", # The only OpenAI model by now that accepts visual input
API_MODEL: "gpt-4-vision-preview", # The only OpenAI model
API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
```
You can also non-visial model (e.g., GPT-4) for each agent, by setting `VISUAL_MODE: False` and proper `API_MODEL` (openai) and `API_DEPLOYMENT_ID` (aoai). You can also optionally set an backup LLM engine in the field of `BACKUP_AGENT` if the above engines failed during the inference.
Expand All @@ -129,8 +133,8 @@ You can utilize non-visual models (e.g., GPT-4) for each agent by configuring th

Optionally, you can set a backup language model (LLM) engine in the `BACKUP_AGENT` field to handle cases where the primary engines fail during inference. Ensure you configure these settings accurately to leverage non-visual models effectively.

#### NOTE
💡 UFO also supports other LLMs and advanced configurations, such as customize your own model, please check the [documents](./model_worker/README.md) for more details. Because of the limitations of model input, a lite version of the prompt is provided to allow users to experience it, which is configured in `config_dev`.yaml.
#### NOTE 💡
UFO also supports other LLMs and advanced configurations, such as customize your own model, please check the [documents](https://microsoft.github.io/UFO/supported_models/overview/) for more details. Because of the limitations of model input, a lite version of the prompt is provided to allow users to experience it, which is configured in `config_dev.yaml`.

### 📔 Step 3: Additional Setting for RAG (optional).
If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the `ufo/config/config.yaml` file.
Expand Down Expand Up @@ -219,8 +223,9 @@ You may use them to debug, replay, or analyze the agent output.


## ❓Get help
* Please first check our our documentation [here](https://microsoft.github.io/UFO/).
* ❔GitHub Issues (prefered)
* For other communications, please contact [email protected]
* For other communications, please contact [[email protected]](mailto:[email protected]).
---

## 🎬 Demo Examples
Expand Down Expand Up @@ -249,8 +254,6 @@ https://github.com/microsoft/UFO/assets/11352048/aa41ad47-fae7-4334-8e0b-ba71c4f
Please consult the [WindowsBench](https://arxiv.org/pdf/2402.07939.pdf) provided in Section A of the Appendix within our technical report. Here are some tips (and requirements) to aid in completing your request:

- Prior to UFO execution of your request, ensure that the targeted application is active (though it may be minimized).
- Occasionally, requests to GPT-V may trigger content safety measures. UFO will attempt to retry regardless, but adjusting the size or scale of the application window may prove helpful. We are actively solving this issue.
- Currently, UFO supports a limited set of applications and UI controls that are compatible with the Windows **UI Automation** API. Our future plans include extending support to the Win32 API to enhance its capabilities.
- Please note that the output of GPT-V may not consistently align with the same request. If unsuccessful with your initial attempt, consider trying again.


Expand All @@ -261,7 +264,7 @@ If you use UFO in your research, please cite our paper:
```
@article{ufo,
title={{UFO: A UI-Focused Agent for Windows OS Interaction}},
author={Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi},
author={Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi},
journal={arXiv preprint arXiv:2402.07939},
year={2024}
}
Expand Down
17 changes: 4 additions & 13 deletions SUPPORT.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,3 @@
# TODO: The maintainer of this repo has not yet edited this file

**REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project?

- **No CSS support:** Fill out this template with information about how to file issues and get help.
- **Yes CSS support:** Fill out an intake form at [aka.ms/onboardsupport](https://aka.ms/onboardsupport). CSS will work with/help you to determine next steps.
- **Not sure?** Fill out an intake as though the answer were "Yes". CSS will help you decide.

*Then remove this first heading from this SUPPORT.MD file before publishing your repo.*

# Support

## How to file issues and get help
Expand All @@ -16,9 +6,10 @@ This project uses GitHub Issues to track bugs and feature requests. Please searc
issues before filing new issues to avoid duplicates. For new issues, file your bug or
feature request as a new Issue.

For help and questions about using this project, please **REPO MAINTAINER: INSERT INSTRUCTIONS HERE
FOR HOW TO ENGAGE REPO OWNERS OR COMMUNITY FOR HELP. COULD BE A STACK OVERFLOW TAG OR OTHER
CHANNEL. WHERE WILL YOU HELP PEOPLE?**.
You may use [GitHub Issues](https://github.com/microsoft/UFO/issues) to raise questions, bug reports, and feature requests.

For help and questions about using this project, please please contact [[email protected]](mailto:[email protected]).


## Microsoft Support Policy

Expand Down
9 changes: 9 additions & 0 deletions documents/docs/about/CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Microsoft Open Source Code of Conduct

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).

Resources:

- [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
- [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
- Contact [[email protected]](mailto:[email protected]) with questions or concerns
14 changes: 14 additions & 0 deletions documents/docs/about/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Contributing

This project welcomes contributions and suggestions. Most contributions require you to
agree to a Contributor License Agreement (CLA) declaring that you have the right to,
and actually do, grant us the rights to use your contribution. For details, visit
https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need
to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the
instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
or contact [[email protected]](mailto:[email protected]) with any additional questions or comments.
33 changes: 33 additions & 0 deletions documents/docs/about/DISCLAIMER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Disclaimer: Code Execution and Data Handling Notice

By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices:

## 1. Code Functionality:
The code you are about to execute has the capability to capture screenshots of your working desktop environment and active applications. These screenshots will be processed and sent to the GPT model for inference.


## 2. Data Privacy and Storage:
It is crucial to note that Microsoft, the provider of this code, explicitly states that it does not collect or save any of the transmitted data. The captured screenshots are processed in real-time for the purpose of inference, and no permanent storage or record of this data is retained by Microsoft.

## 3. User Responsibility:
By running the code, you understand and accept the responsibility for the content and nature of the data present on your desktop during the execution period. It is your responsibility to ensure that no sensitive or confidential information is visible or captured during this process.

## 4. Security Measures:
Microsoft has implemented security measures to safeguard the action execution. However, it is recommended that you run the code in a secure and controlled environment to minimize potential risks. Ensure that you are running the latest security updates on your system.

## 5. Consent for Inference:
You explicitly provide consent for the GPT model to analyze the captured screenshots for the purpose of generating relevant outputs. This consent is inherent in the act of executing the code.

## 6. No Guarantee of Accuracy:
The outputs generated by the GPT model are based on patterns learned during training and may not always be accurate or contextually relevant. Microsoft does not guarantee the accuracy or suitability of the inferences made by the model.

## 7. Indemnification:
Users agree to defend, indemnify, and hold Microsoft harmless from and against all damages, costs, and attorneys' fees in connection with any claims arising from the use of this Repo.

## 8. Reporting Infringements:
If anyone believes that this Repo infringes on their rights, please notify the project owner via the provided project owner email. Microsoft will investigate and take appropriate actions as necessary.

## 9. Modifications to the Disclaimer:
Microsoft reserves the right to update or modify this disclaimer at any time without prior notice. It is your responsibility to review the disclaimer periodically for any changes.

By proceeding to execute the code, you acknowledge that you have read, understood, and agreed to the terms outlined in this disclaimer. If you do not agree with these terms, refrain from running the provided code.
21 changes: 21 additions & 0 deletions documents/docs/about/LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Copyright (c) Microsoft Corporation.

## MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED **AS IS**, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
16 changes: 16 additions & 0 deletions documents/docs/about/SUPPORT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Support

## How to file issues and get help

This project uses GitHub Issues to track bugs and feature requests. Please search the existing
issues before filing new issues to avoid duplicates. For new issues, file your bug or
feature request as a new Issue.

You may use [GitHub Issues](https://github.com/microsoft/UFO/issues) to raise questions, bug reports, and feature requests.

For help and questions about using this project, please please contact [[email protected]](mailto:[email protected]).


## Microsoft Support Policy

Support for this **PROJECT or PRODUCT** is limited to the resources listed above.
Loading