Add an agent that targets Omen markets that are still open but have a known outcome #33

evangriffiths · 2024-03-20T12:44:00Z

This agent is designed to target markets that are still open, but the outcome is easily knowable with a high degree of certainty.

e.g. https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c

Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by 22 March 2024?

At the time of writing (20/03/2024), this market was open, but the answer was already known to be 'yes'.

The agent works as follows:

In a loop, perform web search and scrape to find if the answer to the question is known. Break if the answer is found, or after a certain number of tries, and no definite answer is found, return an 'unknown' answer.

Also added scripts for benchmarking (see results in comments) and deploying the agent.
Also did some re-org of the agent tools. I've put tools with the same functionality into sub-folders (web search, web scraping).

Summary by CodeRabbit

New Features
- Introduced a known outcome prediction agent that determines outcomes for questions, places bets, and deploys on Google Cloud Platform.
- Added functionality for web scraping HTML content, cleaning it, and converting it to Markdown format.

coderabbitai · 2024-03-20T12:44:10Z

Walkthrough

The recent update focuses on enhancing the prediction market agent's capabilities and organizational structure. It reorganizes import paths for web scraping and search tools across several agent files, introduces a new agent for markets with known outcomes along with deployment capabilities, and adds new tools for web scraping to Markdown and for performing web searches via the Tavily API. This update streamlines the agent's operation and expands its functionality, particularly in handling known outcome predictions and improving data extraction methods.

Changes

File(s)	Change Summary
`.../autogen_agent.py`, `.../custom_agent.py`, `.../llamaindex_agent.py`	Reorganized imports related to web scraping and search tools. Adjusted import paths for `GoogleSearchTool`, `WebScrapingTool`, `google_search`, and `web_scrape_structured` to reflect new module structures.
`.../known_outcome_agent/benchmark.py`, `.../known_outcome_agent/deploy.py`, `.../known_outcome_agent/known_outcome_agent.py`	Introduced functionality for a known outcome prediction market agent, including classes for handling questions with known outcomes, making predictions, running benchmarks, and deployment on Google Cloud Platform for xDai markets.
`.../tools/web_scrape/markdown.py`	Added functions for web scraping HTML content, cleaning HTML, extracting text, and converting it to Markdown format. Includes fetching HTML content, scraping and cleaning HTML, and converting to Markdown.
`.../tools/web_search/tavily.py`	Introduced a `web_search` function using the Tavily API, including a `WebSearchResult` model. Implements a retry mechanism with caching, takes a query and maximum results as input, interacts with the Tavily API, and returns a list of search results with URLs and queries.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

coderabbitai

Review Status

Actionable comments generated: 5

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 5c9fa13 and 6f2be42.

Files ignored due to path filters (2)

poetry.lock is excluded by: !**/*.lock
pyproject.toml is excluded by: !**/*.toml

Files selected for processing (8)

prediction_market_agent/agents/autogen_agent.py (1 hunks)
prediction_market_agent/agents/custom_agent.py (1 hunks)
prediction_market_agent/agents/known_outcome_agent/benchmark.py (1 hunks)
prediction_market_agent/agents/known_outcome_agent/deploy.py (1 hunks)
prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py (1 hunks)
prediction_market_agent/agents/llamaindex_agent.py (1 hunks)
prediction_market_agent/tools/web_scrape/markdown.py (1 hunks)
prediction_market_agent/tools/web_search/tavily.py (1 hunks)

Additional comments: 8

prediction_market_agent/tools/web_search/tavily.py (2)

13-14: The retry mechanism is correctly implemented using the tenacity library, ensuring resilience against transient failures.

16-16: Using @persistent_inmemory_cache is a good practice for reducing redundant web search calls, enhancing performance.

prediction_market_agent/agents/llamaindex_agent.py (1)

8-9: Reorganizing imports for web_scrape and google_search modules improves the clarity and maintainability of the code.

prediction_market_agent/tools/web_scrape/markdown.py (2)

9-11: The retry mechanism using tenacity is correctly applied to enhance the reliability of the fetch_html function.

12-13: Applying @persistent_inmemory_cache to fetch_html is a good practice for improving performance by avoiding redundant network calls.

prediction_market_agent/agents/known_outcome_agent/deploy.py (1)

17-18: The market_is_saturated function is a simple and effective way to filter out markets with a high probability of a known outcome.

prediction_market_agent/agents/autogen_agent.py (1)

10-11: Reorganizing imports for WebScrapingTool and GoogleSearchTool improves the clarity and maintainability of the code.

prediction_market_agent/agents/custom_agent.py (1)

20-23: Reorganizing imports for web_scrape_structured_and_summarized and google_search modules improves the clarity and maintainability of the code.

coderabbitai · 2024-03-20T12:57:35Z

prediction_market_agent/tools/web_search/tavily.py

+def web_search(query: str, max_results: int) -> list[WebSearchResult]:
+    """
+    Web search using Tavily API.
+    """
+    tavily_api_key = secret_str_from_env("TAVILY_API_KEY")
+    tavily = TavilyClient(api_key=tavily_api_key.get_secret_value())
+    response = tavily.search(
+        query=query,
+        search_depth="advanced",
+        max_results=max_results,
+        include_raw_content=True,
+    )
+
+    results = [
+        WebSearchResult(
+            url=result["url"],
+            query=query,
+        )
+        for result in response["results"]
+    ]
+
+    return results


The web_search function correctly utilizes the Tavily API for performing web searches. However, consider handling potential exceptions from the TavilyClient's search method to ensure robust error handling.

try: response = tavily.search( query=query, search_depth="advanced", max_results=max_results, include_raw_content=True, ) except Exception as e: # Log the error and handle it appropriately print(f"Error during Tavily search: {e}") return []

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def web_search(query: str, max_results: int) -> list[WebSearchResult]:

"""

Web search using Tavily API.

"""

tavily_api_key = secret_str_from_env("TAVILY_API_KEY")

tavily = TavilyClient(api_key=tavily_api_key.get_secret_value())

response = tavily.search(

query=query,

search_depth="advanced",

max_results=max_results,

include_raw_content=True,

)

results = [

WebSearchResult(

url=result["url"],

query=query,

)

for result in response["results"]

]

return results

def web_search(query: str, max_results: int) -> list[WebSearchResult]:

"""

Web search using Tavily API.

"""

tavily_api_key = secret_str_from_env("TAVILY_API_KEY")

tavily = TavilyClient(api_key=tavily_api_key.get_secret_value())

try:

response = tavily.search(

query=query,

search_depth="advanced",

max_results=max_results,

include_raw_content=True,

)

except Exception as e:

# Log the error and handle it appropriately

print(f"Error during Tavily search: {e}")

return []

results = [

WebSearchResult(

url=result["url"],

query=query,

)

for result in response["results"]

]

return results

coderabbitai · 2024-03-20T12:57:35Z

prediction_market_agent/tools/web_scrape/markdown.py

+def web_scrape(url: str, timeout: int = 10) -> str:
+    """
+    Taken from polywrap/predictionprophet
+
+    https://github.com/polywrap/predictionprophet/blob/97aeea8f87e9b42da242d00d93ed5754bd64f21e/prediction_prophet/functions/web_scrape.py
+    """
+    try:
+        response = fetch_html(url=url, timeout=timeout)
+
+        if "text/html" in response.headers.get("Content-Type", ""):
+            soup = BeautifulSoup(response.content, "html.parser")
+
+            [x.extract() for x in soup.findAll("script")]
+            [x.extract() for x in soup.findAll("style")]
+            [x.extract() for x in soup.findAll("noscript")]
+            [x.extract() for x in soup.findAll("link")]
+            [x.extract() for x in soup.findAll("head")]
+            [x.extract() for x in soup.findAll("image")]
+            [x.extract() for x in soup.findAll("img")]
+
+            text: str = soup.get_text()
+            text = markdownify(text)
+            text = "  ".join([x.strip() for x in text.split("\n")])
+            text = " ".join([x.strip() for x in text.split("  ")])
+
+            return text
+        else:
+            print("Non-HTML content received")
+            return ""
+
+    except requests.RequestException as e:
+        print(f"HTTP request failed: {e}")
+        return ""


The web_scrape function correctly fetches and processes HTML content. However, consider adding error handling for the requests.get call and validating the response status code to ensure robustness.

try: response = requests.get(url, headers=headers, timeout=timeout) + response.raise_for_status() # Ensure the request was successful except requests.RequestException as e: print(f"HTTP request failed: {e}") return ""

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def web_scrape(url: str, timeout: int = 10) -> str:

"""

Taken from polywrap/predictionprophet

https://github.com/polywrap/predictionprophet/blob/97aeea8f87e9b42da242d00d93ed5754bd64f21e/prediction_prophet/functions/web_scrape.py

"""

try:

response = fetch_html(url=url, timeout=timeout)

if "text/html" in response.headers.get("Content-Type", ""):

soup = BeautifulSoup(response.content, "html.parser")

[x.extract() for x in soup.findAll("script")]

[x.extract() for x in soup.findAll("style")]

[x.extract() for x in soup.findAll("noscript")]

[x.extract() for x in soup.findAll("link")]

[x.extract() for x in soup.findAll("head")]

[x.extract() for x in soup.findAll("image")]

[x.extract() for x in soup.findAll("img")]

text: str = soup.get_text()

text = markdownify(text)

text = " ".join([x.strip() for x in text.split("\n")])

text = " ".join([x.strip() for x in text.split(" ")])

return text

else:

print("Non-HTML content received")

return ""

except requests.RequestException as e:

print(f"HTTP request failed: {e}")

return ""

def web_scrape(url: str, timeout: int = 10) -> str:

"""

Taken from polywrap/predictionprophet

https://github.com/polywrap/predictionprophet/blob/97aeea8f87e9b42da242d00d93ed5754bd64f21e/prediction_prophet/functions/web_scrape.py

"""

try:

response = fetch_html(url=url, timeout=timeout)

response.raise_for_status() # Ensure the request was successful

if "text/html" in response.headers.get("Content-Type", ""):

soup = BeautifulSoup(response.content, "html.parser")

[x.extract() for x in soup.findAll("script")]

[x.extract() for x in soup.findAll("style")]

[x.extract() for x in soup.findAll("noscript")]

[x.extract() for x in soup.findAll("link")]

[x.extract() for x in soup.findAll("head")]

[x.extract() for x in soup.findAll("image")]

[x.extract() for x in soup.findAll("img")]

text: str = soup.get_text()

text = markdownify(text)

text = " ".join([x.strip() for x in text.split("\n")])

text = " ".join([x.strip() for x in text.split(" ")])

return text

else:

print("Non-HTML content received")

return ""

except requests.RequestException as e:

print(f"HTTP request failed: {e}")

return ""

coderabbitai · 2024-03-20T12:57:36Z

prediction_market_agent/agents/known_outcome_agent/deploy.py

+class DeployableKnownOutcomeAgent(DeployableAgent):
+    model = "gpt-3.5-turbo-0125"
+
+    def load(self) -> None:
+        self.markets_with_known_outcomes: dict[str, Result] = {}
+
+    def pick_markets(self, markets: list[AgentMarket]) -> list[AgentMarket]:
+        picked_markets: list[AgentMarket] = []
+        for market in markets:
+            # Assume very high probability markets are already known, and have
+            # been correctly bet on, and therefore the value of betting on them
+            # is low.
+            if not market_is_saturated(market=markets[0]):
+                answer = get_known_outcome(
+                    model=self.model,
+                    question=market.question,
+                    max_tries=3,
+                )
+                if answer.has_known_outcome():
+                    picked_markets.append(market)
+                    self.markets_with_known_outcomes[market.id] = answer.result
+
+        return picked_markets
+
+    def answer_binary_market(self, market: AgentMarket) -> bool:
+        # The answer has already been determined in `pick_markets` so we just
+        # return it here.
+        return self.markets_with_known_outcomes[market.id].to_boolean()
+
+    def calculate_bet_amount(self, answer: bool, market: AgentMarket) -> BetAmount:
+        if market.currency == Currency.xDai:
+            return BetAmount(amount=0.1, currency=Currency.xDai)
+        else:
+            raise NotImplementedError("This agent only supports xDai markets")
+
+
+if __name__ == "__main__":
+    agent = DeployableKnownOutcomeAgent()
+    github_repo_url = "https://github.com/gnosis/prediction-market-agent"
+    agent.deploy_gcp(
+        repository=f"git+{github_repo_url}.git@{get_current_git_commit_sha()}",
+        market_type=MarketType.OMEN,
+        labels={OWNER_KEY: getpass.getuser()},
+        secrets={
+            "OPENAI_API_KEY": "EVAN_OPENAI_API_KEY:latest",
+            "TAVILY_API_KEY": "GNOSIS_AI_TAVILY_API_KEY:latest",
+            "BET_FROM_PRIVATE_KEY": "0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307:latest",
+        },
+        memory=1024,
+        api_keys=APIKeys(
+            BET_FROM_ADDRESS="0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307",
+            BET_FROM_PRIVATE_KEY=None,
+            OPENAI_API_KEY=None,
+            MANIFOLD_API_KEY=None,
+        ),
+        cron_schedule="0 */4 * * *",
+        timeout=540,
+    )


The DeployableKnownOutcomeAgent class is well-structured and includes essential methods for deploying the agent. Ensure that the deployment script correctly handles any potential exceptions or errors during deployment to avoid partial or failed deployments.

try: agent.deploy_gcp( ... ) except Exception as e: print(f"Deployment failed: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

class DeployableKnownOutcomeAgent(DeployableAgent):

model = "gpt-3.5-turbo-0125"

def load(self) -> None:

self.markets_with_known_outcomes: dict[str, Result] = {}

def pick_markets(self, markets: list[AgentMarket]) -> list[AgentMarket]:

picked_markets: list[AgentMarket] = []

for market in markets:

# Assume very high probability markets are already known, and have

# been correctly bet on, and therefore the value of betting on them

# is low.

if not market_is_saturated(market=markets[0]):

answer = get_known_outcome(

model=self.model,

question=market.question,

max_tries=3,

)

if answer.has_known_outcome():

picked_markets.append(market)

self.markets_with_known_outcomes[market.id] = answer.result

return picked_markets

def answer_binary_market(self, market: AgentMarket) -> bool:

# The answer has already been determined in `pick_markets` so we just

# return it here.

return self.markets_with_known_outcomes[market.id].to_boolean()

def calculate_bet_amount(self, answer: bool, market: AgentMarket) -> BetAmount:

if market.currency == Currency.xDai:

return BetAmount(amount=0.1, currency=Currency.xDai)

else:

raise NotImplementedError("This agent only supports xDai markets")

if __name__ == "__main__":

agent = DeployableKnownOutcomeAgent()

github_repo_url = "https://github.com/gnosis/prediction-market-agent"

agent.deploy_gcp(

repository=f"git+{github_repo_url}.git@{get_current_git_commit_sha()}",

market_type=MarketType.OMEN,

labels={OWNER_KEY: getpass.getuser()},

secrets={

"OPENAI_API_KEY": "EVAN_OPENAI_API_KEY:latest",

"TAVILY_API_KEY": "GNOSIS_AI_TAVILY_API_KEY:latest",

"BET_FROM_PRIVATE_KEY": "0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307:latest",

},

memory=1024,

api_keys=APIKeys(

BET_FROM_ADDRESS="0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307",

BET_FROM_PRIVATE_KEY=None,

OPENAI_API_KEY=None,

MANIFOLD_API_KEY=None,

),

cron_schedule="0 */4 * * *",

timeout=540,

)

if __name__ == "__main__":

agent = DeployableKnownOutcomeAgent()

github_repo_url = "https://github.com/gnosis/prediction-market-agent"

try:

agent.deploy_gcp(

repository=f"git+{github_repo_url}.git@{get_current_git_commit_sha()}",

market_type=MarketType.OMEN,

labels={OWNER_KEY: getpass.getuser()},

secrets={

"OPENAI_API_KEY": "EVAN_OPENAI_API_KEY:latest",

"TAVILY_API_KEY": "GNOSIS_AI_TAVILY_API_KEY:latest",

"BET_FROM_PRIVATE_KEY": "0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307:latest",

},

memory=1024,

api_keys=APIKeys(

BET_FROM_ADDRESS="0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307",

BET_FROM_PRIVATE_KEY=None,

OPENAI_API_KEY=None,

MANIFOLD_API_KEY=None,

),

cron_schedule="0 */4 * * *",

timeout=540,

)

except Exception as e:

print(f"Deployment failed: {e}")

coderabbitai · 2024-03-20T12:57:36Z

prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py

+def get_known_outcome(model: str, question: str, max_tries: int) -> Answer:
+    """
+    In a loop, perform web search and scrape to find if the answer to the
+    question is known. Break if the answer is found, or after a certain number
+    of tries, and no definite answer is found, return an 'unknown' answer.
+    """
+    tries = 0
+    date_str = datetime.now().strftime("%d %B %Y")
+    previous_urls = []
+    llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0.4)
+    while tries < max_tries:
+        search_prompt = ChatPromptTemplate.from_template(
+            template=GENERATE_SEARCH_QUERY_PROMPT
+        ).format_messages(date_str=date_str, question=question)
+        search_query = llm.invoke(search_prompt).content.strip('"')
+        search_results = web_search(query=search_query, max_results=5)
+        if not search_results:
+            raise ValueError("No search results found.")
+
+        for result in search_results:
+            if result.url in previous_urls:
+                continue
+            previous_urls.append(result.url)
+
+            scraped_content = web_scrape(url=result.url)
+
+            prompt = ChatPromptTemplate.from_template(
+                template=ANSWER_FROM_WEBSCRAPE_PROMPT
+            ).format_messages(
+                date_str=date_str,
+                question=question,
+                scraped_content=scraped_content,
+            )
+            answer = llm.invoke(prompt).content
+            parsed_answer = Answer.model_validate(completion_str_to_json(answer))
+
+            if parsed_answer.result is not Result.UNKNOWN:
+                return parsed_answer
+
+        tries += 1
+
+    return Answer(result=Result.UNKNOWN, reasoning="Max tries exceeded.")


The get_known_outcome function correctly implements the logic for determining the known outcome of a market. Consider adding more detailed logging or error handling for better observability and robustness, especially in the web scraping and search parts.

+ import logging ... try: ... except Exception as e: + logging.error(f"Error while determining known outcome: {e}") return Answer(result=Result.UNKNOWN, reasoning="Error occurred.")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def get_known_outcome(model: str, question: str, max_tries: int) -> Answer:

"""

In a loop, perform web search and scrape to find if the answer to the

question is known. Break if the answer is found, or after a certain number

of tries, and no definite answer is found, return an 'unknown' answer.

"""

tries = 0

date_str = datetime.now().strftime("%d %B %Y")

previous_urls = []

llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0.4)

while tries < max_tries:

search_prompt = ChatPromptTemplate.from_template(

template=GENERATE_SEARCH_QUERY_PROMPT

).format_messages(date_str=date_str, question=question)

search_query = llm.invoke(search_prompt).content.strip('"')

search_results = web_search(query=search_query, max_results=5)

if not search_results:

raise ValueError("No search results found.")

for result in search_results:

if result.url in previous_urls:

continue

previous_urls.append(result.url)

scraped_content = web_scrape(url=result.url)

prompt = ChatPromptTemplate.from_template(

template=ANSWER_FROM_WEBSCRAPE_PROMPT

).format_messages(

date_str=date_str,

question=question,

scraped_content=scraped_content,

)

answer = llm.invoke(prompt).content

parsed_answer = Answer.model_validate(completion_str_to_json(answer))

if parsed_answer.result is not Result.UNKNOWN:

return parsed_answer

tries += 1

return Answer(result=Result.UNKNOWN, reasoning="Max tries exceeded.")

import logging

def get_known_outcome(model: str, question: str, max_tries: int) -> Answer:

"""

In a loop, perform web search and scrape to find if the answer to the

question is known. Break if the answer is found, or after a certain number

of tries, and no definite answer is found, return an 'unknown' answer.

"""

tries = 0

date_str = datetime.now().strftime("%d %B %Y")

previous_urls = []

llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0.4)

while tries < max_tries:

search_prompt = ChatPromptTemplate.from_template(

template=GENERATE_SEARCH_QUERY_PROMPT

).format_messages(date_str=date_str, question=question)

search_query = llm.invoke(search_prompt).content.strip('"')

search_results = web_search(query=search_query, max_results=5)

if not search_results:

raise ValueError("No search results found.")

for result in search_results:

if result.url in previous_urls:

continue

previous_urls.append(result.url)

scraped_content = web_scrape(url=result.url)

prompt = ChatPromptTemplate.from_template(

template=ANSWER_FROM_WEBSCRAPE_PROMPT

).format_messages(

date_str=date_str,

question=question,

scraped_content=scraped_content,

)

answer = llm.invoke(prompt).content

parsed_answer = Answer.model_validate(completion_str_to_json(answer))

if parsed_answer.result is not Result.UNKNOWN:

return parsed_answer

tries += 1

return Answer(result=Result.UNKNOWN, reasoning="Max tries exceeded.")

coderabbitai · 2024-03-20T12:57:36Z

prediction_market_agent/agents/known_outcome_agent/benchmark.py

+if __name__ == "__main__":
+    load_dotenv()
+    tomorrow_str = (datetime.now(tz=pytz.UTC) + timedelta(days=1)).strftime("%d %B %Y")
+
+    # Fetch questions from existing markets, or make some up, where the
+    # outcome is known.
+    qs_with_known_outcome: list[QuestionWithKnownOutcome] = [
+        QuestionWithKnownOutcome(
+            question=f"Will 'Barbie' win an Academy Award for best original song by {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0xceb2a4ecc217cab440acf60737a9fcfd6d3fbf4b",
+            result=Result.YES,
+            notes="Happened on 10th March 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will the 2024 Oscars winner for Best Picture be announced by {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0xb88e4507709148e096bcdfb861b17db7b4d54e6b",
+            result=Result.YES,
+            notes="Happened on 10th March 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will Liverpool win against Atalanta in the Europa League quarter-finals by {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0x1d5a462c801360b4bebbda2b9656e52801a27a3b",
+            result=Result.NO,
+            notes="The match is scheduled for 11 April 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c",
+            result=Result.YES,
+            notes="Happened on 10th March 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will SpaceX successfully test a Starship reentry without losing contact by {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0xcc9123af8db309e0c60c63f9e2b8b82fc86f458b",
+            result=Result.NO,
+            notes="The only scheduled test flight occured, and contact was lost during the test.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will Arsenal reach the Champions League semi-finals on {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0x606efd175b245cd60282a98cef402d4f5e950f92",
+            result=Result.NO,
+            notes="They are scheduled to play the first leg of the quarter-finals on 9 April 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0xe55171beda0d60fd45092ff8bf93d5cb566a2510",
+            result=Result.NO,
+            notes="The verdict was announced on 15th March 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question="Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?",
+            result=Result.UNKNOWN,
+            notes="Outcome is uncertain.",
+        ),
+        QuestionWithKnownOutcome(
+            question="Will the cost of grain in the Spain increase by 20% by 19 July 2024?",
+            result=Result.UNKNOWN,
+            notes="Outcome is uncertain.",
+        ),
+        QuestionWithKnownOutcome(
+            question="Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?",
+            result=Result.UNKNOWN,
+            notes="Outcome is uncertain.",
+        ),
+    ]
+
+    benchmarker = Benchmarker(
+        markets=[q.to_market() for q in qs_with_known_outcome],
+        agents=[
+            KnownOutcomeAgent(
+                agent_name="known_outcome",
+                model="gpt-3.5-turbo-0125",
+                max_tries=3,
+                max_workers=1,
+            ),
+        ],
+        cache_path="./benchmark_cache.json",
+    )
+    benchmarker.run_agents()
+    md = benchmarker.generate_markdown_report()
+
+    output = f"./known_outcome_agent_benchmark_report.{int(time.time())}.md"
+    with open(output, "w") as f:
+        print(f"Writing benchmark report to: {output}")
+        f.write(md)
+
+    # Check all predictions are correct, i.e. mean-squared-error == 0
+    metrics = benchmarker.compute_metrics()
+    assert metrics["MSE for `p_yes`"][0] == 0.0


The benchmarking script is well-structured and includes a comprehensive list of test cases. Ensure that the benchmarking process includes error handling for cases where the agent might fail to predict an outcome, to provide a complete picture of the agent's performance.

try: benchmarker.run_agents() metrics = benchmarker.compute_metrics() assert metrics["MSE for `p_yes`"][0] == 0.0 except Exception as e: print(f"Benchmarking failed: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

if __name__ == "__main__":

load_dotenv()

tomorrow_str = (datetime.now(tz=pytz.UTC) + timedelta(days=1)).strftime("%d %B %Y")

# Fetch questions from existing markets, or make some up, where the

# outcome is known.

qs_with_known_outcome: list[QuestionWithKnownOutcome] = [

QuestionWithKnownOutcome(

question=f"Will 'Barbie' win an Academy Award for best original song by {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0xceb2a4ecc217cab440acf60737a9fcfd6d3fbf4b",

result=Result.YES,

notes="Happened on 10th March 2024.",

),

QuestionWithKnownOutcome(

question=f"Will the 2024 Oscars winner for Best Picture be announced by {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0xb88e4507709148e096bcdfb861b17db7b4d54e6b",

result=Result.YES,

notes="Happened on 10th March 2024.",

),

QuestionWithKnownOutcome(

question=f"Will Liverpool win against Atalanta in the Europa League quarter-finals by {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0x1d5a462c801360b4bebbda2b9656e52801a27a3b",

result=Result.NO,

notes="The match is scheduled for 11 April 2024.",

),

QuestionWithKnownOutcome(

question=f"Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c",

result=Result.YES,

notes="Happened on 10th March 2024.",

),

QuestionWithKnownOutcome(

question=f"Will SpaceX successfully test a Starship reentry without losing contact by {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0xcc9123af8db309e0c60c63f9e2b8b82fc86f458b",

result=Result.NO,

notes="The only scheduled test flight occured, and contact was lost during the test.",

),

QuestionWithKnownOutcome(

question=f"Will Arsenal reach the Champions League semi-finals on {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0x606efd175b245cd60282a98cef402d4f5e950f92",

result=Result.NO,

notes="They are scheduled to play the first leg of the quarter-finals on 9 April 2024.",

),

QuestionWithKnownOutcome(

question=f"Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0xe55171beda0d60fd45092ff8bf93d5cb566a2510",

result=Result.NO,

notes="The verdict was announced on 15th March 2024.",

),

QuestionWithKnownOutcome(

question="Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?",

result=Result.UNKNOWN,

notes="Outcome is uncertain.",

),

QuestionWithKnownOutcome(

question="Will the cost of grain in the Spain increase by 20% by 19 July 2024?",

result=Result.UNKNOWN,

notes="Outcome is uncertain.",

),

QuestionWithKnownOutcome(

question="Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?",

result=Result.UNKNOWN,

notes="Outcome is uncertain.",

),

]

benchmarker = Benchmarker(

markets=[q.to_market() for q in qs_with_known_outcome],

agents=[

KnownOutcomeAgent(

agent_name="known_outcome",

model="gpt-3.5-turbo-0125",

max_tries=3,

max_workers=1,

),

],

cache_path="./benchmark_cache.json",

)

benchmarker.run_agents()

md = benchmarker.generate_markdown_report()

output = f"./known_outcome_agent_benchmark_report.{int(time.time())}.md"

with open(output, "w") as f:

print(f"Writing benchmark report to: {output}")

f.write(md)

# Check all predictions are correct, i.e. mean-squared-error == 0

metrics = benchmarker.compute_metrics()

assert metrics["MSE for `p_yes`"][0] == 0.0

if __name__ == "__main__":

load_dotenv()

tomorrow_str = (datetime.now(tz=pytz.UTC) + timedelta(days=1)).strftime("%d %B %Y")

# Fetch questions from existing markets, or make some up, where the

# outcome is known.

qs_with_known_outcome: list[QuestionWithKnownOutcome] = [

QuestionWithKnownOutcome(

question=f"Will 'Barbie' win an Academy Award for best original song by {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0xceb2a4ecc217cab440acf60737a9fcfd6d3fbf4b",

result=Result.YES,

notes="Happened on 10th March 2024.",

),

QuestionWithKnownOutcome(

question=f"Will the 2024 Oscars winner for Best Picture be announced by {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0xb88e4507709148e096bcdfb861b17db7b4d54e6b",

result=Result.YES,

notes="Happened on 10th March 2024.",

),

QuestionWithKnownOutcome(

question=f"Will Liverpool win against Atalanta in the Europa League quarter-finals by {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0x1d5a462c801360b4bebbda2b9656e52801a27a3b",

result=Result.NO,

notes="The match is scheduled for 11 April 2024.",

),

QuestionWithKnownOutcome(

question=f"Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c",

result=Result.YES,

notes="Happened on 10th March 2024.",

),

QuestionWithKnownOutcome(

question=f"Will SpaceX successfully test a Starship reentry without losing contact by {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0xcc9123af8db309e0c60c63f9e2b8b82fc86f458b",

result=Result.NO,

notes="The only scheduled test flight occured, and contact was lost during the test.",

),

QuestionWithKnownOutcome(

question=f"Will Arsenal reach the Champions League semi-finals on {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0x606efd175b245cd60282a98cef402d4f5e950f92",

result=Result.NO,

notes="They are scheduled to play the first leg of the quarter-finals on 9 April 2024.",

),

QuestionWithKnownOutcome(

question=f"Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on {tomorrow_str}?",

url="https://aiomen.eth.limo/#/0xe55171beda0d60fd45092ff8bf93d5cb566a2510",

result=Result.NO,

notes="The verdict was announced on 15th March 2024.",

),

QuestionWithKnownOutcome(

question="Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?",

result=Result.UNKNOWN,

notes="Outcome is uncertain.",

),

QuestionWithKnownOutcome(

question="Will the cost of grain in the Spain increase by 20% by 19 July 2024?",

result=Result.UNKNOWN,

notes="Outcome is uncertain.",

),

QuestionWithKnownOutcome(

question="Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?",

result=Result.UNKNOWN,

notes="Outcome is uncertain.",

),

]

benchmarker = Benchmarker(

markets=[q.to_market() for q in qs_with_known_outcome],

agents=[

KnownOutcomeAgent(

agent_name="known_outcome",

model="gpt-3.5-turbo-0125",

max_tries=3,

max_workers=1,

),

],

cache_path="./benchmark_cache.json",

)

try:

benchmarker.run_agents()

metrics = benchmarker.compute_metrics()

assert metrics["MSE for `p_yes`"][0] == 0.0

except Exception as e:

print(f"Benchmarking failed: {e}")

md = benchmarker.generate_markdown_report()

output = f"./known_outcome_agent_benchmark_report.{int(time.time())}.md"

with open(output, "w") as f:

print(f"Writing benchmark report to: {output}")

f.write(md)

That's a bold assertion! Cool!

@kongzii, thank you for the enthusiasm! It's great to see positive feedback. If you have any more suggestions or need further assistance, feel free to ask. 😊

evangriffiths · 2024-03-20T13:08:32Z

prediction_market_agent/agents/known_outcome_agent/benchmark.py

+from prediction_market_agent_tooling.benchmark.agents import AbstractBenchmarkedAgent
+from prediction_market_agent_tooling.benchmark.benchmark import Benchmarker
+from prediction_market_agent_tooling.benchmark.utils import (
+    Market,


Will release a new version of prediction-market-agent-tooling, and bump it in this repo and update this code. Happy to do as part of this PR or as a follow-up PR.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 6f2be42 and 40a8aaf.

Files selected for processing (5)

prediction_market_agent/agents/known_outcome_agent/benchmark.py (1 hunks)
prediction_market_agent/agents/known_outcome_agent/deploy.py (1 hunks)
prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py (1 hunks)
prediction_market_agent/tools/web_scrape/structured_summary.py (1 hunks)
prediction_market_agent/tools/web_search/tavily.py (1 hunks)

Files skipped from review due to trivial changes (1)

prediction_market_agent/tools/web_scrape/structured_summary.py

Files skipped from review as they are similar to previous changes (4)

prediction_market_agent/agents/known_outcome_agent/benchmark.py
prediction_market_agent/agents/known_outcome_agent/deploy.py
prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py
prediction_market_agent/tools/web_search/tavily.py

evangriffiths · 2024-03-20T13:18:27Z

Benchmarking output:

# Comparison Report

## Market Results

|   Number of markets |   Proportion resolved |   Proportion YES |   Proportion NO |
|--------------------:|----------------------:|-----------------:|----------------:|
|                  10 |                     0 |              0.3 |             0.7 |

## Agent Results

### Summary Statistics

| Agents        |   MSE for `p_yes` |   Mean confidence |   % within +-0.05 |   % within +-0.1 |   % within +-0.2 |   % correct outcome |   % precision for `yes` |   % precision for `no` |   % recall for `yes` |   % recall for `no` |   confidence/p_yes error correlation | Mean info_utility   |   Proportion answerable |   Proportion answered |   Mean cost ($) |   Mean time (s) |
|:--------------|------------------:|------------------:|------------------:|-----------------:|-----------------:|--------------------:|------------------------:|-----------------------:|---------------------:|--------------------:|-------------------------------------:|:--------------------|------------------------:|----------------------:|----------------:|----------------:|
| known_outcome |                 0 |                 1 |               100 |              100 |              100 |                 100 |                     100 |                    100 |                  100 |                 100 |                                  nan |                     |                     0.7 |                   0.7 |        0.147053 |         23.7117 |

### Markets

| Market Question                                                                                                                                                                   | known_outcome p_yes   | reference p_yes   |
|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------|:------------------|
| [Will 'Barbie' win an Academy Award for best original song by 21 March 2024?](https://aiomen.eth.limo/#/0xceb2a4ecc217cab440acf60737a9fcfd6d3fbf4b)                               | 1.00 [yes]            | 1.00 [yes]        |
| [Will the 2024 Oscars winner for Best Picture be announced by 21 March 2024?](https://aiomen.eth.limo/#/0xb88e4507709148e096bcdfb861b17db7b4d54e6b)                               | 1.00 [yes]            | 1.00 [yes]        |
| [Will Liverpool win against Atalanta in the Europa League quarter-finals by 21 March 2024?](https://aiomen.eth.limo/#/0x1d5a462c801360b4bebbda2b9656e52801a27a3b)                 | 0.00 [no]             | 0.00 [no]         |
| [Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by 21 March 2024?](https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c) | 1.00 [yes]            | 1.00 [yes]        |
| [Will SpaceX successfully test a Starship reentry without losing contact by 21 March 2024?](https://aiomen.eth.limo/#/0xcc9123af8db309e0c60c63f9e2b8b82fc86f458b)                 | 0.00 [no]             | 0.00 [no]         |
| [Will Arsenal reach the Champions League semi-finals on 21 March 2024?](https://aiomen.eth.limo/#/0x606efd175b245cd60282a98cef402d4f5e950f92)                                     | 0.00 [no]             | 0.00 [no]         |
| [Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on 21 March 2024?](https://aiomen.eth.limo/#/0xe55171beda0d60fd45092ff8bf93d5cb566a2510)                | 0.00 [no]             | 0.00 [no]         |
| [Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?]()                                                                                                               | S                     | 0.50 [no]         |
| [Will the cost of grain in the Spain increase by 20% by 19 July 2024?]()                                                                                                          | S                     | 0.50 [no]         |
| [Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?]()                                                                                                   | S                     | 0.50 [no]         |

### Expected value

| Agent         |   Mean expected returns |   Median expected returns |   Total expected returns |
|:--------------|------------------------:|--------------------------:|-------------------------:|
| known_outcome |                     100 |                       100 |                      700 |

| Market Question                                                                                           |   known_outcome |
|:----------------------------------------------------------------------------------------------------------|----------------:|
| Will 'Barbie' win an Academy Award for best original song by 21 March 2024?                               |             100 |
| Will the 2024 Oscars winner for Best Picture be announced by 21 March 2024?                               |             100 |
| Will Liverpool win against Atalanta in the Europa League quarter-finals by 21 March 2024?                 |             100 |
| Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by 21 March 2024? |             100 |
| Will SpaceX successfully test a Starship reentry without losing contact by 21 March 2024?                 |             100 |
| Will Arsenal reach the Champions League semi-finals on 21 March 2024?                                     |             100 |
| Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on 21 March 2024?                |             100 |
| Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?                                           |             nan |
| Will the cost of grain in the Spain increase by 20% by 19 July 2024?                                      |             nan |
| Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?                               |             nan |

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 40a8aaf and 9cf4684.

Files selected for processing (1)

prediction_market_agent/agents/known_outcome_agent/deploy.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

prediction_market_agent/agents/known_outcome_agent/deploy.py

gabrielfior · 2024-03-20T13:52:39Z

prediction_market_agent/agents/known_outcome_agent/deploy.py

+
+    def calculate_bet_amount(self, answer: bool, market: AgentMarket) -> BetAmount:
+        if market.currency == Currency.xDai:
+            return BetAmount(amount=Decimal(0.1), currency=Currency.xDai)


Should we make this configurable (e.g. ENV from Google Cloud) and thus editable after deployment?

We currently have a way of changing agent source code after deployment via gcp, so I don't think that is needed. ATM we can update the commit hash that the agent's container installs (see pic), and click 'save and redeploy'

gabrielfior · 2024-03-20T13:54:47Z

prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py

+                scraped_content=scraped_content,
+            )
+            answer = str(llm.invoke(prompt).content)
+            parsed_answer = Answer.model_validate(completion_str_to_json(answer))


Isn't it a bit risky to assume the LLM will answer using the correct format?
I would expect some try-catch block because of this, but I might be wrong.

I'd say it depends on what comes after this.

Is this needed (if so let's error out and do it more robust in another PR) or can we skip it and continue ?

gpt4 is very reliable at producing json output!

prediction_market_agent/agents/known_outcome_agent/benchmark.py

kongzii · 2024-03-20T13:44:17Z

prediction_market_agent/agents/known_outcome_agent/benchmark.py

+if __name__ == "__main__":
+    load_dotenv()
+    tomorrow_str = (datetime.now(tz=pytz.UTC) + timedelta(days=1)).strftime("%d %B %Y")
+
+    # Fetch questions from existing markets, or make some up, where the
+    # outcome is known.
+    qs_with_known_outcome: list[QuestionWithKnownOutcome] = [
+        QuestionWithKnownOutcome(
+            question=f"Will 'Barbie' win an Academy Award for best original song by {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0xceb2a4ecc217cab440acf60737a9fcfd6d3fbf4b",
+            result=Result.YES,
+            notes="Happened on 10th March 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will the 2024 Oscars winner for Best Picture be announced by {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0xb88e4507709148e096bcdfb861b17db7b4d54e6b",
+            result=Result.YES,
+            notes="Happened on 10th March 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will Liverpool win against Atalanta in the Europa League quarter-finals by {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0x1d5a462c801360b4bebbda2b9656e52801a27a3b",
+            result=Result.NO,
+            notes="The match is scheduled for 11 April 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c",
+            result=Result.YES,
+            notes="Happened on 10th March 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will SpaceX successfully test a Starship reentry without losing contact by {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0xcc9123af8db309e0c60c63f9e2b8b82fc86f458b",
+            result=Result.NO,
+            notes="The only scheduled test flight occured, and contact was lost during the test.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will Arsenal reach the Champions League semi-finals on {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0x606efd175b245cd60282a98cef402d4f5e950f92",
+            result=Result.NO,
+            notes="They are scheduled to play the first leg of the quarter-finals on 9 April 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question=f"Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on {tomorrow_str}?",
+            url="https://aiomen.eth.limo/#/0xe55171beda0d60fd45092ff8bf93d5cb566a2510",
+            result=Result.NO,
+            notes="The verdict was announced on 15th March 2024.",
+        ),
+        QuestionWithKnownOutcome(
+            question="Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?",
+            result=Result.UNKNOWN,
+            notes="Outcome is uncertain.",
+        ),
+        QuestionWithKnownOutcome(
+            question="Will the cost of grain in the Spain increase by 20% by 19 July 2024?",
+            result=Result.UNKNOWN,
+            notes="Outcome is uncertain.",
+        ),
+        QuestionWithKnownOutcome(
+            question="Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?",
+            result=Result.UNKNOWN,
+            notes="Outcome is uncertain.",
+        ),
+    ]
+
+    benchmarker = Benchmarker(
+        markets=[q.to_market() for q in qs_with_known_outcome],
+        agents=[
+            KnownOutcomeAgent(
+                agent_name="known_outcome",
+                model="gpt-3.5-turbo-0125",
+                max_tries=3,
+                max_workers=1,
+            ),
+        ],
+        cache_path="./benchmark_cache.json",
+    )
+    benchmarker.run_agents()
+    md = benchmarker.generate_markdown_report()
+
+    output = f"./known_outcome_agent_benchmark_report.{int(time.time())}.md"
+    with open(output, "w") as f:
+        print(f"Writing benchmark report to: {output}")
+        f.write(md)
+
+    # Check all predictions are correct, i.e. mean-squared-error == 0
+    metrics = benchmarker.compute_metrics()
+    assert metrics["MSE for `p_yes`"][0] == 0.0


That's a bold assertion! Cool!

prediction_market_agent/agents/known_outcome_agent/deploy.py

kongzii · 2024-03-20T13:50:32Z

prediction_market_agent/agents/known_outcome_agent/deploy.py

+            BET_FROM_ADDRESS=verify_address(
+                "0xb611A9f02B318339049264c7a66ac3401281cc3c"
+            ),
+            BET_FROM_PRIVATE_KEY=None,


BET_FROM_PRIVATE_KEY should (can) be defined here instead of manually in secrets. Deployment will automatically treat it like a secret thanks to:

Ah nice. It feels a bit counter intuitive that we have 2 places where private keys can be passed in. If we wanted to make it so all keys are specified in one place (assuming APIKeys can never contain all key-types an agent might want), then the way to do this would be to remove APIKeys altogether. Got any opinions on that?

then the way to do this would be to remove APIKeys altogether. Got any opinions on that?

Yep, I believe we have GitHub issue for that 😄 Looking forward to someone picking it up.

My current reasoning for using APIKeys here is that we use them heavily across the tooling repo, so APIKeys needs to be enforced here to force the right variable names. And then add to secrets anything agent-specific (tavily for example)

prediction_market_agent/agents/known_outcome_agent/deploy.py

prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 9cf4684 and c7a93d0.

Files selected for processing (4)

prediction_market_agent/agents/known_outcome_agent/benchmark.py (1 hunks)
prediction_market_agent/agents/known_outcome_agent/deploy.py (1 hunks)
prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py (1 hunks)
prediction_market_agent/tools/web_scrape/basic_summary.py (1 hunks)

Files skipped from review as they are similar to previous changes (3)

prediction_market_agent/agents/known_outcome_agent/benchmark.py
prediction_market_agent/agents/known_outcome_agent/deploy.py
prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py

Additional comments: 2

prediction_market_agent/tools/web_scrape/basic_summary.py (2)

9-14: The addition of the separators parameter with a default value enhances the flexibility of the _summary function, allowing for customization of text splitting based on the content structure. This is a positive change for handling different types of content more effectively.

12-12: Updating the ChatOpenAI model parameter to "gpt-3.5-turbo-0125" from a previous version indicates an upgrade to a potentially more efficient or accurate model. It's important to verify that this new model version is compatible with the existing codebase and that it meets the performance and accuracy requirements for the task at hand.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between c7a93d0 and e1e232e.

Files selected for processing (2)

prediction_market_agent/agents/known_outcome_agent/deploy.py (1 hunks)
prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py (1 hunks)

Files skipped from review as they are similar to previous changes (2)

prediction_market_agent/agents/known_outcome_agent/deploy.py
prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py

evangriffiths added 4 commits March 19, 2024 19:03

Agent implementation and benchmarking

a600782

Re-ogranise tools, and fix known_outcome_agent benchmarking

871297d

Add deployment script for KnownOutcomeAgent

569db16

Fix timeout

6f2be42