Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an agent that targets Omen markets that are still open but have a known outcome #33

Merged
merged 8 commits into from
Mar 20, 2024

Conversation

evangriffiths
Copy link
Contributor

@evangriffiths evangriffiths commented Mar 20, 2024

This agent is designed to target markets that are still open, but the outcome is easily knowable with a high degree of certainty.

e.g. https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c

Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by 22 March 2024?

At the time of writing (20/03/2024), this market was open, but the answer was already known to be 'yes'.

The agent works as follows:

In a loop, perform web search and scrape to find if the answer to the question is known. Break if the answer is found, or after a certain number of tries, and no definite answer is found, return an 'unknown' answer.

Also added scripts for benchmarking (see results in comments) and deploying the agent.
Also did some re-org of the agent tools. I've put tools with the same functionality into sub-folders (web search, web scraping).

Summary by CodeRabbit

  • New Features
    • Introduced a known outcome prediction agent that determines outcomes for questions, places bets, and deploys on Google Cloud Platform.
    • Added functionality for web scraping HTML content, cleaning it, and converting it to Markdown format.

Copy link
Contributor

coderabbitai bot commented Mar 20, 2024

Walkthrough

The recent update focuses on enhancing the prediction market agent's capabilities and organizational structure. It reorganizes import paths for web scraping and search tools across several agent files, introduces a new agent for markets with known outcomes along with deployment capabilities, and adds new tools for web scraping to Markdown and for performing web searches via the Tavily API. This update streamlines the agent's operation and expands its functionality, particularly in handling known outcome predictions and improving data extraction methods.

Changes

File(s) Change Summary
.../autogen_agent.py, .../custom_agent.py, .../llamaindex_agent.py Reorganized imports related to web scraping and search tools. Adjusted import paths for GoogleSearchTool, WebScrapingTool, google_search, and web_scrape_structured to reflect new module structures.
.../known_outcome_agent/benchmark.py, .../known_outcome_agent/deploy.py, .../known_outcome_agent/known_outcome_agent.py Introduced functionality for a known outcome prediction market agent, including classes for handling questions with known outcomes, making predictions, running benchmarks, and deployment on Google Cloud Platform for xDai markets.
.../tools/web_scrape/markdown.py Added functions for web scraping HTML content, cleaning HTML, extracting text, and converting it to Markdown format. Includes fetching HTML content, scraping and cleaning HTML, and converting to Markdown.
.../tools/web_search/tavily.py Introduced a web_search function using the Tavily API, including a WebSearchResult model. Implements a retry mechanism with caching, takes a query and maximum results as input, interacts with the Tavily API, and returns a list of search results with URLs and queries.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 5

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 5c9fa13 and 6f2be42.
Files ignored due to path filters (2)
  • poetry.lock is excluded by: !**/*.lock
  • pyproject.toml is excluded by: !**/*.toml
Files selected for processing (8)
  • prediction_market_agent/agents/autogen_agent.py (1 hunks)
  • prediction_market_agent/agents/custom_agent.py (1 hunks)
  • prediction_market_agent/agents/known_outcome_agent/benchmark.py (1 hunks)
  • prediction_market_agent/agents/known_outcome_agent/deploy.py (1 hunks)
  • prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py (1 hunks)
  • prediction_market_agent/agents/llamaindex_agent.py (1 hunks)
  • prediction_market_agent/tools/web_scrape/markdown.py (1 hunks)
  • prediction_market_agent/tools/web_search/tavily.py (1 hunks)
Additional comments: 8
prediction_market_agent/tools/web_search/tavily.py (2)
  • 13-14: The retry mechanism is correctly implemented using the tenacity library, ensuring resilience against transient failures.
  • 16-16: Using @persistent_inmemory_cache is a good practice for reducing redundant web search calls, enhancing performance.
prediction_market_agent/agents/llamaindex_agent.py (1)
  • 8-9: Reorganizing imports for web_scrape and google_search modules improves the clarity and maintainability of the code.
prediction_market_agent/tools/web_scrape/markdown.py (2)
  • 9-11: The retry mechanism using tenacity is correctly applied to enhance the reliability of the fetch_html function.
  • 12-13: Applying @persistent_inmemory_cache to fetch_html is a good practice for improving performance by avoiding redundant network calls.
prediction_market_agent/agents/known_outcome_agent/deploy.py (1)
  • 17-18: The market_is_saturated function is a simple and effective way to filter out markets with a high probability of a known outcome.
prediction_market_agent/agents/autogen_agent.py (1)
  • 10-11: Reorganizing imports for WebScrapingTool and GoogleSearchTool improves the clarity and maintainability of the code.
prediction_market_agent/agents/custom_agent.py (1)
  • 20-23: Reorganizing imports for web_scrape_structured_and_summarized and google_search modules improves the clarity and maintainability of the code.

Comment on lines 17 to 38
def web_search(query: str, max_results: int) -> list[WebSearchResult]:
"""
Web search using Tavily API.
"""
tavily_api_key = secret_str_from_env("TAVILY_API_KEY")
tavily = TavilyClient(api_key=tavily_api_key.get_secret_value())
response = tavily.search(
query=query,
search_depth="advanced",
max_results=max_results,
include_raw_content=True,
)

results = [
WebSearchResult(
url=result["url"],
query=query,
)
for result in response["results"]
]

return results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The web_search function correctly utilizes the Tavily API for performing web searches. However, consider handling potential exceptions from the TavilyClient's search method to ensure robust error handling.

try:
    response = tavily.search(
        query=query,
        search_depth="advanced",
        max_results=max_results,
        include_raw_content=True,
    )
except Exception as e:
    # Log the error and handle it appropriately
    print(f"Error during Tavily search: {e}")
    return []

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def web_search(query: str, max_results: int) -> list[WebSearchResult]:
"""
Web search using Tavily API.
"""
tavily_api_key = secret_str_from_env("TAVILY_API_KEY")
tavily = TavilyClient(api_key=tavily_api_key.get_secret_value())
response = tavily.search(
query=query,
search_depth="advanced",
max_results=max_results,
include_raw_content=True,
)
results = [
WebSearchResult(
url=result["url"],
query=query,
)
for result in response["results"]
]
return results
def web_search(query: str, max_results: int) -> list[WebSearchResult]:
"""
Web search using Tavily API.
"""
tavily_api_key = secret_str_from_env("TAVILY_API_KEY")
tavily = TavilyClient(api_key=tavily_api_key.get_secret_value())
try:
response = tavily.search(
query=query,
search_depth="advanced",
max_results=max_results,
include_raw_content=True,
)
except Exception as e:
# Log the error and handle it appropriately
print(f"Error during Tavily search: {e}")
return []
results = [
WebSearchResult(
url=result["url"],
query=query,
)
for result in response["results"]
]
return results

Comment on lines +21 to +53
def web_scrape(url: str, timeout: int = 10) -> str:
"""
Taken from polywrap/predictionprophet

https://github.com/polywrap/predictionprophet/blob/97aeea8f87e9b42da242d00d93ed5754bd64f21e/prediction_prophet/functions/web_scrape.py
"""
try:
response = fetch_html(url=url, timeout=timeout)

if "text/html" in response.headers.get("Content-Type", ""):
soup = BeautifulSoup(response.content, "html.parser")

[x.extract() for x in soup.findAll("script")]
[x.extract() for x in soup.findAll("style")]
[x.extract() for x in soup.findAll("noscript")]
[x.extract() for x in soup.findAll("link")]
[x.extract() for x in soup.findAll("head")]
[x.extract() for x in soup.findAll("image")]
[x.extract() for x in soup.findAll("img")]

text: str = soup.get_text()
text = markdownify(text)
text = " ".join([x.strip() for x in text.split("\n")])
text = " ".join([x.strip() for x in text.split(" ")])

return text
else:
print("Non-HTML content received")
return ""

except requests.RequestException as e:
print(f"HTTP request failed: {e}")
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The web_scrape function correctly fetches and processes HTML content. However, consider adding error handling for the requests.get call and validating the response status code to ensure robustness.

try:
    response = requests.get(url, headers=headers, timeout=timeout)
+   response.raise_for_status()  # Ensure the request was successful
except requests.RequestException as e:
    print(f"HTTP request failed: {e}")
    return ""

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def web_scrape(url: str, timeout: int = 10) -> str:
"""
Taken from polywrap/predictionprophet
https://github.com/polywrap/predictionprophet/blob/97aeea8f87e9b42da242d00d93ed5754bd64f21e/prediction_prophet/functions/web_scrape.py
"""
try:
response = fetch_html(url=url, timeout=timeout)
if "text/html" in response.headers.get("Content-Type", ""):
soup = BeautifulSoup(response.content, "html.parser")
[x.extract() for x in soup.findAll("script")]
[x.extract() for x in soup.findAll("style")]
[x.extract() for x in soup.findAll("noscript")]
[x.extract() for x in soup.findAll("link")]
[x.extract() for x in soup.findAll("head")]
[x.extract() for x in soup.findAll("image")]
[x.extract() for x in soup.findAll("img")]
text: str = soup.get_text()
text = markdownify(text)
text = " ".join([x.strip() for x in text.split("\n")])
text = " ".join([x.strip() for x in text.split(" ")])
return text
else:
print("Non-HTML content received")
return ""
except requests.RequestException as e:
print(f"HTTP request failed: {e}")
return ""
def web_scrape(url: str, timeout: int = 10) -> str:
"""
Taken from polywrap/predictionprophet
https://github.com/polywrap/predictionprophet/blob/97aeea8f87e9b42da242d00d93ed5754bd64f21e/prediction_prophet/functions/web_scrape.py
"""
try:
response = fetch_html(url=url, timeout=timeout)
response.raise_for_status() # Ensure the request was successful
if "text/html" in response.headers.get("Content-Type", ""):
soup = BeautifulSoup(response.content, "html.parser")
[x.extract() for x in soup.findAll("script")]
[x.extract() for x in soup.findAll("style")]
[x.extract() for x in soup.findAll("noscript")]
[x.extract() for x in soup.findAll("link")]
[x.extract() for x in soup.findAll("head")]
[x.extract() for x in soup.findAll("image")]
[x.extract() for x in soup.findAll("img")]
text: str = soup.get_text()
text = markdownify(text)
text = " ".join([x.strip() for x in text.split("\n")])
text = " ".join([x.strip() for x in text.split(" ")])
return text
else:
print("Non-HTML content received")
return ""
except requests.RequestException as e:
print(f"HTTP request failed: {e}")
return ""

Comment on lines 21 to 78
class DeployableKnownOutcomeAgent(DeployableAgent):
model = "gpt-3.5-turbo-0125"

def load(self) -> None:
self.markets_with_known_outcomes: dict[str, Result] = {}

def pick_markets(self, markets: list[AgentMarket]) -> list[AgentMarket]:
picked_markets: list[AgentMarket] = []
for market in markets:
# Assume very high probability markets are already known, and have
# been correctly bet on, and therefore the value of betting on them
# is low.
if not market_is_saturated(market=markets[0]):
answer = get_known_outcome(
model=self.model,
question=market.question,
max_tries=3,
)
if answer.has_known_outcome():
picked_markets.append(market)
self.markets_with_known_outcomes[market.id] = answer.result

return picked_markets

def answer_binary_market(self, market: AgentMarket) -> bool:
# The answer has already been determined in `pick_markets` so we just
# return it here.
return self.markets_with_known_outcomes[market.id].to_boolean()

def calculate_bet_amount(self, answer: bool, market: AgentMarket) -> BetAmount:
if market.currency == Currency.xDai:
return BetAmount(amount=0.1, currency=Currency.xDai)
else:
raise NotImplementedError("This agent only supports xDai markets")


if __name__ == "__main__":
agent = DeployableKnownOutcomeAgent()
github_repo_url = "https://github.com/gnosis/prediction-market-agent"
agent.deploy_gcp(
repository=f"git+{github_repo_url}.git@{get_current_git_commit_sha()}",
market_type=MarketType.OMEN,
labels={OWNER_KEY: getpass.getuser()},
secrets={
"OPENAI_API_KEY": "EVAN_OPENAI_API_KEY:latest",
"TAVILY_API_KEY": "GNOSIS_AI_TAVILY_API_KEY:latest",
"BET_FROM_PRIVATE_KEY": "0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307:latest",
},
memory=1024,
api_keys=APIKeys(
BET_FROM_ADDRESS="0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307",
BET_FROM_PRIVATE_KEY=None,
OPENAI_API_KEY=None,
MANIFOLD_API_KEY=None,
),
cron_schedule="0 */4 * * *",
timeout=540,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DeployableKnownOutcomeAgent class is well-structured and includes essential methods for deploying the agent. Ensure that the deployment script correctly handles any potential exceptions or errors during deployment to avoid partial or failed deployments.

try:
    agent.deploy_gcp(
        ...
    )
except Exception as e:
    print(f"Deployment failed: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
class DeployableKnownOutcomeAgent(DeployableAgent):
model = "gpt-3.5-turbo-0125"
def load(self) -> None:
self.markets_with_known_outcomes: dict[str, Result] = {}
def pick_markets(self, markets: list[AgentMarket]) -> list[AgentMarket]:
picked_markets: list[AgentMarket] = []
for market in markets:
# Assume very high probability markets are already known, and have
# been correctly bet on, and therefore the value of betting on them
# is low.
if not market_is_saturated(market=markets[0]):
answer = get_known_outcome(
model=self.model,
question=market.question,
max_tries=3,
)
if answer.has_known_outcome():
picked_markets.append(market)
self.markets_with_known_outcomes[market.id] = answer.result
return picked_markets
def answer_binary_market(self, market: AgentMarket) -> bool:
# The answer has already been determined in `pick_markets` so we just
# return it here.
return self.markets_with_known_outcomes[market.id].to_boolean()
def calculate_bet_amount(self, answer: bool, market: AgentMarket) -> BetAmount:
if market.currency == Currency.xDai:
return BetAmount(amount=0.1, currency=Currency.xDai)
else:
raise NotImplementedError("This agent only supports xDai markets")
if __name__ == "__main__":
agent = DeployableKnownOutcomeAgent()
github_repo_url = "https://github.com/gnosis/prediction-market-agent"
agent.deploy_gcp(
repository=f"git+{github_repo_url}.git@{get_current_git_commit_sha()}",
market_type=MarketType.OMEN,
labels={OWNER_KEY: getpass.getuser()},
secrets={
"OPENAI_API_KEY": "EVAN_OPENAI_API_KEY:latest",
"TAVILY_API_KEY": "GNOSIS_AI_TAVILY_API_KEY:latest",
"BET_FROM_PRIVATE_KEY": "0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307:latest",
},
memory=1024,
api_keys=APIKeys(
BET_FROM_ADDRESS="0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307",
BET_FROM_PRIVATE_KEY=None,
OPENAI_API_KEY=None,
MANIFOLD_API_KEY=None,
),
cron_schedule="0 */4 * * *",
timeout=540,
)
if __name__ == "__main__":
agent = DeployableKnownOutcomeAgent()
github_repo_url = "https://github.com/gnosis/prediction-market-agent"
try:
agent.deploy_gcp(
repository=f"git+{github_repo_url}.git@{get_current_git_commit_sha()}",
market_type=MarketType.OMEN,
labels={OWNER_KEY: getpass.getuser()},
secrets={
"OPENAI_API_KEY": "EVAN_OPENAI_API_KEY:latest",
"TAVILY_API_KEY": "GNOSIS_AI_TAVILY_API_KEY:latest",
"BET_FROM_PRIVATE_KEY": "0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307:latest",
},
memory=1024,
api_keys=APIKeys(
BET_FROM_ADDRESS="0x3666DA333dAdD05083FEf9FF6dDEe588d26E4307",
BET_FROM_PRIVATE_KEY=None,
OPENAI_API_KEY=None,
MANIFOLD_API_KEY=None,
),
cron_schedule="0 */4 * * *",
timeout=540,
)
except Exception as e:
print(f"Deployment failed: {e}")

Comment on lines 121 to 162
def get_known_outcome(model: str, question: str, max_tries: int) -> Answer:
"""
In a loop, perform web search and scrape to find if the answer to the
question is known. Break if the answer is found, or after a certain number
of tries, and no definite answer is found, return an 'unknown' answer.
"""
tries = 0
date_str = datetime.now().strftime("%d %B %Y")
previous_urls = []
llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0.4)
while tries < max_tries:
search_prompt = ChatPromptTemplate.from_template(
template=GENERATE_SEARCH_QUERY_PROMPT
).format_messages(date_str=date_str, question=question)
search_query = llm.invoke(search_prompt).content.strip('"')
search_results = web_search(query=search_query, max_results=5)
if not search_results:
raise ValueError("No search results found.")

for result in search_results:
if result.url in previous_urls:
continue
previous_urls.append(result.url)

scraped_content = web_scrape(url=result.url)

prompt = ChatPromptTemplate.from_template(
template=ANSWER_FROM_WEBSCRAPE_PROMPT
).format_messages(
date_str=date_str,
question=question,
scraped_content=scraped_content,
)
answer = llm.invoke(prompt).content
parsed_answer = Answer.model_validate(completion_str_to_json(answer))

if parsed_answer.result is not Result.UNKNOWN:
return parsed_answer

tries += 1

return Answer(result=Result.UNKNOWN, reasoning="Max tries exceeded.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_known_outcome function correctly implements the logic for determining the known outcome of a market. Consider adding more detailed logging or error handling for better observability and robustness, especially in the web scraping and search parts.

+ import logging
...
try:
    ...
except Exception as e:
+   logging.error(f"Error while determining known outcome: {e}")
    return Answer(result=Result.UNKNOWN, reasoning="Error occurred.")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def get_known_outcome(model: str, question: str, max_tries: int) -> Answer:
"""
In a loop, perform web search and scrape to find if the answer to the
question is known. Break if the answer is found, or after a certain number
of tries, and no definite answer is found, return an 'unknown' answer.
"""
tries = 0
date_str = datetime.now().strftime("%d %B %Y")
previous_urls = []
llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0.4)
while tries < max_tries:
search_prompt = ChatPromptTemplate.from_template(
template=GENERATE_SEARCH_QUERY_PROMPT
).format_messages(date_str=date_str, question=question)
search_query = llm.invoke(search_prompt).content.strip('"')
search_results = web_search(query=search_query, max_results=5)
if not search_results:
raise ValueError("No search results found.")
for result in search_results:
if result.url in previous_urls:
continue
previous_urls.append(result.url)
scraped_content = web_scrape(url=result.url)
prompt = ChatPromptTemplate.from_template(
template=ANSWER_FROM_WEBSCRAPE_PROMPT
).format_messages(
date_str=date_str,
question=question,
scraped_content=scraped_content,
)
answer = llm.invoke(prompt).content
parsed_answer = Answer.model_validate(completion_str_to_json(answer))
if parsed_answer.result is not Result.UNKNOWN:
return parsed_answer
tries += 1
return Answer(result=Result.UNKNOWN, reasoning="Max tries exceeded.")
import logging
def get_known_outcome(model: str, question: str, max_tries: int) -> Answer:
"""
In a loop, perform web search and scrape to find if the answer to the
question is known. Break if the answer is found, or after a certain number
of tries, and no definite answer is found, return an 'unknown' answer.
"""
tries = 0
date_str = datetime.now().strftime("%d %B %Y")
previous_urls = []
llm = ChatOpenAI(model="gpt-4-1106-preview", temperature=0.4)
while tries < max_tries:
search_prompt = ChatPromptTemplate.from_template(
template=GENERATE_SEARCH_QUERY_PROMPT
).format_messages(date_str=date_str, question=question)
search_query = llm.invoke(search_prompt).content.strip('"')
search_results = web_search(query=search_query, max_results=5)
if not search_results:
raise ValueError("No search results found.")
for result in search_results:
if result.url in previous_urls:
continue
previous_urls.append(result.url)
scraped_content = web_scrape(url=result.url)
prompt = ChatPromptTemplate.from_template(
template=ANSWER_FROM_WEBSCRAPE_PROMPT
).format_messages(
date_str=date_str,
question=question,
scraped_content=scraped_content,
)
answer = llm.invoke(prompt).content
parsed_answer = Answer.model_validate(completion_str_to_json(answer))
if parsed_answer.result is not Result.UNKNOWN:
return parsed_answer
tries += 1
return Answer(result=Result.UNKNOWN, reasoning="Max tries exceeded.")

Comment on lines 75 to 163
if __name__ == "__main__":
load_dotenv()
tomorrow_str = (datetime.now(tz=pytz.UTC) + timedelta(days=1)).strftime("%d %B %Y")

# Fetch questions from existing markets, or make some up, where the
# outcome is known.
qs_with_known_outcome: list[QuestionWithKnownOutcome] = [
QuestionWithKnownOutcome(
question=f"Will 'Barbie' win an Academy Award for best original song by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xceb2a4ecc217cab440acf60737a9fcfd6d3fbf4b",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will the 2024 Oscars winner for Best Picture be announced by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xb88e4507709148e096bcdfb861b17db7b4d54e6b",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will Liverpool win against Atalanta in the Europa League quarter-finals by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x1d5a462c801360b4bebbda2b9656e52801a27a3b",
result=Result.NO,
notes="The match is scheduled for 11 April 2024.",
),
QuestionWithKnownOutcome(
question=f"Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will SpaceX successfully test a Starship reentry without losing contact by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xcc9123af8db309e0c60c63f9e2b8b82fc86f458b",
result=Result.NO,
notes="The only scheduled test flight occured, and contact was lost during the test.",
),
QuestionWithKnownOutcome(
question=f"Will Arsenal reach the Champions League semi-finals on {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x606efd175b245cd60282a98cef402d4f5e950f92",
result=Result.NO,
notes="They are scheduled to play the first leg of the quarter-finals on 9 April 2024.",
),
QuestionWithKnownOutcome(
question=f"Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xe55171beda0d60fd45092ff8bf93d5cb566a2510",
result=Result.NO,
notes="The verdict was announced on 15th March 2024.",
),
QuestionWithKnownOutcome(
question="Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
QuestionWithKnownOutcome(
question="Will the cost of grain in the Spain increase by 20% by 19 July 2024?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
QuestionWithKnownOutcome(
question="Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
]

benchmarker = Benchmarker(
markets=[q.to_market() for q in qs_with_known_outcome],
agents=[
KnownOutcomeAgent(
agent_name="known_outcome",
model="gpt-3.5-turbo-0125",
max_tries=3,
max_workers=1,
),
],
cache_path="./benchmark_cache.json",
)
benchmarker.run_agents()
md = benchmarker.generate_markdown_report()

output = f"./known_outcome_agent_benchmark_report.{int(time.time())}.md"
with open(output, "w") as f:
print(f"Writing benchmark report to: {output}")
f.write(md)

# Check all predictions are correct, i.e. mean-squared-error == 0
metrics = benchmarker.compute_metrics()
assert metrics["MSE for `p_yes`"][0] == 0.0
Copy link
Contributor

@coderabbitai coderabbitai bot Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarking script is well-structured and includes a comprehensive list of test cases. Ensure that the benchmarking process includes error handling for cases where the agent might fail to predict an outcome, to provide a complete picture of the agent's performance.

try:
    benchmarker.run_agents()
    metrics = benchmarker.compute_metrics()
    assert metrics["MSE for `p_yes`"][0] == 0.0
except Exception as e:
    print(f"Benchmarking failed: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if __name__ == "__main__":
load_dotenv()
tomorrow_str = (datetime.now(tz=pytz.UTC) + timedelta(days=1)).strftime("%d %B %Y")
# Fetch questions from existing markets, or make some up, where the
# outcome is known.
qs_with_known_outcome: list[QuestionWithKnownOutcome] = [
QuestionWithKnownOutcome(
question=f"Will 'Barbie' win an Academy Award for best original song by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xceb2a4ecc217cab440acf60737a9fcfd6d3fbf4b",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will the 2024 Oscars winner for Best Picture be announced by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xb88e4507709148e096bcdfb861b17db7b4d54e6b",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will Liverpool win against Atalanta in the Europa League quarter-finals by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x1d5a462c801360b4bebbda2b9656e52801a27a3b",
result=Result.NO,
notes="The match is scheduled for 11 April 2024.",
),
QuestionWithKnownOutcome(
question=f"Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will SpaceX successfully test a Starship reentry without losing contact by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xcc9123af8db309e0c60c63f9e2b8b82fc86f458b",
result=Result.NO,
notes="The only scheduled test flight occured, and contact was lost during the test.",
),
QuestionWithKnownOutcome(
question=f"Will Arsenal reach the Champions League semi-finals on {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x606efd175b245cd60282a98cef402d4f5e950f92",
result=Result.NO,
notes="They are scheduled to play the first leg of the quarter-finals on 9 April 2024.",
),
QuestionWithKnownOutcome(
question=f"Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xe55171beda0d60fd45092ff8bf93d5cb566a2510",
result=Result.NO,
notes="The verdict was announced on 15th March 2024.",
),
QuestionWithKnownOutcome(
question="Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
QuestionWithKnownOutcome(
question="Will the cost of grain in the Spain increase by 20% by 19 July 2024?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
QuestionWithKnownOutcome(
question="Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
]
benchmarker = Benchmarker(
markets=[q.to_market() for q in qs_with_known_outcome],
agents=[
KnownOutcomeAgent(
agent_name="known_outcome",
model="gpt-3.5-turbo-0125",
max_tries=3,
max_workers=1,
),
],
cache_path="./benchmark_cache.json",
)
benchmarker.run_agents()
md = benchmarker.generate_markdown_report()
output = f"./known_outcome_agent_benchmark_report.{int(time.time())}.md"
with open(output, "w") as f:
print(f"Writing benchmark report to: {output}")
f.write(md)
# Check all predictions are correct, i.e. mean-squared-error == 0
metrics = benchmarker.compute_metrics()
assert metrics["MSE for `p_yes`"][0] == 0.0
if __name__ == "__main__":
load_dotenv()
tomorrow_str = (datetime.now(tz=pytz.UTC) + timedelta(days=1)).strftime("%d %B %Y")
# Fetch questions from existing markets, or make some up, where the
# outcome is known.
qs_with_known_outcome: list[QuestionWithKnownOutcome] = [
QuestionWithKnownOutcome(
question=f"Will 'Barbie' win an Academy Award for best original song by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xceb2a4ecc217cab440acf60737a9fcfd6d3fbf4b",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will the 2024 Oscars winner for Best Picture be announced by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xb88e4507709148e096bcdfb861b17db7b4d54e6b",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will Liverpool win against Atalanta in the Europa League quarter-finals by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x1d5a462c801360b4bebbda2b9656e52801a27a3b",
result=Result.NO,
notes="The match is scheduled for 11 April 2024.",
),
QuestionWithKnownOutcome(
question=f"Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will SpaceX successfully test a Starship reentry without losing contact by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xcc9123af8db309e0c60c63f9e2b8b82fc86f458b",
result=Result.NO,
notes="The only scheduled test flight occured, and contact was lost during the test.",
),
QuestionWithKnownOutcome(
question=f"Will Arsenal reach the Champions League semi-finals on {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x606efd175b245cd60282a98cef402d4f5e950f92",
result=Result.NO,
notes="They are scheduled to play the first leg of the quarter-finals on 9 April 2024.",
),
QuestionWithKnownOutcome(
question=f"Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xe55171beda0d60fd45092ff8bf93d5cb566a2510",
result=Result.NO,
notes="The verdict was announced on 15th March 2024.",
),
QuestionWithKnownOutcome(
question="Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
QuestionWithKnownOutcome(
question="Will the cost of grain in the Spain increase by 20% by 19 July 2024?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
QuestionWithKnownOutcome(
question="Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
]
benchmarker = Benchmarker(
markets=[q.to_market() for q in qs_with_known_outcome],
agents=[
KnownOutcomeAgent(
agent_name="known_outcome",
model="gpt-3.5-turbo-0125",
max_tries=3,
max_workers=1,
),
],
cache_path="./benchmark_cache.json",
)
try:
benchmarker.run_agents()
metrics = benchmarker.compute_metrics()
assert metrics["MSE for `p_yes`"][0] == 0.0
except Exception as e:
print(f"Benchmarking failed: {e}")
md = benchmarker.generate_markdown_report()
output = f"./known_outcome_agent_benchmark_report.{int(time.time())}.md"
with open(output, "w") as f:
print(f"Writing benchmark report to: {output}")
f.write(md)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bold assertion! Cool!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kongzii, thank you for the enthusiasm! It's great to see positive feedback. If you have any more suggestions or need further assistance, feel free to ask. 😊

from prediction_market_agent_tooling.benchmark.agents import AbstractBenchmarkedAgent
from prediction_market_agent_tooling.benchmark.benchmark import Benchmarker
from prediction_market_agent_tooling.benchmark.utils import (
Market,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will release a new version of prediction-market-agent-tooling, and bump it in this repo and update this code. Happy to do as part of this PR or as a follow-up PR.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 6f2be42 and 40a8aaf.
Files selected for processing (5)
  • prediction_market_agent/agents/known_outcome_agent/benchmark.py (1 hunks)
  • prediction_market_agent/agents/known_outcome_agent/deploy.py (1 hunks)
  • prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py (1 hunks)
  • prediction_market_agent/tools/web_scrape/structured_summary.py (1 hunks)
  • prediction_market_agent/tools/web_search/tavily.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • prediction_market_agent/tools/web_scrape/structured_summary.py
Files skipped from review as they are similar to previous changes (4)
  • prediction_market_agent/agents/known_outcome_agent/benchmark.py
  • prediction_market_agent/agents/known_outcome_agent/deploy.py
  • prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py
  • prediction_market_agent/tools/web_search/tavily.py

@evangriffiths
Copy link
Contributor Author

Benchmarking output:

# Comparison Report

## Market Results

|   Number of markets |   Proportion resolved |   Proportion YES |   Proportion NO |
|--------------------:|----------------------:|-----------------:|----------------:|
|                  10 |                     0 |              0.3 |             0.7 |

## Agent Results

### Summary Statistics

| Agents        |   MSE for `p_yes` |   Mean confidence |   % within +-0.05 |   % within +-0.1 |   % within +-0.2 |   % correct outcome |   % precision for `yes` |   % precision for `no` |   % recall for `yes` |   % recall for `no` |   confidence/p_yes error correlation | Mean info_utility   |   Proportion answerable |   Proportion answered |   Mean cost ($) |   Mean time (s) |
|:--------------|------------------:|------------------:|------------------:|-----------------:|-----------------:|--------------------:|------------------------:|-----------------------:|---------------------:|--------------------:|-------------------------------------:|:--------------------|------------------------:|----------------------:|----------------:|----------------:|
| known_outcome |                 0 |                 1 |               100 |              100 |              100 |                 100 |                     100 |                    100 |                  100 |                 100 |                                  nan |                     |                     0.7 |                   0.7 |        0.147053 |         23.7117 |

### Markets

| Market Question                                                                                                                                                                   | known_outcome p_yes   | reference p_yes   |
|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------|:------------------|
| [Will 'Barbie' win an Academy Award for best original song by 21 March 2024?](https://aiomen.eth.limo/#/0xceb2a4ecc217cab440acf60737a9fcfd6d3fbf4b)                               | 1.00 [yes]            | 1.00 [yes]        |
| [Will the 2024 Oscars winner for Best Picture be announced by 21 March 2024?](https://aiomen.eth.limo/#/0xb88e4507709148e096bcdfb861b17db7b4d54e6b)                               | 1.00 [yes]            | 1.00 [yes]        |
| [Will Liverpool win against Atalanta in the Europa League quarter-finals by 21 March 2024?](https://aiomen.eth.limo/#/0x1d5a462c801360b4bebbda2b9656e52801a27a3b)                 | 0.00 [no]             | 0.00 [no]         |
| [Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by 21 March 2024?](https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c) | 1.00 [yes]            | 1.00 [yes]        |
| [Will SpaceX successfully test a Starship reentry without losing contact by 21 March 2024?](https://aiomen.eth.limo/#/0xcc9123af8db309e0c60c63f9e2b8b82fc86f458b)                 | 0.00 [no]             | 0.00 [no]         |
| [Will Arsenal reach the Champions League semi-finals on 21 March 2024?](https://aiomen.eth.limo/#/0x606efd175b245cd60282a98cef402d4f5e950f92)                                     | 0.00 [no]             | 0.00 [no]         |
| [Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on 21 March 2024?](https://aiomen.eth.limo/#/0xe55171beda0d60fd45092ff8bf93d5cb566a2510)                | 0.00 [no]             | 0.00 [no]         |
| [Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?]()                                                                                                               | S                     | 0.50 [no]         |
| [Will the cost of grain in the Spain increase by 20% by 19 July 2024?]()                                                                                                          | S                     | 0.50 [no]         |
| [Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?]()                                                                                                   | S                     | 0.50 [no]         |

### Expected value

| Agent         |   Mean expected returns |   Median expected returns |   Total expected returns |
|:--------------|------------------------:|--------------------------:|-------------------------:|
| known_outcome |                     100 |                       100 |                      700 |

| Market Question                                                                                           |   known_outcome |
|:----------------------------------------------------------------------------------------------------------|----------------:|
| Will 'Barbie' win an Academy Award for best original song by 21 March 2024?                               |             100 |
| Will the 2024 Oscars winner for Best Picture be announced by 21 March 2024?                               |             100 |
| Will Liverpool win against Atalanta in the Europa League quarter-finals by 21 March 2024?                 |             100 |
| Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by 21 March 2024? |             100 |
| Will SpaceX successfully test a Starship reentry without losing contact by 21 March 2024?                 |             100 |
| Will Arsenal reach the Champions League semi-finals on 21 March 2024?                                     |             100 |
| Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on 21 March 2024?                |             100 |
| Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?                                           |             nan |
| Will the cost of grain in the Spain increase by 20% by 19 July 2024?                                      |             nan |
| Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?                               |             nan |

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 40a8aaf and 9cf4684.
Files selected for processing (1)
  • prediction_market_agent/agents/known_outcome_agent/deploy.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • prediction_market_agent/agents/known_outcome_agent/deploy.py


def calculate_bet_amount(self, answer: bool, market: AgentMarket) -> BetAmount:
if market.currency == Currency.xDai:
return BetAmount(amount=Decimal(0.1), currency=Currency.xDai)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this configurable (e.g. ENV from Google Cloud) and thus editable after deployment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently have a way of changing agent source code after deployment via gcp, so I don't think that is needed. ATM we can update the commit hash that the agent's container installs (see pic), and click 'save and redeploy'
Screenshot 2024-03-20 at 15 18 58

scraped_content=scraped_content,
)
answer = str(llm.invoke(prompt).content)
parsed_answer = Answer.model_validate(completion_str_to_json(answer))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a bit risky to assume the LLM will answer using the correct format?
I would expect some try-catch block because of this, but I might be wrong.

Copy link
Contributor

@kongzii kongzii Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it depends on what comes after this.

Is this needed (if so let's error out and do it more robust in another PR) or can we skip it and continue ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpt4 is very reliable at producing json output!

Comment on lines 75 to 163
if __name__ == "__main__":
load_dotenv()
tomorrow_str = (datetime.now(tz=pytz.UTC) + timedelta(days=1)).strftime("%d %B %Y")

# Fetch questions from existing markets, or make some up, where the
# outcome is known.
qs_with_known_outcome: list[QuestionWithKnownOutcome] = [
QuestionWithKnownOutcome(
question=f"Will 'Barbie' win an Academy Award for best original song by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xceb2a4ecc217cab440acf60737a9fcfd6d3fbf4b",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will the 2024 Oscars winner for Best Picture be announced by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xb88e4507709148e096bcdfb861b17db7b4d54e6b",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will Liverpool win against Atalanta in the Europa League quarter-finals by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x1d5a462c801360b4bebbda2b9656e52801a27a3b",
result=Result.NO,
notes="The match is scheduled for 11 April 2024.",
),
QuestionWithKnownOutcome(
question=f"Will Donald Trump officially become the GOP nominee for the 2024 presidential elections by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x859a6b465ee1e4a73aab0f2da4428c6255da466c",
result=Result.YES,
notes="Happened on 10th March 2024.",
),
QuestionWithKnownOutcome(
question=f"Will SpaceX successfully test a Starship reentry without losing contact by {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xcc9123af8db309e0c60c63f9e2b8b82fc86f458b",
result=Result.NO,
notes="The only scheduled test flight occured, and contact was lost during the test.",
),
QuestionWithKnownOutcome(
question=f"Will Arsenal reach the Champions League semi-finals on {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0x606efd175b245cd60282a98cef402d4f5e950f92",
result=Result.NO,
notes="They are scheduled to play the first leg of the quarter-finals on 9 April 2024.",
),
QuestionWithKnownOutcome(
question=f"Will the jury deliver a verdict on James Crumbley's 'bad parenting' case on {tomorrow_str}?",
url="https://aiomen.eth.limo/#/0xe55171beda0d60fd45092ff8bf93d5cb566a2510",
result=Result.NO,
notes="The verdict was announced on 15th March 2024.",
),
QuestionWithKnownOutcome(
question="Will Lewis Hamilton win the 2024/2025 F1 drivers champtionship?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
QuestionWithKnownOutcome(
question="Will the cost of grain in the Spain increase by 20% by 19 July 2024?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
QuestionWithKnownOutcome(
question="Will over 360 pople have died while climbing Mount Everest by 1st Jan 2028?",
result=Result.UNKNOWN,
notes="Outcome is uncertain.",
),
]

benchmarker = Benchmarker(
markets=[q.to_market() for q in qs_with_known_outcome],
agents=[
KnownOutcomeAgent(
agent_name="known_outcome",
model="gpt-3.5-turbo-0125",
max_tries=3,
max_workers=1,
),
],
cache_path="./benchmark_cache.json",
)
benchmarker.run_agents()
md = benchmarker.generate_markdown_report()

output = f"./known_outcome_agent_benchmark_report.{int(time.time())}.md"
with open(output, "w") as f:
print(f"Writing benchmark report to: {output}")
f.write(md)

# Check all predictions are correct, i.e. mean-squared-error == 0
metrics = benchmarker.compute_metrics()
assert metrics["MSE for `p_yes`"][0] == 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bold assertion! Cool!

BET_FROM_ADDRESS=verify_address(
"0xb611A9f02B318339049264c7a66ac3401281cc3c"
),
BET_FROM_PRIVATE_KEY=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BET_FROM_PRIVATE_KEY should (can) be defined here instead of manually in secrets. Deployment will automatically treat it like a secret thanks to:

Screenshot by Dropbox Capture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice. It feels a bit counter intuitive that we have 2 places where private keys can be passed in. If we wanted to make it so all keys are specified in one place (assuming APIKeys can never contain all key-types an agent might want), then the way to do this would be to remove APIKeys altogether. Got any opinions on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then the way to do this would be to remove APIKeys altogether. Got any opinions on that?

Yep, I believe we have GitHub issue for that 😄 Looking forward to someone picking it up.

My current reasoning for using APIKeys here is that we use them heavily across the tooling repo, so APIKeys needs to be enforced here to force the right variable names. And then add to secrets anything agent-specific (tavily for example)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay coool

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 9cf4684 and c7a93d0.
Files selected for processing (4)
  • prediction_market_agent/agents/known_outcome_agent/benchmark.py (1 hunks)
  • prediction_market_agent/agents/known_outcome_agent/deploy.py (1 hunks)
  • prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py (1 hunks)
  • prediction_market_agent/tools/web_scrape/basic_summary.py (1 hunks)
Files skipped from review as they are similar to previous changes (3)
  • prediction_market_agent/agents/known_outcome_agent/benchmark.py
  • prediction_market_agent/agents/known_outcome_agent/deploy.py
  • prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py
Additional comments: 2
prediction_market_agent/tools/web_scrape/basic_summary.py (2)
  • 9-14: The addition of the separators parameter with a default value enhances the flexibility of the _summary function, allowing for customization of text splitting based on the content structure. This is a positive change for handling different types of content more effectively.
  • 12-12: Updating the ChatOpenAI model parameter to "gpt-3.5-turbo-0125" from a previous version indicates an upgrade to a potentially more efficient or accurate model. It's important to verify that this new model version is compatible with the existing codebase and that it meets the performance and accuracy requirements for the task at hand.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between c7a93d0 and e1e232e.
Files selected for processing (2)
  • prediction_market_agent/agents/known_outcome_agent/deploy.py (1 hunks)
  • prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • prediction_market_agent/agents/known_outcome_agent/deploy.py
  • prediction_market_agent/agents/known_outcome_agent/known_outcome_agent.py

@evangriffiths evangriffiths merged commit 5cd7cfb into main Mar 20, 2024
6 checks passed
@evangriffiths evangriffiths deleted the evan/known-outcome-agent branch March 20, 2024 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants