Agent that thinks more thoroughly about question and considers possible outcomes #47

gabrielfior · 2024-04-03T22:33:00Z

Closes #40

Summary by CodeRabbit

New Features
- Introduced benchmarking functionality for CrewAI agents in prediction markets.
- Added a DeployableThinkThoroughlyAgent for selecting and betting on prediction markets.
- New functionalities for creating outcomes, determining probabilities, and making decisions in prediction markets.
Refactor
- Improved organization by moving the market_is_saturated function to a utility module for better reusability.

coderabbitai · 2024-04-03T22:33:04Z

Walkthrough

The update enhances the prediction market agents by introducing benchmarking, subquestion handling, and deployment strategies. New tools and utilities improve decision-making by incorporating detailed subquestion analysis and outcome probability management.

Changes

File Path	Change Summary
`.../crewai_subsequential_agent/benchmark.py`	Introduces benchmarking for CrewAI agents with binary market questions.
`.../crewai_subsequential_agent/crewai_agent_subquestions.py`	Adds handling of subquestions in prediction markets.
`.../crewai_subsequential_agent/deploy.py`	New deployment functionalities for selecting and betting on markets.
`.../known_outcome_agent/benchmark.py`	Adds a directive to ignore type checking.
`.../known_outcome_agent/deploy.py`	Refactors market saturation check to an external module.
`.../agents/utils.py`	Introduces a new saturation check function and API key management.
`.../crewai_subsequential_agent/prompts.py`	Adds functionality for creating and evaluating outcomes based on probabilities.
`.../tools/crewai_tools.py`	Introduces a new tool for internet searches using the Tavily search API.

Assessment against linked issues

Objective	Addressed	Explanation
Make one of the agents think about the question more thoroughly (#40)	✅

Recent Review Details

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 64d8ac6 and 7d2fcc3.

Files selected for processing (1)

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

gabrielfior · 2024-04-03T22:33:27Z

Note that the notebooks have been added for helping the discussion and will not be merged into main.

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py

evangriffiths

Nice work. This looks like fun to build!

I think this approach is good for improving one of the areas that Martin has mentioned - that the prediction is consistent with predictions for the rest of the probability space.

But I don't think it helps with improving 'depth' of reasoning. I would suspect the prediction for each sub-outcome would perform similarly shallow reasoning (as do our existing agents like the evo agent). Martin was also interested in the idea of getting the agent to reason deeper about the question by generating sub-questions, the answers of which the main question depend on. Like for Will Carlos Alcaraz win the Miami Open by 5 April 2024?, the agent would ask about the probabilities of winning in the quarter final/semi final, win rate of Alcaraz vs the other finalist, etc. and then combine these probabilities in a kind of bayesian way.

Luckily I think these two improvements can be made together, so not saying it should be one or the other. But I think worth thinking about this other type of enhancement at this point.

Also, curious to know, what is the token cost per prediction that you're seeing?

prediction_market_agent/agents/crewai_subsequential_agent/prompts.py

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py

…utcomes # Conflicts: # poetry.lock

gabrielfior · 2024-04-04T21:28:41Z

I also added a benchmark script to the agent, an excerpt can be found below

Comparison Report

Market Results

Number of markets	Proportion resolved	Proportion YES	Proportion NO
1	0	0	1

Agent Results

Summary Statistics

Agents	MSE for `p_yes`	Mean confidence	% within +-0.05	% within +-0.1	% within +-0.2	% correct outcome	% precision for `yes`	% precision for `no`	% recall for `yes`	% recall for `no`	confidence/p_yes error correlation	Mean info_utility	Proportion answerable	Proportion answered	Mean cost ($)	Mean time (s)
subsequential_questions	0.25	1	0	0	0	100	0	100	0	100	nan		1	1	0.000727	3.28748

Markets

Market Question	subsequential_questions p_yes	reference p_yes
[Will the stock price of Donald Trump's media company exceed $100 on 1 April 2024?](Will the stock price of Donald Trump's media company exceed $100 on 1 April 2024?)	0.00 [no]	0.50 [no]

Expected value

Agent	Mean expected returns	Median expected returns	Total expected returns
subsequential_questions	0	0	0

Market Question	subsequential_questions
Will the stock price of Donald Trump's media company exceed $100 on 1 April 2024?	0

prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py

coderabbitai

Actionable comments posted: 4

Review Status

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 3ccf592 and a89ec31.

Files ignored due to path filters (2)

poetry.lock is excluded by !**/*.lock, !**/*.lock
pyproject.toml is excluded by !**/*.toml

Files selected for processing (8)

crewai_multiple_agent.ipynb (1 hunks)
prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py (1 hunks)
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py (1 hunks)
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py (1 hunks)
prediction_market_agent/agents/crewai_subsequential_agent/prompts.py (1 hunks)
prediction_market_agent/agents/known_outcome_agent/benchmark.py (1 hunks)
prediction_market_agent/agents/known_outcome_agent/deploy.py (2 hunks)
prediction_market_agent/agents/utils.py (1 hunks)

Files not summarized due to errors (1)

crewai_multiple_agent.ipynb: Error: Message exceeds token limit

Files skipped from review due to trivial changes (1)

prediction_market_agent/agents/known_outcome_agent/benchmark.py

Additional comments not posted (30)

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py (3)

15-18: Ensure the model version (gpt-3.5-turbo) is up-to-date and aligns with the project's requirements for AI models.

44-48: The method calculate_bet_amount only supports xDai markets. Ensure that this limitation is documented and consider implementing support for additional currencies if required by the project.

51-56: The main block uses hard-coded values for deployment parameters. Consider externalizing these values to configuration files or environment variables for better maintainability.

prediction_market_agent/agents/known_outcome_agent/deploy.py (4)

23-23: The import of market_is_saturated from utils is a good practice for code reusability. Ensure that the moved function is no longer used within this file to avoid redundancy.

1-1: The use of # type: ignore at the top of the file suggests there might be type hinting issues. Ensure that all type hints are correct and consider removing this directive if it's no longer necessary.

1-4: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [37-89]

The method answer_binary_market has a complex logic for determining the market answer. Ensure that this logic is thoroughly tested, especially the error handling and the fallback to None when an answer cannot be determined.

1-4: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [94-137]

The main block for deploying the agent contains hard-coded values and paths. Consider externalizing these to configuration files or environment variables for better maintainability and flexibility.

prediction_market_agent/agents/crewai_subsequential_agent/prompts.py (2)

1-98: Ensure that the prompts and expected output formats are aligned with the requirements of the CrewAI framework and are correctly formatted for the LLM to understand. Pay special attention to placeholders like [SCENARIO] and {scenario} to ensure they are used consistently and correctly.

1-98: Consider adding more examples to the prompts to cover a wider range of scenarios and improve the LLM's understanding of the task.

prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py (2)

1-159: Ensure that the benchmarking script accurately reflects the performance of the CrewAI agent by verifying the correctness of the market building, prediction generation, and the final assertion on the mean-squared-error for p_yes.

1-159: Consider adding documentation or comments explaining the benchmarking process, especially the significance of the mean-squared-error assertion and how the benchmark results should be interpreted.

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py (3)

25-152: Ensure that the CrewAIAgentSubquestions class and its methods are well-documented, especially the interaction between tasks, agents, and crews within the CrewAI framework. This will help future developers understand and maintain the code.

25-152: Consider adding error handling for the CrewAI framework interactions, especially for cases where tasks fail or return unexpected results. This will improve the robustness of the agent's decision-making process.

25-152: Verify that the asynchronous execution of tasks (async_execution=True) is correctly managed and that the results are correctly aggregated before making a final decision. This is crucial for the accuracy of the agent's predictions.

crewai_multiple_agent.ipynb (16)

4-15: Imports are correctly organized and necessary for the notebook's functionality.

19-37: Loading environment variables using load_dotenv() is a secure practice for configuration.

40-47: Ensure SerperDevTool is effectively utilized in tasks where appropriate.

51-74: Consider adding comments to explain the purpose and functionality of each agent for clarity, especially if the notebook is intended for educational purposes or wider distribution.

51-115: Clarify the use of tools in the research_task definition. If SerperDevTool or other tools are intended to be used, consider uncommenting and properly integrating them.

126-130: Consider including other agents (analyst, writer) and their respective tasks in report_crew if applicable to the simulation's goals, to fully utilize the multi-agent system.

150-152: Consider enhancing result handling for clarity and context, especially if the notebook is intended for production or broader educational use.

185-230: Ensure the alternative approach for breaking down scenarios into possible outcomes is consistently integrated with the rest of the notebook's logic and objectives.

286-287: Consider providing additional examples or explanations to further showcase the alternative approach, especially if the notebook is intended for educational purposes.

426-440: Ensure that the tools assigned to agents, such as search_tool, are utilized effectively in their tasks to fully leverage the capabilities of the multi-agent system.

458-512: Review the verbose logging level in the Crew definition to ensure it's appropriate for the intended use case, as it may produce extensive output that could overwhelm users or obscure important information.

865-871: Enhance the result handling for clarity and context, especially if the notebook is intended for production or broader educational use. Consider using more structured output or visualizations to present the results.

891-896: Add comments to explain the purpose and functionality of the result handling and condition evaluation for clarity, especially if the notebook is intended for educational purposes or wider distribution.

908-908: Ensure that the report's conclusions are based on accurate and up-to-date information, especially if the notebook's analysis is used for decision-making or educational purposes.

925-932: Ensure that the outlined improvements for sentence generation and script execution are implemented systematically and tested thoroughly to enhance the notebook's functionality and accuracy.

940-969: Consider adding more examples or explanations to further illustrate the approach to analyzing prediction market questions, especially if the notebook is intended for a broad audience.

prediction_market_agent/agents/utils.py

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py

crewai_multiple_agent.ipynb

prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py

prediction_market_agent/agents/known_outcome_agent/benchmark.py

prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py

kongzii · 2024-04-05T07:37:35Z

Seems like the benchmark was run only on 1 market, that's not very useful, I guess at least 50 would be nice

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py

…utcomes # Conflicts: # poetry.lock

coderabbitai

Actionable comments posted: 1

Review Status

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between a89ec31 and 766380e.

Files ignored due to path filters (1)

pyproject.toml is excluded by !**/*.toml

Files selected for processing (6)

prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py (1 hunks)
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py (1 hunks)
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py (1 hunks)
prediction_market_agent/agents/crewai_subsequential_agent/prompts.py (1 hunks)
prediction_market_agent/agents/known_outcome_agent/deploy.py (2 hunks)
prediction_market_agent/tools/crewai_tools.py (1 hunks)

Files skipped from review as they are similar to previous changes (5)

prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py
prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py
prediction_market_agent/agents/crewai_subsequential_agent/deploy.py
prediction_market_agent/agents/crewai_subsequential_agent/prompts.py
prediction_market_agent/agents/known_outcome_agent/deploy.py

Additional comments not posted (1)

prediction_market_agent/tools/crewai_tools.py (1)

9-11: LGTM! The TavilyDevToolSchema class correctly uses Pydantic for input validation, ensuring that search_query is mandatory.

prediction_market_agent/tools/crewai_tools.py

…utcomes # Conflicts: # poetry.lock

coderabbitai

Actionable comments posted: 2

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py

prediction_market_agent/tools/crewai_tools.py

coderabbitai

Actionable comments posted: 2

prediction_market_agent/tools/crewai_tools.py

prediction_market_agent/utils.py

prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py

gabrielfior · 2024-04-11T10:21:35Z

Here a few benchmarks for 20 markets and 50 markets

20 markets

Comparison Report

Market Results

Number of markets	Proportion resolved	Proportion YES	Proportion NO
20	0	0.4	0.6

Agent Results

Summary Statistics

Agents	MSE for `p_yes`	Mean confidence	% within +-0.05	% within +-0.1	% within +-0.2	% correct outcome	% precision for `yes`	% precision for `no`	% recall for `yes`	% recall for `no`	confidence/p_yes error correlation	Proportion answerable	Proportion answered
subsequential-questions-crewai	0.0866736	0.8675	10	15	45	65	55.5556	72.7273	62.5	66.6667	0.0750478	1	1
random	0.172009	0.521477	20	30	50	55	33.3333	58.8235	12.5	83.3333	-0.272218	1	1
fixed-no	0.278555	1	15	25	25	60	0	60	0	100	nan	1	1
fixed-yes	0.415312	1	15	15	15	40	40	0	100	0	nan	1	1

Markets

Market Question	subsequential-questions-crewai p_yes	random p_yes	fixed-yes p_yes	reference p_yes
Will the LK-99 room temp, ambient pressure superconductivity pre-print replicate before 2025?	0.20 [NO]	0.07 [NO]	1.00 [YES]	0.03 [NO]
Will Biden be the 2024 Democratic Nominee?	0.40 [NO]	0.21 [NO]	1.00 [YES]	0.96 [YES]
Will Joe Biden win the 2024 US Presidential Election?	0.69 [YES]	0.54 [YES]	1.00 [YES]	0.48 [NO]
Will Andrew Tate be found guilty of human (sex) trafficking?	0.10 [NO]	0.38 [NO]	1.00 [YES]	0.60 [YES]
Will AI be a major topic during the 2024 presidential debates in the United States? (please read criteria)	0.82 [YES]	0.09 [NO]	1.00 [YES]	0.34 [NO]
In 2028, will an AI be able to generate a full high-quality movie to a prompt?	0.32 [NO]	0.33 [NO]	1.00 [YES]	0.36 [NO]
Will Donald Trump be the Republican nominee for president in 2024?	0.95 [YES]	0.12 [NO]	1.00 [YES]	0.98 [YES]
Will Donald Trump win the 2024 presidential election?	0.40 [NO]	0.37 [NO]	1.00 [YES]	0.51 [YES]
Will an AI get gold on any International Math Olympiad by 2025?	0.68 [YES]	0.02 [NO]	1.00 [YES]	0.20 [NO]
In 2028, will AI be at least as big a political issue as abortion?	0.20 [NO]	0.38 [NO]	1.00 [YES]	0.45 [NO]
Will OpenAI hint at or claim to have AGI by 2025 end?	0.00 [NO]	0.00 [NO]	1.00 [YES]	0.22 [NO]
Will either Joe Biden or Donald Trump be elected President in 2024?	0.80 [YES]	0.18 [NO]	1.00 [YES]	0.97 [YES]
Will the average global temperature in 2024 exceed 2023?	0.75 [YES]	0.58 [YES]	1.00 [YES]	0.60 [YES]
Will GPT-5 be released before 2025?	0.90 [YES]	0.26 [NO]	1.00 [YES]	0.67 [YES]
Will Congress pass a bill in 2024 to ban TikTok in the US or force it to change ownership?	0.80 [YES]	0.50 [NO]	1.00 [YES]	0.49 [NO]
Will Joe Biden get impeached in his first term?	0.20 [NO]	0.23 [NO]	1.00 [YES]	0.08 [NO]
Will Aella be romantically or sexually involved with Destiny by the end of 2024?	0.13 [NO]	0.39 [NO]	1.00 [YES]	0.07 [NO]
Did COVID-19 come from a laboratory?	0.95 [YES]	0.25 [NO]	1.00 [YES]	0.55 [YES]
Will AI wipe out humanity before the year 2030?	0.35 [NO]	0.97 [YES]	1.00 [YES]	0.03 [NO]
Will Threads have more daily active users than Twitter by the end of 2024?	0.20 [NO]	0.21 [NO]	1.00 [YES]	0.04 [NO]

Expected value

Agent	Mean expected returns	Median expected returns	Total expected returns
subsequential-questions-crewai	29.2825	24.017	585.651
random	6.0579	5.99164	121.158
fixed-no	13.6758	6.53544	273.515
fixed-yes	-13.6758	-6.53544	-273.515

Market Question	subsequential-questions-crewai	random	fixed-no	fixed-yes
Will the LK-99 room temp, ambient pressure superconductivity pre-print replicate before 2025?	94.2138	94.2138	94.2138	-94.2138
Will Biden be the 2024 Democratic Nominee?	-92.3114	-92.3114	-92.3114	92.3114
Will Joe Biden win the 2024 US Presidential Election?	-3.07088	-3.07088	3.07088	-3.07088
Will Andrew Tate be found guilty of human (sex) trafficking?	-20	-20	-20	20
Will AI be a major topic during the 2024 presidential debates in the United States? (please read criteria)	-32.847	32.847	32.847	-32.847
In 2028, will an AI be able to generate a full high-quality movie to a prompt?	27.8136	27.8136	27.8136	-27.8136
Will Donald Trump be the Republican nominee for president in 2024?	95.2844	-95.2844	-95.2844	95.2844
Will Donald Trump win the 2024 presidential election?	-1.37533	-1.37533	-1.37533	1.37533
Will an AI get gold on any International Math Olympiad by 2025?	-59.1427	59.1427	59.1427	-59.1427
In 2028, will AI be at least as big a political issue as abortion?	10	10	10	-10
Will OpenAI hint at or claim to have AGI by 2025 end?	55.4266	55.4266	55.4266	-55.4266
Will either Joe Biden or Donald Trump be elected President in 2024?	93.5522	-93.5522	-93.5522	93.5522
Will the average global temperature in 2024 exceed 2023?	20.2203	20.2203	-20.2203	20.2203
Will GPT-5 be released before 2025?	34.0547	-34.0547	-34.0547	34.0547
Will Congress pass a bill in 2024 to ban TikTok in the US or force it to change ownership?	-1.98329	1.98329	1.98329	-1.98329
Will Joe Biden get impeached in his first term?	84.8219	84.8219	84.8219	-84.8219
Will Aella be romantically or sexually involved with Destiny by the end of 2024?	85.3588	85.3588	85.3588	-85.3588
Did COVID-19 come from a laboratory?	10	-10	-10	10
Will AI wipe out humanity before the year 2030?	93.3281	-93.3281	93.3281	-93.3281
Will Threads have more daily active users than Twitter by the end of 2024?	92.3068	92.3068	92.3068	-92.3068

50 markets

Comparison Report

Market Results

Number of markets	Proportion resolved	Proportion YES	Proportion NO
50	0	0.36	0.64

Agent Results

Summary Statistics

Agents	MSE for `p_yes`	Mean confidence	% within +-0.05	% within +-0.1	% within +-0.2	% correct outcome	% precision for `yes`	% precision for `no`	% recall for `yes`	% recall for `no`	confidence/p_yes error correlation	Proportion answerable	Proportion answered
subsequential-questions-crewai	0.139858	0.85125	8	16	38	62	48.2759	80.9524	77.7778	53.125	0.167525	1	1
random	0.16555	0.426501	10	24	46	52	39.2857	68.1818	61.1111	46.875	-0.0381193	1	1
fixed-no	0.238667	1	10	20	28	64	0	64	0	100	nan	1	1
fixed-yes	0.440751	1	10	10	12	36	36	0	100	0	nan	1	1

Markets

Market Question	subsequential-questions-crewai p_yes	random p_yes	fixed-yes p_yes	reference p_yes
Will the LK-99 room temp, ambient pressure superconductivity pre-print replicate before 2025?	0.20 [NO]	0.12 [NO]	1.00 [YES]	0.03 [NO]
Will Biden be the 2024 Democratic Nominee?	0.85 [YES]	0.09 [NO]	1.00 [YES]	0.96 [YES]
Will Joe Biden win the 2024 US Presidential Election?	0.40 [NO]	0.67 [YES]	1.00 [YES]	0.49 [NO]
Will Andrew Tate be found guilty of human (sex) trafficking?	0.95 [YES]	0.53 [YES]	1.00 [YES]	0.60 [YES]
Will AI be a major topic during the 2024 presidential debates in the United States? (please read criteria)	0.80 [YES]	0.15 [NO]	1.00 [YES]	0.34 [NO]
In 2028, will an AI be able to generate a full high-quality movie to a prompt?	0.20 [NO]	0.33 [NO]	1.00 [YES]	0.36 [NO]
Will Donald Trump be the Republican nominee for president in 2024?	0.95 [YES]	0.80 [YES]	1.00 [YES]	0.98 [YES]
Will Donald Trump win the 2024 presidential election?	0.80 [YES]	0.31 [NO]	1.00 [YES]	0.51 [YES]
Will an AI get gold on any International Math Olympiad by 2025?	0.55 [YES]	0.70 [YES]	1.00 [YES]	0.20 [NO]
In 2028, will AI be at least as big a political issue as abortion?	0.80 [YES]	0.88 [YES]	1.00 [YES]	0.45 [NO]
Will OpenAI hint at or claim to have AGI by 2025 end?	0.70 [YES]	0.19 [NO]	1.00 [YES]	0.22 [NO]
Will either Joe Biden or Donald Trump be elected President in 2024?	0.80 [YES]	0.55 [YES]	1.00 [YES]	0.97 [YES]
Will the average global temperature in 2024 exceed 2023?	0.99 [YES]	0.93 [YES]	1.00 [YES]	0.60 [YES]
Will GPT-5 be released before 2025?	0.90 [YES]	0.29 [NO]	1.00 [YES]	0.67 [YES]
Will Congress pass a bill in 2024 to ban TikTok in the US or force it to change ownership?	0.20 [NO]	0.59 [YES]	1.00 [YES]	0.49 [NO]
Will Joe Biden get impeached in his first term?	0.21 [NO]	0.02 [NO]	1.00 [YES]	0.08 [NO]
Will Aella be romantically or sexually involved with Destiny by the end of 2024?	0.20 [NO]	0.07 [NO]	1.00 [YES]	0.07 [NO]
Did COVID-19 come from a laboratory?	0.91 [YES]	0.47 [NO]	1.00 [YES]	0.55 [YES]
Will AI wipe out humanity before the year 2030?	0.20 [NO]	0.55 [YES]	1.00 [YES]	0.03 [NO]
Will Threads have more daily active users than Twitter by the end of 2024?	0.15 [NO]	0.39 [NO]	1.00 [YES]	0.04 [NO]
Will AI pass the Longbets version of the Turing test by the end of 2029?	0.26 [NO]	0.65 [YES]	1.00 [YES]	0.57 [YES]
Will a large language model beat a super grandmaster playing chess by 2028?	0.20 [NO]	0.30 [NO]	1.00 [YES]	0.40 [NO]
Will AI wipe out humanity before the year 2100	0.80 [YES]	0.60 [YES]	1.00 [YES]	0.12 [NO]
Will Jimmy Carter become a centenarian?	0.80 [YES]	0.71 [YES]	1.00 [YES]	0.57 [YES]
Will a Democrat win the 2024 US presidential election?	0.00 [NO]	0.63 [YES]	1.00 [YES]	0.48 [NO]
Will a room-temperature, atmospheric pressure superconductor be discovered before 2030?	0.70 [YES]	0.51 [YES]	1.00 [YES]	0.08 [NO]
Will the US enter a recession by the end of 2024?	0.35 [NO]	0.15 [NO]	1.00 [YES]	0.15 [NO]
Will Linda Yaccarino be the CEO of X on April 13, 2024?	0.90 [YES]	0.60 [YES]	1.00 [YES]	0.98 [YES]
By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?	0.85 [YES]	0.20 [NO]	1.00 [YES]	0.49 [NO]
Will the Apple Vision Pro be successful enough to revive interest in mixed reality and the metaverse?	0.85 [YES]	0.46 [NO]	1.00 [YES]	0.28 [NO]
Will the Meissner effect be confirmed near room temperature in copper-substituted lead apatite?	0.85 [YES]	0.69 [YES]	1.00 [YES]	0.04 [NO]
Will this Yudkowsky tweet on AI video generation hold up in 2024?	0.80 [YES]	0.38 [NO]	1.00 [YES]	0.32 [NO]
Will Kizaru betray/defect the Marines?	0.81 [YES]	0.87 [YES]	1.00 [YES]	0.56 [YES]
Will there be an AI language model that surpasses ChatGPT and other OpenAI models before the end of 2024?	0.85 [YES]	0.59 [YES]	1.00 [YES]	0.30 [NO]
In a year, will we think that Sam Altman leaving OpenAI reduced AI risk?	0.80 [YES]	0.97 [YES]	1.00 [YES]	0.11 [NO]
Will Donald Trump win the 2024 US Presidential Election?	0.20 [NO]	0.89 [YES]	1.00 [YES]	0.51 [YES]
Will Sam Altman be the CEO of OpenAI at the end of 2024?	0.85 [YES]	0.19 [NO]	1.00 [YES]	0.96 [YES]
Will Hezbollah directly engage in combat operations against Israel?	0.21 [NO]	0.85 [YES]	1.00 [YES]	0.50 [YES]
Will China launch a full-scale invasion of Taiwan before 2030?	0.20 [NO]	0.49 [NO]	1.00 [YES]	0.23 [NO]
Will Sam Altman be a co-founder of a serious OpenAI competitor by EOY 2024?	0.80 [YES]	0.77 [YES]	1.00 [YES]	0.03 [NO]
Will there be a very reliable way of reading human thoughts by the end of 2024?🧠🕵️	0.20 [NO]	1.00 [YES]	1.00 [YES]	0.12 [NO]
Is the "100% effective against solid tumors" cancer pill AOH1996 paper legit? [see description]	0.00 [NO]	0.57 [YES]	1.00 [YES]	0.23 [NO]
Will Google mostly catch up to OpenAI in LLM quality and neutralize ChatGPT's lead by the end of 2024?	0.85 [YES]	0.70 [YES]	1.00 [YES]	0.42 [NO]
Will this Yudkowsky tweet hold up?	0.00 [NO]	0.86 [YES]	1.00 [YES]	0.84 [YES]
Will GPT-4 be available to ChatGPT Free Users in 2024?	0.80 [YES]	0.33 [NO]	1.00 [YES]	0.70 [YES]
Will a reliable and general household robot be developed before January 1st, 2030?	0.85 [YES]	0.20 [NO]	1.00 [YES]	0.37 [NO]
Will AI wipe out humanity before the year 2040?	0.10 [NO]	0.98 [YES]	1.00 [YES]	0.06 [NO]
Was Ilya spooked by a capabilities advance?	0.00 [NO]	0.79 [YES]	1.00 [YES]	0.08 [NO]
Will Superalignment succeed? (self assessment)	0.20 [NO]	0.30 [NO]	1.00 [YES]	0.23 [NO]
Will Benjamin Netanyahu (Bibi) be the prime minister of Israel at the end of 2024	0.70 [YES]	0.45 [NO]	1.00 [YES]	0.58 [YES]

Expected value

Agent	Mean expected returns	Median expected returns	Total expected returns
subsequential-questions-crewai	15.9045	13.6694	795.226
random	-0.435562	0.802601	-21.7781
fixed-no	20.2083	22.7631	1010.42
fixed-yes	-20.2083	-22.7631	-1010.42

Market Question	subsequential-questions-crewai	random	fixed-no	fixed-yes
Will the LK-99 room temp, ambient pressure superconductivity pre-print replicate before 2025?	94.1633	94.1633	94.1633	-94.1633
Will Biden be the 2024 Democratic Nominee?	92.3114	-92.3114	-92.3114	92.3114
Will Joe Biden win the 2024 US Presidential Election?	2.95499	-2.95499	2.95499	-2.95499
Will Andrew Tate be found guilty of human (sex) trafficking?	20	20	-20	20
Will AI be a major topic during the 2024 presidential debates in the United States? (please read criteria)	-32.847	32.847	32.847	-32.847
In 2028, will an AI be able to generate a full high-quality movie to a prompt?	27.8136	27.8136	27.8136	-27.8136
Will Donald Trump be the Republican nominee for president in 2024?	95.2844	95.2844	-95.2844	95.2844
Will Donald Trump win the 2024 presidential election?	2	-2	-2	2
Will an AI get gold on any International Math Olympiad by 2025?	-59.1427	-59.1427	59.1427	-59.1427
In 2028, will AI be at least as big a political issue as abortion?	-10	-10	10	-10
Will OpenAI hint at or claim to have AGI by 2025 end?	-55.4266	55.4266	55.4266	-55.4266
Will either Joe Biden or Donald Trump be elected President in 2024?	93.5522	93.5522	-93.5522	93.5522
Will the average global temperature in 2024 exceed 2023?	20.2203	20.2203	-20.2203	20.2203
Will GPT-5 be released before 2025?	33.8315	-33.8315	-33.8315	33.8315
Will Congress pass a bill in 2024 to ban TikTok in the US or force it to change ownership?	1.98329	-1.98329	1.98329	-1.98329
Will Joe Biden get impeached in his first term?	84.8219	84.8219	84.8219	-84.8219
Will Aella be romantically or sexually involved with Destiny by the end of 2024?	85.3588	85.3588	85.3588	-85.3588
Did COVID-19 come from a laboratory?	9.67587	-9.67587	-9.67587	9.67587
Will AI wipe out humanity before the year 2030?	93.3281	-93.3281	93.3281	-93.3281
Will Threads have more daily active users than Twitter by the end of 2024?	92.3068	92.3068	92.3068	-92.3068
Will AI pass the Longbets version of the Turing test by the end of 2029?	-14.1888	14.1888	-14.1888	14.1888
Will a large language model beat a super grandmaster playing chess by 2028?	19.3749	19.3749	19.3749	-19.3749
Will AI wipe out humanity before the year 2100	-75.4511	-75.4511	75.4511	-75.4511
Will Jimmy Carter become a centenarian?	14.9105	14.9105	-14.9105	14.9105
Will a Democrat win the 2024 US presidential election?	3.00503	-3.00503	3.00503	-3.00503
Will a room-temperature, atmospheric pressure superconductor be discovered before 2030?	-84.3113	-84.3113	84.3113	-84.3113
Will the US enter a recession by the end of 2024?	70.7833	70.7833	70.7833	-70.7833
Will Linda Yaccarino be the CEO of X on April 13, 2024?	96.5069	96.5069	-96.5069	96.5069
By the end of 2026, will we have transparency into any useful internal pattern within a Large Language Model whose semantics would have been unfamiliar to AI and cognitive science in 2006?	-2.02417	2.02417	2.02417	-2.02417
Will the Apple Vision Pro be successful enough to revive interest in mixed reality and the metaverse?	-44.1657	44.1657	44.1657	-44.1657
Will the Meissner effect be confirmed near room temperature in copper-substituted lead apatite?	-92.9598	-92.9598	92.9598	-92.9598
Will this Yudkowsky tweet on AI video generation hold up in 2024?	-36.7308	36.7308	36.7308	-36.7308
Will Kizaru betray/defect the Marines?	12.4284	12.4284	-12.4284	12.4284
Will there be an AI language model that surpasses ChatGPT and other OpenAI models before the end of 2024?	-39.1446	-39.1446	39.1446	-39.1446
In a year, will we think that Sam Altman leaving OpenAI reduced AI risk?	-78.3376	-78.3376	78.3376	-78.3376
Will Donald Trump win the 2024 US Presidential Election?	-1.45681	1.45681	-1.45681	1.45681
Will Sam Altman be the CEO of OpenAI at the end of 2024?	92.8701	-92.8701	-92.8701	92.8701
Will Hezbollah directly engage in combat operations against Israel?	-0.148389	0.148389	-0.148389	0.148389
Will China launch a full-scale invasion of Taiwan before 2030?	54.3198	54.3198	54.3198	-54.3198
Will Sam Altman be a co-founder of a serious OpenAI competitor by EOY 2024?	-94.4222	-94.4222	94.4222	-94.4222
Will there be a very reliable way of reading human thoughts by the end of 2024?🧠🕵️	75.2882	-75.2882	75.2882	-75.2882
Is the "100% effective against solid tumors" cancer pill AOH1996 paper legit? [see description]	54.7358	-54.7358	54.7358	-54.7358
Will Google mostly catch up to OpenAI in LLM quality and neutralize ChatGPT's lead by the end of 2024?	-15.0331	-15.0331	15.0331	-15.0331
Will this Yudkowsky tweet hold up?	-67.6783	67.6783	-67.6783	67.6783
Will GPT-4 be available to ChatGPT Free Users in 2024?	39.473	-39.473	-39.473	39.473
Will a reliable and general household robot be developed before January 1st, 2030?	-26.1514	26.1514	26.1514	-26.1514
Will AI wipe out humanity before the year 2040?	88.9162	-88.9162	88.9162	-88.9162
Was Ilya spooked by a capabilities advance?	83.4589	-83.4589	83.4589	-83.4589
Will Superalignment succeed? (self assessment)	53.6813	53.6813	53.6813	-53.6813
Will Benjamin Netanyahu (Bibi) be the prime minister of Israel at the end of 2024	15.4881	-15.4881	-15.4881	15.4881

…utcomes # Conflicts: # poetry.lock

gabrielfior added 2 commits March 28, 2024 19:20

initial notebook

905a5e8

Added agent that asks subsequential questions

a3100f5

gabrielfior linked an issue Apr 3, 2024 that may be closed by this pull request

Make one of the agents think about the question more thoroughly #40

Closed

kongzii reviewed Apr 4, 2024

View reviewed changes

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py Outdated Show resolved Hide resolved

kongzii reviewed Apr 4, 2024

View reviewed changes

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py Outdated Show resolved Hide resolved

kongzii reviewed Apr 4, 2024

View reviewed changes

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py Outdated Show resolved Hide resolved

kongzii reviewed Apr 4, 2024

View reviewed changes

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py Outdated Show resolved Hide resolved

kongzii reviewed Apr 4, 2024

View reviewed changes

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py Outdated Show resolved Hide resolved

kongzii reviewed Apr 4, 2024

View reviewed changes

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py Outdated Show resolved Hide resolved

evangriffiths reviewed Apr 4, 2024

View reviewed changes

gabrielfior added 3 commits April 4, 2024 17:06

Agent working locally

5767944

Merge remote-tracking branch 'origin/main' into gabriel/alternative-o…

8d32eb6

…utcomes # Conflicts: # poetry.lock

Updated poetry.lock

a552ece

Added benchmark to agent_subquestions

3f18c0c

gabrielfior commented Apr 4, 2024

View reviewed changes

prediction_market_agent/agents/crewai_subsequential_agent/benchmark.py Outdated Show resolved Hide resolved

Finishing touches

a89ec31

gabrielfior marked this pull request as ready for review April 4, 2024 21:52

gabrielfior requested review from kongzii and evangriffiths April 4, 2024 21:53

coderabbitai bot reviewed Apr 4, 2024

View reviewed changes

kongzii reviewed Apr 5, 2024

View reviewed changes

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py Outdated Show resolved Hide resolved

gabrielfior added 3 commits April 5, 2024 11:09

Merge remote-tracking branch 'origin/main' into gabriel/alternative-o…

909e047

…utcomes # Conflicts: # poetry.lock

Fixing model gpt-4-turbo-preview

996f94a

Adding benchmark.py EVO-style

766380e

coderabbitai bot reviewed Apr 5, 2024

View reviewed changes

prediction_market_agent/tools/crewai_tools.py Outdated Show resolved Hide resolved

gabrielfior added 3 commits April 5, 2024 20:08

Updating dependencies

d44a7f0

Merge remote-tracking branch 'origin/main' into gabriel/alternative-o…

0aeb3c2

…utcomes # Conflicts: # poetry.lock

Benchmark generated for subsequential agent (5 markets)

f1cfadc

coderabbitai bot reviewed Apr 9, 2024

View reviewed changes

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py Outdated Show resolved Hide resolved

prediction_market_agent/agents/crewai_subsequential_agent/crewai_agent_subquestions.py Outdated Show resolved Hide resolved

kongzii reviewed Apr 10, 2024

View reviewed changes

prediction_market_agent/agents/crewai_subsequential_agent/deploy.py Outdated Show resolved Hide resolved

kongzii reviewed Apr 10, 2024

View reviewed changes

prediction_market_agent/tools/crewai_tools.py Outdated Show resolved Hide resolved

(WIP) Implemented PR comments, not yet final

6513138

coderabbitai bot reviewed Apr 10, 2024

View reviewed changes

prediction_market_agent/tools/crewai_tools.py Outdated Show resolved Hide resolved

prediction_market_agent/utils.py Show resolved Hide resolved

Executed benchmark for 50 markets

787d670