-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for closed markets benchmarks + a little cleaning up #17
Conversation
Warning Rate Limit Exceeded@kongzii has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 5 minutes and 1 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. WalkthroughThe updates focus on enhancing the prediction market agent tooling, particularly around agent benchmarking and market analysis. Key modifications include the introduction of new methods and properties for more nuanced prediction capabilities and market analysis, the addition of enums for market filtering and resolution, and refinements in how predictions are evaluated and reported. These changes aim to provide more detailed, flexible, and accurate tools for analyzing and interacting with prediction markets. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 7
Configuration used: CodeRabbit UI
Files selected for processing (4)
- prediction_market_agent_tooling/benchmark/agents.py (3 hunks)
- prediction_market_agent_tooling/benchmark/benchmark.py (7 hunks)
- prediction_market_agent_tooling/benchmark/utils.py (7 hunks)
- tests/test_benchmark.py (4 hunks)
Additional comments: 7
prediction_market_agent_tooling/benchmark/agents.py (3)
- 16-20: The introduction of the
is_predictable
method in place ofevaluate
is a significant change. Ensure that all subclasses and usages of the previously existingevaluate
method have been updated accordingly to prevent any runtime errors.- 28-32: The
check_and_predict
method provides a streamlined way to check predictability before making a prediction. This is a good practice as it encapsulates the predictability check and prediction within a single method, reducing the risk of errors and improving code readability.- 58-74: The
check_and_predict_restricted
method effectively combines the predictability check with the restricted prediction process. This method ensures that predictions adhere to the specified time restrictions, which is crucial for maintaining the integrity of predictions in scenarios with time-sensitive information.prediction_market_agent_tooling/benchmark/benchmark.py (4)
- 16-16: The addition of
MarketResolution
to the import statements indicates the introduction of a new enum to handle market resolution statuses. This change aligns with the PR objectives to refine benchmarking logic based on market resolution status.- 272-274: The comparison logic for determining the correct outcome percentage has been updated to use the
probable_resolution
field from both prediction and market objects. This change is part of the refinement of benchmarking logic based on market resolution status and is a logical update to ensure outcome comparison is aligned with the probable resolution of markets.- 289-300: The method
_compute_precision_and_recall_percentages
now uses theprobable_resolution
field to calculate precision and recall percentages foryes
andno
outcomes. This update is part of the benchmarking logic refinement and ensures that the evaluation metrics are accurately calculated based on the probable resolution of markets. The logic appears correct and aligns with the objectives of enhancing the benchmarking process.- 352-352: The method
_compute_ratio_evaluated_as_answerable
now calculates the proportion of predictions evaluated as answerable based on theis_predictable
field. This change supports the enhancement of agent predictability by accurately reflecting the proportion of markets where agents could make predictions.
prediction = ( | ||
agent.check_and_predict(market_question=market.question) | ||
if not market.is_resolved | ||
else agent.check_and_predict_restricted( | ||
market_question=market.question, | ||
time_restriction_up_to=market.created_time, # TODO: Add support for resolved_at and any time in between. | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic within get_prediction_result
method has been updated to differentiate between resolved and unresolved markets, using check_and_predict
for the latter and check_and_predict_restricted
for the former. This change is part of enhancing agent predictability in various market conditions. However, there's a TODO
comment regarding adding support for resolved_at
and any time in between, which suggests incomplete implementation for handling different time restrictions.
Consider implementing the handling for resolved_at
and other time restrictions to fully support the new prediction capabilities in restricted or closed market scenarios.
f"{p.outcome_prediction.p_yes:.2f} [{p.outcome_prediction.probable_resolution.value}]" | ||
if p.is_predictable | ||
and p.outcome_prediction # Is answerable and answered | ||
else "N/A" | ||
if not p.evaluation | ||
and not p.outcome_prediction # Not evaluated for some reason | ||
else "S" | ||
if p.evaluation | ||
and not p.evaluation.is_predictable # Skipped (evaluated to be not predictable) | ||
else "F" | ||
if p.evaluation | ||
and p.evaluation.is_predictable | ||
and not p.outcome_prediction # Failed (no prediction) | ||
else should_not_happen( | ||
f"Unexpected case in get_markets_summary() for {p}." | ||
else ( | ||
"S" | ||
if not p.is_predictable # Skipped (evaluated to be not predictable) | ||
else ( | ||
"F" | ||
if p.is_predictable | ||
and not p.outcome_prediction # Failed (no prediction) | ||
else should_not_happen( | ||
f"Unexpected case in get_markets_summary() for {p}." | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_markets_summary
method has been updated to include the probable_resolution
in the market summary generation. This change enhances the summary information by providing insights into the resolution status of markets. However, the handling of unexpected cases using should_not_happen
might not be the best approach for error handling. It's recommended to log unexpected cases or handle them more gracefully to avoid potential crashes or uninformative error messages.
Consider improving the error handling in the get_markets_summary
method to log unexpected cases or handle them more gracefully, ensuring the application remains robust and informative in case of errors.
- else should_not_happen(f"Unexpected case in get_markets_summary() for {p}.")
+ else "Unexpected case" # Log or handle unexpected cases more gracefully
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
f"{p.outcome_prediction.p_yes:.2f} [{p.outcome_prediction.probable_resolution.value}]" | |
if p.is_predictable | |
and p.outcome_prediction # Is answerable and answered | |
else "N/A" | |
if not p.evaluation | |
and not p.outcome_prediction # Not evaluated for some reason | |
else "S" | |
if p.evaluation | |
and not p.evaluation.is_predictable # Skipped (evaluated to be not predictable) | |
else "F" | |
if p.evaluation | |
and p.evaluation.is_predictable | |
and not p.outcome_prediction # Failed (no prediction) | |
else should_not_happen( | |
f"Unexpected case in get_markets_summary() for {p}." | |
else ( | |
"S" | |
if not p.is_predictable # Skipped (evaluated to be not predictable) | |
else ( | |
"F" | |
if p.is_predictable | |
and not p.outcome_prediction # Failed (no prediction) | |
else should_not_happen( | |
f"Unexpected case in get_markets_summary() for {p}." | |
) | |
) | |
f"{p.outcome_prediction.p_yes:.2f} [{p.outcome_prediction.probable_resolution.value}]" | |
if p.is_predictable | |
and p.outcome_prediction # Is answerable and answered | |
else ( | |
"S" | |
if not p.is_predictable # Skipped (evaluated to be not predictable) | |
else ( | |
"F" | |
if p.is_predictable | |
and not p.outcome_prediction # Failed (no prediction) | |
else "Unexpected case" # Log or handle unexpected cases more gracefully |
if prediction.outcome_prediction.probable_resolution == MarketResolution.YES | ||
and market.yes_outcome_price > 0 | ||
else 0 | ||
) | ||
no_shares = ( | ||
bet_units / 0.5 # market.no_outcome_price | ||
if prediction.outcome_prediction.p_yes <= buy_yes_threshold | ||
if prediction.outcome_prediction.probable_resolution == MarketResolution.NO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The calculate_expected_returns
method has been updated to include logic based on the probable_resolution
field. This change is part of the updates to expected returns calculation based on the probable resolution of markets. The logic appears correct and aligns with the objectives of refining the benchmarking process. However, the hardcoded bet units and the assumption of a 50/50 market state might limit the flexibility and accuracy of the expected returns calculation.
Consider adding support for dynamic bet sizes and more accurate share price calculations to improve the flexibility and accuracy of the expected returns calculation.
def is_predictable_restricted( | ||
self, | ||
market_question: str, | ||
time_restriction_up_to: datetime, | ||
) -> bool: | ||
""" | ||
Override if the agent can decide to not predict the question, before doing the hard work. | ||
Data used for the evaluation must be restricted to the time_restriction_up_to. | ||
""" | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_predictable_restricted
method introduces time-based restrictions for predictions. It's crucial to ensure that all data used for prediction respects the time_restriction_up_to
parameter to avoid leaking future information into the prediction process.
Consider adding a concrete implementation example or guidance in the method's documentation to help developers understand how to enforce the time restriction correctly.
def predict_restricted( | ||
self, | ||
market_question: str, | ||
time_restriction_up_to: datetime, | ||
) -> Prediction: | ||
""" | ||
Predict the outcome of the market question. | ||
Data used for the prediction must be restricted to the time_restriction_up_to. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The predict_restricted
method correctly raises NotImplementedError
, indicating that subclasses must provide their own implementation. This is consistent with the abstract nature of the AbstractBenchmarkedAgent
class. However, it would be beneficial to include documentation or examples on how to implement this method, especially regarding the handling of time_restriction_up_to
.
Would you like me to provide an example implementation or further guidance on handling the time_restriction_up_to
parameter?
def predict(self, market_question: str) -> Prediction: | ||
p_yes, confidence = random.random(), random.random() | ||
return Prediction( | ||
evaluation=evaluated, | ||
outcome_prediction=OutcomePrediction( | ||
p_yes=p_yes, | ||
confidence=confidence, | ||
info_utility=None, | ||
), | ||
) | ||
|
||
def predict_restricted( | ||
self, market_question: str, time_restriction_up_to: datetime | ||
) -> Prediction: | ||
return self.predict(market_question) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RandomAgent
class's predict_restricted
method simply delegates to the predict
method without considering the time_restriction_up_to
parameter. This might not align with the intended use of predict_restricted
, which is supposed to restrict data based on time. Ensure that the implementation of predict_restricted
in RandomAgent
(and other agents if applicable) correctly handles the time restriction.
Consider implementing a time-restricted version of the prediction logic in RandomAgent.predict_restricted
to ensure it adheres to the intended behavior of making predictions with time-based restrictions.
def predict(self, market_question: str) -> Prediction: | ||
p_yes, confidence = 1.0 if self.fixed_answer else 0.0, 1.0 | ||
return Prediction( | ||
evaluation=evaluated, | ||
outcome_prediction=OutcomePrediction( | ||
p_yes=p_yes, | ||
confidence=confidence, | ||
info_utility=None, | ||
), | ||
) | ||
|
||
def predict_restricted( | ||
self, market_question: str, time_restriction_up_to: datetime | ||
) -> Prediction: | ||
return self.predict(market_question) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the RandomAgent
, the FixedAgent
class's predict_restricted
method does not account for the time_restriction_up_to
parameter and directly calls the predict
method. This could lead to incorrect behavior if the prediction logic is supposed to be time-sensitive.
It's recommended to adjust the FixedAgent.predict_restricted
method to ensure it respects the time_restriction_up_to
parameter, providing accurate predictions based on the specified time restrictions.
@@ -13,51 +13,84 @@ def __init__(self, agent_name: str, max_workers: t.Optional[int] = None): | |||
self.agent_name = agent_name | |||
self.max_workers = max_workers # Limit the number of workers that can run this worker in parallel threads | |||
|
|||
def evaluate(self, market_question: str) -> EvaluatedQuestion: | |||
raise NotImplementedError | |||
def is_predictable(self, market_question: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced the evaluate research predict
pattern to just the is-predictable predict
pattern, where is-predictable
defaults to True now. So agents are free to be implemented anyhow they want.
return Prediction(is_predictable=is_predictable) | ||
return self.predict(market_question=market_question) | ||
|
||
def is_predictable_restricted( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus, there are _restricted
versions of them, they are called instead of the original ones, if we are doing benchmark on resolved markets.
|
||
@property | ||
def is_resolved(self) -> bool: | ||
return self.resolution is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the original is_resolved
and changed it to this property to not have two sources of the same information.
else MarketResolution.YES | ||
if self.p_yes > 0.5 | ||
else MarketResolution.NO | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added probable_resolution
, which can be used always, no matter if it is an open or resolved market.
Motivation is that if the market is resolved, it doesn't mean that if the final p_yes is >0.5, the market was resolved to True.
def binary_answer(self) -> bool: | ||
return self.p_yes > 0.5 | ||
def probable_resolution(self) -> MarketResolution: | ||
return MarketResolution.YES if self.p_yes > 0.5 else MarketResolution.NO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it to the outcome prediction as well, just for consistency + we don't need to always compare >0.5.
@@ -233,7 +298,3 @@ def get_llm_api_call_cost( | |||
model_cost += model_costs[model]["completion_tokens"] * completion_tokens | |||
model_cost /= 1000 | |||
return model_cost | |||
|
|||
|
|||
def should_not_happen(message: str, E: t.Type[Exception] = RuntimeError) -> t.NoReturn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was duplicate, it's defined in the other utils as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- prediction_market_agent_tooling/benchmark/utils.py (8 hunks)
Files skipped from review as they are similar to previous changes (1)
- prediction_market_agent_tooling/benchmark/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- prediction_market_agent_tooling/benchmark/utils.py (8 hunks)
Files skipped from review as they are similar to previous changes (1)
- prediction_market_agent_tooling/benchmark/utils.py
Summary by CodeRabbit
MarketResolution
enum for improved prediction evaluation based on market resolution.