Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carl/marketing test diagnostics #1

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

carl-offerfit
Copy link
Owner

No description provided.

carl-offerfit and others added 5 commits November 12, 2024 09:51
_otholearn: a diagnostic level parameter controls more detailed output. Uses logging
_rlearner: Add r-squared and pearson r correlation analysis to score. Pass in the diagnostic_level argument
dml: Pass the diagnostic_level_argument
linear_model: Use logger and print info on the get_optimal_alpha optimization
- Includes collinearity based feature reduction and log scaling of inputs
- Balanced downsampling by treatment (maintains conversion balance for each treatment)
- XGBoost wrappers with more interpretable score functions for imbalanced data
- Cross validation loop on all models (first stage, final) supporting SparseLinearDML and CausalForestDML, saves score results to a file
Copy link
Collaborator

@kevinofferfit kevinofferfit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one comment to finish my sample test with a sample client.

T_feat = data['a_processed']
y = data['real_reward']

treat_df = pd.DataFrame(T_feat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my local debugger, T_feat is already a pd.DataFrame. Do we need to make it a dataframe again?

model_y=XGBClassAUCScore(**y_params),
discrete_outcome=True,
cv=cv_folds,
**dml_params,
Copy link
Collaborator

@kevinofferfit kevinofferfit Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind using XGBRegR2Score() for model_t and XGBClassAUCScore for model_y? I was under the impression that we want to use apples-to-apples estimators for T and Y.

# TODO: Confirm with Carl that redefining treat_est_combo is fine since we removed
# collinear features.
treat_est_combo = pd.concat([X,T], axis=1)
noncausal_est.fit(treat_est_combo, y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to fit standard model? We will be fitting CausalForestDML() below, and based on the documentation, it has parameter honest = True by default. This means the est = CausalForestDML(...) will train by splitting into 2 equal subsets anyway. I'm basing the definition of honesty from this paper here (section 2.4).

output_results.write(f"{fold_score},")
for fold_scores in est.nuisance_scores_t[0]:
for score in fold_scores:
output_results.write(f"{score},")
Copy link
Collaborator

@kevinofferfit kevinofferfit Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to understand why we want to save nuisance_scores_t for each treatment. For example, we can directly get a summary of mean point estimate using est.summary().

image

Copy link
Collaborator

@kevinofferfit kevinofferfit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional comments after I started reading the causal forest paper.

Use concat instead of merge to make the combined old/new model data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants