-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Carl/marketing test diagnostics #1
base: main
Are you sure you want to change the base?
Conversation
_otholearn: a diagnostic level parameter controls more detailed output. Uses logging _rlearner: Add r-squared and pearson r correlation analysis to score. Pass in the diagnostic_level argument dml: Pass the diagnostic_level_argument linear_model: Use logger and print info on the get_optimal_alpha optimization
- Includes collinearity based feature reduction and log scaling of inputs - Balanced downsampling by treatment (maintains conversion balance for each treatment) - XGBoost wrappers with more interpretable score functions for imbalanced data - Cross validation loop on all models (first stage, final) supporting SparseLinearDML and CausalForestDML, saves score results to a file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add one comment to finish my sample test with a sample client.
scripts/marketing_data_test.py
Outdated
T_feat = data['a_processed'] | ||
y = data['real_reward'] | ||
|
||
treat_df = pd.DataFrame(T_feat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my local debugger, T_feat is already a pd.DataFrame. Do we need to make it a dataframe again?
Marketing Data Test - Miscellaneous Updates
model_y=XGBClassAUCScore(**y_params), | ||
discrete_outcome=True, | ||
cv=cv_folds, | ||
**dml_params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale behind using XGBRegR2Score()
for model_t
and XGBClassAUCScore
for model_y
? I was under the impression that we want to use apples-to-apples estimators for T
and Y
.
# TODO: Confirm with Carl that redefining treat_est_combo is fine since we removed | ||
# collinear features. | ||
treat_est_combo = pd.concat([X,T], axis=1) | ||
noncausal_est.fit(treat_est_combo, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to fit standard model? We will be fitting CausalForestDML()
below, and based on the documentation, it has parameter honest = True
by default. This means the est = CausalForestDML(...)
will train by splitting into 2 equal subsets anyway. I'm basing the definition of honesty from this paper here (section 2.4).
output_results.write(f"{fold_score},") | ||
for fold_scores in est.nuisance_scores_t[0]: | ||
for score in fold_scores: | ||
output_results.write(f"{score},") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added additional comments after I started reading the causal forest paper.
Use concat instead of merge to make the combined old/new model data
No description provided.