New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Carl/marketing test diagnostics #1

Open

carl-offerfit wants to merge 8 commits into main from carl/marketing-test-diagnostics

Owner

carl-offerfit commented Nov 12, 2024

No description provided.

carl-offerfit and others added 5 commits

November 12, 2024 09:51


          Add packages collinearity, xgboost

662a82b

_otholearn: a diagnostic level parameter controls more detailed output. Uses logging
_rlearner: Add r-squared and pearson r correlation analysis to score. Pass in the diagnostic_level argument
dml: Pass the diagnostic_level_argument
linear_model: Use logger and print info on the get_optimal_alpha optimization


          Marketing data test script. Features include:

90fd97e

- Includes collinearity based feature reduction and log scaling of inputs
- Balanced downsampling by treatment (maintains conversion balance for each treatment)
- XGBoost wrappers with more interpretable score functions for imbalanced data
- Cross validation loop on all models (first stage, final) supporting SparseLinearDML and CausalForestDML, saves score results to a file


          Comment out effects calc for now - its really slow

bf70ac9


          Option to do/do-not attempt the DRTesting evaluation

9cf1b2c


          make the script work with a sample file

958e194

kevinofferfit reviewed

View reviewed changes

scripts/marketing_data_test.py Show resolved Hide resolved

kevinofferfit reviewed

View reviewed changes

Collaborator

kevinofferfit left a comment

Add one comment to finish my sample test with a sample client.

kevinofferfit reviewed

View reviewed changes

scripts/marketing_data_test.py Outdated

+                  T_feat = data['a_processed']
+                  y = data['real_reward']
+                  treat_df = pd.DataFrame(T_feat)

Collaborator

kevinofferfit Nov 20, 2024

Based on my local debugger, T_feat is already a pd.DataFrame. Do we need to make it a dataframe again?

kevinofferfit and others added 2 commits

November 20, 2024 11:49


          allow downsampling of treatment effects

b4a7cda


          Merge pull request #2 from carl-offerfit/kevin/run-with-sample-fine

87ae434

Marketing Data Test - Miscellaneous Updates

kevinofferfit reviewed

View reviewed changes

scripts/marketing_data_test.py

+                                      model_y=XGBClassAUCScore(**y_params),
+                                      discrete_outcome=True,
+                                      cv=cv_folds,
+                                      **dml_params,

Collaborator

kevinofferfit Nov 26, 2024 •

edited

Loading

What's the rationale behind using XGBRegR2Score() for model_t and XGBClassAUCScore for model_y? I was under the impression that we want to use apples-to-apples estimators for T and Y.

kevinofferfit reviewed

View reviewed changes

scripts/marketing_data_test.py

+                      # TODO: Confirm with Carl that redefining treat_est_combo is fine since we removed
+                      # collinear features.
+                      treat_est_combo = pd.concat([X,T], axis=1)
+                      noncausal_est.fit(treat_est_combo, y)

Collaborator

kevinofferfit Nov 26, 2024

Do we need to fit standard model? We will be fitting CausalForestDML() below, and based on the documentation, it has parameter honest = True by default. This means the est = CausalForestDML(...) will train by splitting into 2 equal subsets anyway. I'm basing the definition of honesty from this paper here (section 2.4).

kevinofferfit reviewed

View reviewed changes

scripts/marketing_data_test.py

+                                  output_results.write(f"{fold_score},")
+                              for fold_scores in est.nuisance_scores_t[0]:
+                                  for score in fold_scores:
+                                      output_results.write(f"{score},")

Collaborator

kevinofferfit Nov 26, 2024 •

edited

Loading

I'm still trying to understand why we want to save nuisance_scores_t for each treatment. For example, we can directly get a summary of mean point estimate using est.summary().

kevinofferfit reviewed

View reviewed changes

Collaborator

kevinofferfit left a comment

Added additional comments after I started reading the causal forest paper.


          Small fixes

3f38e0c

Use concat instead of merge to make the combined old/new model data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet