Modular Causal Graphs #1255

dmly · 2024-09-21T01:13:29Z

dmly
Sep 21, 2024

If I have 20-30 variables in the DAG, runtime for tasks using the Shapley values could be very slow due to the permutation of those variables.

instead, can I decompose this big graph into several smaller ones? Those smaller subgraphs will still belong to the hierarchy of the big graph. The reason is that I can run shapely values for the small graphs and if needed I can go up the hierarchy to run the Shapley values again at the upper graphs.

bloebp · 2024-10-01T14:57:28Z

bloebp
Oct 1, 2024
Maintainer

This might defeat the purpose of Shapley values, as they try to 'fairly' attribute effects to different combinations of variables. If you have smaller sub-graphs, you might not get a fair attribution across these graphs. Instead, you might want to experiment with the parameters for computing the Shapley values (e.g., you can drastically reduce the number of permutations, etc.). You can pass a ShapleyConfig object to the method (https://github.com/py-why/dowhy/blob/main/dowhy/gcm/shapley.py#L43). For example, you can set approximation_method to ShapleyApproximationMethods.PERMUTATION and num_permutations to something like 10 or 20.

1 reply

dmly Oct 3, 2024
Author

I appreciate your detail response. That was my original idea to deal with a sizable graph (<30 nodes) as well. My primary concern is that the Shapley value calculation permutates over the nodes/metrics 2^n times. With the limit set to 20, my understanding is that it is too low and would have similar effect as if I break up the SCM into smaller ones. What make you think limiting the # of permutations would be reasonable?
Best

bloebp · 2024-10-03T17:19:54Z

bloebp
Oct 3, 2024
Maintainer

Setting the number of permutations to 20 does not mean you only evaluate 20 possible subsets, but roughly ~20^2 subsets. While this is still far from 2^n, it often provides a surprisingly good approximation of all possible subsets. So, while you are correct that the number of evaluations is very far from 2^n, the results are often not that far off (of course, this heavily depends on how complex the true interactions really are). At least, you would (randomly) consider interactions between all variables.

Further, you avoid the issue that you might introduce hidden confounders in case of breaking up the SCMs (except it is like a nice tree structure or something).

2 replies

dmly Oct 10, 2024
Author

Thanks for gain for another detail explanation. It's not entirely clear for me the limit of 20 resulting in 20^2 permutations. I will need to read the code to understand it. Thank you.

bloebp Oct 10, 2024
Maintainer

To clarify: Not 20^2 permutations, but subsets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modular Causal Graphs #1255

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Modular Causal Graphs #1255

dmly Sep 21, 2024

Replies: 2 comments · 3 replies

bloebp Oct 1, 2024 Maintainer

dmly Oct 3, 2024 Author

bloebp Oct 3, 2024 Maintainer

dmly Oct 10, 2024 Author

bloebp Oct 10, 2024 Maintainer

dmly
Sep 21, 2024

Replies: 2 comments 3 replies

bloebp
Oct 1, 2024
Maintainer

dmly Oct 3, 2024
Author

bloebp
Oct 3, 2024
Maintainer

dmly Oct 10, 2024
Author

bloebp Oct 10, 2024
Maintainer