-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pod DNS error and Pod DNS spoof litmus tests validations and TOTAL_CHAOS_DURATION issue #564
Comments
Some applications cache DNS results; if the results are cached before the chaos experiment is injected, you will not see the experiment's effects. How did you validate if the DNS error was successful or not? Will be good if you could share that shell script you used and also your chaos engine spec for the experiment |
Thanks @gdsoumya for the response. #!/bin/bash
while :
do
curl [nginx](http://google.com/) >> /tmp/outputtrace.log && sleep 0.01;
curl -LI [nginx](http://google.com/) -o /dev/null -w '%{http_code}\n' -s >> /tmp/outputstatus.log && sleep 0.01;
done Following is the chaos engine spec for the experiment: apiVersion: [litmuschaos.io/v1alpha1](http://litmuschaos.io/v1alpha1)
kind: ChaosEngine
metadata:
name: dns-error
spec:
engineState: "active"
annotationCheck: "false"
appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
chaosServiceAccount: pod-dns-error-sa
jobCleanUpPolicy: retain
experiments:
- name: pod-dns-error
spec:
components:
env:
- name: CONTAINER_RUNTIME
value: containerd
## comma separated list of host names
## if not provided, all hostnames/domains will be targeted
- name: TARGET_HOSTNAMES
value: '["nginx"]'
- name: TOTAL_CHAOS_DURATION
value: '500' |
Are you running the DNS chaos on the nginx pod or on some other pod that is accessing to nginx? Also maybe try using the fully hostname for the service like |
Running the DNS chaos on the same nginx pod. |
If you run the experiment on the same nginx pod then it might not show the effect properly, you need to run it on the pod where you want the DNS requests to fail for example start a new pod and run the chaos on that and try using dig/curl in that pod to access nginx. Any other domain/host will not be affected because you mentioned target as just nginx so google.com will not be affected. Also just confirming that you updated the chaos engine with the full service hostname? |
We specify the app on which the chaos should run as following in the chaos engine spec right? appinfo:
appns: "default"
applabel: "app=nginx"
appkind: "deployment"
So in this case the expected behavior should be that we should see the DNS errors for the hostnames specified in TARGET_HOSTNAMES, when we try to run the curl from pods itself? Also just confirming that you updated the chaos engine with the full service hostname? yes, I also tried removing the TARGET_HOSTNAMES variable completely(which should target all hostnames) but not able to validate the chaos. |
which container env are you using? is it containerd? |
yes containerd. |
containerd has had some issues with DNS chaos can you check the logs of the helper pod and confirm if there are any errors being reported in there. |
The helper pod is getting deleted immediately. I tried this experiment multiple times, sometimes the helper pods also went in the error state but the final chaos result was still showing as passed. |
Is it getting deleted immediately as soon as the experiment starts? |
yes |
@gdsoumya , I just created a new cluster with docker runtime and the DNS chaos worked as expected. It looks like it has some issues with containerd. |
it should affect all pods as far as I know, can you set the pod affect percentage to 100% and see. Tagging @ispeakc0de for further support on pods affected. |
For Pod DNS error litmus experiment, we followed the steps(https://litmuschaos.github.io/litmus/experiments/categories/pods/pod-dns-error/#ramp-time) to generate a chaos for the target hostname(nginx), we also ran a shell script to validate if the chaos is injected. But we were not able to identify the chaos injection for the application pods, since the hostname DNS was still working during the entire duration of chaos.
We also wanted to debug this more by increasing the TOTAL_CHAOS_DURATION to a higher value(like 300 seconds), but even after increasing the chaos duration, the chaos experiment completes within 30-40 seconds. Can you please confirm if there is any other configuration we can use to increase chaos duration or if we can validate the chaos experiment? We also noticed the similar behavior for POD DNS Spoof experiment.
The text was updated successfully, but these errors were encountered: