Skip to content

Commit

Permalink
Merge pull request #1113 from amitbhatt818/cherryPick
Browse files Browse the repository at this point in the history
[Cherry pick into v1.0.x for GA]
  • Loading branch information
Karthik Satchitanand authored Jan 14, 2020
2 parents bf8104e + 4d82014 commit 00949a1
Show file tree
Hide file tree
Showing 13 changed files with 98 additions and 728 deletions.
2 changes: 2 additions & 0 deletions experiments/generic/container_kill/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
## Experiment Metadata

<table>
<tr>
<th> Name </th>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
<th> Documentation Link </th>
</tr>
<tr>
<td> Drain Node </td>
<td> Node Drain </td>
<td> This experiment drains the node where application pod is running and verifies if it is scheduled on another available node. </td>
<td> <a href="https://docs.litmuschaos.io/docs/drain-node/"> Here </a> </td>
<td> <a href="https://docs.litmuschaos.io/docs/node-drain/"> Here </a> </td>
</tr>
</table>
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,4 @@ spec:
value: ''

command: ["/bin/bash"]
args: ["-c", "ansible-playbook ./experiments/generic/drain_node/drain_node_ansible_logic.yml -i /etc/ansible/hosts -vv; exit 0"]
args: ["-c", "ansible-playbook ./experiments/generic/node_drain/node_drain_ansible_logic.yml -i /etc/ansible/hosts -vv; exit 0"]
20 changes: 13 additions & 7 deletions experiments/generic/pod_network_corruption/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
## Experiment Metadata

| Type | Description | K8s Platform |
| ----- | ------------------------------------------------------------ | ------------ |
| Chaos | Inject network packet corruption into application pod | Any |

## Experient documentation

The corresponding documentation can be found [here](https://docs.litmuschaos.io/docs/pod-network-corruption/)
<table>
<tr>
<th> Name </th>
<th> Description </th>
<th> Documentation Link </th>
</tr>
<tr>
<td> Pod Network Corruption </td>
<td> Inject network packet corruption into application pod
</td>
<td> <a href="https://docs.litmuschaos.io/docs/pod-network-corruption/"> Here </a> </td>
</tr>
</table
14 changes: 14 additions & 0 deletions experiments/kafka/kafka-broker-disk-failure/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
## Experiment Metadata

<table>
<tr>
<th> Name </th>
<th> Description </th>
<th> Documentation Link </th>
</tr>
<tr>
<td> Kafka Broker Disk Failure </td>
<td> Fail kafka broker disk/storage. This experiment causes forced detach of specified disk serving as storage for the Kafka broker pod </td>
<td> <a href="https://docs.litmuschaos.io/docs/kafka-broker-disk-failure/"> Here </a> </td>
</tr>
</table>
69 changes: 14 additions & 55 deletions experiments/kafka/kafka-broker-pod-failure/README.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,14 @@
### Sample ChaosEngine manifest to execute kafka broker kill experiment

- To override experiment defaults, add the ENV variables in `spec.components` of the experiment.

```yml
apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
name: kafka-chaos
namespace: default
spec:
appinfo:
appns: default
applabel: 'app=cp-kafka'
appkind: statefulset
chaosServiceAccount: kafka-sa
monitoring: false
experiments:
- name: kafka-broker-pod-failure
spec:
components:
# choose based on available kafka broker replicas
- name: KAFKA_REPLICATION_FACTOR
value: '3'

# get via "kubectl get pods --show-labels -n <kafka-namespace>"
- name: KAFKA_LABEL
value: 'app=cp-kafka'

- name: KAFKA_NAMESPACE
value: 'default'

# get via "kubectl get svc -n <kafka-namespace>"
- name: KAFKA_SERVICE
value: 'kafka-cp-kafka-headless'

# get via "kubectl get svc -n <kafka-namespace>
- name: KAFKA_PORT
value: '9092'

- name: ZOOKEEPER_NAMESPACE
value: 'default'

# get via "kubectl get pods --show-labels -n <zk-namespace>"
- name: ZOOKEEPER_LABEL
value: 'app=cp-zookeeper'

# get via "kubectl get svc -n <zk-namespace>
- name: ZOOKEEPER_SERVICE
value: 'kafka-cp-zookeeper-headless'

# get via "kubectl get svc -n <zk-namespace>
- name: ZOOKEEPER_PORT
value: '2181'
```
## Experiment Metadata

<table>
<tr>
<th> Name </th>
<th> Description </th>
<th> Documentation Link </th>
</tr>
<tr>
<td> Kafka Broker Pod Failure </td>
<td> Fail kafka leader-broker pods. This experiment causes (forced/graceful) pod failure of specific/random Kafka broker pods</td>
<td> <a href="https://docs.litmuschaos.io/docs/kafka-broker-pod-failure/"> Here </a> </td>
</tr>
</table>
121 changes: 9 additions & 112 deletions experiments/openebs/openebs-pool-container-failure/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,118 +2,15 @@

<table>
<tr>
<th> Type </th>
<th> Description </th>
<th> Storage </th>
<th> K8s Platform </th>
</tr>
<tr>
<td> Chaos </td>
<td> Kill the pool container and check if gets scheduled again </td>
<td> OPENEBS </td>
<td> Any </td>
</tr>
</table>

## Entry-Criteria

- Application services are accessible & pods are healthy
- Application writes are successful

## Exit-Criteria

- Application services are accessible & pods are healthy
- Data written prior to chaos is successfully retrieved/read
- Database consistency is maintained as per db integrity check utils
- Storage target pods are healthy

## Notes

- Typically used as a disruptive test, to cause loss of access to storage pool by killing it.
- The pool pod should start again and it should be healthy.

## Associated Utils

- [pumba/pod_failure_by_sigkill.yaml](/chaoslib/pumba/pod_failure_by_sigkill.yaml)
- [cstor_pool_kill.yml](/experiments/openebs/openebs-pool-container-failure/cstor_pool_kill.yml)

### Procedure

This scenario validates the behaviour of application and OpenEBS persistent volumes in the amidst of chaos induced on storage pool. The litmus experiment fails the specified pool and thereby losing the access to volumes being created on it.

After injecting the chaos into the component specified via environmental variable, litmus experiment observes the behaviour of corresponding OpenEBS PV and the application which consumes the volume.

Based on the value of env DATA_PERSISTENCE, the corresponding data consistency util will be executed. At present only busybox and percona-mysql are supported. Along with specifying env in the litmus experiment, user needs to pass name for configmap and the data consistency specific parameters required via configmap in the format as follows:

parameters.yml: |
blocksize: 4k
blockcount: 1024
testfile: difiletest

It is recommended to pass test-name for configmap and mount the corresponding configmap as volume in the litmus pod. The above snippet holds the parameters required for validation data consistency in busybox application.

For percona-mysql, the following parameters are to be injected into configmap.

parameters.yml: |
dbuser: root
dbpassword: k8sDem0
dbname: tdb

The configmap data will be utilised by litmus experiments as its variables while executing the scenario. Based on the data provided, litmus checks if the data is consistent after recovering from induced chaos.

## Litmusbook Environment Variables

### Application

<table>
<tr>
<th> Parameter </t>
<th> Description </th>
</tr>
<tr>
<td> APP_NAMESPACE </td>
<td> Namespace in which application pods are deployed </td>
</tr>
<tr>
<td> APP_LABEL </td>
<td> Unique Labels in `key=value` format of application deployment </td>
</tr>
<tr>
<td> APP_PVC </td>
<td> Name of persistent volume claim used for app's volume mounts </td>
</tr>
</table>

### Chaos

<table>
<tr>
<th> Parameter </th>
<th> Name </th>
<th> Description </th>
<th> Documentation Link </th>
</tr>
<tr>
<td> CHAOS_ITERATIONS </td>
<td> The number of chaos iterations </td>
</tr>
</table>

### Health Checks
<table>
<tr>
<th> Parameter </t>
<th> Description </th>
</tr>
<tr>
<td> LIVENESS_APP_NAMESPACE </td>
<td> Namespace in which external liveness pods are deployed, if any </td>
</tr>
<tr>
<td> LIVENESS_APP_LABEL </td>
<td> Unique Labels in `key=value` format for external liveness pod, if any </td>
</tr>
<tr>
<td> DATA_PERSISTENCE </td>
<td> Data accessibility & integrity verification post recovery. To check against busybox set value: "busybox" and for percona, set value: "mysql" </td>
</tr>
</table>
<td> OpenEBS Pool Container Failure </td>
<td> Kill the pool container and check if it gets scheduled again. This scenario validates the behaviour of application and OpenEBS persistent volumes when chaos is induced on the storage pool. The litmus experiment fails the specified pool thereby losing the access to volume replicas created on it.
</td>
<td> <a href="https://docs.litmuschaos.io/docs/openebs-pool-container-failure/"> Here </a> </td>
</tr>
</table>

122 changes: 8 additions & 114 deletions experiments/openebs/openebs-pool-pod-failure/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,120 +2,14 @@

<table>
<tr>
<th> Type </th>
<th> Description </th>
<th> Storage </th>
<th> K8s Platform </th>
</tr>
<tr>
<td> Chaos </td>
<td> Kill the pool pod and check if gets scheduled again </td>
<td> OPENEBS </td>
<td> Any </td>
</tr>
</table>

## Entry-Criteria

- Application services are accessible & pods are healthy
- Application writes are successful

## Exit-Criteria

- Application services are accessible & pods are healthy
- Data written prior to chaos is successfully retrieved/read
- Database consistency is maintained as per db integrity check utils
- Storage target pods are healthy

## Notes

- Typically used as a disruptive test, to cause loss of access to storage pool by killing it.
- The pool pod should start again and it should be healthy.

## Associated Utils

- [cstor_pool_delete.yml](/experiments/openebs/openebs-pool-container-failure/cstor_pool_delete.yml)
- [cstor_pool_health_check.yml](/experiments/openebs/openebs-pool-container-failure/cstor_pool_health_check.yml)
- [cstor_verify_pool_provisioning.yml](/experiments/openebs/openebs-pool-container-failure/cstor_verify_pool_provisioning.yml)
- [cstor_delete_and_verify_pool_deployment.yml](/experiments/openebs/openebs-pool-container-failure/cstor_delete_and_verify_pool_deployment.yml)

### Procedure

This scenario validates the behaviour of application and OpenEBS persistent volumes in the amidst of chaos induced on storage pool. The litmus experiment fails the specified pool and thereby losing the access to volumes being created on it.

After injecting the chaos into the component specified via environmental variable, litmus experiment observes the behaviour of corresponding OpenEBS PV and the application which consumes the volume.

Based on the value of env DATA_PERSISTENCE, the corresponding data consistency util will be executed. At present only busybox and percona-mysql are supported. Along with specifying env in the litmus experiment, user needs to pass name for configmap and the data consistency specific parameters required via configmap in the format as follows:

parameters.yml: |
blocksize: 4k
blockcount: 1024
testfile: difiletest

It is recommended to pass test-name for configmap and mount the corresponding configmap as volume in the litmus pod. The above snippet holds the parameters required for validation data consistency in busybox application.

For percona-mysql, the following parameters are to be injected into configmap.

parameters.yml: |
dbuser: root
dbpassword: k8sDem0
dbname: tdb

The configmap data will be utilised by litmus experiments as its variables while executing the scenario. Based on the data provided, litmus checks if the data is consistent after recovering from induced chaos.

## Litmusbook Environment Variables

### Application

<table>
<tr>
<th> Parameter </t>
<th> Description </th>
</tr>
<tr>
<td> APP_NAMESPACE </td>
<td> Namespace in which application pods are deployed </td>
</tr>
<tr>
<td> APP_LABEL </td>
<td> Unique Labels in `key=value` format of application deployment </td>
</tr>
<tr>
<td> APP_PVC </td>
<td> Name of persistent volume claim used for app's volume mounts </td>
</tr>
</table>

### Chaos

<table>
<tr>
<th> Parameter </th>
<th> Name </th>
<th> Description </th>
<th> Documentation Link </th>
</tr>
<tr>
<td> CHAOS_ITERATIONS </td>
<td> The number of chaos iterations </td>
</tr>
</table>

### Health Checks
<table>
<tr>
<th> Parameter </t>
<th> Description </th>
</tr>
<tr>
<td> LIVENESS_APP_NAMESPACE </td>
<td> Namespace in which external liveness pods are deployed, if any </td>
</tr>
<tr>
<td> LIVENESS_APP_LABEL </td>
<td> Unique Labels in `key=value` format for external liveness pod, if any </td>
</tr>
<tr>
<td> DATA_PERSISTENCE </td>
<td> Data accessibility & integrity verification post recovery. To check against busybox set value: "busybox" and for percona, set value: "mysql" </td>
</tr>
</table>
<td> OpenEBS Pool Pod Failure </td>
<td> Kill the pool pod and check if gets scheduled again. This scenario validates the behaviour of application and OpenEBS persistent volumes when chaos is induced on storage pool. The litmus experiment fails the specified pool thereby losing the access to volumes created on it.
</td>
<td> <a href="https://docs.litmuschaos.io/docs/openebs-pool-pod-failure/"> Here </a> </td>
</tr>
</table>
Loading

0 comments on commit 00949a1

Please sign in to comment.