From 4c6792c97ee7bd145f95271d489d4e9a295de9f2 Mon Sep 17 00:00:00 2001
From: saurabh3460
Date: Wed, 6 Nov 2024 11:05:04 +0530
Subject: [PATCH 1/3] add runwhen concept docs
---
docs/runwhen/concepts.md | 101 +++++++++++++++++++++++++++++++++++++++
docs/runwhen/contrib.md | 56 ++++++++++++++++++++++
2 files changed, 157 insertions(+)
create mode 100644 docs/runwhen/concepts.md
create mode 100644 docs/runwhen/contrib.md
diff --git a/docs/runwhen/concepts.md b/docs/runwhen/concepts.md
new file mode 100644
index 0000000..f8f1fe7
--- /dev/null
+++ b/docs/runwhen/concepts.md
@@ -0,0 +1,101 @@
+# RunWhen Concepts
+- [RunWhen Concepts](#runwhen-concepts)
+- [Runwhen Local](#runwhen-local)
+ - [CheatSheet Generator](#cheatsheet-generator)
+ - [Uploading Cluster Topology to the Platform](#uploading-cluster-topology-to-the-platform)
+- [CodeCollections](#codecollections)
+- [CodeBundles](#codebundles)
+
+# Runwhen Local
+- [source-code](https://github.com/runwhen-contrib/runwhen-local)
+- [Helm Chart](https://github.com/runwhen-contrib/helm-charts/tree/main/charts/runwhen-local)
+- [Upstream docs](https://docs.runwhen.com/public/v/runwhen-local/)
+
+RunWhen Local has two core functions:
+- Generate remediation scripts / CheatSheets from included templates for your local cluster
+- Upload Cluster Topology to the RunWhen Platform
+
+## CheatSheet Generator
+At the moment RunWhen Local **does not posses the ability to discover issues** in
+your cluster and suggest mitigation runbooks / codebundles.
+
+**However, it discovers your kubernetes resources and object names.**
+Using which, it generates a wide set of runbooks for you, if you already know the
+root cause. These runbooks contain documentation and pastable shell script
+snippets for the searched issue. These scripts / cheatsheet are already pre-templated
+with your namespaces and kubernetes resource names.
+
+This collection of cheatsheets / runbooks, although not exhaustive, covers a significant portion
+of recurring issues and healthcheck failures and can be useful to SREs for quick
+resolution of incidents.
+
+[Upstream Examples](https://docs.runwhen.com/public/v/runwhen-local/user-guide/features/user_guide-feature_overview)
+
+## Uploading Cluster Topology to the Platform
+The second core function of runwhen-local is to upload cluster topology to the
+runwhen platform so you can visualize the cluster workload map from a configured
+runwhen workspace.
+
+- First, follow documentation at [Upload to RunWhen Platform](https://docs.runwhen.com/public/v/runwhen-local/user-guide/features/upload-to-runwhen-platform#upload-from-the-cli)
+ - To generate the `uploadInfo.yaml` file
+- Next, take the yaml object and copy over it's contents to `uploadInfo:[]` section
+of the helm [`values.yaml` file](https://github.com/runwhen-contrib/helm-charts/blob/main/charts/runwhen-local/values.yaml#L121)
+- Once configured it should look like this:
+ ```YAML
+ uploadInfo:
+ workspaceName:
+ token: # Do NOT add token and commit to git
+ workspaceOwnerEmail: tester@my-company.com
+ papiURL: https://papi.beta.runwhen.com
+ defaultLocation: location-01-us-west1 # available runwhen locations
+ ```
+- You should pass the token from helm cli, to ensure you are not leaking the token via git
+ ```bash
+ helm upgrade --install ${HELM_RELEASE_NAME} runwhen-contrib/runwhen-local \
+ --set uploadInfo.token=${RUNWHEN_PLATFORM_TOKEN} \
+ -f ${VALUES_FILE} -n ${NAMESPACE}
+ ```
+
+# CodeCollections
+CodeCollections are a group of CodeBundles that can be referenced and used in RunWhen Platform.
+
+*N.B. It's important to note here that currently codecollections cannot be imported explicitly and run against your local cluster using RunWhen Local*
+
+Currently RunWhen has published two codecollections:
+- [runwhen-public-codecollection](https://github.com/runwhen-contrib/rw-public-codecollection)
+ - These contain codebundles that are usually run against services and doesn't involved a Shell / CLI component
+- [runwhen-cli-codecollection](https://github.com/runwhen-contrib/rw-cli-codecollection)
+ - These are generally targeted towards SRE workloads and wraps various shell-scripts and CLI tooling.
+
+# CodeBundles
+CodeBundles are specific detectors/mitigators of known SLI/SLO violations in a live software stack.
+
+It comprises of:
+- Robot files
+ - Scripts / Playbooks / tasksets written using [Robot Framework](), that either
+ - Create and enforce RunWhen SLIs - `sli.robot`
+ - Create miitigation runbooks in response to an SLO/SLI violation - `runbook.robot`
+- Platform definitions of `{SLX, SLO, SLI, Runbook}` as `YAML` configurations
+ - These do not need to be located in your repo, however it's a good practice to have them committed in git.
+ - These configurations wrap standard behaviors for interacting with RunWhen Platform API, `papi`
+ - Endpoint: `https://papi.beta.runwhen.com`
+ - The RunWhen `YAML` configurations are only pertinent when your codebundle is live on RunWhen Platform, these do not play any role as of now for either local testing or RunWhen Local.
+- Test resources / scripts
+
+In a local testing environment you only need to execute the `*.robot` files inside the provided container configurations,
+- [Dockerfile](../../Dockerfile)
+- [vscode/devcontainer](../../.devcontainer.json)
+
+
+The usual call chain is as follows:
+- Robot Scripts
+ - User variable and secret injection
+ - Runwhen Libraries
+ - RunWhen Services
+ - Wrapped shell CLI command / Platform SDK code execution
+ - or, direct shims to your shell scripts / python code when services are unavailable
+ - These tasks fetch the current value of a metric / state
+ - This metric value is then compared against the defined thresholds at `sli/slo.yaml` in the platform.
+ - If the Robot script just runs a set of tasks as a mitigation step, it returns either success or failure.
+
+More concepts and non-trivial FAQs around writing CodeBundles are explained at [Contributing to CodeCollections/CodeBundles](contrib.md)
\ No newline at end of file
diff --git a/docs/runwhen/contrib.md b/docs/runwhen/contrib.md
new file mode 100644
index 0000000..f79a7c7
--- /dev/null
+++ b/docs/runwhen/contrib.md
@@ -0,0 +1,56 @@
+# Contributing to CodeCollections/CodeBundles
+
+## Creating a New CodeCollection
+### Forking the template repository
+
+## Writing a Non-trivial CodeBundle
+### Directory structure / Scaffolding
+
+
+
+#########
+Repository Setup
+Introduction to Robot Framework Scripts (how it interacts with RunWhen)
+Calling bash with relative paths
+Secret handling
+Suite Initialization
+Library usage
+Explain the call chain
+Library Setup
+How to get an exhaustive list of available libraries
+CLI repo
+Public repo
+Explain what libraries would be auto-fetched by devcontainer tooling
+Core
+CLI
+What needs to be added for specific libraries that are used in a robot script
+Paths
+Running a test with local docker
+Adding additional binaries to devcontainer as needed
+Mysql-client
+Postgres-client
+Redis-client
+Configuring Env / secrets
+Expose endpoints
+Local docker network
+Expose from test cluster
+Test by using docker run on localhost
+Test in your live environment
+Deploy as a k8s job
+Give an example
+Testing on Runwhen Platform
+Connecting test env/cluster to runwhen
+Runwhen-local upload
+If Robot script needs to use additional dependencies, like CLI tools the devs need to be informed and for now they will handle the update on platform side
+Mysql-client
+Postgres-client
+Redis-client
+Registering your first codecollection to Runwhen-platform
+Mention that this may be in private as per developer discretion
+How to configure the YAML to test
+Branch name length limitations
+Expose metric endpoints so that they are accessible to runwhen-platform codebundles
+Configuring Env / secrets
+Running the test
+Checking logs
+Checking for errors
From 29962c2aef9d00d52c3dad02916bc577362d0ce3 Mon Sep 17 00:00:00 2001
From: saurabh3460
Date: Wed, 6 Nov 2024 11:05:51 +0530
Subject: [PATCH 2/3] update README.md
---
README.md | 35 +++++++++++++----------------------
1 file changed, 13 insertions(+), 22 deletions(-)
diff --git a/README.md b/README.md
index c969b75..ac9f264 100644
--- a/README.md
+++ b/README.md
@@ -8,32 +8,23 @@
-
-# codecollection-template
-A hello-world-style template for codecollection authors to get started writing codebundles. This template contains the minimum file structure expected by the RunWhen platform.
-
[![Build](https://github.com/runwhen-contrib/codecollection-template/actions/workflows/build.yaml/badge.svg)](https://github.com/runwhen-contrib/codecollection-template/actions/workflows/build.yaml)
-## Getting Started
-Looking to be a contributor for CodeCollections or start your own? We'd love to collaborate! Head on over to our [public docs](https://docs.runwhen.com/public/runwhen-authors/getting-started-with-codecollection-development) to get started.
-File Structure overview of devcontainer:
-```
--/app/
- |- auth/ #store secrets here, it should already be properly gitignored for you
- |- codecollection/
- | |- codebundles/ # stores codebundles that can be run
- | |- libraries/ # stores python keyword libraries used by codebundles
- |- dev_facade/ # provides interfaces equivalent to those used on the platform, but just dry runs the keywords to assist with development
- ...
-```
+[Upstream Docs - CodeCollection Template](https://github.com/runwhen-contrib/codecollection-template/blob/main/README.md)
-The included script `ro` wraps the `robot` RobotFramework binary, and includes some extra functionality to write logs to a consistent location for viewing in a HTTP server at http://localhost:3000/ that is always running as part of the devcontainer.
+# InfraCloud RunWhen CodeCollection
-### Quickstart
+This CodeCollection aims to create a repository of CodeBundles that can address the various reproducible incident scenarios at [Infracloud/sre-stack](https://github.com/infracloudio/sre-stack/)
-Navigate to the codebundle directory
-`cd codecollection/codebundles/hello_world/`
+- Set meaningful SLOs on Services and their dependencies
+ - DBs
+ - Queues
+ - Caches
+ - Gateways and proxies
+- Create SLIs to continuosly monitor the health of services and dependencies
+- Create mitigation runbooks in some scenarios where root-cause can be deterministically attested to
-Run the codebundle
-`ro sli.robot`
+## Additional Docs
+- [RunWhen Concepts](docs/runwhen/concepts.md)
+- [Contributing to CodeCollections/CodeBundles](docs/runwhen/contrib.md)
\ No newline at end of file
From 28458527250aff713d85bda68ad2a119db254a56 Mon Sep 17 00:00:00 2001
From: saurabh3460
Date: Wed, 6 Nov 2024 11:06:12 +0530
Subject: [PATCH 3/3] update update rds-mysql-conn-count/README.md
---
codebundles/rds-mysql-conn-count/README.md | 94 ++++++++++++++++++++++
1 file changed, 94 insertions(+)
diff --git a/codebundles/rds-mysql-conn-count/README.md b/codebundles/rds-mysql-conn-count/README.md
index e69de29..70a57ba 100644
--- a/codebundles/rds-mysql-conn-count/README.md
+++ b/codebundles/rds-mysql-conn-count/README.md
@@ -0,0 +1,94 @@
+# CodeBundle - RDS MySQL Connection Count
+
+This codebundle targets to detect and resolve an incident caused by too many sleeping connections in MySQL.
+
+- Target Service - MySQL
+- Cloud Platform - AWS/RDS
+
+## SLX
+```YAML
+statement: RDS MySql connections should be within 80% of total max connection.
+alias: RDS MySql Connections Count
+metricType: gauge
+asMeasuredBy: Score based on promethues query
+icon: Cloud
+owners:
+ - saurabh.yadav@infracloud.io
+imageURL: >-
+ https://storage.googleapis.com/runwhen-nonprod-shared-images/icons/kubernetes/resources/labeled/ns.svg
+
+```
+## SLO / Service Level Objective
+Example:
+```YAML
+codeBundle:
+ repoUrl: https://github.com/infracloudio/ifc-rw-codecollection
+ pathToYaml: codebundles/slo-default/queries.yaml
+ ref: main
+sloSpecType: simple-mwmb
+objective: 95
+threshold: 48
+operand: lt
+```
+
+## SLI / Service Level Indicator
+```YAML
+displayUnitsLong: OK
+displayUnitsShort: ok
+locations:
+ - location-01-us-west1
+description: >-
+ Watch RDS MySql connection count
+codeBundle:
+ repoUrl: https://github.com/infracloudio/ifc-rw-codecollection
+ ref: main
+ pathToRobot: codebundles/rds-mysql-conn-count/sli.robot
+# read more about intervalStrategy here: https://docs.runwhen.com/public/runwhen-platform/feature-overview/points-on-the-map-slxs/service-level-indicators-slis/interval-strategies
+intervalStrategy: intermezzo
+intervalSeconds: 30
+configProvided:
+ # Change PROMETHEUS_HOSTNAME to your endpoint and currently endpoint needs to be publicly exposed.
+ - name: PROMETHEUS_HOSTNAME
+ value: >-
+ http://aeccfb7ff9bfb4705b6218294a7346c3-2081802229.us-west-2.elb.amazonaws.com/prometheus/api/v1
+ - name: QUERY
+ value: >-
+ aws_rds_database_connections_average{dimension_DBInstanceIdentifier="robotshopmysql"} > 1
+ - name: TRANSFORM
+ value: RAW
+ - name: STEP
+ value: '30'
+ - name: DATA_COLUMN
+ value: '1'
+ - name: NO_RESULT_OVERWRITE
+ value: 'Yes'
+ - name: NO_RESULT_VALUE
+ value: '0'
+servicesProvided:
+ - name: curl
+ locationServiceName: curl-service.shared
+```
+
+## RunBook / Mitigation
+
+```YAML
+location: location-01-us-west1
+codeBundle:
+ repoUrl: https://github.com/infracloudio/ifc-rw-codecollection
+ ref: main
+ pathToRobot: codebundles/rds-mysql-conn-count/runbook.robot
+servicesProvided:
+ - name: curl
+ locationServiceName: curl-service.shared
+configProvided:
+ - name: MYSQL_USER
+ value: admin
+ - name: MYSQL_HOST
+ value: robotshopmysql.example.us-west-2.rds.amazonaws.com
+ - name: PROCESS_USER
+ value: shipping
+```
+
+### Assumptions & Pitfalls
+
+These configs are placeholder YAML. one needs to modify them according to need and then paste them to the platform side.
\ No newline at end of file