Address reviews

Yelp · Sep 12, 2024 · 4346cda · 4346cda
1 parent 7c020ad
commit 4346cda
Show file tree

Hide file tree

Showing 19 changed files with 196 additions and 281 deletions.
diff --git a/docs/source/about/glossary.rst b/docs/source/about/glossary.rst
@@ -13,6 +13,11 @@ PaaSTA uses.
 `Kubernetes <https://kubernetes.io/>`_ (a.k.a. k8s) is the open-source system on which Yelp runs many compute workloads.
 In Kubernetes, tasks are distributed to and run by servers called Kubelets (but a.k.a. kube nodes or Kubernetes agents) from the Kubernetes control plane.
 
+**Kubernetes Deployment**
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A Kubernetes resource that represents a collection of pods running the same application. A Deployment is responsible for creating and updating instances of your application.
+
 **Kubernetes Node**
 ~~~~~~~~~~~~~~~~~~~
 
@@ -22,33 +27,39 @@ In our case, it's usually a virtual machine provisioned via AWS EC2 Fleets or Au
 **Kubernetes Horizontal Pod Autoscaler (HPA)**
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-It's a Kubernetes feature that automatically scales the number of pods in a deployment based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics).
+A Kubernetes feature that automatically scales the number of pods in a deployment based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics).
 
 **clustername**
 ~~~~~~~~~~~~~~~
 
 A shortname used to describe a PaaSTA cluster. Use \`paasta
 list-clusters\` to see them all.
 
-**Kubernetes pod**
+**Kubernetes Pod**
 ~~~~~~~~~~~~~~~~~~~
 
 Atomic deployment unit for PaaSTA workloads at Yelp and all Kubernetes clusters. Can be thought of as a collection of 1 or more related containers.
 Pods can be seen as one or more containers that share a network namespace, at Yelp these are individual instances of one of our services, many can run on each server.
 
+**Kubernetes Namespace**
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+It provides a mechanism for isolating groups of resources within a single cluster. Each K8s Namespace can contain resources like
+Pods and Deployments, and it allows for management and access controls to be applied at the Namespace level.
+
 **instancename**
 ~~~~~~~~~~~~~~~~
 
-Logical collection of Kubernetes pods that comprise a Kubernetes Deployment. service
+Logical collection of Kubernetes pods that comprise an application (a Kubernetes Deployment) deployed on Kubernetes. service
 name + instancename = Kubernetes Deployment. Examples: main, canary. Each instance represents a running
 version of a service with its own configuration and resources.
 
 **namespace**
 ~~~~~~~~~~~~~
 
 An haproxy/SmartStack concept grouping backends that listen on a
-particular port. A namespace may route to many healthy paaSTA
-instances. By default, the namespace in which a Kubernetes deployment appears is
+particular port. A namespace may route to many healthy PaaSTA
+instances. By default, the namespace in which a PaaSTA instance appears is
 its instancename.
 
 **Nerve**

diff --git a/docs/source/about/smartstack_interaction.rst b/docs/source/about/smartstack_interaction.rst
@@ -1,15 +1,13 @@
-How PaaSTA Interacts with SmartStack
-====================================
+SmartStack Service Discovery and PaaSTA Integration
+===================================================
 
-PaaSTA uses SmartStack configuration to influence the **deployment** and
-**monitoring** of services. This document assumes some prior knowledge
-about SmartStack; see http://nerds.airbnb.com/smartstack-service-discovery-cloud/.
+This document assumes some prior knowledge about SmartStack; see http://nerds.airbnb.com/smartstack-service-discovery-cloud/ for more information.
 
 .. contents:: Table of Contents
    :depth: 2
 
-How SmartStack Settings Influence Deployment
---------------------------------------------
+SmartStack Service Discovery and Latency Zones
+----------------------------------------------
 
 In SmartStack, a service can be configured to be *discovered* at a particular
 latency zone.
@@ -35,104 +33,13 @@ A-C. This is great for latency -- only talk to habitats that are
 topographically "nearby" -- but reduces availability since only three habitats
 can be reached.
 
-What Would Happen if PaaSTA Were Not Aware of SmartStack
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-PaaSTA uses `Kubernetes <https://kubernetes.io/>`_ to deploy
-long-running services. At Yelp, PaaSTA clusters are deployed at the
-``superregion`` level. This means that a service could potentially be deployed
-on any available host in that ``superregion`` that has resources to run it. If
-PaaSTA were unaware of the Smartstack ``discover:`` settings, Kubernetes scheduler would
-naively deploy pods in a potentially "unbalanced" manner:
-
-.. image:: unbalanced_distribution.svg
-   :width: 700px
-
-With the naive approach, there is a total of six pods for the superregion, but
-four landed in ``region 1``, and two landed in ``region 2``. If
-the ``discover`` setting were set to ``habitat``, there would be habitats
-**without** pods available to serve anything, likely causing an outage.
-
-In a world with configurable SmartStack discovery settings, the deployment
-system (Kubernetes) must be aware of these and deploy accordingly.
-
-How to set PaaSTA to be aware of SmartStack
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+PaaSTA's SmartStack Unawareness and Pod Spreading Strategy
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 PaaSTA is not natively aware of SmartStack, to make it aware or more specifically Kubernetes scheduler aware, we can use Pod Topology Spread Contraints.
 To balance pods across Availability Zones (AZs) in Kubernetes, we use `topology spread contraints <https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/>`_. By using the key
 "topology_spread_constraints" in soa-configs to assign it for each instance of a service.
 
-How SmartStack Settings Influence Monitoring
---------------------------------------------
-
-If a service is in SmartStack, PaaSTA uses the same ``discover`` setting
-referenced above to decide how the service should be monitored. When a service
-author sets a particular setting, say ``discover: region``, it implies that the
-system should enforce availability of that service in every region. If there
-are regions that lack tasks to serve that service, then PaaSTA should alert.
-
-Example: Checking Each Habitat When ``discover: habitat``
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-If SmartStack is configured to ``discover: habitat``, PaaSTA configures
-Kubernetes to balance tasks to each habitat. But what if it is unable to do that?
-
-.. image:: replication_alert_habitat.svg
-   :width: 700px
-
-In this case, there are no tasks in habitat F. This is a problem because
-``discover: habitat`` implies that any clients in habitat F will not
-be able to find the service. It is *down* in habitat F.
-
-To detect and alert on this, PaaSTA uses the ``discover`` setting to decide
-which unique locations to look at (e.g. ``habitat``). Paasta iterates over
-each unique location (e.g. habitats A-F) and inspects the replication levels
-in each location. It finds that there is at least one habitat with too few
-instances (habitat F, which has 0 out of 1) and alerts.
-
-The output of the alert or ``paasta status`` looks something like this::
-
-    Smartstack:
-        habitatA - Healthy - in haproxy with (1/1) total backends UP in this namespace.
-        habitatB - Healthy - in haproxy with (1/1) total backends UP in this namespace.
-        habitatC - Healthy - in haproxy with (1/1) total backends UP in this namespace.
-        habitatD - Healthy - in haproxy with (1/1) total backends UP in this namespace.
-        habitatE - Healthy - in haproxy with (1/1) total backends UP in this namespace.
-        habitatF - Critical - in haproxy with (0/1) total backends UP in this namespace.
-
-In this case the service authors have a few actions they can take:
-
-- Increase the total instance count to have more tasks per habitat.
-  (In this example, each habitat contains a single point of failure!)
-- Change the ``discovery`` setting to ``region`` to increase availability
-  at the cost of latency.
-- Investigate *why* tasks can't run in habitat F.
-  (Lack of resources? Improper configs? Missing service dependencies?)
-
-Example: Checking Each Region When ``discover: region``
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-If SmartStack is configured to ``discover: region``, PaaSTA configures
-Kubernetes to balance tasks to each region. But what if it is unable to launch
-all the tasks, but there were tasks running in that region?
-
-.. image:: replication_noalert_region.svg
-   :width: 700px
-
-The output of the alert or ``paasta status`` looks something like this::
-
-    Smartstack:
-        region1 - Healthy - in haproxy with (3/3) total backends UP in this namespace.
-        region2 - Warning - in haproxy with (2/3) total backends UP in this namespace.
-
-Assuming a threshold of 50%, an alert would not be sent to the team in this case.
-
-Even if some habitats do not have tasks for this service, ``discover: region``
-ensures that clients can be satisfied by tasks in the same region if not by
-tasks in the same habitat.
-
-
 The Relationship Between Nerve "namespaces" and PaaSTA "instances"
 ------------------------------------------------------------------
 

diff --git a/docs/source/adhoc_instances.rst b/docs/source/adhoc_instances.rst
@@ -15,17 +15,17 @@ the `yelpsoa configs documentation <yelpsoa_configs.html>`_.
 Running an adhoc instance
 =========================
 
-Adhoc instances can be run using ``paasta local-run`` like any other instance.
+Adhoc instances can be run using ``PaaSTA local-run`` like any other instance.
 A sample use case where one needs to ssh onto an adhoc batch machine and run
 the adhoc instance ``example_instance`` for the service ``example_service``
 would use the command:
 
-  ``paasta local-run --pull --service example_service --instance example_instance``
+  ``PaaSTA local-run --pull --service example_service --instance example_instance``
 
 The 'interactive' instance
 --------------------------
 
-Running ``paasta local-run`` without specifying the ``--instance`` flag
+Running ``PaaSTA local-run`` without specifying the ``--instance`` flag
 launches an interactive instance of a service running a bash shell. This
 interactive instace can be used to run adhoc jobs that aren't run frequently
 enough to be added to ``soa_configs.`` The defaults values for the cpu, mem and
@@ -52,7 +52,7 @@ files on the host::
       cmd: "python -m batch.adhoc.backfill_batch --dest=/tmp/backfill.csv"
 
 Example "interactive" definition that users will get when they run
-``paasta local-run --pull --interactive``. It needs lots of ram and
+``PaaSTA local-run --pull --interactive``. It needs lots of ram and
 defaults to an ipython repl. Also uses the canary version of the code::
 
     # This is the default config that is run when you don't specify an instance
@@ -67,11 +67,11 @@ Assuming service role from another AWS account
 
 If you need to locally run your instance using a role from an AWS account that differs from your current environment's (e.g. you want to use a role from our production account in one of our dev environments), you will need to specify ``--cluster`` and ``--assume-role-aws-account``.
 
-In the example below, specifying ``--cluster`` in the ``local-run`` command will use instance configurations for ``pnw-prod`` and ``--assume-pod-identity`` will use the configured role from the prod account, no matter your current environment.
+In the example below, specifying ``--cluster`` in the ``local-run`` command will use instance configurations for ``pnw-prod`` and ``--assume-Pod-identity`` will use the configured role from the prod account, no matter your current environment.
 
 
 .. code-block::sh
 
-    paasta local-run --service <service-name> --pull --assume-pod-identity --instance <service-instance> --cluster pnw-prod --interactive
+    PaaSTA local-run --service <service-name> --pull --assume-Pod-identity --instance <service-instance> --cluster pnw-prod --interactive
 
 Here, ``--interactive`` flag is used to get the interactive shell.
diff --git a/docs/source/autoscaling.rst b/docs/source/autoscaling.rst
@@ -2,7 +2,7 @@
 Autoscaling PaaSTA Instances
 ====================================
 
-PaaSTA allows programmatic control of the number of replicas (pods) a service has.
+PaaSTA allows programmatic control of the number of replicas (Pods) a service has.
 It uses Kubernetes' Horizontal Pod Autoscaler (HPA) to watch a service's load and scale up or down.
 
 How to use autoscaling
@@ -24,9 +24,9 @@ This behavior may mean that your service is scaled up unnecessarily when you fir
 Don't worry - the autoscaler will soon learn what the actual load on your service is, and will scale back down to the appropriate level.
 
 If you use autoscaling it is highly recommended that you make sure your service has a readiness probe.
-If your service is registered in Smartstack, each pod automatically gets a readiness probe that checks whether that pod is available in the service mesh.
+If your service is registered in Smartstack, each Pod automatically gets a readiness probe that checks whether that Pod is available in the service mesh.
 Non-smartstack services may want to configure a ``healthcheck_mode``, and either ``healthcheck_cmd`` or  ``healthcheck_uri`` to ensure they have a readiness probe.
-The HPA will ignore the load on your pods between when they first start up and when they are ready.
+The HPA will ignore the load on your Pods between when they first start up and when they are ready.
 This ensures that the HPA doesn't incorrectly scale up due to this warm-up CPU usage.
 
 Autoscaling parameters are stored in an ``autoscaling`` attribute of your instances as a dictionary.
@@ -66,7 +66,7 @@ The currently available metrics providers are:
   Measures the CPU usage of your service's container.
 
 :uwsgi:
-  With the ``uwsgi`` metrics provider, Paasta will configure your pods to be scraped from your uWSGI master via its `stats server <http://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html>`_.
+  With the ``uwsgi`` metrics provider, Paasta will configure your Pods to be scraped from your uWSGI master via its `stats server <http://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html>`_.
   We currently only support uwsgi stats on port 8889, and Prometheus will attempt to scrape that port.
 
   .. note::
@@ -75,7 +75,7 @@ The currently available metrics providers are:
 
 
 :gunicorn:
-  With the ``gunicorn`` metrics provider, Paasta will configure your pods to run an additional container with the `statsd_exporter <https://github.com/prometheus/statsd_exporter>`_ image.
+  With the ``gunicorn`` metrics provider, Paasta will configure your Pods to run an additional container with the `statsd_exporter <https://github.com/prometheus/statsd_exporter>`_ image.
   This sidecar will listen on port 9117 and receive stats from the gunicorn service. The ``statsd_exporter`` will translate the stats into Prometheus format, which Prometheus will scrape.
 
 :active-requests:
@@ -150,7 +150,7 @@ There are a few restrictions on using multiple metrics for scaling your service,
    providers, if one of the metrics providers is CPU scaling.  You must explicitly opt-out of autotuning by setting a
    ``cpus`` value for this service instance.
 
-If you run ``paasta validate`` for your service, it will check these conditions for you.
+If you run ``PaaSTA validate`` for your service, it will check these conditions for you.
 
 
 How to create a custom (bespoke) autoscaling method

diff --git a/docs/source/bouncing.rst b/docs/source/bouncing.rst
@@ -17,7 +17,7 @@ A "Bounce" can happen for one of these reasons:
     * ``max_instances``
     * ``backoff_seconds``
 
-* An issue of a ``paasta restart`` (or partially from a start/stop)
+* An issue of a ``PaaSTA restart`` (or partially from a start/stop)
 * A change in system-wide PaaSTA configuration (defaults for volumes, ram, cpu, etc)
 
 By default, PaaSTA will do the safest thing possible and favors service uptime
@@ -32,7 +32,7 @@ predictable bouncing behavior is desired.
 Read more in the next section for exact details and differences amongst these
 bounce method.
 
-Note that only in the case of ``paasta mark-for-deployoment --auto-rollback``
+Note that only in the case of ``PaaSTA mark-for-deployoment --auto-rollback``
 will PaaSTA revert code back to previous versions after a failed
 bounce. In any other case, PaaSTA will continue to try to move forward forever
 until the bounce proceeds, always trying to converge on the desired state.
@@ -147,7 +147,7 @@ See the docs on the `marathon config <yelpsoa_configs.html#marathon-clustername-
 
 Additionally, a service author can configure how the bounce code determines
 which instances are healthy by setting ``bounce_health_params``. This
-dictionary is passed in as keyword arguments to `get_happy_tasks <generated/paasta_tools.bounce_lib.html#bounce_lib.get_happy_tasks>`_.
+dictionary is passed in as keyword arguments to `get_happy_tasks <generated/PaaSTA_tools.bounce_lib.html#bounce_lib.get_happy_tasks>`_.
 Valid options are:
 
 * ``min_task_uptime``: Minimum number of seconds that a task must be running

diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst
@@ -22,7 +22,7 @@ You can run ``make itest`` to execute them.
 Example Cluster
 ^^^^^^^^^^^^^^^^^
 There is a docker compose configuration based on our itest containers that you
-can use to run the paasta code against a semi-realistic cluster whilst you are
+can use to run the PaaSTA code against a semi-realistic cluster whilst you are
 developing. More instructions `here <./installation/example_cluster.html>`_
 
 System Package Building / itests
@@ -37,19 +37,19 @@ Making new versions
 -------------------
 * Make a branch. WRITE TESTS FIRST (TDD)! Add features.
 
-* Submit your branch for review. Include the "paasta" group. Communicate with
+* Submit your branch for review. Include the "PaaSTA" group. Communicate with
   the team to select a single designated Primary Reviewer.
 
 * After ShipIts, merge your branch to master.
 
   * This version will become live *automatically* if the test suite passes.
 
-  * If you *do not want this*, go to Puppet and pin the ``paasta_tools``
+  * If you *do not want this*, go to Puppet and pin the ``PaaSTA_tools``
     package to the current (without your changes) version. The ``mesosstage``
     cluster will still pick up your changes (due to that cluster's explicit
     hiera override of version to ``latest``) so you can test them there.
 
-      * If you do pin a specific version, email paasta@yelp.com to let the rest of the team know.
+      * If you do pin a specific version, email PaaSTA@yelp.com to let the rest of the team know.
 
 * Edit ``yelp_package/Makefile`` and bump the version in ``RELEASE``.
 
@@ -68,6 +68,6 @@ it is a little tricky.
 * You can load the appropriate rules into your shell. Note that it is sensitive
   to the exact path you use to invoke the command getting autocomplete hints:
 
-  * ``eval "$(.tox/py27/bin/register-python-argcomplete ./tox/py27/bin/paasta)"``
+  * ``eval "$(.tox/py27/bin/register-python-argcomplete ./tox/py27/bin/PaaSTA)"``
 
 * There is a simple integration test. See the itest/ folder.