Skip to content

Commit

Permalink
Address reviews
Browse files Browse the repository at this point in the history
  • Loading branch information
EmanElsaban committed Sep 12, 2024
1 parent 7c020ad commit 4346cda
Show file tree
Hide file tree
Showing 19 changed files with 196 additions and 281 deletions.
21 changes: 16 additions & 5 deletions docs/source/about/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ PaaSTA uses.
`Kubernetes <https://kubernetes.io/>`_ (a.k.a. k8s) is the open-source system on which Yelp runs many compute workloads.
In Kubernetes, tasks are distributed to and run by servers called Kubelets (but a.k.a. kube nodes or Kubernetes agents) from the Kubernetes control plane.

**Kubernetes Deployment**
~~~~~~~~~~~~~~~~~~~~~~~~~

A Kubernetes resource that represents a collection of pods running the same application. A Deployment is responsible for creating and updating instances of your application.

**Kubernetes Node**
~~~~~~~~~~~~~~~~~~~

Expand All @@ -22,33 +27,39 @@ In our case, it's usually a virtual machine provisioned via AWS EC2 Fleets or Au
**Kubernetes Horizontal Pod Autoscaler (HPA)**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It's a Kubernetes feature that automatically scales the number of pods in a deployment based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics).
A Kubernetes feature that automatically scales the number of pods in a deployment based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics).

**clustername**
~~~~~~~~~~~~~~~

A shortname used to describe a PaaSTA cluster. Use \`paasta
list-clusters\` to see them all.

**Kubernetes pod**
**Kubernetes Pod**
~~~~~~~~~~~~~~~~~~~

Atomic deployment unit for PaaSTA workloads at Yelp and all Kubernetes clusters. Can be thought of as a collection of 1 or more related containers.
Pods can be seen as one or more containers that share a network namespace, at Yelp these are individual instances of one of our services, many can run on each server.

**Kubernetes Namespace**
~~~~~~~~~~~~~~~~~~~~~~~~

It provides a mechanism for isolating groups of resources within a single cluster. Each K8s Namespace can contain resources like
Pods and Deployments, and it allows for management and access controls to be applied at the Namespace level.

**instancename**
~~~~~~~~~~~~~~~~

Logical collection of Kubernetes pods that comprise a Kubernetes Deployment. service
Logical collection of Kubernetes pods that comprise an application (a Kubernetes Deployment) deployed on Kubernetes. service
name + instancename = Kubernetes Deployment. Examples: main, canary. Each instance represents a running
version of a service with its own configuration and resources.

**namespace**
~~~~~~~~~~~~~

An haproxy/SmartStack concept grouping backends that listen on a
particular port. A namespace may route to many healthy paaSTA
instances. By default, the namespace in which a Kubernetes deployment appears is
particular port. A namespace may route to many healthy PaaSTA
instances. By default, the namespace in which a PaaSTA instance appears is
its instancename.

**Nerve**
Expand Down
107 changes: 7 additions & 100 deletions docs/source/about/smartstack_interaction.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
How PaaSTA Interacts with SmartStack
====================================
SmartStack Service Discovery and PaaSTA Integration
===================================================

PaaSTA uses SmartStack configuration to influence the **deployment** and
**monitoring** of services. This document assumes some prior knowledge
about SmartStack; see http://nerds.airbnb.com/smartstack-service-discovery-cloud/.
This document assumes some prior knowledge about SmartStack; see http://nerds.airbnb.com/smartstack-service-discovery-cloud/ for more information.

.. contents:: Table of Contents
:depth: 2

How SmartStack Settings Influence Deployment
--------------------------------------------
SmartStack Service Discovery and Latency Zones
----------------------------------------------

In SmartStack, a service can be configured to be *discovered* at a particular
latency zone.
Expand All @@ -35,104 +33,13 @@ A-C. This is great for latency -- only talk to habitats that are
topographically "nearby" -- but reduces availability since only three habitats
can be reached.

What Would Happen if PaaSTA Were Not Aware of SmartStack
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

PaaSTA uses `Kubernetes <https://kubernetes.io/>`_ to deploy
long-running services. At Yelp, PaaSTA clusters are deployed at the
``superregion`` level. This means that a service could potentially be deployed
on any available host in that ``superregion`` that has resources to run it. If
PaaSTA were unaware of the Smartstack ``discover:`` settings, Kubernetes scheduler would
naively deploy pods in a potentially "unbalanced" manner:

.. image:: unbalanced_distribution.svg
:width: 700px

With the naive approach, there is a total of six pods for the superregion, but
four landed in ``region 1``, and two landed in ``region 2``. If
the ``discover`` setting were set to ``habitat``, there would be habitats
**without** pods available to serve anything, likely causing an outage.

In a world with configurable SmartStack discovery settings, the deployment
system (Kubernetes) must be aware of these and deploy accordingly.

How to set PaaSTA to be aware of SmartStack
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
PaaSTA's SmartStack Unawareness and Pod Spreading Strategy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

PaaSTA is not natively aware of SmartStack, to make it aware or more specifically Kubernetes scheduler aware, we can use Pod Topology Spread Contraints.
To balance pods across Availability Zones (AZs) in Kubernetes, we use `topology spread contraints <https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/>`_. By using the key
"topology_spread_constraints" in soa-configs to assign it for each instance of a service.

How SmartStack Settings Influence Monitoring
--------------------------------------------

If a service is in SmartStack, PaaSTA uses the same ``discover`` setting
referenced above to decide how the service should be monitored. When a service
author sets a particular setting, say ``discover: region``, it implies that the
system should enforce availability of that service in every region. If there
are regions that lack tasks to serve that service, then PaaSTA should alert.

Example: Checking Each Habitat When ``discover: habitat``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If SmartStack is configured to ``discover: habitat``, PaaSTA configures
Kubernetes to balance tasks to each habitat. But what if it is unable to do that?

.. image:: replication_alert_habitat.svg
:width: 700px

In this case, there are no tasks in habitat F. This is a problem because
``discover: habitat`` implies that any clients in habitat F will not
be able to find the service. It is *down* in habitat F.

To detect and alert on this, PaaSTA uses the ``discover`` setting to decide
which unique locations to look at (e.g. ``habitat``). Paasta iterates over
each unique location (e.g. habitats A-F) and inspects the replication levels
in each location. It finds that there is at least one habitat with too few
instances (habitat F, which has 0 out of 1) and alerts.

The output of the alert or ``paasta status`` looks something like this::

Smartstack:
habitatA - Healthy - in haproxy with (1/1) total backends UP in this namespace.
habitatB - Healthy - in haproxy with (1/1) total backends UP in this namespace.
habitatC - Healthy - in haproxy with (1/1) total backends UP in this namespace.
habitatD - Healthy - in haproxy with (1/1) total backends UP in this namespace.
habitatE - Healthy - in haproxy with (1/1) total backends UP in this namespace.
habitatF - Critical - in haproxy with (0/1) total backends UP in this namespace.

In this case the service authors have a few actions they can take:

- Increase the total instance count to have more tasks per habitat.
(In this example, each habitat contains a single point of failure!)
- Change the ``discovery`` setting to ``region`` to increase availability
at the cost of latency.
- Investigate *why* tasks can't run in habitat F.
(Lack of resources? Improper configs? Missing service dependencies?)

Example: Checking Each Region When ``discover: region``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If SmartStack is configured to ``discover: region``, PaaSTA configures
Kubernetes to balance tasks to each region. But what if it is unable to launch
all the tasks, but there were tasks running in that region?

.. image:: replication_noalert_region.svg
:width: 700px

The output of the alert or ``paasta status`` looks something like this::

Smartstack:
region1 - Healthy - in haproxy with (3/3) total backends UP in this namespace.
region2 - Warning - in haproxy with (2/3) total backends UP in this namespace.

Assuming a threshold of 50%, an alert would not be sent to the team in this case.

Even if some habitats do not have tasks for this service, ``discover: region``
ensures that clients can be satisfied by tasks in the same region if not by
tasks in the same habitat.


The Relationship Between Nerve "namespaces" and PaaSTA "instances"
------------------------------------------------------------------

Expand Down
12 changes: 6 additions & 6 deletions docs/source/adhoc_instances.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,17 @@ the `yelpsoa configs documentation <yelpsoa_configs.html>`_.
Running an adhoc instance
=========================

Adhoc instances can be run using ``paasta local-run`` like any other instance.
Adhoc instances can be run using ``PaaSTA local-run`` like any other instance.
A sample use case where one needs to ssh onto an adhoc batch machine and run
the adhoc instance ``example_instance`` for the service ``example_service``
would use the command:

``paasta local-run --pull --service example_service --instance example_instance``
``PaaSTA local-run --pull --service example_service --instance example_instance``

The 'interactive' instance
--------------------------

Running ``paasta local-run`` without specifying the ``--instance`` flag
Running ``PaaSTA local-run`` without specifying the ``--instance`` flag
launches an interactive instance of a service running a bash shell. This
interactive instace can be used to run adhoc jobs that aren't run frequently
enough to be added to ``soa_configs.`` The defaults values for the cpu, mem and
Expand All @@ -52,7 +52,7 @@ files on the host::
cmd: "python -m batch.adhoc.backfill_batch --dest=/tmp/backfill.csv"

Example "interactive" definition that users will get when they run
``paasta local-run --pull --interactive``. It needs lots of ram and
``PaaSTA local-run --pull --interactive``. It needs lots of ram and
defaults to an ipython repl. Also uses the canary version of the code::

# This is the default config that is run when you don't specify an instance
Expand All @@ -67,11 +67,11 @@ Assuming service role from another AWS account

If you need to locally run your instance using a role from an AWS account that differs from your current environment's (e.g. you want to use a role from our production account in one of our dev environments), you will need to specify ``--cluster`` and ``--assume-role-aws-account``.

In the example below, specifying ``--cluster`` in the ``local-run`` command will use instance configurations for ``pnw-prod`` and ``--assume-pod-identity`` will use the configured role from the prod account, no matter your current environment.
In the example below, specifying ``--cluster`` in the ``local-run`` command will use instance configurations for ``pnw-prod`` and ``--assume-Pod-identity`` will use the configured role from the prod account, no matter your current environment.


.. code-block::sh
paasta local-run --service <service-name> --pull --assume-pod-identity --instance <service-instance> --cluster pnw-prod --interactive
PaaSTA local-run --service <service-name> --pull --assume-Pod-identity --instance <service-instance> --cluster pnw-prod --interactive
Here, ``--interactive`` flag is used to get the interactive shell.
12 changes: 6 additions & 6 deletions docs/source/autoscaling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Autoscaling PaaSTA Instances
====================================

PaaSTA allows programmatic control of the number of replicas (pods) a service has.
PaaSTA allows programmatic control of the number of replicas (Pods) a service has.
It uses Kubernetes' Horizontal Pod Autoscaler (HPA) to watch a service's load and scale up or down.

How to use autoscaling
Expand All @@ -24,9 +24,9 @@ This behavior may mean that your service is scaled up unnecessarily when you fir
Don't worry - the autoscaler will soon learn what the actual load on your service is, and will scale back down to the appropriate level.

If you use autoscaling it is highly recommended that you make sure your service has a readiness probe.
If your service is registered in Smartstack, each pod automatically gets a readiness probe that checks whether that pod is available in the service mesh.
If your service is registered in Smartstack, each Pod automatically gets a readiness probe that checks whether that Pod is available in the service mesh.
Non-smartstack services may want to configure a ``healthcheck_mode``, and either ``healthcheck_cmd`` or ``healthcheck_uri`` to ensure they have a readiness probe.
The HPA will ignore the load on your pods between when they first start up and when they are ready.
The HPA will ignore the load on your Pods between when they first start up and when they are ready.
This ensures that the HPA doesn't incorrectly scale up due to this warm-up CPU usage.

Autoscaling parameters are stored in an ``autoscaling`` attribute of your instances as a dictionary.
Expand Down Expand Up @@ -66,7 +66,7 @@ The currently available metrics providers are:
Measures the CPU usage of your service's container.

:uwsgi:
With the ``uwsgi`` metrics provider, Paasta will configure your pods to be scraped from your uWSGI master via its `stats server <http://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html>`_.
With the ``uwsgi`` metrics provider, Paasta will configure your Pods to be scraped from your uWSGI master via its `stats server <http://uwsgi-docs.readthedocs.io/en/latest/StatsServer.html>`_.
We currently only support uwsgi stats on port 8889, and Prometheus will attempt to scrape that port.

.. note::
Expand All @@ -75,7 +75,7 @@ The currently available metrics providers are:


:gunicorn:
With the ``gunicorn`` metrics provider, Paasta will configure your pods to run an additional container with the `statsd_exporter <https://github.com/prometheus/statsd_exporter>`_ image.
With the ``gunicorn`` metrics provider, Paasta will configure your Pods to run an additional container with the `statsd_exporter <https://github.com/prometheus/statsd_exporter>`_ image.
This sidecar will listen on port 9117 and receive stats from the gunicorn service. The ``statsd_exporter`` will translate the stats into Prometheus format, which Prometheus will scrape.

:active-requests:
Expand Down Expand Up @@ -150,7 +150,7 @@ There are a few restrictions on using multiple metrics for scaling your service,
providers, if one of the metrics providers is CPU scaling. You must explicitly opt-out of autotuning by setting a
``cpus`` value for this service instance.

If you run ``paasta validate`` for your service, it will check these conditions for you.
If you run ``PaaSTA validate`` for your service, it will check these conditions for you.


How to create a custom (bespoke) autoscaling method
Expand Down
6 changes: 3 additions & 3 deletions docs/source/bouncing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ A "Bounce" can happen for one of these reasons:
* ``max_instances``
* ``backoff_seconds``

* An issue of a ``paasta restart`` (or partially from a start/stop)
* An issue of a ``PaaSTA restart`` (or partially from a start/stop)
* A change in system-wide PaaSTA configuration (defaults for volumes, ram, cpu, etc)

By default, PaaSTA will do the safest thing possible and favors service uptime
Expand All @@ -32,7 +32,7 @@ predictable bouncing behavior is desired.
Read more in the next section for exact details and differences amongst these
bounce method.

Note that only in the case of ``paasta mark-for-deployoment --auto-rollback``
Note that only in the case of ``PaaSTA mark-for-deployoment --auto-rollback``
will PaaSTA revert code back to previous versions after a failed
bounce. In any other case, PaaSTA will continue to try to move forward forever
until the bounce proceeds, always trying to converge on the desired state.
Expand Down Expand Up @@ -147,7 +147,7 @@ See the docs on the `marathon config <yelpsoa_configs.html#marathon-clustername-

Additionally, a service author can configure how the bounce code determines
which instances are healthy by setting ``bounce_health_params``. This
dictionary is passed in as keyword arguments to `get_happy_tasks <generated/paasta_tools.bounce_lib.html#bounce_lib.get_happy_tasks>`_.
dictionary is passed in as keyword arguments to `get_happy_tasks <generated/PaaSTA_tools.bounce_lib.html#bounce_lib.get_happy_tasks>`_.
Valid options are:

* ``min_task_uptime``: Minimum number of seconds that a task must be running
Expand Down
10 changes: 5 additions & 5 deletions docs/source/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ You can run ``make itest`` to execute them.
Example Cluster
^^^^^^^^^^^^^^^^^
There is a docker compose configuration based on our itest containers that you
can use to run the paasta code against a semi-realistic cluster whilst you are
can use to run the PaaSTA code against a semi-realistic cluster whilst you are
developing. More instructions `here <./installation/example_cluster.html>`_

System Package Building / itests
Expand All @@ -37,19 +37,19 @@ Making new versions
-------------------
* Make a branch. WRITE TESTS FIRST (TDD)! Add features.

* Submit your branch for review. Include the "paasta" group. Communicate with
* Submit your branch for review. Include the "PaaSTA" group. Communicate with
the team to select a single designated Primary Reviewer.

* After ShipIts, merge your branch to master.

* This version will become live *automatically* if the test suite passes.

* If you *do not want this*, go to Puppet and pin the ``paasta_tools``
* If you *do not want this*, go to Puppet and pin the ``PaaSTA_tools``
package to the current (without your changes) version. The ``mesosstage``
cluster will still pick up your changes (due to that cluster's explicit
hiera override of version to ``latest``) so you can test them there.

* If you do pin a specific version, email paasta@yelp.com to let the rest of the team know.
* If you do pin a specific version, email PaaSTA@yelp.com to let the rest of the team know.

* Edit ``yelp_package/Makefile`` and bump the version in ``RELEASE``.

Expand All @@ -68,6 +68,6 @@ it is a little tricky.
* You can load the appropriate rules into your shell. Note that it is sensitive
to the exact path you use to invoke the command getting autocomplete hints:

* ``eval "$(.tox/py27/bin/register-python-argcomplete ./tox/py27/bin/paasta)"``
* ``eval "$(.tox/py27/bin/register-python-argcomplete ./tox/py27/bin/PaaSTA)"``

* There is a simple integration test. See the itest/ folder.
Loading

0 comments on commit 4346cda

Please sign in to comment.