diff --git a/README.md b/README.md index ea4be9090185..cadd1822b1ec 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -> 🚨 IMPORTANT: When upgrading Sourcegraph, please check [docs/migrate.md](docs/migrate.md) to check if any manual migrations are necessary. +> 🚨 IMPORTANT: When upgrading Sourcegraph, please check [upgrading docs](https://docs.sourcegraph.com/admin/updates/kubernetes) to check if any manual migrations are necessary. > `master` branch tracks development. Use the revision of this repository corresponding to the > version of Sourcegraph you wish to deploy. E.g., `git checkout v3.10.4`. @@ -18,11 +18,11 @@ container](https://docs.sourcegraph.com/#quickstart-guide) or using [Docker Compose](https://docs.sourcegraph.com/admin/install/docker-compose). Migrating to Sourcegraph on Kubernetes is easy later. -- [Installing](docs/install.md) -- [Configuring](docs/configure.md) -- [Updating](docs/update.md) -- [Scaling](docs/scale.md) -- [Troubleshooting](docs/troubleshoot.md) +- [Installing](https://docs.sourcegraph.com/admin/install/kubernetes) +- [Configuring](https://docs.sourcegraph.com/admin/install/kubernetes/configure) +- [Updating](https://docs.sourcegraph.com/admin/updates/kubernetes) +- [Scaling](https://docs.sourcegraph.com/admin/install/kubernetes/scale) +- [Troubleshooting](https://docs.sourcegraph.com/admin/install/kubernetes/troubleshoot) - [Admin guide](docs/admin-guide.md) - useful guide for Sourcegraph admins - [Prometheus metrics](docs/admin-guide.md#prometheus) - list of all Prometheus metrics that can be used for application performance monitoring diff --git a/docs/admin-guide.md b/docs/admin-guide.md deleted file mode 100644 index 0fd3d2b437df..000000000000 --- a/docs/admin-guide.md +++ /dev/null @@ -1,88 +0,0 @@ -# Admin guide - -This guide is intended for system administrators and operations engineers who are responsible for -maintaining a Sourcegraph Kubernetes cluster. Each section covers a topic or tool that may be -helpful in managing the cluster. - -## Debugging - -The following commands are useful to gain visibility into cluster status. - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - -
List all pods runningkubectl get pods -o=wide
Describe pod state, including reasons why a pod is not successfully running.kubectl describe pod $POD_NAME
Tail logskubectl logs -f $POD_NAME
SSH into a running pod container.kubectl exec -it $POD_NAME -- sh
Get a PostgreSQL client on the prod database.kubectl exec -it $(kubectl get pods -l app=pgsql -o jsonpath="{.items[0].metadata.name}") -- psql -U sg
-
- ---- - -## Prometheus - -[Prometheus](https://prometheus.io/) is an open-source application monitoring system and time series database. It is -commonly used to track key performance metrics over time, such as the following: - -- QPS -- Application requests by URL route name -- HTTP response latency -- HTTP error codes -- Time since last search index update - - - -Follow the [steps to deploy Prometheus](../configure/prometheus/README.md). - -After updating the cluster, the running Prometheus pod will be visible in the list printed by -`kubectl get pods`. Once this is enabled, Prometheus will begin recording performance metrics across -all services running in Sourcegraph. - -## Distributed tracing - -Distributed tracing tools are useful when debugging performance issues such as high query latency. Sourcegraph uses the -[OpenTracing standard](http://opentracing.io/) and can be made to work with any tracing tool that satisfies that -standard. Currently, two tracing tools are supported by Sourcegraph configuration: - -- [Lightstep](../configure/configure.md#configure-lightstep-tracing) -- [Jaeger](../configure/jaeger/README.md) - -## Snapshots - -The `sourcegraph-server-gen` command supports creating and restoring snapshots of the database, -which can be useful for backups and syncing database state from one cluster to another: - -- On macOS: - ``` - curl -O https://storage.googleapis.com/sourcegraph-assets/sourcegraph-server-gen/darwin_amd64/sourcegraph-server-gen - chmod +x ./sourcegraph-server-gen - ``` -- On Linux: - ```bash - curl -O https://storage.googleapis.com/sourcegraph-assets/sourcegraph-server-gen/linux_amd64/sourcegraph-server-gen - chmod +x ./sourcegraph-server-gen - ``` - -Run `sourcegraph-server-gen snapshot --help` for more information. diff --git a/docs/configure.md b/docs/configure.md index 27ba4d039169..434b7b135ae1 100644 --- a/docs/configure.md +++ b/docs/configure.md @@ -1,613 +1 @@ -# Configuring Sourcegraph - -Configuring a Sourcegraph Kubernetes cluster is done by applying manifest files and with simple -`kubectl` commands. You can configure Sourcegraph as flexibly as you need to meet the requirements -of your deployment environment. We provide simple instructions for common things like setting up -TLS, enabling code intelligence, and exposing Sourcegraph to external traffic below. - -## Fork this repository - -We recommend you fork this repository to track your configuration changes in Git. -This will make upgrades far easier and is a good practice not just for Sourcegraph, but for any Kubernetes application. - -1. Create a fork of this repository. - - - The fork can be public **unless** you plan to store secrets in the repository itself. - - We recommend not storing secrets in the repository itself and these instructions document how. - -1. Create a release branch to track all of your customizations to Sourcegraph. - When you upgrade Sourcegraph, you will merge upstream into this branch. - - ```bash - git checkout HEAD -b release - ``` - - If you followed the installation instructions, `HEAD` should point at the Git tag you've deployed to your running Kubernetes cluster. - -1. Commit customizations to your release branch: - - - Commit manual modifications to Kubernetes YAML files. - - Commit commands that should be run on every update (e.g. `kubectl apply`) to [./kubectl-apply-all.sh](../kubectl-apply-all.sh). - - Commit commands that generally only need to be run once per cluster to (e.g. `kubectl create secret`, `kubectl expose`) to [./create-new-cluster.sh](../create-new-cluster.sh). - -1. When you upgrade, merge the corresponding upstream release tag into your release branch. E.g., `git remote add upstream https://github.com/sourcegraph/deploy-sourcegraph` to add the upstream remote and `git checkout release && git merge v3.15.0` to merge the upstream release tag into your release branch. - -## Dependencies - -Configuration steps in this file depend on [jq](https://stedolan.github.io/jq/), -[yj](https://github.com/sourcegraph/yj) and [jy](https://github.com/sourcegraph/jy). - -If you choose to use [overlays](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/#bases-and-overlays) -you need the [kustomize](https://kustomize.io/) tool installed. - -## Table of contents - -### Common configuration - -- [Configure a storage class](#configure-a-storage-class) -- [Configure network access](#configure-network-access) -- [Update site configuration](#update-site-configuration) -- [Configure TLS/SSL](#configure-tlsssl) -- [Configure repository cloning via SSH](#configure-repository-cloning-via-ssh) -- [Configure language servers](#configure-language-servers) -- [Configure SSDs to boost performance](../configure/ssd/README.md). -- [Increase memory or CPU limits](#increase-memory-or-cpu-limits) - -### Less common configuration - -- [Configure gitserver replica count](#configure-gitserver-replica-count) -- [Configure indexed-search replica count](#configure-indexed-search-replica-count) -- [Assign resource-hungry pods to larger nodes](#assign-resource-hungry-pods-to-larger-nodes) -- [Configure Alertmanager](../configure/prometheus/alertmanager/README.md) -- [Disable or customize Jaeger tracing](../configure/jaeger/README.md) -- [Configure Lightstep tracing](#configure-lightstep-tracing) -- [Configure custom Redis](#configure-custom-redis) -- [Configure custom PostgreSQL](#configure-custom-postgres) -- [Install without RBAC](#install-without-rbac) -- [Use non-default namespace](#use-non-default-namespace) -- [Pulling images locally](#pulling-images-locally) - -### Working with overlays - -- [Overlay basic principles](#overlay-basic-principles) -- [Handling overlays in this repository](#handling-overlays-in-this-repository) -- [namespaced overlay](#namespaced-overlay) -- [non-root overlay](#non-root-overlay) -- [non-privileged overlay](#non-privileged-overlay) - -## Configure network access - -You need to make the main web server accessible over the network to external users. - -There are a few approaches, but using an ingress controller is recommended. - -### Ingress controller (recommended) - -For production environments, we recommend using the [ingress-nginx](https://kubernetes.github.io/ingress-nginx/) [ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/). - -As part of our base configuration we install an ingress for [sourcegraph-frontend](../base/frontend/sourcegraph-frontend.Ingress.yaml). It installs rules for the default ingress, see comments to restrict it to a specific host. - -In addition to the sourcegraph-frontend ingress, you'll need to install the NGINX ingress controller (ingress-nginx). Follow the instructions at https://kubernetes.github.io/ingress-nginx/deploy/ to create the ingress controller. Add the files to [configure/ingress-nginx](../configure/ingress-nginx), including an [install.sh](configure/ingress-nginx/install.sh) file which applies the relevant manifests. We include sample generic-cloud manifests as part of this repository, but please follow the official instructions for your cloud provider. - -Add the [configure/ingress-nginx/install.sh](configure/ingress-nginx/install.sh) command to [create-new-cluster.sh](../create-new-cluster.sh) and commit the change: - -```shell -echo ./configure/ingress-nginx/install.sh >> create-new-cluster.sh -``` - -Once the ingress has acquired an external address, you should be able to access Sourcegraph using that. You can check the external address by running the following command and looking for the `LoadBalancer` entry: - -```bash -kubectl -n ingress-nginx get svc -``` - -If you are having trouble accessing Sourcegraph, ensure ingress-nginx IP is accessible above. Otherwise see [Troubleshooting ingress-nginx](https://kubernetes.github.io/ingress-nginx/troubleshooting/). The namespace of the ingress-controller is `ingress-nginx`. - -#### Configuration - -`ingress-nginx` has extensive configuration documented at [NGINX Configuration](https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/). We expect most administrators to modify [ingress-nginx annotations](https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/) in [sourcegraph-frontend.Ingress.yaml](../base/frontend/sourcegraph-frontend.Ingress.yaml). Some settings are modified globally (such as HSTS). In that case we expect administrators to modify the [ingress-nginx configmap](https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/) in [configure/ingress-nginx/mandatory.yaml](../configure/ingress-nginx/mandatory.yaml). - -### NGINX service - -In cases where ingress controllers cannot be created, creating an explicit NGINX service is a viable -alternative. See the files in the [configure/nginx-svc](../configure/nginx-svc) folder for an -example of how to do this via a NodePort service (any other type of Kubernetes service will also -work): - -1. Modify [configure/nginx-svc/nginx.ConfigMap.yaml](../configure/nginx-svc/nginx.ConfigMap.yaml) to - contain the TLS certificate and key for your domain. - -1. `kubectl apply -f configure/nginx-svc` to create the NGINX service. - -1. Update [create-new-cluster.sh](../create-new-cluster.sh) with the previous command. - - ``` - echo kubectl apply -f configure/nginx-svc >> create-new-cluster.sh - ``` - -### Network rule - -> Note: this setup path does not support TLS. - -Add a network rule that allows ingress traffic to port 30080 (HTTP) on at least one node. - -- [Google Cloud Platform Firewall rules](https://cloud.google.com/compute/docs/vpc/using-firewalls). - - 1. Expose the necessary ports. - - ```bash - gcloud compute --project=$PROJECT firewall-rules create sourcegraph-frontend-http --direction=INGRESS --priority=1000 --network=default --action=ALLOW --rules=tcp:30080 - ``` - - 1. Change the type of the `sourcegraph-frontend` service in [base/frontend/sourcegraph-frontend.Service.yaml](../base/frontend/sourcegraph-frontend.Service.yaml) from `ClusterIP` to `NodePort`: - - ```diff - spec: - ports: - - name: http - port: 30080 - + nodePort: 30080 - - type: ClusterIP - + type: NodePort - ``` - - 1. Directly applying this change to the service [will fail](https://github.com/kubernetes/kubernetes/issues/42282). Instead, you must delete the old service and then create the new one (this will result in a few seconds of downtime): - - ```shell - kubectl delete svc sourcegraph-frontend - kubectl apply -f base/frontend/sourcegraph-frontend.Service.yaml - ``` - - 1. Find a node name. - - ```bash - kubectl get pods -l app=sourcegraph-frontend -o=custom-columns=NODE:.spec.nodeName - ``` - - 1. Get the EXTERNAL-IP address (will be ephemeral unless you [make it static](https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address#promote_ephemeral_ip)). - ```bash - kubectl get node $NODE -o wide - ``` - -- [AWS Security Group rules](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_SecurityGroups.html). - -Sourcegraph should now be accessible at `$EXTERNAL_ADDR:30080`, where `$EXTERNAL_ADDR` is the address of _any_ node in the cluster. - -## Update site configuration - -Sourcegraph's application configuration is stored in the PostgreSQL database. For editing this configuration you may use the web UI. See [site configuration](https://docs.sourcegraph.com/admin/config/site_config) for more information. - -## Configure TLS/SSL - -If you intend to make your Sourcegraph instance accessible on the Internet or another untrusted network, you should use TLS so that all traffic will be served over HTTPS. - -### Ingress controller - -If you exposed your Sourcegraph instance via an ingress controller as described in ["Ingress controller (recommended)"](#ingress-controller-recommended): - -1. Create a [TLS secret](https://kubernetes.io/docs/concepts/configuration/secret/) that contains your TLS certificate and private key. - - ```bash - kubectl create secret tls sourcegraph-tls --key $PATH_TO_KEY --cert $PATH_TO_CERT - ``` - - Update [create-new-cluster.sh](../create-new-cluster.sh) with the previous command. - - ``` - echo kubectl create secret tls sourcegraph-tls --key $PATH_TO_KEY --cert $PATH_TO_CERT >> create-new-cluster.sh - ``` - -1. Add the tls configuration to [base/frontend/sourcegraph-frontend.Ingress.yaml](../base/frontend/sourcegraph-frontend.Ingress.yaml). - - ```yaml - # base/frontend/sourcegraph-frontend.Ingress.yaml - tls: - - hosts: - # Replace 'sourcegraph.example.com' with the real domain that you want to use for your Sourcegraph instance. - - sourcegraph.example.com - secretName: sourcegraph-tls - rules: - - http: - paths: - - path: / - backend: - serviceName: sourcegraph-frontend - servicePort: 30080 - # Replace 'sourcegraph.example.com' with the real domain that you want to use for your Sourcegraph instance. - host: sourcegraph.example.com - ``` - -1. Change your `externalURL` in [the site configuration](https://docs.sourcegraph.com/admin/config/site_config) to e.g. `https://sourcegraph.example.com`: - - Apply the above changes to the ingress controller with the following command. - - ```bash - kubectl apply -f base/frontend/sourcegraph-frontend.Ingress.yaml - ``` - -**WARNING:** Do NOT commit the actual TLS cert and key files to your fork (unless your fork is -private **and** you are okay with storing secrets in it). - -### NGINX service - -If you exposed your Sourcegraph instance via the altenative nginx service as described in ["nginx service"](#nginx-service), those instructions already walked you through setting up TLS/SSL. - -## Configure repository cloning via SSH - -Sourcegraph will clone repositories using SSH credentials if they are mounted at `/home/sourcegraph/.ssh` in the `gitserver` deployment. - -1. [Create a secret](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables) that contains the base64 encoded contents of your SSH private key (_make sure it doesn't require a password_) and known_hosts file. - - ```bash - kubectl create secret generic gitserver-ssh \ - --from-file id_rsa=${HOME}/.ssh/id_rsa \ - --from-file known_hosts=${HOME}/.ssh/known_hosts - ``` - - Update [create-new-cluster.sh](../create-new-cluster.sh) with the previous command. - - ```bash - echo kubectl create secret generic gitserver-ssh \ - --from-file id_rsa=${HOME}/.ssh/id_rsa \ - --from-file known_hosts=${HOME}/.ssh/known_hosts >> create-new-cluster.sh - ``` - -2. Mount the [secret as a volume](https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-files-from-a-pod) in [gitserver.StatefulSet.yaml](../base/gitserver/gitserver.StatefulSet.yaml). - - For example: - - ```yaml - # base/gitserver/gitserver.StatefulSet.yaml - spec: - containers: - volumeMounts: - - mountPath: /home/sourcegraph/.ssh - name: ssh - volumes: - - name: ssh - secret: - defaultMode: 0644 - secretName: gitserver-ssh - ``` - - Convenience script: - - ```bash - # This script requires https://github.com/sourcegraph/jy and https://github.com/sourcegraph/yj - GS=base/gitserver/gitserver.StatefulSet.yaml - cat $GS | yj | jq '.spec.template.spec.containers[].volumeMounts += [{mountPath: "/home/sourcegraph/.ssh", name: "ssh"}]' | jy -o $GS - cat $GS | yj | jq '.spec.template.spec.volumes += [{name: "ssh", secret: {defaultMode: 384, secretName:"gitserver-ssh"}}]' | jy -o $GS - ``` - -3. Apply the updated `gitserver` configuration to your cluster. - - ```bash - ./kubectl-apply-all.sh - ``` - -**WARNING:** Do NOT commit the actual `id_rsa` and `known_hosts` files to your fork (unless -your fork is private **and** you are okay with storing secrets in it). - -## Configure language servers - -Code intelligence is provided through [Sourcegraph extensions](https://docs.sourcegraph.com/extensions). These language extensions communicate with language servers that are deployed inside your Sourcegraph cluster. See the README.md for each language for configuration information: - -- Go: [configure/lang/go/README.md](../configure/lang/go/README.md) -- JavaScript/TypeScript: [configure/lang/typescript/README.md](../configure/lang/typescript/README.md) - -## Increase memory or CPU limits - -If your instance contains a large number of repositories or monorepos, changing the compute resources allocated to containers can improve performance. See [Kubernetes' official documentation](https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/) for information about compute resources and how to specify then, and see [docs/scale.md](scale.md) for specific advice about what resources to tune. - -## Configure gitserver replica count - -Increasing the number of `gitserver` replicas can improve performance when your instance contains a large number of repositories. Repository clones are consistently striped across all `gitserver` replicas. Other services need to be aware of how many `gitserver` replicas exist so they can resolve an individual repo. - -To change the number of `gitserver` replicas: - -1. Update the `replicas` field in [gitserver.StatefulSet.yaml](../base/gitserver/gitserver.StatefulSet.yaml). -1. Update the `SRC_GIT_SERVERS` environment variable in the frontend service to reflect the number of replicas. - - For example, if there are 2 gitservers then `SRC_GIT_SERVERS` should have the value `gitserver-0.gitserver:3178 gitserver-1.gitserver:3178`: - - ```yaml - - env: - - name: SRC_GIT_SERVERS - value: gitserver-0.gitserver:3178 gitserver-1.gitserver:3178 - ``` - -1. Recommended: Increase [indexed-search replica count](#configure-indexed-search-replica-count) - -Here is a convenience script that performs all three steps: - -```bash -# This script requires https://github.com/sourcegraph/jy and https://github.com/sourcegraph/yj - -GS=base/gitserver/gitserver.StatefulSet.yaml - -REPLICA_COUNT=2 # number of gitserver replicas - -# Update gitserver replica count -cat $GS | yj | jq ".spec.replicas = $REPLICA_COUNT" | jy -o $GS - -# Compute all gitserver names -GITSERVERS=$(for i in `seq 0 $(($REPLICA_COUNT-1))`; do echo -n "gitserver-$i.gitserver:3178 "; done) - -# Update SRC_GIT_SERVERS environment variable in other services -find . -name "*yaml" -exec sed -i.sedibak -e "s/value: gitserver-0.gitserver:3178.*/value: $GITSERVERS/g" {} + - -IDX_SEARCH=base/indexed-search/indexed-search.StatefulSet.yaml - -# Update indexed-search replica count -cat $IDX_SEARCH | yj | jq ".spec.replicas = $REPLICA_COUNT" | jy -o $IDX_SEARCH - -# Delete sed's backup files -find . -name "*.sedibak" -delete -``` - -Commit the outstanding changes. - -## Configure indexed-search replica count - -Increasing the number of `indexed-search` replicas can improve performance and reliability when your instance contains a large number of repositories. Repository indexes are distributed evenly across all `indexed-search` replicas. - -By default `indexed-search` relies on kubernetes service discovery, so adjusting the number of replicas just requires updating the `replicas` field in [indexed-search.StatefulSet.yaml](../base/indexed-search/indexed-search.StatefulSet.yaml). - -Not Recommended: To use a static list of indexed-search servers you can configure `INDEXED_SEARCH_SERVERS` on `sourcegraph-frontend`. It uses the same format as `SRC_GIT_SERVERS` above. Adjusting replica counts will require the same steps as gitserver. - -## Assign resource-hungry pods to larger nodes - -If you have a heterogeneous cluster where you need to ensure certain more resource-hungry pods are assigned to more powerful nodes (e.g. `indexedSearch`), you can [specify node constraints](https://kubernetes.io/docs/concepts/configuration/assign-pod-node) (such as `nodeSelector`, etc.). - -This is useful if, for example, you have a very large monorepo that performs best when `gitserver` -and `searcher` are on very large nodes, but you want to use smaller nodes for -`sourcegraph-frontend`, `repo-updater`, etc. Node constraints can also be useful to ensure fast -updates by ensuring certain pods are assigned to specific nodes, preventing the need for manual pod -shuffling. - -See [the official documentation](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/) for instructions about applying node constraints. - -## Configure a storage class - -Sourcegraph expects there to be storage class named `sourcegraph` that it uses for all its persistent volume claims. This storage class must be configured before applying the base configuration to your cluster. - -Create `base/sourcegraph.StorageClass.yaml` with the appropriate configuration for your cloud provider and commit the file to your fork. - -### Google Cloud Platform (GCP) - -```yaml -# base/sourcegraph.StorageClass.yaml -kind: StorageClass -apiVersion: storage.k8s.io/v1 -metadata: - name: sourcegraph - labels: - deploy: sourcegraph -provisioner: kubernetes.io/gce-pd -parameters: - type: pd-ssd # This configures SSDs (recommended). -``` - -[Additional documentation](https://kubernetes.io/docs/concepts/storage/storage-classes/#gce-pd). - -### Amazon Web Services (AWS) - -```yaml -# base/sourcegraph.StorageClass.yaml -kind: StorageClass -apiVersion: storage.k8s.io/v1 -metadata: - name: sourcegraph - labels: - deploy: sourcegraph -provisioner: kubernetes.io/aws-ebs -parameters: - type: gp2 # This configures SSDs (recommended). -``` - -[Additional documentation](https://kubernetes.io/docs/concepts/storage/storage-classes/#aws-ebs). - -### Azure - -```yaml -# base/sourcegraph.StorageClass.yaml -kind: StorageClass -apiVersion: storage.k8s.io/v1 -metadata: - name: sourcegraph - labels: - deploy: sourcegraph -provisioner: kubernetes.io/azure-disk -parameters: - storageaccounttype: Premium_LRS # This configures SSDs (recommended). A Premium VM is required. -``` - -[Additional documentation](https://kubernetes.io/docs/concepts/storage/storage-classes/#azure-disk). - -### Other cloud providers - -```yaml -# base/sourcegraph.StorageClass.yaml -kind: StorageClass -apiVersion: storage.k8s.io/v1 -metadata: - name: sourcegraph - labels: - deploy: sourcegraph -# Read https://kubernetes.io/docs/concepts/storage/storage-classes/ to configure the "provisioner" and "parameters" fields for your cloud provider. -# SSDs are highly recommended! -# provisioner: -# parameters: -``` - -### Using a storage class with an alternate name - -If you wish to use a different storage class for Sourcegraph, then you need to update all persistent volume claims with the name of the desired storage class. Convenience script: - -```bash -#!/bin/bash - -# This script requires https://github.com/sourcegraph/jy and https://github.com/sourcegraph/yj -STORAGE_CLASS_NAME= - -find . -name "*PersistentVolumeClaim.yaml" -exec sh -c "cat {} | yj | jq '.spec.storageClassName = \"$STORAGE_CLASS_NAME\"' | jy -o {}" \; - -GS=base/gitserver/gitserver.StatefulSet.yaml - -cat $GS | yj | jq --arg STORAGE_CLASS_NAME $STORAGE_CLASS_NAME '.spec.volumeClaimTemplates = (.spec.volumeClaimTemplates | map( . * {spec:{storageClassName: $STORAGE_CLASS_NAME }}))' | jy -o $GS -``` - -## Configure Lightstep tracing - -Lightstep is a closed-source distributed tracing and performance monitoring tool created by some of the authors of Dapper. Every Sourcegraph deployment supports Lightstep, and it can be configured via the following environment variables (with example values): - -```yaml -env: - # https://about.sourcegraph.com/docs/config/site/#lightstepproject-string - - name: LIGHTSTEP_PROJECT - value: my_project - - # https://about.sourcegraph.com/docs/config/site/#lightstepaccesstoken-string - - name: LIGHTSTEP_ACCESS_TOKEN - value: abcdefg - - # If false, any logs (https://github.com/opentracing/specification/blob/master/specification.md#log-structured-data) - # from spans will be omitted from the spans sent to Lightstep. - - name: LIGHTSTEP_INCLUDE_SENSITIVE - value: true -``` - -To enable this, you must first purchase Lightstep and create a project corresponding to the Sourcegraph instance. Then, add the above environment to each deployment. - -## Configure custom Redis - -Sourcegraph supports specifying a custom Redis server for: - -- caching information (specified via the `REDIS_CACHE_ENDPOINT` environment variable) -- storing information (session data and job queues) (specified via the `REDIS_STORE_ENDPOINT` environment variable) - -If you want to specify a custom Redis server, you'll need specify the corresponding environment variable for each of the following deployments: - -- `sourcegraph-frontend` -- `repo-updater` - -## Configure custom PostgreSQL - -You can use your own PostgreSQL v9.6+ server with Sourcegraph if you wish. For example, you may prefer this if you already have existing backup infrastructure around your own PostgreSQL server, wish to use Amazon RDS, etc. - -Simply edit the relevant PostgreSQL environment variables (e.g. PGHOST, PGPORT, PGUSER, [etc.](http://www.postgresql.org/docs/current/static/libpq-envars.html)) in [base/frontend/sourcegraph-frontend.Deployment.yaml](../base/frontend/sourcegraph-frontend.Deployment.yaml) to point to your existing PostgreSQL instance. - -## Install without RBAC - -Sourcegraph communicates with the Kubernetes API for service discovery. It also has some janitor DaemonSets that clean up temporary cache data. To do that we need to create RBAC resources. - -If using RBAC is not an option, then you will not want to apply `*.Role.yaml` and `*.RoleBinding.yaml` files. - -## Add license key - -Sourcegraph's Kubernetes deployment [requires an Enterprise license key](https://about.sourcegraph.com/pricing). - -1. Create an account on or sign in to sourcegraph.com, and go to https://sourcegraph.com/subscriptions/new to obtain a license key. - -1. Once you have a license key, add it to your [site configuration](https://docs.sourcegraph.com/admin/config/site_config). - -## Use non-default namespace - -If you're deploying Sourcegraph into a non-default namespace, -refer to [base/prometheus/README.md#Namespaces](../base/prometheus/README.md#Namespaces) and -[base/grafana/README.md#Namespaces](../base/grafana/README.md#Namespaces) for further configuration instructions. - -## Pulling images locally - -In some cases, a site admin may want to pull all Docker images used in the cluster locally. For -example, if your organization requires use of a private registry, you may need to do this as an -intermediate step to mirroring them on the private registry. The following script accomplishes this -for all images under `base/`: - -```bash -for IMAGE in $(grep --include '*.yaml' -FR 'image:' base | awk '{ print $(NF) }'); do docker pull "$IMAGE"; done; -``` - -### Overlay basic principles - -An overlay specifies customizations for a base directory of Kubernetes manifests. The base has no knowledge of the overlay. -Overlays can be used for example to change the number of replicas, change a namespace, add a label etc. Overlays can refer to -other overlays that eventually refer to the base forming a directed acyclic graph with the base as the root. - -An overlay is defined in a `kustomization.yaml` file (the name of the file is fixed and there can be only one kustomization - file in one directory). To avoid complications with reference cycles an overlay can only reference resources inside the - directory subtree of the directory it resides in (symlinks are not allowed either). - -For more details about overlays please consult the `kustomize` [documentation](https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/). - -Using overlays and applying them to the cluster -can be done in two ways: by using `kubectl` or with the `kustomize` tool. - -Starting with `kubectl` client version 1.14 `kubectl` can handle `kustomization.yaml` files directly. -When using `kubectl` there is no intermediate step that generates actual manifest files. Instead the combined resources from the -overlays and the base are directly sent to the cluster. This is done with the `kubectl apply -k` command. The argument to the -command is a directory containing a `kustomization.yaml` file. - -The second way to use overlays is with the `kustomize` tool. This does generate manifest files that are then applied -in the conventional way using `kubectl apply -f`. - -### Handling overlays in this repository - -The overlays provided in this repository rely on the `kustomize` tool and the `overlay-generate-cluster.sh` script in the -root directory of this repository to generate the manifests. There are two reasons why it was set up like this: - -- It avoids having to put a `kustomization.yaml` file in the `base` directory and forcing users that don't use overlays -to deal with it (unfortunately `kubectl apply -f` doesn't work if a `kustomization.yaml` file is in the directory). -- It generates manifests instead of applying them directly. This provides opportunity to additionally validate the files -and also allows using `kubectl apply -f` with `--prune` flag turned on (`apply -k` with `--prune` does not work correctly). - -To generate the manifests run the `overlay-generate-cluster.sh` with two arguments: the name of the overlay and a -path to an output directory where the generated manifests will be. Example (assuming you are in the root directory of this -repository): - -```shell script -./overlay-generate-cluster.sh non-root generated-cluster -``` - -After executing the script you can apply the generated manifests from the `generated-cluster` directory: - -```shell script -kubectl apply --prune -l deploy=sourcegraph -f generated-cluster --recursive -``` - -Available overlays are the subdirectories of `overlays` (only give the name of the subdirectory, not the full path as an argument). - -### Namespaced overlay - -This overlay adds a namespace declaration to all the manifests. You can change the namespace by editing - `overlays/namespaced/kustomization.yaml`. - -To use it, execute this from the root directory of this repository: - -```shell script -./overlay-generate-cluster.sh namespaced generated-cluster -``` - -### Non-root overlay - -The manifests in the `base` directory specify user `root` for all containers. This overlay changes the specification to be -a non-root user. - -If you are starting a fresh installation use the overlay `non-root-create-cluster`. After creation you can use the overlay -`non-root`. - -If you already are running a Sourcegraph instance using user `root` and want to convert to running with non-root user then -you need to apply a migration step that will change the permissions of all persistent volumes so that the volumes can be -used by the non-root user. This migration is provided as overlay `migrate-to-nonroot`. After the migration you can use -overlay `non-root`. - -### Non-privileged overlay - -This overlays goes one step further than the `non-root` overlay by also removing cluster roles and cluster role bindings. - -If you are starting a fresh installation use the overlay `non-privileged-create-cluster`. After creation you can use the overlay -`non-privileged`. - - - - +Moved to https://docs.sourcegraph.com/admin/install/kubernetes/configure diff --git a/docs/grafana/README.md b/docs/grafana/README.md deleted file mode 100644 index 6f2041a2e25c..000000000000 --- a/docs/grafana/README.md +++ /dev/null @@ -1,5 +0,0 @@ -# Recommended Grafana dashboards - -* `frontend.json`: metrics for frontend/API server -* `gitserver.json`: metrics for gitserver service -* `resources.json`: CPU and memory usage across all services diff --git a/docs/grafana/frontend.json b/docs/grafana/frontend.json deleted file mode 100644 index 1a355b260882..000000000000 --- a/docs/grafana/frontend.json +++ /dev/null @@ -1,632 +0,0 @@ -{ - "__inputs": [ - { - "name": "DS_PRODUCTION", - "label": "production", - "description": "", - "type": "datasource", - "pluginId": "prometheus", - "pluginName": "Prometheus" - } - ], - "__requires": [ - { - "type": "panel", - "id": "graph", - "name": "Graph", - "version": "" - }, - { - "type": "grafana", - "id": "grafana", - "name": "Grafana", - "version": "4.0.0" - }, - { - "type": "datasource", - "id": "prometheus", - "name": "Prometheus", - "version": "1.0.0" - } - ], - "id": null, - "title": "dc:frontend", - "tags": [], - "style": "dark", - "timezone": "browser", - "editable": true, - "sharedCrosshair": false, - "hideControls": false, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "templating": { - "list": [] - }, - "annotations": { - "list": [] - }, - "schemaVersion": 13, - "version": 34, - "links": [], - "gnetId": null, - "rows": [ - { - "title": "Dashboard Row", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "grid": {}, - "id": 1, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(label_replace(job:src_http_request_count:rate5m, \"c\", \"${1}00s\", \"code\", \"([0-9]).*\")) by (c)", - "intervalFactor": 2, - "legendFormat": "{{c}}", - "refId": "A", - "step": 60 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "frontend QPS, HTTP code [rate-5m]", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "grid": {}, - "id": 2, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(route:src_http_request_count:rate5m{route=~\"blob|graphql|home|page.def.landing|page.repo.landing|repo|repo-branches|search|settings|sign-in|site-admin|tree\"}) by (route)", - "intervalFactor": 2, - "legendFormat": "{{route}}", - "refId": "A", - "step": 60 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "frontend QPS, route [rate-5m]", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - } - ], - "showTitle": false, - "titleSize": "h6", - "height": "250px", - "repeat": null, - "repeatRowId": null, - "repeatIteration": null, - "collapse": false - }, - { - "title": "Dashboard Row", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "grid": {}, - "id": 3, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "histogram_quantile(0.90, route:src_http_request_duration_seconds_bucket:rate5m{route=~\"blob|graphql|home|page.def.landing|page.repo.landing|repo|repo-branches|search|settings|sign-in|site-admin|tree\"})", - "intervalFactor": 2, - "legendFormat": "{{route}}", - "refId": "A", - "step": 60 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "frontend p90 req duration [rate-5m]", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "s", - "logBase": 1, - "max": null, - "min": 0, - "show": true - }, - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "grid": {}, - "id": 4, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "histogram_quantile(0.75, route:src_http_request_duration_seconds_bucket:rate5m{route=~\"blob|graphql|home|page.def.landing|page.repo.landing|repo|repo-branches|search|settings|sign-in|site-admin|tree\"})", - "intervalFactor": 2, - "legendFormat": "{{route}}", - "refId": "A", - "step": 60 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "frontend p75 req duration [rate-5m]", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "s", - "label": "", - "logBase": 1, - "max": null, - "min": 0, - "show": true - }, - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - } - ], - "showTitle": false, - "titleSize": "h6", - "height": 250, - "repeat": null, - "repeatRowId": null, - "repeatIteration": null, - "collapse": false - }, - { - "title": "Dashboard Row", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "id": 5, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 4, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "label_replace( irate(process_cpu_seconds_total{app=\"sourcegraph-frontend\"}[10m]), \"inst\", \"$1\", \"instance\", \"[a-z0-9\\\\-]+\\\\-([a-z0-9]+)\" ) * 100", - "intervalFactor": 2, - "legendFormat": "{{ inst }}", - "refId": "A", - "step": 120 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "CPU (%)", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "id": 6, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 4, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "label_replace( process_resident_memory_bytes{app=\"sourcegraph-frontend\"}, \"inst\", \"$1\", \"instance\", \"[a-z0-9\\\\-]+\\\\-([a-z0-9]+)\" ) / 1024 / 1024 / 1024", - "intervalFactor": 2, - "legendFormat": "{{ inst }}", - "refId": "A", - "step": 120 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Memory, RSS (GB)", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "id": 7, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 4, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "label_replace( process_virtual_memory_bytes{app=\"sourcegraph-frontend\"}, \"inst\", \"$1\", \"instance\", \"[a-z0-9\\\\-]+\\\\-([a-z0-9]+)\" ) / 1024 / 1024 / 1024", - "intervalFactor": 2, - "legendFormat": "{{ inst }}", - "refId": "A", - "step": 120 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Memory, VSZ (GB)", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - } - ], - "showTitle": false, - "titleSize": "h6", - "height": 250, - "repeat": null, - "repeatRowId": null, - "repeatIteration": null, - "collapse": false - } - ] -} \ No newline at end of file diff --git a/docs/grafana/gitserver.json b/docs/grafana/gitserver.json deleted file mode 100644 index d5aae752f3ed..000000000000 --- a/docs/grafana/gitserver.json +++ /dev/null @@ -1,648 +0,0 @@ -{ - "__inputs": [ - { - "name": "DS_PRODUCTION", - "label": "production", - "description": "", - "type": "datasource", - "pluginId": "prometheus", - "pluginName": "Prometheus" - } - ], - "__requires": [ - { - "type": "panel", - "id": "graph", - "name": "Graph", - "version": "" - }, - { - "type": "grafana", - "id": "grafana", - "name": "Grafana", - "version": "4.0.0" - }, - { - "type": "datasource", - "id": "prometheus", - "name": "Prometheus", - "version": "1.0.0" - } - ], - "id": null, - "title": "dc:gitserver", - "tags": [ - "overview" - ], - "style": "dark", - "timezone": "utc", - "editable": true, - "sharedCrosshair": false, - "hideControls": false, - "time": { - "from": "now-6h", - "to": "now" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "templating": { - "list": [] - }, - "annotations": { - "list": [] - }, - "refresh": false, - "schemaVersion": 13, - "version": 15, - "links": [], - "gnetId": null, - "rows": [ - { - "title": "Row", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "grid": {}, - "id": 12, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(src_gitserver_exec_duration_seconds_count{status=~\"[0-9]+\"}[10m])) by (job)", - "intervalFactor": 2, - "legendFormat": "{{job}}:{{cmd}}", - "refId": "A", - "step": 60 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Total QPS [rate-10m]", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "grid": {}, - "id": 3, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum(rate(src_gitserver_exec_duration_seconds_count{status=~\"[0-9]+\"}[10m])) by (cmd, job)", - "intervalFactor": 2, - "legendFormat": "{{job}}:{{cmd}}", - "refId": "A", - "step": 60 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "QPS by operation [rate-10m]", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - } - ], - "showTitle": false, - "titleSize": "h6", - "height": "250px", - "repeat": null, - "repeatRowId": null, - "repeatIteration": null, - "collapse": false - }, - { - "title": "New row", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "grid": {}, - "id": 5, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "histogram_quantile(0.9, sum(rate(src_gitserver_exec_duration_seconds_bucket{status=~\"[0-9]+\"}[10m])) by (le, cmd, job))", - "intervalFactor": 2, - "legendFormat": "{{job}}:{{cmd}}", - "refId": "A", - "step": 60 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "p90 req duration [rate-10m]", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "s", - "logBase": 1, - "max": null, - "min": 0, - "show": true - }, - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "grid": {}, - "id": 13, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "histogram_quantile(0.75, sum(rate(src_gitserver_exec_duration_seconds_bucket{status=~\"[0-9]+\"}[10m])) by (le, cmd, job))", - "intervalFactor": 2, - "legendFormat": "{{job}}:{{cmd}}", - "refId": "A", - "step": 60 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "p75 req duration [rate-10m]", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "s", - "logBase": 1, - "max": null, - "min": 0, - "show": true - }, - { - "format": "short", - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - } - ], - "showTitle": false, - "titleSize": "h6", - "height": 283, - "repeat": null, - "repeatRowId": null, - "repeatIteration": null, - "collapse": false - }, - { - "title": "New row", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "id": 10, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by (job)(src_gitserver_exec_running)", - "hide": false, - "intervalFactor": 2, - "legendFormat": "{{ job }}", - "refId": "A", - "step": 60 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "git execs running", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "id": 11, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "sum by (status, cmd)(rate(src_gitserver_exec_duration_seconds_count{status!=\"\"}[10m]))", - "intervalFactor": 2, - "legendFormat": "{{cmd}} {{status}}", - "refId": "A", - "step": 60 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Exit Status QPS [10m]", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - } - ], - "showTitle": false, - "titleSize": "h6", - "height": "250px", - "repeat": null, - "repeatRowId": null, - "repeatIteration": null, - "collapse": false - }, - { - "title": "New row", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "grid": {}, - "id": 9, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 2, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "min(src_gitserver_disk_space_available) BY (job) / 1000 / 1000 / 1000", - "intervalFactor": 2, - "legendFormat": "{{job}}", - "refId": "A", - "step": 40 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "Free Disk Space", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "cumulative" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "gbytes", - "label": "", - "logBase": 1, - "max": null, - "min": 0, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - } - ], - "showTitle": false, - "titleSize": "h6", - "height": 287, - "repeat": null, - "repeatRowId": null, - "repeatIteration": null, - "collapse": false - } - ] -} \ No newline at end of file diff --git a/docs/grafana/resources.json b/docs/grafana/resources.json deleted file mode 100644 index 2b7c5a452157..000000000000 --- a/docs/grafana/resources.json +++ /dev/null @@ -1,328 +0,0 @@ -{ - "__inputs": [ - { - "name": "DS_PRODUCTION", - "label": "production", - "description": "", - "type": "datasource", - "pluginId": "prometheus", - "pluginName": "Prometheus" - } - ], - "__requires": [ - { - "type": "panel", - "id": "graph", - "name": "Graph", - "version": "" - }, - { - "type": "grafana", - "id": "grafana", - "name": "Grafana", - "version": "4.0.0" - }, - { - "type": "datasource", - "id": "prometheus", - "name": "Prometheus", - "version": "1.0.0" - } - ], - "id": null, - "title": "dc:resources", - "tags": [], - "style": "dark", - "timezone": "browser", - "editable": true, - "sharedCrosshair": false, - "hideControls": false, - "time": { - "from": "2018-04-04T19:22:55.424Z", - "to": "2018-04-05T19:22:55.424Z" - }, - "timepicker": { - "refresh_intervals": [ - "5s", - "10s", - "30s", - "1m", - "5m", - "15m", - "30m", - "1h", - "2h", - "1d" - ], - "time_options": [ - "5m", - "15m", - "1h", - "6h", - "12h", - "24h", - "2d", - "7d", - "30d" - ] - }, - "templating": { - "list": [] - }, - "annotations": { - "list": [] - }, - "refresh": false, - "schemaVersion": 13, - "version": 14, - "links": [], - "gnetId": null, - "rows": [ - { - "title": "Dashboard Row", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "id": 1, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 12, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "(max by (job)(irate(process_cpu_seconds_total[5m]))) * 100", - "intervalFactor": 2, - "legendFormat": "{{job}}", - "refId": "A", - "step": 120 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "CPU, max per job (%)", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - } - ], - "showTitle": false, - "titleSize": "h6", - "height": 266, - "repeat": null, - "repeatRowId": null, - "repeatIteration": null, - "collapse": false - }, - { - "title": "Dashboard Row", - "panels": [ - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "id": 3, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "(max by (job)(process_resident_memory_bytes)) / 1024 / 1024 / 1024", - "intervalFactor": 2, - "legendFormat": "{{job}}", - "refId": "A", - "step": 240 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "RSS Memory, max per job (GB)", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - }, - { - "aliasColors": {}, - "bars": false, - "datasource": "${DS_PRODUCTION}", - "editable": true, - "error": false, - "fill": 1, - "id": 4, - "legend": { - "avg": false, - "current": false, - "max": false, - "min": false, - "show": true, - "total": false, - "values": false - }, - "lines": true, - "linewidth": 1, - "links": [], - "nullPointMode": "connected", - "percentage": false, - "pointradius": 5, - "points": false, - "renderer": "flot", - "seriesOverrides": [], - "span": 6, - "stack": false, - "steppedLine": false, - "targets": [ - { - "expr": "max by (job)(process_virtual_memory_bytes) / 1024 / 1024 / 1024", - "intervalFactor": 2, - "legendFormat": "{{job}}", - "refId": "A", - "step": 240 - } - ], - "thresholds": [], - "timeFrom": null, - "timeShift": null, - "title": "VSZ Memory, max per job (GB)", - "tooltip": { - "msResolution": false, - "shared": true, - "sort": 0, - "value_type": "individual" - }, - "type": "graph", - "xaxis": { - "mode": "time", - "name": null, - "show": true, - "values": [] - }, - "yaxes": [ - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - }, - { - "format": "short", - "label": null, - "logBase": 1, - "max": null, - "min": null, - "show": true - } - ] - } - ], - "showTitle": false, - "titleSize": "h6", - "height": 228, - "repeat": null, - "repeatRowId": null, - "repeatIteration": null, - "collapse": false - } - ] -} \ No newline at end of file diff --git a/docs/helm.migrate.md b/docs/helm.migrate.md deleted file mode 100644 index 817d3f1ddc97..000000000000 --- a/docs/helm.migrate.md +++ /dev/null @@ -1,75 +0,0 @@ -# Migrating from the legacy Sourcegraph Helm chart (2.10.x and prior) - -Two things have changed in 2.11.x that require migration: - -- Gitserver is now configured using [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/). -- We have [a new deployment strategy](#why-is-there-a-new-deployment-strategy). - -## Migrating - -These steps will uninstall Sourcegraph from your cluster while preserving your data. Then you will be able to deploy Sourcegraph using the new process. If you would like help with this process, please reach out to support@sourcegraph.com. - -**Please read through all instructions first before starting the migration so you know what is involved** - -1. Make a backup of the yaml deployed to your cluster. - - ```bash - kubectl get all --export -o yaml > backup.yaml - ``` - -2. Set the reclaim policy for your existing deployments to `retained`. - - ```bash - kubectl get pv -o json | jq --raw-output ".items | map(select(.spec.claimRef.name)) | .[] | \"kubectl patch pv -p '{\\\"spec\\\":{\\\"persistentVolumeReclaimPolicy\\\":\\\"Retain\\\"}}' \\(.metadata.name)\"" | bash - ``` - -3. (**Downtime starts here**) Delete the `sourcegraph` release from your cluster. - - ```bash - helm del --purge sourcegraph - ``` - -4. Remove `tiller` from your cluster - - ```bash - helm reset - ``` - -5. Update the old persistent volumes so they can be reused by the new deployment - - ```bash - # mark all persistent volumes as claimable by the new deployments - - kubectl get pv -o json | jq --raw-output ".items | map(select(.spec.claimRef.name)) | .[] | \"kubectl patch pv -p '{\\\"spec\\\":{\\\"claimRef\\\":{\\\"uid\\\":null}}}' \\(.metadata.name)\"" | bash - - # rename the `gitserver` persistent volumes so that the new `gitserver` stateful set can re-use it - - kubectl get pv -o json | jq --raw-output ".items | map(select(.spec.claimRef.name | contains(\"gitserver-\"))) | .[] | \"kubectl patch pv -p '{\\\"spec\\\":{\\\"claimRef\\\":{\\\"name\\\":\\\"repos-gitserver-\\(.spec.claimRef.name | ltrimstr(\"gitserver-\") | tonumber - 1)\\\"}}}' \\(.metadata.name)\"" | bash - ``` - -6. Proceed with the normal [installation steps](install.md). 🚨 When following the instructions for [configuring a storage class](configure.md#configure-a-storage-class), you need to make sure that the newly configured storage class has the same configuration as the one that you were using in the legacy helm deployment. Steps: - - 1. When creating the new storage class, use the same `cluster.storageClass.name` and `cluster.storageClass.zone` fields that were in your old [values.yaml](https://github.com/sourcegraph/deploy-sourcegraph/blob/helm-legacy/values.yaml). - - 1. Use the convenience script in ["Using a storage class with an alternate name"](configure.md#using-a-storage-class-with-an-alternate-name) to update all the `storageClassName` references in the PVCs to refer to the old `cluster.storageClass.name` field. - -7. The previous step produces a fresh base state, so you will need to reconfigure your cluster by following the relevant steps in [configure.md](configure.md) (e.g. exposing ports, applying your site config, enabling other services like language servers, Prometheus, Alertmanager, Jaeger, etc.). - - **Downtime ends once installation and configuration is complete** - -## Why is there a new deployment strategy? - -2.10.x and prior was deployed by configuring `values.yaml` and using `helm` to generate the final yaml to deploy to a cluster. - -There were a few downsides with this approach: - -- `values.yaml` was a custom configuration format defined by us which implicitly made configuring certain Kubernetes settings special cases. We didn't want this to grow over time into an unmaintainable/unusable mess. -- If customers wanted to configure things not supported in `values.yaml`, then we would either need to add support or the customer would need to make further modifications to the generated yaml. -- Writing Go templates inside of yaml was error prone and hard to maintain. It was too easy to make a silly mistake and generate invalid yaml. Our editors could not help us because Go template logic made the yaml templates not valid yaml. -- It required using `helm` to generate templates even though some customers don't care to use `helm` to deploy the yaml. - -Our new approach is simpler and more flexible. - -- We have removed our dependency on `helm`. It is no longer needed to generate templates, and we no longer recommend it as the easiest way to deploy our yaml to a cluster. You are still free to use `helm` to deploy to your cluster if you wish. -- Our base config is pure yaml which can be deployed directly to a cluster. It is easier for you to use, and also easier for us to maintain. -- You can configure our base yaml using whatever process best for you (Git ops, [Kustomize](https://github.com/kubernetes-sigs/kustomize), custom scripts, etc.). We provide [documentation and recipies for common customizations](configure.md). diff --git a/docs/images/prometheus.png b/docs/images/prometheus.png deleted file mode 100644 index 609794551a9d..000000000000 Binary files a/docs/images/prometheus.png and /dev/null differ diff --git a/docs/install.md b/docs/install.md index 529253ddce29..54c4c6fdb503 100644 --- a/docs/install.md +++ b/docs/install.md @@ -1,75 +1 @@ -# Installing Sourcegraph - -> **Note:** Sourcegraph sends performance and usage data to Sourcegraph to help us make our product -> better for you. The data sent does NOT include any source code or file data (including URLs that -> might implicitly contain this information). You can view traces and disable telemetry in the site -> admin area on the server. - -## Requirements - -- [Kubernetes](https://kubernetes.io/) v1.9 or later with an SSD storage class - - [Cluster role administrator access](https://kubernetes.io/docs/reference/access-authn-authz/rbac/) -- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) v1.9.7 or later -- Access to server infrastructure on which you can create a Kubernetes cluster (see - [resource allocation guidelines](scale.md)). -- [Sourcegraph Enterprise license](./configure.md#add-license-key). You can run through these instructions without one, but you must obtain a license for instances of more than 10 users. -- A valid domain name for your Sourcegraph instance ([to enable SSL/TLS](https://github.com/sourcegraph/deploy-sourcegraph/blob/master/docs/configure.md#configure-tlsssl)) -- A valid TLS certificate (whether from a trusted certificate authority such as Comodo, RapidSSL, or others, a self-signed certificate that can be distributed and installed across all users' machines, or the ability to use an existing reverse proxy that provides SSL termination for the connection) -- Access tokens or other credentials to [connect to your code hosts of choice](https://docs.sourcegraph.com/admin/external_service) -- [Administrative access to your single sign-on (SSO) provider of choice](https://docs.sourcegraph.com/admin/auth) - -## Steps - -1. [Provision a Kubernetes cluster](k8s.md) on the infrastructure of your choice. -1. Make sure you have configured `kubectl` to [access your cluster](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/). - - - If you are using GCP, you'll need to give your user the ability to create roles in Kubernetes [(see GCP's documentation)](https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control#prerequisites_for_using_role-based_access_control): - - ```bash - kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value account) - ``` - -1. Clone this repository and check out the version tag you wish to deploy. - - ```bash - # Go to https://github.com/sourcegraph/deploy-sourcegraph/tags and select the latest version tag - git clone https://github.com/sourcegraph/deploy-sourcegraph && cd deploy-sourcegraph && git checkout ${VERSION} - ``` - -1. Configure the `sourcegraph` storage class for the cluster by reading through ["Configure a storage class"](./configure.md#configure-a-storage-class). - -1. If you want to add a large number of repositories to your instance, you should [configure the number of gitserver replicas](configure.md#configure-gitserver-replica-count) and [the number of indexed-search replicas](configure.md#configure-indexed-search-replica-count) _before_ you continue with the next step. (See ["Tuning replica counts for horizontal scalability"](scale.md#tuning-replica-counts-for-horizontal-scalability) for guidelines.) - -1. Deploy the desired version of Sourcegraph to your cluster: - - ```bash - ./kubectl-apply-all.sh - ``` - -1. Monitor the status of the deployment. - - ```bash - watch kubectl get pods -o wide - ``` - -1. Once the deployment completes, verify Sourcegraph is running by temporarily making the frontend port accessible: - - kubectl 1.9.x: - - ```bash - kubectl port-forward $(kubectl get pod -l app=sourcegraph-frontend -o template --template="{{(index .items 0).metadata.name}}") 3080 - ``` - - kubectl 1.10.0 or later: - - ``` - kubectl port-forward svc/sourcegraph-frontend 3080:30080 - ``` - - Open http://localhost:3080 in your browser and you will see a setup page. Congrats, you have Sourcegraph up and running! - -1. Now [configure your deployment](configure.md). - -### Troubleshooting - -See the [Troubleshooting docs](troubleshoot.md). +Moved to https://docs.sourcegraph.com/admin/install/kubernetes diff --git a/docs/k8s.azure.md b/docs/k8s.azure.md index 847fbf3610dd..663a28f4baae 100644 --- a/docs/k8s.azure.md +++ b/docs/k8s.azure.md @@ -1,64 +1 @@ -# Kubernetes on Azure - -Install the [Azure CLI tool](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) and log in: - -``` -az login -``` - -Sourcegraph on Kubernetes requires at least **16 cores** in the **DSv3** family in the Azure location of your choice (e.g. `eastus`), so make sure you have enough available (if not, [request a quota increase](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request)): - -``` -$ az vm list-usage -l eastus -o table -Name CurrentValue Limit --------------------------------- -------------- ------- -... -Standard DSv3 Family vCPUs 0 32 -... -``` - -Ensure that these Azure service providers are enabled: - -``` -az provider register -n Microsoft.Network -az provider register -n Microsoft.Storage -az provider register -n Microsoft.Compute -az provider register -n Microsoft.ContainerService -``` - -Create a resource group: - -``` -az group create --name sourcegraphResourceGroup --location eastus -``` - -Create a cluster: - -``` -az aks create --resource-group sourcegraphResourceGroup --name sourcegraphCluster --node-count 1 --generate-ssh-keys --node-vm-size Standard_D16s_v3 -``` - -Connect to the cluster for future `kubectl` commands: - -``` -az aks get-credentials --resource-group sourcegraphResourceGroup --name sourcegraphCluster -``` - -Follow the [Sourcegraph cluster installation instructions](./#install-sourcegraph-data-center-on-your-cluster) with `storageClass` set to `managed-premium` in `config.json`: - -```diff -- "storageClass": "default" -+ "storageClass": "managed-premium" -``` - -You can see if the pods are ready and check for installation problems through the Kubernetes dashboard: - -``` -az aks browse --resource-group sourcegraphResourceGroup --name sourcegraphCluster -``` - -Set up a load balancer to make the main web server accessible over the network to external users: - -``` -kubectl expose deployment sourcegraph-frontend --type=LoadBalancer --name=sourcegraphloadbalancer --port=80 --target-port=3080 -``` +Moved to https://docs.sourcegraph.com/admin/install/kubernetes/azure diff --git a/docs/k8s.eks.md b/docs/k8s.eks.md index a47511ef2145..f9c7f3ba77ee 100644 --- a/docs/k8s.eks.md +++ b/docs/k8s.eks.md @@ -1,130 +1 @@ -# Kubernetes on Amazon EKS - -[Amazon EKS](https://aws.amazon.com/eks/) is Amazon's managed Kubernetes offering, similar to how Google Cloud offers managed Kubernetes clusters (GKE). - -If your preferred cloud provider is Amazon, we strongly recommend using EKS instead of plain EC2. By using EKS, you will not need to manage your own Kubernetes control plane (complex). Instead, Amazon will provide it for you and you will only be responsible for managing Sourcegraph, which runs on the Kubernetes cluster. - -## Create the Amazon EKS Service Role - -Follow the [EKS Getting Started guide](https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html#eks-prereqs) to create the IAM EKS service role: - -1. Open the [**IAM console**](https://console.aws.amazon.com/iam/). -2. Click **Roles** -> **Create role**. -3. Choose **EKS**, accept the defaults and **Next: Permissions**. -4. Click **Next: Review**. -5. Under **Role name**, enter `eksServiceRoleSourcegraph`, then **Create role**. - -## Create the Amazon EKS Cluster VPC - -1. Open the [**AWS CloudFormation console**](https://console.aws.amazon.com/cloudformation/). -2. Ensure the region in the top right navigation bar is an EKS-supported region (see [this list](https://docs.aws.amazon.com/general/latest/gr/eks.html)). -3. Click **Create stack**, and select **"with new resources"**. -4. When prompted to specify a template, select "Amazon S3 URL" as your **Template Source** and enter: - ``` - https://amazon-eks.s3-us-west-2.amazonaws.com/cloudformation/2020-04-21/amazon-eks-vpc-sample.yaml - ``` -5. Under **Stack name**, enter `eks-vpc-sourcegraph`. -6. Click **Next** through the following pages until you get the option to **Create stack**. Review the configuration and click **Create stack**. - -For more details on these steps, refer to [Amazon EKS prerequisites: Create your Amazon EKS cluster VPC](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#vpc-create). - -## Create the Amazon EKS Cluster - -1. Open the [**EKS console**](https://console.aws.amazon.com/eks/home#/clusters). -2. Click **Create cluster**. -3. Under **Cluster name**, enter `sourcegraph`. -4. Under **Cluster Service Role**, select `eksServiceRoleSourcegraph`. -5. Under **VPC**, select `eks-vpc-sourcegraph`. -6. Under **Security groups**, select the one prefixed `eks-vpc-sourcegraph-ControlPlaneSecurityGroup-`. (Do NOT select `NodeSecurityGroup`.) -7. Accept all other values as default and click **Create**. -8. Wait for the cluster to finish **CREATING**. This will take around 10 minutes to complete, so grab some ☕. - -For more details on these steps, refer to [Amazon EKS prerequisites: Create your Amazon EKS cluster](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-console.html#eks-create-cluster). - -## Create Kubernetes cluster worker nodes - -1. Open the [**AWS CloudFormation console**](https://console.aws.amazon.com/cloudformation/). -2. Click **Create stack** -3. Select the very last **Specify an Amazon S3 template URL** option and enter - ``` - https://amazon-eks.s3.us-west-2.amazonaws.com/cloudformation/2020-04-21/amazon-eks-nodegroup.yaml - ``` -4. Under **Stack name**, enter `sourcegraph-worker-nodes`. -5. Under **ClusterName**, enter the exact cluster name you used (`sourcegraph`). -6. Under **ClusterControlPlaneSecurityGroup**, scroll down or begin typing and select the option prefixed `eks-vpc-sourcegraph-ControlPlaneSecurityGroup-` (Do NOT select the `NodeSecurityGroup`.) -7. Under **NodeGroupName**, enter `sourcegraph-node-group`. -8. Choose **NodeAutoScalingGroupMinSize** and **NodeAutoScalingGroupMaxSize** and **NodeInstanceType** based on the following chart: - -
- -| Users | Instance type | Min nodes | Max nodes | Cost est. | Attached Storage | Root Storage | -| ------------ | ------------- | --------- | --------- | ---------- | ---------------- | ------------ | -| 10-500 | m5.4xlarge | 3 | 6 | $59-118/day | 500 GB | 100 GB | -| 500-2000 | m5.4xlarge | 6 | 10 | $118-195/day | 500 GB | 100 GB | - - -
- -> **Note:** You can always come back here later and modify these values to scale up/down the number of worker nodes. To do so, just visit the console page again, select **Actions**, **Create Change Set For Current Stack**, enter the same template URL mentioned above, modify the values and hit "next" until reviewing final changes, and finally **Execute**. - -9. Under **KeyName**, choose a valid key name so that you can SSH into worker nodes if needed in the future. -10. Under **VpcId**, select `eks-vpc-sourcegraph-VPC`. -11. Under **Subnets**, search for and select *all* `eks-vpc-sourcegraph` subnets. -12. Click **Next** through the following pages until you get the option to **Create stack**. Review the configuration and click **Create stack**. - -For more details on these steps, refer to [Worker Nodes: Amazon EKS-optimized Linux AMI](https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html). - -## Install `kubectl` and configure access to the cluster - -On your dev machine: - -1. Install the `aws` CLI tool: [bundled installer](https://docs.aws.amazon.com/cli/latest/userguide/awscli-install-bundle.html), [other installation methods](https://docs.aws.amazon.com/cli/latest/userguide/installing.html). -2. Follow [these instructions](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) to create an access key and `aws configure` the CLI to use it. -3. Install `kubectl` and `aws-iam-authenticator` by following [these steps](https://docs.aws.amazon.com/eks/latest/userguide/configure-kubectl.html). -4. [Configure `kubectl` to interact with your cluster](https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html): - ``` - aws eks update-kubeconfig --name ${cluster_name} - ``` - -**Important**: If `kubectl` commands prompt you for username/password, be sure that `kubectl version` reports a client version of v1.10+. Older versions of kubectl do not work with the authentication configuration provided by Amazon EKS. - -At this point, `kubectl get svc` should show something like: - -``` -$ kubectl get svc -NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE -kubernetes ClusterIP 172.20.0.1 443/TCP 4m -``` - -## Enable worker nodes to join the Kubernetes cluster - -Now it is time to enable the worker nodes created by CloudFormation to actually join the Kubernetes cluster: - -1. Download, edit, and save [this configuration map file](https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html): - ``` - curl -O curl -o aws-auth-cm.yaml https://amazon-eks.s3.us-west-2.amazonaws.com/cloudformation/2020-04-21/aws-auth-cm.yaml - ``` -2. Replace `rolearn` in the file (_do not_ modify the file otherwise) with the correct value. To find this value: - - Open the [**AWS CloudFormation console**](https://console.aws.amazon.com/cloudformation/). - - Locate and select the `sourcegraph-worker-nodes` row. - - Click the **Output** tab, and copy the **NodeInstanceRole** value. -3. Run `kubectl apply -f aws-auth-cm.yaml` -4. Watch `kubectl get nodes --watch` until all nodes appear with status `Ready` (this will take a few minutes). - -## Create the default storage class - -EKS does not have a default Kubernetes storage class out of the box, but one is needed. - -Follow [these short steps](https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html) to create it. (Simply copy and paste the suggested file and run all suggested `kubectl` commands. You do not need to modify the file.) - -## Deploy the Kubernetes Web UI Dashboard (optional) - -See [Tutorial: Deploy the Kubernetes Dashboard](https://docs.aws.amazon.com/eks/latest/userguide/dashboard-tutorial.html). - -## Deploy Sourcegraph! 🎉 - -Your Kubernetes cluster is now all set up and running! - -Luckily, deploying Sourcegraph on your cluster is much easier and quicker than the above steps. :) - -Follow our [installation documentation](install.md) to continue. +Moved to https://docs.sourcegraph.com/admin/install/kubernetes/eks diff --git a/docs/k8s.md b/docs/k8s.md index 4c62b5ab1ab3..00ec6b5f5594 100644 --- a/docs/k8s.md +++ b/docs/k8s.md @@ -1,45 +1 @@ -# Provisioning a Kubernetes cluster - -
- -**Security note:** If you intend to set this up as a production instance, we recommend you create the cluster in a VPC -or other secure network that restricts unauthenticated access from the public Internet. You can later expose the -necessary ports via an -[Internet Gateway](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Internet_Gateway.html) or equivalent -mechanism. Take care to secure your cluster in a manner that meets your organization's security requirements. - -
- -Follow the instructions linked in the table below to provision a Kubernetes cluster for the -infrastructure provider of your choice, using the recommended node and list types in the -table. - -> Note: Sourcegraph can run on any Kubernetes cluster, so if your infrastructure provider is not -> listed, see the "Other" row. Pull requests to add rows for more infrastructure providers are -> welcome! - -
- - - - - - - - - - - - - - - - - - - - - - -
Compute nodes
ProviderNode typeBoot/ephemeral disk sizeReference
Amazon EKS (better than plain EC2) m5.4xlarge N/A Deploy Sourcegraph on EKS
AWS EC2 m5.4xlarge N/A Run Kubernetes on EC2
Google Kubernetes Engine (GKE) n1-standard-16 100 GB (default) GKE Quickstart
Azure D16 v3100 GB (SSD preferred) Deploy Sourcegraph on Azure
Other 16 vCPU, 60 GiB memory per node 100 GB (SSD preferred) Kubernetes Service Providers
-
+Moved to https://docs.sourcegraph.com/admin/install/kubernetes/k8s diff --git a/docs/migrate.md b/docs/migrate.md index 55f26386a51c..4b74f51bc2c6 100644 --- a/docs/migrate.md +++ b/docs/migrate.md @@ -1,190 +1 @@ -# Migrations - - -This document records manual migrations that are necessary to apply when upgrading to certain -Sourcegraph versions. All manual migrations between the version you are upgrading from and the -version you are upgrading to should be applied (unless otherwise noted). - -## 3.16 - -### Note: The following deployments have had their `strategy` changed from `rolling` to `recreate`: - - redis-cache - - redis-store - - pgsql - - precise-code-intel-bundle-manager - - prometheus - -This change was made to avoid two pods writing to the same volume and causing corruption. - -To implement these changes run the followng: - -```shell script -kubectl apply -f base/precise-code-intel/bundle-manager.Deployment.yaml -kubectl apply -f base/redis/redis-cache.Deployment.yaml -kubectl apply -f base/redis/redis-store.Deployment.yaml -kubectl apply -f base/prometheus/prometheus.Deployment.yaml -kubectl apply -f base/pgsql/pgsql.Deployment -``` - -For more information see[#676](https://github.com/sourcegraph/deploy-sourcegraph/pull/676) - -## 3.15 - -### Note: Prometheus and Grafana resource requirements increase - -Resource _requests and limits_ for Grafana and Prometheus are now equal to the following: - -- Grafana 100Mi -> 512Mi -- Prometheus: 500M -> 3G - -This change was made to ensure that even if another Sourcegraph service starts consuming more memory than expected and the Kubernetes node has been over-provisioned, that Sourcegraph's monitoring will still have enough memory to run and monitor / send alerts to the site admin. For additional information see [#638](https://github.com/sourcegraph/deploy-sourcegraph/pull/638) - -### (optional) Keep LSIF data through manual migration - -If you have previously uploaded LSIF precise code intelligence data and wish to retain it after upgrading, you will need to perform this migration. - -**Skipping the migration** - -If you choose not to migrate the data, Sourcegraph will use basic code intelligence until you upload LSIF data again. - -You may run the following commands to remove the now unused resources: - -```shell script -kubectl delete svc lsif-server -kubectl delete deployment lsif-server -kubectl delete pvc lsif-server -``` - -**Migrating** - -The lsif-server service has been replaced by a trio of services defined in [precise-code-intel](../base/precise-code-intel), -and the persistent volume claim in which lsif-server stored converted LSIF uploads has been replaced by -[bundle storage](../base/precise-code-intel/bundle-storage.PersistentVolume.yaml). - -Upgrading to 3.15 will create a new empty volume for LSIF data. Without any action, the LSIF data previously uploaded -to the instance will be lost. To retain old LSIF data, perform the following migration steps. This will cause some -temporary downtime for precise code intelligence. - -**Migrating** - -1. Deploy 3.15. This will create a `bundle-manager` persistent volume claim. -2. Release the claims to old and new persistent volumes by taking down `lsif-server` and `precise-code-intel-bundle-manager`. - -```shell script -kubectl delete svc lsif-server -kubectl delete deployment lsif-server -kubectl delete deployment precise-code-intel-bundle-manager -``` - -3. Deploy the `lsif-server-migrator` deployment to transfer the data from the old volume to the new volume. - -```shell script -kubectl apply -f configure/lsif-server-migrator/lsif-server-migrator.Deployment.yaml -``` - -4. Watch the output of the `lsif-server-migrator` until the copy completes (`'Copy complete!'`). - -```shell script -kubectl logs lsif-server-migrator -``` - -5. Tear down the deployment and re-create the bundle manager deployment. - -```shell script -kubectl delete deployment lsif-server-migrator -./kubectl-apply-all.sh -``` - -6. Remove the old persistent volume claim. - -```shell script -kubectl delete pvc lsif-server -``` - -## 3.11 - -In 3.11 we removed the management console. If you make use of `CRITICAL_CONFIG_FILE` or `SITE_CONFIG_FILE`, please refer to the [migration notes for Sourcegraph 3.11+](https://docs.sourcegraph.com/admin/migration/3_11). - -## 3.10 - -In 3.9 we migrated `indexed-search` to a StatefulSet. However, we didn't migrate the `indexed-search` service to a headless service. You can't mutate a service, so you will need to replace the service before running `kubectl-apply-all.sh`: - -``` bash -# Replace since we can't mutate services -kubectl replace --force -f base/indexed-search/indexed-search.Service.yaml - -# Now apply all so frontend knows how to speak to the new service address -# for indexed-search -./kubectl-apply-all.sh -``` - -## 3.9 - -In 3.9 `indexed-search` is migrated from a Kubernetes [Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) to a [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/). By default Kubernetes will assign a new volume to `indexed-search`, leading to it being unavailable while it reindexes. To avoid that we need to update the [PersistentVolume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/)'s claim to the new indexed-search pod (from `indexed-search` to `data-indexed-search-0`. This can be achieved by running the commands in the script below before upgrading. Please read the script closely to understand what it does before following it. - -``` bash -# Set the reclaim policy to retain so when we delete the volume claim the volume is not deleted. -kubectl patch pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' $(kubectl get pv -o json | jq -r '.items[] | select(.spec.claimRef.name == "indexed-search").metadata.name') - -# Stop indexed search so we can migrate it. This means indexed search will be down! -kubectl scale deploy/indexed-search --replicas=0 - -# Remove the existing claim on the volume -kubectl delete pvc indexed-search - -# Move the claim to data-indexed-search-0, which is the name created by stateful set. -kubectl patch pv -p '{"spec":{"claimRef":{"name":"data-indexed-search-0","uuid":null}}}' $(kubectl get pv -o json | jq -r '.items[] | select(.spec.claimRef.name == "indexed-search").metadata.name') - -# Create the stateful set -kubectl apply -f base/indexed-search/indexed-search.StatefulSet.yaml -``` - -## 3.8 - -If you're deploying Sourcegraph into a non-default namespace, refer to ["Use non-default namespace" in docs/configure.md](configure.md#use-non-default-namespace) for further configuration instructions. - -## 3.7.2 - -Before upgrading or downgrading 3.7, please consult the [v3.7.2 migration guide](https://docs.sourcegraph.com/admin/migration/3_7) to ensure you have enough free disk space. - -## 3.0 - -🚨 If you have not migrated off of helm yet, please refer to [docs/helm.migrate.md](helm.migrate.md) before reading the following notes for migrating to Sourcegraph 3.0. - -🚨 Please upgrade your Sourcegraph instance to 2.13.x before reading the following notes for migrating to Sourcegraph 3.0. - -### Configuration - -In Sourcegraph 3.0 all site configuration has been moved out of the `config-file.ConfigMap.yaml` and into the PostgreSQL database. We have an automatic migration if you use version 3.2 or before. Please do not upgrade directly from 2.x to 3.3 or higher. - -After running 3.0, you should visit the configuration page (`/site-admin/configuration`) and [the management console](https://docs.sourcegraph.com/admin/management_console) and ensure that your configuration is as expected. In some rare cases, automatic migration may not be able to properly carry over some settings and you may need to reconfigure them. - -### `sourcegraph-frontend` service type - -The type of the `sourcegraph-frontend` service ([base/frontend/sourcegraph-frontend.Service.yaml](../base/frontend/sourcegraph-frontend.Service.yaml)) has changed -from `NodePort` to `ClusterIP`. Directly applying this change [will -fail](https://github.com/kubernetes/kubernetes/issues/42282). Instead, you must delete the old -service and then create the new one (this will result in a few seconds of downtime): - -```shell -kubectl delete svc sourcegraph-frontend -kubectl apply -f base/frontend/sourcegraph-frontend.Service.yaml -``` - -### Language server deployment - -Sourcegraph 3.0 removed lsp-proxy and automatic language server deployment in favor of [Sourcegraph extensions](https://docs.sourcegraph.com/extensions). As a consequence, Sourcegraph 3.0 does not automatically run or manage language servers. If you had code intelligence enabled in 2.x, you will need to follow the instructions for each language extension and deploy them individually. Read the [code intelligence documentation](https://docs.sourcegraph.com/user/code_intelligence). - -### HTTPS / TLS - -Sourcegraph 3.0 removed HTTPS / TLS features from Sourcegraph in favor of relying on [Kubernetes Ingress Resources](https://kubernetes.io/docs/concepts/services-networking/ingress/). As a consequence, Sourcegraph 3.0 does not expose TLS as the NodePort 30433. Instead you need to ensure you have setup and configured either an ingress controller (recommended) or an explicit NGINX service. See [ingress controller documentation](configure.md#ingress-controller-recommended), [NGINX service documentation](configure.md#nginx-service), and [configure TLS/SSL documentation](configure.md#configure-tlsssl). - -If you previously configured `TLS_KEY` and `TLS_CERT` environment variables, you can remove them from [base/frontend/sourcegraph-frontend.Deployment.yaml](../base/frontend/sourcegraph-frontend.Deployment.yaml) - -### Postgres 11.1 - -Sourcegraph 3.0 ships with Postgres 11.1. The upgrade procedure is mostly automatic. Please read [this page](https://docs.sourcegraph.com/admin/postgres) for detailed information. - -## 2.12 - -Beginning in version 2.12.0, Sourcegraph's Kubernetes deployment [requires an Enterprise license key](https://about.sourcegraph.com/pricing). Follow the steps in [docs/configure.md](docs/configure.md#add-a-license-key). +Moved to https://docs.sourcegraph.com/admin/updates/kubernetes diff --git a/docs/scale.md b/docs/scale.md index 211632cbb8f3..4bb7f399a3c4 100644 --- a/docs/scale.md +++ b/docs/scale.md @@ -1,138 +1 @@ -# Scaling - -Sourcegraph can be configured to scale to very large codebases and large numbers of -users. If you notice latency for search or code intelligence is higher than desired, changing these -parameters can yield a drastic improvement in performance. - -> For assistance when scaling and tuning Sourcegraph, [contact us](https://about.sourcegraph.com/contact/). We're happy to help! - -## Tuning replica counts for horizontal scalability - -By default, your cluster has a single pod for each of `sourcegraph-frontend`, `searcher`, and `gitserver`. You can -increase the number of replicas of each of these services to handle higher scale. - -We recommend setting the `sourcegraph-frontend`, `searcher`, and `gitserver` replica counts according to the following tables: - -
- -| Users | Number of `sourcegraph-frontend` replicas | -| ---------- | ----------------------------------------- | -| 10-500 | 1 | -| 500-2000 | 2 | -| 2000-4000 | 6 | -| 4000-10000 | 18 | -| 10000+ | 28 | - -_You can change the replica count of `sourcegraph-frontend` by editing [base/frontend/sourcegraph-frontend.Deployment.yaml](../base/frontend/sourcegraph-frontend.Deployment.yaml)._ - -
- -| Repositories | Number of `searcher` replicas | -| ------------ | ------------------------------------------------------------------------------ | -| 1-20 | 1 | -| 20-50 | 2 | -| 50-200 | 3-5 | -| 200-1k | 5-10 | -| 1k-5k | 10-15 | -| 5k-25k | 20-40 | -| 25k+ | 40+ ([contact us](https://about.sourcegraph.com/contact/) for scaling advice) | -| Monorepo | 1-25 ([contact us](https://about.sourcegraph.com/contact/) for scaling advice) | - -_You can change the replica count of `searcher` by editing [base/searcher/searcher.Deployment.yaml](../base/searcher/searcher.Deployment.yaml)._ - -
- -| Repositories | Number of `gitserver` replicas | -| ------------ | ----------------------------------------------------------------------------- | -| 1-200 | 1 | -| 200-500 | 2 | -| 500-1000 | 3 | -| 1k-5k | 4-8 | -| 5k-25k | 8-20 | -| 25k+ | 20+ ([contact us](https://about.sourcegraph.com/contact/) for scaling advice) | -| Monorepo | 1 ([contact us](https://about.sourcegraph.com/contact/) for scaling advice) | - -_Read [docs/configure.md](/docs/configure.md#Configure-gitserver-replica-count) to learn about how to change -the replica count of `gitserver`._ - -
- ---- - -## Improving performance with a large number of repositories - -When you're using Sourcegraph with many repositories (100s-10,000s), the most important parameters to tune are: - -- `sourcegraph-frontend` CPU/memory resource allocations -- `searcher` replica count -- `indexedSearch` CPU/memory resource allocations -- `gitserver` replica count -- `symbols` replica count and CPU/memory resource allocations -- `gitMaxConcurrentClones`, because `git clone` and `git fetch` operations are IO- and CPU-intensive -- `repoListUpdateInterval` (in minutes), because each interval triggers `git fetch` operations for all repositories - -Consult the tables above for the recommended replica counts to use. **Note:** the `gitserver` replica count is specified -differently from the replica counts for other services; read [docs/configure.md](docs/configure.md#Configure-gitserver-replica-count) to learn about how to change -the replica count of `gitserver`. - -Notes: - -- If your change requires `gitserver` pods to be restarted and they are scheduled on another node - when they restart, they may go offline for 60-90 seconds (and temporarily show a `Multi-Attach` - error). This delay is caused by Kubernetes detaching and reattaching the volume. Mitigation - steps depend on your cloud provider; [contact us](https://about.sourcegraph.com/contact/) for - advice. - -- For context on what each service does, see [Sourcegraph Architecture Overview](https://docs.sourcegraph.com/dev/architecture). - ---- - -## Improving performance with large monorepos - -When you're using Sourcegraph with a large monorepo (or several large monorepos), the most important parameters to tune -are: - -- `sourcegraph-frontend` CPU/memory resource allocations -- `searcher` CPU/memory resource allocations (allocate enough memory to hold all non-binary files in your repositories) -- `indexedSearch` CPU/memory resource allocations (for the `zoekt-indexserver` pod, allocate enough memory to hold all non-binary files in your largest repository; for the `zoekt-webserver` pod, allocate enough memory to hold ~2.7x the size of all non-binary files in your repositories) -- `symbols` CPU/memory resource allocations -- `gitserver` CPU/memory resource allocations (allocate enough memory to hold your Git packed bare repositories) - ---- - -## Configuring faster disk I/O for caches - -Many parts of Sourcegraph's infrastructure benefit from using SSDs for caches. This is especially -important for search performance. By default, disk caches will use the -Kubernetes [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) and will be the -same IO speed as the underlying node's disk. Even if the node's default disk is a SSD, however, it -is likely network-mounted rather than local. - -See [configure/ssd/README.md](../configure/ssd/README.md) for instructions about configuring SSDs. - ---- - -## Cluster resource allocation guidelines - -For production environments, we recommend the following resource allocations for the entire -Kubernetes cluster, based on the number of users in your organization: - -
- -| Users | vCPUs | Memory | Attached Storage | Root Storage | -| ------------ | ----- | ------ | ---------------- | ------------ | -| 10-500 | 10 | 24 GB | 500 GB | 50 GB | -| 500-2,000 | 16 | 48 GB | 500 GB | 50 GB | -| 2,000-4,000 | 32 | 72 GB | 900 GB | 50 GB | -| 4,000-10,000 | 48 | 96 GB | 900 GB | 50 GB | -| 10,000+ | 64 | 200 GB | 900 GB | 50 GB | - -
- ---- - - - -## Using heterogeneous node pools with `nodeSelector` - -See ["Assign resource-hungry pods to larger nodes" in docs/configure.md](configure.md#Assign-resource-hungry-pods-to-larger-nodes). +Moved to https://docs.sourcegraph.com/admin/install/kubernetes/scale diff --git a/docs/troubleshoot.md b/docs/troubleshoot.md index 76512a042889..79f88a209986 100644 --- a/docs/troubleshoot.md +++ b/docs/troubleshoot.md @@ -1,50 +1 @@ -# Troubleshooting - -If Sourcegraph does not start up or shows unexpected behavior, there are a variety of ways you can determine the root -cause of the failure. The most useful commands are: - -- `kubectl get pods -o=wide` — lists all pods in your cluster and the corresponding health status of each. -- `kubectl logs -f $POD_NAME` — tails the logs for the specified pod. - -If Sourcegraph is unavailable and the `sourcegraph-frontend-*` pod(s) are not in status `Running`, then view their logs with `kubectl logs -f sourcegraph-frontend-$POD_ID` (filling in `$POD_ID` from the `kubectl get pods` output). Inspect both the log messages printed at startup (at the beginning of the log output) and recent log messages. - -Less frequently used commands: - -- `kubectl describe $POD_NAME` — shows detailed info about the status of a single pod. -- `kubectl get pvc` — lists all Persistent Volume Claims (PVCs) and the status of each. -- `kubectl get pv` — lists all Persistent Volumes (PVs) that have been provisioned. In a healthy cluster, there should - be a one-to-one mapping between PVs and PVCs. -- `kubectl get events` — lists all events in the cluster's history. -- `kubectl delete pod $POD_NAME` — delete a failing pod so it gets recreated, possibly on a different node -- `kubectl drain --force --ignore-daemonsets --delete-local-data $NODE` — remove all pods from a node and mark it as unschedulable to prevent new pods from arriving - -### Common errors - -- `Error from server (Forbidden): error when creating "base/frontend/sourcegraph-frontend.Role.yaml": roles.rbac.authorization.k8s.io "sourcegraph-frontend" is forbidden: attempt to grant extra privileges` - - - The account you are using to apply the Kubernetes configuration doesn't have sufficient permissions to create roles. - - GCP: `kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $YOUR_EMAIL` - -- `kubectl get pv` shows no Persistent Volumes, and/or `kubectl get events` shows a `Failed to provision volume with StorageClass "default"` error. - - Check that a storage class named "default" exists via `kubectl get storageclass`. If one does exist, run `kubectl get storageclass default -o=yaml` and verify that the zone indicated in the output matches the zone of your cluster. - Google Cloud Platform users may need to [request an increase in storage quota](https://cloud.google.com/compute/quotas). - -- Many pods are stuck in Pending status. Use `kubectl cluster-info dump > dump.txt` to obtain a dump of - the logs. One thing to check for is insufficient resources: - - ``` - "Reason": "FailedScheduling", - "Message": "0/3 nodes are available: 1 Insufficient memory, 3 Insufficient cpu.", - ``` - - This means that your cluster is under provisioned (i.e. has too few nodes, or not enough CPU and memory). - If you're using Google Cloud Platform, note that the default node type is `n1-standard-1`, a machine - with only one CPU, and that some components request a 2-CPU node. When creating a cluster, use - `--machine-type=n1-standard-16`. - -- You can't access Sourcegraph. See [Troubleshooting ingress-nginx](https://kubernetes.github.io/ingress-nginx/troubleshooting/). If you followed our instructions the namespace of the ingress-controller is `ingress-nginx`. - -Any other issues? Contact us at [@srcgraph](https://twitter.com/srcgraph) -or , or file issues on -our [public issue tracker](https://github.com/sourcegraph/issues/issues). +Moved to https://docs.sourcegraph.com/admin/install/kubernetes/troubleshoot diff --git a/docs/update.md b/docs/update.md index 8917e6165e08..8eac8964aa04 100644 --- a/docs/update.md +++ b/docs/update.md @@ -1,126 +1 @@ -# Updating Sourcegraph - -> IMPORTANT: Please check [docs/migrate.md](migrate.md) before upgrading to any particular -> version of Sourcegraph to check if any manual migrations are necessary. - -> 🚨 If you are updating from a 2.10.x or previous deployment, follow the migration steps in [docs/helm.migrate.md](helm.migrate.md) before updating. - -A new version of Sourcegraph is released every month (with patch releases in between, released as needed). Check the [Sourcegraph blog](https://about.sourcegraph.com/blog) for release announcements. - -## Steps - -These steps assume that you followed the [forking instructions in docs/configure.md](configure.md#fork-this-repository) - -1. Merge the new version of Sourcegraph into your release branch. - - ```bash - cd $DEPLOY_SOURCEGRAPH_FORK - git fetch - git checkout release - - # Choose which version you want to deploy from https://github.com/sourcegraph/deploy-sourcegraph/releases - git merge $VERSION - ``` - -1. Deploy the updated version of Sourcegraph to your Kubernetes cluster: - - ```bash - ./kubectl-apply-all.sh - ``` - -1. Monitor the status of the deployment. - - ```bash - watch kubectl get pods -o wide - ``` - -## Rollback - -You can rollback by resetting your `release` branch to the old state and proceeding with step 2 above. - -_If an update includes a database migration, rollback will require some manual DB -modifications. We plan to eliminate these in the near future, but for now, -email if you have concerns before updating to a new release._ - -## Improving update reliability and latency with node selectors - -Some of the services that comprise Sourcegraph require more resources than others, especially if the -default CPU or memory allocations have been overridden. During an update when many services restart, -you may observe that the more resource-hungry pods (e.g., `gitserver`, `indexed-search`) fail to -restart, because no single node has enough available CPU or memory to accommodate them. This may be -especially true if the cluster is heterogeneous (i.e., not all nodes have the same amount of -CPU/memory). - -If this happens, do the following: - -- Use `kubectl drain $NODE` to drain a node of existing pods, so it has enough allocation for the larger - service. -- Run `watch kubectl get pods -o wide` and wait until the node has been drained. Run `kubectl get pods` to check that all pods except for the resource-hungry one(s) have been assigned to a node. -- Run `kubectl uncordon $NODE` to enable the larger pod(s) to be scheduled on the drained node. - -Note that the need to run the above steps can be prevented altogether with [node -selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector), which -tell Kubernetes to assign certain pods to specific nodes. See the [docs on enabling node -selectors](scale.md#node-selector) for Sourcegraph on Kubernetes. - -## High-availability updates - -Sourcegraph is designed to be a high-availability (HA) service. Updates require zero downtime and -employ health checks to test the health of newly updated components before switching live traffic -over to them. HA-enabling features include the following: - -- Replication: nearly all of the critical services within Sourcegraph are replicated. If a single instance of a - service fails, that instance is restarted and removed from operation until it comes online again. -- Updates are applied in a rolling fashion to each service such that a subset of instances are updated first while - traffic continues to flow to the old instances. Once the health check determines the set of new instances is - healthy, traffic is directed to the new set and the old set is terminated. -- Each service includes a health check that detects whether the service is in a healthy state. This check is specific to - the service. These are used to check the health of new instances after an update and during regular operation to - determine if an instance goes down. -- Database migrations are handled automatically on update when they are necessary. - -### Updating blue-green deployments - -Some users may wish to opt for running two separate Sourcegraph clusters running in a -[blue-green](https://martinfowler.com/bliki/BlueGreenDeployment.html) deployment. Such a setup makes -the update step more complex, but it can still be done with the `sourcegraph-server-gen snapshot` -command: - -- **Preconditions:** - - Suppose cluster A is currently live, and cluster B is in standby. - - Clusters A and B should be running the same version of Sourcegraph. - - Ensure `sourcegraph-server-gen` is ugpraded to version 3.0.1 (`sourcegraph-server-gen update`) -- **Snapshot of A:** Configure `kubectl` to access cluster A and then run `sourcegraph-server-gen snapshot create`. -- **Restore A's snapshot to B:** - - Configure `kubectl` to access B. - - Spin down `sourcegraph-frontend` replicas to 0. (**Note:** this is very important, because - otherwise `sourcegraph-frontend` may apply changes to the database that corrupt the snapshot - restoration.) - - ``` - kubectl scale --replicas=0 deployment/sourcegraph-frontend - ``` - - - `sourcegraph-server-gen snapshot restore` from the same directory where you ran the snapshot creation earlier. - - Spin up `sourcegraph-frontend` replicas to what it was before: - - ``` - kubectl scale --replicas=$N deployment/sourcegraph-frontend - ``` -- **Upgrade cluster B** to the new Sourcegraph version. Perform some quick checks to verify it is - functioning. -- **Switch traffic over to B.** (B is now live.) -- **Upgrade cluster A** to the new Sourcegraph version. -- **Switch traffic back to A.** (A is now live again.) - -After the update, cluster A will be live, cluster B will be in standby, and both will be running the -same new version of Sourcegraph. You may lose a few minutes of database updates while A is not live, -but that is generally acceptable. - -To keep the database on B current, you may periodically wish to sync A's database over to B -(`sourcegraph-server-gen snapshot create` on A, `sourcegraph-server-gen snapshot restore` on B). It -is important that the versions of A and B are equivalent when this is done. - -### Troubleshooting - -See the [troubleshooting page](troubleshoot.md). +Moved to https://docs.sourcegraph.com/admin/install/kubernetes/update