If Sourcegraph does not start up or shows unexpected behavior, there are a variety of ways you can determine the root cause of the failure. The most useful commands are:
kubectl get pods -o=wide
— lists all pods in your cluster and the corresponding health status of each.kubectl logs -f $POD_NAME
— tails the logs for the specified pod.
If Sourcegraph is unavailable and the sourcegraph-frontend-*
pod(s) are not in status Running
, then view their logs with kubectl logs -f sourcegraph-frontend-$POD_ID
(filling in $POD_ID
from the kubectl get pods
output). Inspect both the log messages printed at startup (at the beginning of the log output) and recent log messages.
Less frequently used commands:
kubectl describe $POD_NAME
— shows detailed info about the status of a single pod.kubectl get pvc
— lists all Persistent Volume Claims (PVCs) and the status of each.kubectl get pv
— lists all Persistent Volumes (PVs) that have been provisioned. In a healthy cluster, there should be a one-to-one mapping between PVs and PVCs.kubectl get events
— lists all events in the cluster's history.kubectl delete pod $POD_NAME
— delete a failing pod so it gets recreated, possibly on a different nodekubectl drain --force --ignore-daemonsets --delete-local-data $NODE
— remove all pods from a node and mark it as unschedulable to prevent new pods from arriving
-
Error from server (Forbidden): error when creating "base/frontend/sourcegraph-frontend.Role.yaml": roles.rbac.authorization.k8s.io "sourcegraph-frontend" is forbidden: attempt to grant extra privileges
- The account you are using to apply the Kubernetes configuration doesn't have sufficient permissions to create roles.
- GCP:
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $YOUR_EMAIL
-
kubectl get pv
shows no Persistent Volumes, and/orkubectl get events
shows aFailed to provision volume with StorageClass "default"
error.Check that a storage class named "default" exists via
kubectl get storageclass
. If one does exist, runkubectl get storageclass default -o=yaml
and verify that the zone indicated in the output matches the zone of your cluster. Google Cloud Platform users may need to request an increase in storage quota. -
Many pods are stuck in Pending status. Use
kubectl cluster-info dump > dump.txt
to obtain a dump of the logs. One thing to check for is insufficient resources:"Reason": "FailedScheduling", "Message": "0/3 nodes are available: 1 Insufficient memory, 3 Insufficient cpu.",
This indicates that no available nodes meet the requirements specified for the corresponding nodes. If you're using Google Cloud Platform, note that the default node type is
n1-standard-1
, a machine with only one CPU, and that some components request a 2-CPU node. When creating a cluster, use--machine-type=n1-standard-16
.
Any other issues? Contact us at @srcgraph or mailto:[email protected], or file issues on our public issue tracker.