Skip to content

Latest commit

 

History

History
 
 

bosh-stop-start

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
# bosh-stop-start

What?

bosh-stop-start.sh is a bash script for gracefully stopping or starting all the VMs in PCF installations.

Why?

Idle VMs are expensive to keep running in public cloud environments.

Attempts to stop VMs using IaaS primitives (e.g. gcloud compute instances stop vm-01234567-89ab-cdef-0123-456789abcdef) usually end badly. This is because the BOSH director VM always wants to maintain the lifecycle of its deployments (i.e. tiles). Not surprising when you consider that lifecycle management is a core responsibility of BOSH. So it's advisable to politely ask BOSH when you'd like to instruct an installation to go to sleep or wake up.

How?

We recommend you SSH to the Ops Manager, git clone this repo and run the bosh-stop-start.sh script from there.

There are two operational modes, stop and start, which do much as you would expect.

You should expect these operations to each take ~90 mins to complete.

Installing the script

SSH into the Ops Manager VM as the ubuntu user and clone this repo:

cd ${HOME}
git clone https://github.com/pivotal-education/public-tools.git

Stopping an running installation

OPSMAN_URL=https://pcf.myinstance.mydomain.com \
OPSMAN_USER=usually_admin \
OPSMAN_PASSWD=some_long_complex_admin_password \
  ${HOME}/public-tools/bosh-stop-start/bosh-stop-start.sh stop

Starting an stopped installation

OPSMAN_URL=https://pcf.myinstance.mydomain.com \
OPSMAN_USER=usually_admin \
OPSMAN_PASSWD=some_long_complex_admin_password \
  ${HOME}/public-tools/bosh-stop-start/bosh-stop-start.sh start

Technical notes

  • The script uses the OM tool and the JQ parser.

  • Use the Ops Manager as a host for this script. If the script fails to connect to the BOSH director private IP address then the operation will be aborted. Assigning the BOSH director a public IP address is never recommended.

  • An added complication arises in automating these tasks because in order to stop or start the entire installation the script must target each deployment in turn. Individual deployment names are randomised between each installation so we can't reliably predict them. To stop or start an installation we must first ask the system to list all its deployments so that we can successfully target each one.

  • We intentionally use bosh stop --hard to send an installation to sleep. The basic stop command simply tells the BOSH director to stop all deployed jobs (i.e. processes) but, rather oddly, leaves all the VMs running which continue to incur costs. At the IaaS level, the --hard option translates as a request to stop the jobs and delete the VM. Admittedly this seems a little heavy-handed as we really just wanted the VMs stopped, but it's the only option currently available to us.

  • The BOSH director and Ops Manager VMs will remain in a running state after a stop operation is completed. This is intentional. If necessary these VMs can be manually stopped at the IaaS level but, clearly, they must both be running once again before issuing a start command.

  • It has been suggested that you should use the Resource Config panel of each tile to turn the instance count down to (at most) one instance per instance group. If you encounter issues with the stop operation, like VMs failing to be deleted, this suggestion may be worth considering. When doing this, be sure to switch all post-deploy errands OFF to keep the wait times as low as possible.