Skip to content

VRO On‐Call Overview

Gabriel Zurita edited this page Oct 31, 2024 · 9 revisions

Being On-Call

Welcome to your on-call shift! This guide provides an overview and links to essential resources to help you efficiently manage your responsibilities.

On-Call Engineer Core Purpose

  • Proactive Monitoring: Serves as a backstop to automated monitoring, proactively addressing anomalies to enhance service quality.
  • Team Shield: Protect the development team from disruptions caused by unplanned work, allowing them to maintain focus and productivity.
  • Rapid Response: Respond immediately to incidents, manage deployments, and communicate effectively with stakeholders.

On-Call Roles and Responsibilities

For more details, refer to the On-Call Responsibilities.

Primary On-Call Engineer Duties

The on-call engineer's duties are outlined in priority order, particularly within the context of Incident Management:

  1. Production Issues:
  2. Blockers:
    • Address any issues that may block team productivity, such as problems with QA environments, CI infrastructure, test failures, or deployment failures.
  3. Unplanned Work:
    • Track requests from communication channels like Slack and other relevant team channels for additional support needs.
  4. Planned Work:
    • Handle routine production tasks during business hours, including non-urgent alerts and software release approvals.
    • Prioritize immediate response to critical incidents over less time-sensitive tasks.

Secondary On-Call Engineer

  • Support Role:
    • Assist the primary engineer and take over if they're unavailable.
    • May handle non-urgent tasks and routine production duties, allowing the primary engineer to focus on critical incidents.

Shift Schedule and Handover Procedures

  • Availability: On-call engineers should be available during working hours (9 AM—5 PM ET) and ensure prompt responses to pages according to criticality.
  • Timing: The on-call rotation aligns with the sprint schedule and covers each sprint's start to end.
  • Handover: Document ongoing issues, communicate important updates, and ensure a smooth transition to the next engineer.

Quick Reference

Key Contacts and Escalation Paths

Essential On-Call Tools and Resources

Note: some of the below could be further consolidated into single documents, and simplified to have less content.

Clone this wiki locally