Skip to content

Commit

Permalink
Add future reservation support
Browse files Browse the repository at this point in the history
  • Loading branch information
abbas1902 committed Dec 7, 2024
1 parent ca73935 commit b18d453
Show file tree
Hide file tree
Showing 10 changed files with 171 additions and 33 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ No modules.
| <a name="input_enable_shielded_vm"></a> [enable\_shielded\_vm](#input\_enable\_shielded\_vm) | Enable the Shielded VM configuration. Note: the instance image must support option. | `bool` | `false` | no |
| <a name="input_enable_smt"></a> [enable\_smt](#input\_enable\_smt) | Enables Simultaneous Multi-Threading (SMT) on instance. | `bool` | `false` | no |
| <a name="input_enable_spot_vm"></a> [enable\_spot\_vm](#input\_enable\_spot\_vm) | Enable the partition to use spot VMs (https://cloud.google.com/spot-vms). | `bool` | `false` | no |
| <a name="input_future_reservation"></a> [future\_reservation](#input\_future\_reservation) | If set, will make use of the future reservation for the nodeset. Input can be either the future reservation name or its selfLink in the format 'projects/PROJECT\_ID/zones/ZONE/futureReservations/FUTURE\_RESERVATION\_NAME'.<br/>See https://cloud.google.com/compute/docs/instances/future-reservations-overview | `string` | `""` | no |
| <a name="input_guest_accelerator"></a> [guest\_accelerator](#input\_guest\_accelerator) | List of the type and count of accelerator cards attached to the instance. | <pre>list(object({<br/> type = string,<br/> count = number<br/> }))</pre> | `[]` | no |
| <a name="input_instance_image"></a> [instance\_image](#input\_instance\_image) | Defines the image that will be used in the Slurm node group VM instances.<br/><br/>Expected Fields:<br/>name: The name of the image. Mutually exclusive with family.<br/>family: The image family to use. Mutually exclusive with name.<br/>project: The project where the image is hosted.<br/><br/>For more information on creating custom images that comply with Slurm on GCP<br/>see the "Slurm on GCP Custom Images" section in docs/vm-images.md. | `map(string)` | <pre>{<br/> "family": "slurm-gcp-6-8-hpc-rocky-linux-8",<br/> "project": "schedmd-slurm-public"<br/>}</pre> | no |
| <a name="input_instance_image_custom"></a> [instance\_image\_custom](#input\_instance\_image\_custom) | A flag that designates that the user is aware that they are requesting<br/>to use a custom and potentially incompatible image for this Slurm on<br/>GCP module.<br/><br/>If the field is set to false, only the compatible families and project<br/>names will be accepted. The deployment will fail with any other image<br/>family or name. If set to true, no checks will be done.<br/><br/>See: https://goo.gle/hpc-slurm-images | `bool` | `false` | no |
Expand Down
12 changes: 12 additions & 0 deletions community/modules/compute/schedmd-slurm-gcp-v6-nodeset/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ locals {
spot = var.enable_spot_vm
termination_action = try(var.spot_instance_config.termination_action, null)
reservation_name = local.reservation_name
future_reservation = local.future_reservation
maintenance_interval = var.maintenance_interval
instance_properties_json = jsonencode(var.instance_properties)

Expand Down Expand Up @@ -141,6 +142,17 @@ locals {
reservation_name = local.res_match.whole == null ? "" : "${local.res_prefix}${local.res_short_name}${local.res_suffix}"
}

locals {
fr_match = regex("^(?P<whole>projects/(?P<project>[a-z0-9-]+)/zones/(?P<zone>[a-z0-9-]+)/futureReservations/)?(?P<name>[a-z0-9-]+)?$", var.future_reservation)

fr_name = local.fr_match.name
fr_project = coalesce(local.fr_match.project, var.project_id)
fr_zone = coalesce(local.fr_match.zone, var.zone)

future_reservation = var.future_reservation == "" ? "" : "projects/${local.fr_project}/zones/${local.fr_zone}/futureReservations/${local.fr_name}"
}


# tflint-ignore: terraform_unused_declarations
data "google_compute_reservation" "reservation" {
count = length(local.reservation_name) > 0 ? 1 : 0
Expand Down
24 changes: 24 additions & 0 deletions community/modules/compute/schedmd-slurm-gcp-v6-nodeset/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -54,4 +54,28 @@ output "nodeset" {
condition = !var.enable_placement || !var.dws_flex.enabled
error_message = "Cannot use DWS Flex with `enable_placement`."
}

precondition {
condition = var.reservation_name == "" || var.future_reservation == ""
error_message = "Cannot use reservations and future reservations in the same nodeset"
}

precondition {
condition = !var.enable_placement || var.future_reservation == ""
error_message = "Cannot use `enable_placement` with future reservations."
}

precondition {
condition = var.future_reservation == "" || length(var.zones) == 0
error_message = <<-EOD
If a future reservation is specified, `var.zones` should be empty.
EOD
}

precondition {
condition = var.future_reservation == "" || local.fr_zone == var.zone
error_message = <<-EOD
The zone of the deployment must match that of the future reservation"
EOD
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,21 @@ variable "reservation_name" {
}
}

variable "future_reservation" {
description = <<-EOD
If set, will make use of the future reservation for the nodeset. Input can be either the future reservation name or its selfLink in the format 'projects/PROJECT_ID/zones/ZONE/futureReservations/FUTURE_RESERVATION_NAME'.
See https://cloud.google.com/compute/docs/instances/future-reservations-overview
EOD
type = string
default = ""
nullable = false

validation {
condition = length(regexall("^(projects/([a-z0-9-]+)/zones/([a-z0-9-]+)/futureReservations/([a-z0-9-]+))?$", var.future_reservation)) > 0 || length(regexall("^([a-z0-9-]+)$", var.future_reservation)) > 0
error_message = "Future reservation must be either the future reservation name or its selfLink in the format 'projects/PROJECT_ID/zone/ZONE/futureReservations/FUTURE_RESERVATION_NAME'."
}
}

variable "maintenance_interval" {
description = <<-EOD
Sets the maintenance interval for instances in this nodeset.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,7 @@ limitations under the License.
| <a name="input_metadata"></a> [metadata](#input\_metadata) | Metadata, provided as a map. | `map(string)` | `{}` | no |
| <a name="input_min_cpu_platform"></a> [min\_cpu\_platform](#input\_min\_cpu\_platform) | Specifies a minimum CPU platform. Applicable values are the friendly names of<br/>CPU platforms, such as Intel Haswell or Intel Skylake. See the complete list:<br/>https://cloud.google.com/compute/docs/instances/specify-min-cpu-platform | `string` | `null` | no |
| <a name="input_network_storage"></a> [network\_storage](#input\_network\_storage) | An array of network attached storage mounts to be configured on all instances. | <pre>list(object({<br/> server_ip = string,<br/> remote_mount = string,<br/> local_mount = string,<br/> fs_type = string,<br/> mount_options = string,<br/> client_install_runner = optional(map(string))<br/> mount_runner = optional(map(string))<br/> }))</pre> | `[]` | no |
| <a name="input_nodeset"></a> [nodeset](#input\_nodeset) | Define nodesets, as a list. | <pre>list(object({<br/> node_count_static = optional(number, 0)<br/> node_count_dynamic_max = optional(number, 1)<br/> node_conf = optional(map(string), {})<br/> nodeset_name = string<br/> additional_disks = optional(list(object({<br/> disk_name = optional(string)<br/> device_name = optional(string)<br/> disk_size_gb = optional(number)<br/> disk_type = optional(string)<br/> disk_labels = optional(map(string), {})<br/> auto_delete = optional(bool, true)<br/> boot = optional(bool, false)<br/> })), [])<br/> bandwidth_tier = optional(string, "platform_default")<br/> can_ip_forward = optional(bool, false)<br/> disable_smt = optional(bool, false)<br/> disk_auto_delete = optional(bool, true)<br/> disk_labels = optional(map(string), {})<br/> disk_size_gb = optional(number)<br/> disk_type = optional(string)<br/> enable_confidential_vm = optional(bool, false)<br/> enable_placement = optional(bool, false)<br/> enable_oslogin = optional(bool, true)<br/> enable_shielded_vm = optional(bool, false)<br/> enable_maintenance_reservation = optional(bool, false)<br/> enable_opportunistic_maintenance = optional(bool, false)<br/> gpu = optional(object({<br/> count = number<br/> type = string<br/> }))<br/> dws_flex = object({<br/> enabled = bool<br/> max_run_duration = number<br/> use_job_duration = bool<br/> })<br/> labels = optional(map(string), {})<br/> machine_type = optional(string)<br/> maintenance_interval = optional(string)<br/> instance_properties_json = string<br/> metadata = optional(map(string), {})<br/> min_cpu_platform = optional(string)<br/> network_tier = optional(string, "STANDARD")<br/> network_storage = optional(list(object({<br/> server_ip = string<br/> remote_mount = string<br/> local_mount = string<br/> fs_type = string<br/> mount_options = string<br/> client_install_runner = optional(map(string))<br/> mount_runner = optional(map(string))<br/> })), [])<br/> on_host_maintenance = optional(string)<br/> preemptible = optional(bool, false)<br/> region = optional(string)<br/> service_account = optional(object({<br/> email = optional(string)<br/> scopes = optional(list(string), ["https://www.googleapis.com/auth/cloud-platform"])<br/> }))<br/> shielded_instance_config = optional(object({<br/> enable_integrity_monitoring = optional(bool, true)<br/> enable_secure_boot = optional(bool, true)<br/> enable_vtpm = optional(bool, true)<br/> }))<br/> source_image_family = optional(string)<br/> source_image_project = optional(string)<br/> source_image = optional(string)<br/> subnetwork_self_link = string<br/> additional_networks = optional(list(object({<br/> network = string<br/> subnetwork = string<br/> subnetwork_project = string<br/> network_ip = string<br/> nic_type = string<br/> stack_type = string<br/> queue_count = number<br/> access_config = list(object({<br/> nat_ip = string<br/> network_tier = string<br/> }))<br/> ipv6_access_config = list(object({<br/> network_tier = string<br/> }))<br/> alias_ip_range = list(object({<br/> ip_cidr_range = string<br/> subnetwork_range_name = string<br/> }))<br/> })))<br/> access_config = optional(list(object({<br/> nat_ip = string<br/> network_tier = string<br/> })))<br/> spot = optional(bool, false)<br/> tags = optional(list(string), [])<br/> termination_action = optional(string)<br/> reservation_name = optional(string)<br/> startup_script = optional(list(object({<br/> filename = string<br/> content = string })), [])<br/><br/> zone_target_shape = string<br/> zone_policy_allow = set(string)<br/> zone_policy_deny = set(string)<br/> }))</pre> | `[]` | no |
| <a name="input_nodeset"></a> [nodeset](#input\_nodeset) | Define nodesets, as a list. | <pre>list(object({<br/> node_count_static = optional(number, 0)<br/> node_count_dynamic_max = optional(number, 1)<br/> node_conf = optional(map(string), {})<br/> nodeset_name = string<br/> additional_disks = optional(list(object({<br/> disk_name = optional(string)<br/> device_name = optional(string)<br/> disk_size_gb = optional(number)<br/> disk_type = optional(string)<br/> disk_labels = optional(map(string), {})<br/> auto_delete = optional(bool, true)<br/> boot = optional(bool, false)<br/> })), [])<br/> bandwidth_tier = optional(string, "platform_default")<br/> can_ip_forward = optional(bool, false)<br/> disable_smt = optional(bool, false)<br/> disk_auto_delete = optional(bool, true)<br/> disk_labels = optional(map(string), {})<br/> disk_size_gb = optional(number)<br/> disk_type = optional(string)<br/> enable_confidential_vm = optional(bool, false)<br/> enable_placement = optional(bool, false)<br/> enable_oslogin = optional(bool, true)<br/> enable_shielded_vm = optional(bool, false)<br/> enable_maintenance_reservation = optional(bool, false)<br/> enable_opportunistic_maintenance = optional(bool, false)<br/> gpu = optional(object({<br/> count = number<br/> type = string<br/> }))<br/> dws_flex = object({<br/> enabled = bool<br/> max_run_duration = number<br/> use_job_duration = bool<br/> })<br/> labels = optional(map(string), {})<br/> machine_type = optional(string)<br/> maintenance_interval = optional(string)<br/> instance_properties_json = string<br/> metadata = optional(map(string), {})<br/> min_cpu_platform = optional(string)<br/> network_tier = optional(string, "STANDARD")<br/> network_storage = optional(list(object({<br/> server_ip = string<br/> remote_mount = string<br/> local_mount = string<br/> fs_type = string<br/> mount_options = string<br/> client_install_runner = optional(map(string))<br/> mount_runner = optional(map(string))<br/> })), [])<br/> on_host_maintenance = optional(string)<br/> preemptible = optional(bool, false)<br/> region = optional(string)<br/> service_account = optional(object({<br/> email = optional(string)<br/> scopes = optional(list(string), ["https://www.googleapis.com/auth/cloud-platform"])<br/> }))<br/> shielded_instance_config = optional(object({<br/> enable_integrity_monitoring = optional(bool, true)<br/> enable_secure_boot = optional(bool, true)<br/> enable_vtpm = optional(bool, true)<br/> }))<br/> source_image_family = optional(string)<br/> source_image_project = optional(string)<br/> source_image = optional(string)<br/> subnetwork_self_link = string<br/> additional_networks = optional(list(object({<br/> network = string<br/> subnetwork = string<br/> subnetwork_project = string<br/> network_ip = string<br/> nic_type = string<br/> stack_type = string<br/> queue_count = number<br/> access_config = list(object({<br/> nat_ip = string<br/> network_tier = string<br/> }))<br/> ipv6_access_config = list(object({<br/> network_tier = string<br/> }))<br/> alias_ip_range = list(object({<br/> ip_cidr_range = string<br/> subnetwork_range_name = string<br/> }))<br/> })))<br/> access_config = optional(list(object({<br/> nat_ip = string<br/> network_tier = string<br/> })))<br/> spot = optional(bool, false)<br/> tags = optional(list(string), [])<br/> termination_action = optional(string)<br/> reservation_name = optional(string)<br/> future_reservation = string<br/> startup_script = optional(list(object({<br/> filename = string<br/> content = string })), [])<br/><br/> zone_target_shape = string<br/> zone_policy_allow = set(string)<br/> zone_policy_deny = set(string)<br/> }))</pre> | `[]` | no |
| <a name="input_nodeset_dyn"></a> [nodeset\_dyn](#input\_nodeset\_dyn) | Defines dynamic nodesets, as a list. | <pre>list(object({<br/> nodeset_name = string<br/> nodeset_feature = string<br/> }))</pre> | `[]` | no |
| <a name="input_nodeset_tpu"></a> [nodeset\_tpu](#input\_nodeset\_tpu) | Define TPU nodesets, as a list. | <pre>list(object({<br/> node_count_static = optional(number, 0)<br/> node_count_dynamic_max = optional(number, 5)<br/> nodeset_name = string<br/> enable_public_ip = optional(bool, false)<br/> node_type = string<br/> accelerator_config = optional(object({<br/> topology = string<br/> version = string<br/> }), {<br/> topology = ""<br/> version = ""<br/> })<br/> tf_version = string<br/> preemptible = optional(bool, false)<br/> preserve_tpu = optional(bool, false)<br/> zone = string<br/> data_disks = optional(list(string), [])<br/> docker_image = optional(string, "")<br/> network_storage = optional(list(object({<br/> server_ip = string<br/> remote_mount = string<br/> local_mount = string<br/> fs_type = string<br/> mount_options = string<br/> client_install_runner = optional(map(string))<br/> mount_runner = optional(map(string))<br/> })), [])<br/> subnetwork = string<br/> service_account = optional(object({<br/> email = optional(string)<br/> scopes = optional(list(string), ["https://www.googleapis.com/auth/cloud-platform"])<br/> }))<br/> project_id = string<br/> reserved = optional(string, false)<br/> }))</pre> | `[]` | no |
| <a name="input_on_host_maintenance"></a> [on\_host\_maintenance](#input\_on\_host\_maintenance) | Instance availability Policy. | `string` | `"MIGRATE"` | no |
Expand Down
Loading

0 comments on commit b18d453

Please sign in to comment.