Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MM-61113] Allow multiple customizable subnets #833

Merged
merged 21 commits into from
Oct 22, 2024
Merged

Conversation

fmartingr
Copy link
Contributor

@fmartingr fmartingr commented Oct 16, 2024

Summary

  • deployer config changes:
    • ClusterSubnetID -> ClusterSubnetIDs
    • ElasticsearchSettings.VPCID removed in favor of ClusterVpcId
  • All resources are now matched with the specified VPC in the configuration
  • Elasticsearch now has specific configuration for AZ Awareness if more than one subnet is set
  • Fixed a bug where we wouldn't wait for terraform stdout resulting in incomplete output from commands. 61f26c2

Ticket Link

https://mattermost.atlassian.net/browse/MM-61113

@fmartingr fmartingr marked this pull request as ready for review October 17, 2024 14:56
Copy link
Contributor

@streamer45 streamer45 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! A couple of high-level comments.

  • deployer config changes:
    • ClusterVpcId -> ClusterVPCIDs

I don't see this change in here. I guess you meant ClusterSubnetID to ClusterSubnetIDs.

Also, is the implicit indexing driven by certain constraints or just simpler/quicker to implement? I am wondering whether a more structured setting could be easier/digestible (and potentially less error-prone) on the user side.

@fmartingr
Copy link
Contributor Author

Cool! A couple of high-level comments.

  • deployer config changes:

    • ClusterVpcId -> ClusterVPCIDs

I don't see this change in here. I guess you meant ClusterSubnetID to ClusterSubnetIDs.

Yeah, sorry. Edited.

Also, is the implicit indexing driven by certain constraints or just simpler/quicker to implement? I am wondering whether a more structured setting could be easier/digestible (and potentially less error-prone) on the user side.

What do you mean by this?

@streamer45
Copy link
Contributor

Also, is the implicit indexing driven by certain constraints or just simpler/quicker to implement? I am wondering whether a more structured setting could be easier/digestible (and potentially less error-prone) on the user side.

What do you mean by this?

Having the specific index of ClusterSubnetIDs refer to a resource type feels perhaps a bit overengineered. Why not define an object with N named fields (one per resource type)?

@fmartingr
Copy link
Contributor Author

Having the specific index of ClusterSubnetIDs refer to a resource type feels perhaps a bit overengineered. Why not define an object with N named fields (one per resource type)?

I had something like that before (I iterated quite a bit during this week) but if I don't specify the subnet from a data "aws_subnet" block Terraform considers it part of the environment and tries to delete it on the destroy operation.

@fmartingr fmartingr requested review from streamer45, agnivade and agarciamontoro and removed request for agarciamontoro and agnivade October 18, 2024 17:27
@fmartingr
Copy link
Contributor Author

Refactored the code after a talk with @agarciamontoro.

I wasn't able to reproduce the issue I had that the destroy operation was trying to destroy the subnet if directly specified by a subnet_id parameter.

for scanner.Scan() {
mlog.Info(scanner.Text())
}
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of doing this? Scanning stdout is not really blocking anything. And anyways you need to wait until stderr scanning is finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this I was not getting the entire output from terraform, some details about errors where missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's weird. I don't understand how this changes anything though. Probably we can look into this in an isolated scenario. FYI @agarciamontoro @streamer45

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm .. ok I understand the bug. https://pkg.go.dev/os/exec#Cmd.StdoutPipe

Cmd.Wait will close the pipe after seeing the command exit, so most callers need not close the pipe themselves. It is thus incorrect to call Wait before all reads from the pipe have completed.

So using a scanner to read from the pipe is error-prone and probably why it fails a lot of times. The better way is to directly set the StdErr/Stdout fields with a custom io.Writer.

type cmdLogger struct {
}

func (*cmdLogger) Write(in []byte) (int, error) {
	mlog.Info(string(in))
	return len(in), nil
}

and then

cmd.Stdout = &cmdLogger
cmd.Stderr = cmd.Stdout

if err := cmd.Start(); err != nil {
return err
}

if err := cmd.Wait(); err != nil {
return err
}

Having the same writer for stdout and stderr is handled properly in the stdlib:

// If Stdout and Stderr are the same writer, and have a type that can
// be compared with ==, at most one goroutine at a time will call Write.

@fmartingr - can we try with something like this?

"Keycloak": "",
"Metrics": "",
"Proxy": "",
"Redis": []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a list for Redis when it's just a single server? Contrary, why we don't need a list for other resources like app/agent/proxy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use lists and then index by count, but I favoured simplicity in this first PR. if you think that's necessary it's a easy change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went ahead and did the change: 46be0e8

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no no, my point was if Redis is a single server, then we don't need a list there. Unless the list is mandatory somehow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the list change and if we only need one subnet (because there's only one server) we can just set only one subnet in the setting (or leave it empty to use the default from aws_subnet.selected). Feels simpler if every setting works the same way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, conceptually it just felt odd to have an array when it's just one resource. But not a big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe if we use aws_elasticache_replication_group in the future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's all premature optimization :)

deployment/terraform/assets/cluster.tf Outdated Show resolved Hide resolved
deployment/terraform/assets/cluster.tf Outdated Show resolved Hide resolved
deployment/terraform/assets/cluster.tf Outdated Show resolved Hide resolved
deployment/terraform/assets/cluster.tf Outdated Show resolved Hide resolved
deployment/terraform/utils.go Outdated Show resolved Hide resolved
@agnivade
Copy link
Member

@fmartingr - Here: https://github.com/mattermost/mattermost-cloud-monitoring/pull/765/files#diff-af79a285e58611c0f303f2d367ca5d6de36a521a3e8375f3c092f2fda2e0c742R5, I can see aws_elasticache_subnet_group. Is there something different that forces us to use only aws_db_subnet_group?

@fmartingr
Copy link
Contributor Author

fmartingr commented Oct 21, 2024

@fmartingr - Here: https://github.com/mattermost/mattermost-cloud-monitoring/pull/765/files#diff-af79a285e58611c0f303f2d367ca5d6de36a521a3e8375f3c092f2fda2e0c742R5, I can see aws_elasticache_subnet_group. Is there something different that forces us to use only aws_db_subnet_group?

I just followed some documentation. I will change the resource type, but it's really weird that it let me use it.

7199169

Copy link
Member

@agnivade agnivade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

I would like to see #833 (comment) being tracked separately if you don't want to do this in this PR itself.

@fmartingr
Copy link
Contributor Author

Thanks.

I would like to see #833 (comment) being tracked separately if you don't want to do this in this PR itself.

Opened #835 to track. I can use that changes locally myself while I test the RHEL scripts since that will be very prone to errors.

Copy link
Contributor

@streamer45 streamer45 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment on lines 156 to 158
func (c *ClusterSubnetIDs) IsAnySet() bool {
return len(c.App) > 0 || len(c.Job) > 0 || len(c.Proxy) > 0 || len(c.Agent) > 0 || len(c.ElasticSearch) > 0 || len(c.Metrics) > 0 || len(c.Keycloak) > 0 || len(c.Database) > 0 || len(c.Redis) > 0
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that using something like reflect.DeepEqual to avoid every single field and automatically support future ones may be a worthwhile simplification given performance is not really a concern. Not a huge deal though.

Copy link
Member

@agarciamontoro agarciamontoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job, and incredibly thorough! I left a couple of comments below. Thank you so much for all your work here, @fmartingr <3

deployment/config.go Outdated Show resolved Hide resolved
Comment on lines 243 to 252
// convertToTerraformVar converts a list parameter to a json encoded string
func convertToTerraformVar[T any](param T) string {
result, err := json.Marshal(param)
if err != nil {
mlog.Error("failed to convert parameter to terraform var", mlog.Any("param", param), mlog.Err(err))
return ""
}

return string(result)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this as a method of ClusterSubnetIDs, just like IsAnySet? It seems to be specific to that type, although it has a pretty generic name right now (although the comment says it only applies to lists)

Copy link
Contributor Author

@fmartingr fmartingr Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the code, this converts any parameter since it just marshals into json, but happy to make it just a String() function on the type. I could also update the comment 👓

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but happy to make it just a String() function on the type

That's what I did for TerraformMap, the type of CustomTags. That way you can simply pass it to Sprintf. I don't have a super strong opinion on this, but I find it slightly better, since it isolates its function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/config/deployer.md Outdated Show resolved Hide resolved
docs/config/deployer.md Outdated Show resolved Hide resolved
fmartingr and others added 3 commits October 22, 2024 12:14
Co-authored-by: Alejandro García Montoro <[email protected]>
Co-authored-by: Alejandro García Montoro <[email protected]>
Copy link
Member

@agarciamontoro agarciamontoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Again, amazing work :)

@agarciamontoro agarciamontoro added the 4: Reviews Complete All reviewers have approved the pull request label Oct 22, 2024
@fmartingr fmartingr merged commit 29284a6 into master Oct 22, 2024
1 check passed
@fmartingr fmartingr deleted the feat/vpc-subnets branch October 22, 2024 13:46
fmartingr added a commit that referenced this pull request Oct 22, 2024
* multiple subnet support

* wait for terraform stdout

* revert elasticsearch zone awareness

* fixed line removed on merge conflict

* AWSAvailabilityZone compatibility

* support a single subnet in db groups

* refactor to use explicit subnets

* elasticsearch multiple subnet support

* docs

* review comments

* use list of subnets for all resources

* tags for db-subnet-group

* updated sample files

* aws_db_subnet_group -> aws_elasticcache_subnet_group

* use reflect.DeepEqual to simplify code

* Update docs/config/deployer.md

Co-authored-by: Alejandro García Montoro <[email protected]>

* Update deployment/config.go

Co-authored-by: Alejandro García Montoro <[email protected]>

* docs link to aws docs

* function to struct method

---------

Co-authored-by: Alejandro García Montoro <[email protected]>
fmartingr added a commit that referenced this pull request Oct 22, 2024
* multiple subnet support

* wait for terraform stdout

* revert elasticsearch zone awareness

* fixed line removed on merge conflict

* AWSAvailabilityZone compatibility

* support a single subnet in db groups

* refactor to use explicit subnets

* elasticsearch multiple subnet support

* docs

* review comments

* use list of subnets for all resources

* tags for db-subnet-group

* updated sample files

* aws_db_subnet_group -> aws_elasticcache_subnet_group

* use reflect.DeepEqual to simplify code

* Update docs/config/deployer.md

Co-authored-by: Alejandro García Montoro <[email protected]>

* Update deployment/config.go

Co-authored-by: Alejandro García Montoro <[email protected]>

* docs link to aws docs

* function to struct method

---------

Co-authored-by: Alejandro García Montoro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4: Reviews Complete All reviewers have approved the pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants