-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
investigate logging solutions #40
Comments
Following @azend's good advice I abandoned playing with quickwit and played with some options for shipping logs straight off the DO droplets into Cloudwatch. After several fruitless hours I gave up trying to get the cloudwatch agent to actually ship logs. Not sure what was wrong but on prem usage doesn't seem particularly important to the maintainers. eg. If you put the config file (instead of fetching config from parameter store) where they say you should put it the agent deletes it for you 🙃 So I've got a fluentbit config that seems to do the job. Since we don't have automation for the droplets my rough plan is: Deploy fluent bit:
Drop an
And massage it until it looks right for:
|
Oh and PR #80 adds an IAM user and policy for sending to cloudwatch. |
Looks good to me. I'm not surprised there was some gotcha with the Cloudwatch agent. Though deleting its own config wouldn't have been my top choice hah. If the credentials worked to sign into STS then I guess we could have created and used a parameter. But using FluentBit seems like a far more flexible option. I can't remember how many VMs there are in DO but Ian suggested that there were a lot. We could create an Ansible playbook if there's a lot but it probably isn't worth it if there's less than 10. I found a role that pretty much completes exactly what's mentioned above. But I don't know if we want to use it or rob the important parts for our own role. https://github.com/bimdata/ansible_role_fluentbit/blob/main/tasks/main.yml While I'm thinking about it, we should put a log retention period on the output config. One year of retention is enough for most security and compliance standards including SOC2. The exact period doesn't matter but it helps log storage from building up when the project goes idle or we forget to change the setting later. |
Status update: #80 has been applied to prod/stage and merged. Above deploy plan is complete on That said - they're kinda ugly. Particularly the php-fpm logs which contain multi-line stack traces that are not grouped together so they're hard to read. Fluent-bit does have support for multiline grouping but I didn't have much time to play with it so haven't got that bit figured out yet. My regex game is not strong 😿 One other thing I noticed, which is probably nbd but worth mentioning. the I didn't notice any issues with gpo.ca or secure.gpo.ca while I was testing it, but if anyone noticed load spikes it was probably that. I've capture the manual changes in #81 so I can sleep at night. It ain't pretty but it works. |
We need to get logs off prod1 / cache1 / data1 and into something searchable. It would be really nice if we didn't have to run an elasticsearch cluster for this purpose.
Possible options:
Others?
The text was updated successfully, but these errors were encountered: