WIP: Proof of concept using Loki to store and retrieve logs #1540

hydrogen18 · 2022-03-14T16:31:04Z

This is my proof of concept for using Loki to store & retrieve logs. This is based off my work I did on the metal-lb branch, cleaning up some of the way we interact with operators

Changes

Add loki to the development environment, running it within kind
Add code to query loki's HTTP interface
Rework the lease logs command to query loki

Each namespace in Kubernetes becomes an "organization" in Loki. So logs are automatically kept isolated between deployments. Promtail picks up the logs automatically and forwards them to Loki. Accessing this is pretty simple. I wrote a client to make the HTTP request to Loki. Trying to import their client is not very practical and making the actual requests is not complex. I copied some code from Loki, which uses the same license as our project anyways.

Selecting loki and supporting it as our only solution makes sense for now. It is an open source project under active development designed to work with kubernetes. It would be nice to support other log retention solutions but I don't think that is practical for now.

One neat feature I've added is the ability to identify and limit the logs to a specific run of a container. This happens if the container fails & restarts. I think this is helpful because in the case of a container that deploys but fails quickly an end user can request all the logs from a specific run to try and figure out why it is failing. It's still on an end user to deploy a container that logs helpful information, but this should makes things easier.

One thing I haven't tried to account for is running Loki should be optional. If a provider wants to host lots of computational workloads (like miners, or whatever) they shouldn't be required to run Loki. So we need to figure out what to do in that case. Should we just fall back to querying the kube logs?

Incomplete / Unresolved

Providers need to be configured to limit retention of logs to whatever they have storage for
Providers need to configure Loki to rate limit logs to something reasonable. This is an ongoing effort in Loki, hopefully we can pick this up once the changes are merged into a new release.
When a lease is closed, we need to signal Loki to clear out all the logs it has retained for that lease. Otherwise Loki could run out of space retaining logs for closed leases.

github-actions · 2022-03-25T00:03:27Z

Marked as stale; will be closed in five days.

Cut bait or go fishing!

arijitAD and others added 13 commits February 16, 2022 09:54

feat: Add resource server to serve loki logs

415a53d

add hostname operator config

1aab09b

wip

92f6fc8

wip, query loki for stuff

29c899e

wip, query loki for logs

20d8069

add TODOs

a57b7e1

work on mroe features

f9fb9ad

wip on streaming logs back

4658e61

actually works for streaming logs now

c53147c

proof of concept work

42c59ac

actually retrieve logs via CLI

c9c745d

wip

9d8f234

poc for loki streaming of logs

c21e52a

hydrogen18 requested review from boz and troian as code owners March 14, 2022 16:31

github-actions bot added the stale label Mar 25, 2022

boz added keepalive Exempt these when managing stale PRs and removed stale labels Mar 25, 2022

akash-mm assigned hydrogen18 Jun 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Proof of concept using Loki to store and retrieve logs #1540

WIP: Proof of concept using Loki to store and retrieve logs #1540

hydrogen18 commented Mar 14, 2022

github-actions bot commented Mar 25, 2022

WIP: Proof of concept using Loki to store and retrieve logs #1540

Are you sure you want to change the base?

WIP: Proof of concept using Loki to store and retrieve logs #1540

Conversation

hydrogen18 commented Mar 14, 2022

github-actions bot commented Mar 25, 2022