This repo documents a sample implementation of cloudshell centralized logging with Grafana / Loki / Promtail stack.
Cloudshell outputs JSON logs that can be ingested into centralized logging tools such as Elastic / Loki. Centralized Logging is beneficial for tracking sandbox events that can occur across multiple servers.
Read more on cloudshell centralized logging in help article.
Grafana provides dashboard UI, Loki is the backend logging aggregator, Promtail agent installed on all Quali components. These components are analagous to elastic stack components of Elastic / Kibana / Filebeat.
Standard Loki Architecture:
Cloudshell Integration:
- Configure cloudshell JSON logs by following along with Quali help article
- Set up Grafana Server (see Grafana help article)
- Set up Loki Server. This can be done with local binary install or docker installation
- Install Promtail agent on Quali components by downloading and adding the Promtail binary file to each server. See Grafana help article and get correct binary from grafana github release page
- Configure Loki Server config yaml in same directory as binary.
- See sample Loki config and run Loki exe.
- Run sample bat file to run exe
- See Grafana help for reference.
- Configure Promtail config yaml to scrape the target logs and run agent.
- See sample Promtail config
- Run sample bat file to run service
- Configure Loki as data source in Grafana. See Grafana help
- Customize dashboards
- Add cloudshell Loki service to cloudshell, which gives option to optionally present loki data from sandbox via Loki API calls
Loki / Grafana uses LogQL syntax which can allow to target job labels, or filter by target json field of logs.
sample filter to pull "dispro" logs of a target sandbox ID:
{job="qualiserver"} | json | category="DistributedProvisioning.Execution" |= "b83d649f-319b-4545-bf56-39b65b30668a"
Pull sandbox setup via wildcard filter:
{filename=~".*b83d649f-319b-4545-bf56-39b65b30668a.*Setup.*",job="sandboxlogs"}
Pass in top Level sandbox id variable into dashboard queries:
An alternative workflow to promtail monitoring of log files, is to push data directly into Loki. A good candidate workflow is to pull sandbox activity events and push directly into Loki without need for intermediary files. A custom "Loki Server" shell with these commands is included.
Solution Flow:
- At end of sandbox setup, pull sandbox events with sandbox api call.
- Store latest sandbox event in sandbox data, push setup events data into Loki
- At end of teardown, read last cached event id in sandbox data and make api call to gather remaining events
- Push Post-setup events into Loki
sample flow output:
sandbox events in grafana:
log browser labels:
- Import "Loki Server" Shell and create resource
- Add all relevant attribute data to Loki Server Shell
- Sandbox Api Details (for pulling sandbox events)
- Loki Server Details (for pushing to Loki)
- Loki Server resource does NOT have to be in sandbox, but does have to be in same cloudshell domain
- Customize Setup / Teardown to trigger proper command on shell. See examples in orchestration folder
- Use Try / Finally to handle setup errors and then trigger Loki
- sandbox.suppress_exceptions needs to be set to False so that exceptions bubble up to be handled properly
Dashboards can be explored in Grafana, or optionally Loki shell commands can be triggered directly from sandbox. The use case is to query select logs and print them to sandbox console instead of browsing Grafana UI.
Sample command pulling Dispro Quali Server logs: