Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when testing with grafana-agent multi-subordinate functionality. #43

Closed
dashmage opened this issue Jul 26, 2023 · 1 comment · Fixed by #44
Closed

Error when testing with grafana-agent multi-subordinate functionality. #43

dashmage opened this issue Jul 26, 2023 · 1 comment · Fixed by #44
Assignees

Comments

@dashmage
Copy link
Contributor

dashmage commented Jul 26, 2023

Issue

Error with hardware-observer charm when trying to test the multi-subordinate functionality of grafana-agent charm from this branch. The hardware-observer charm status is blocked with the message "Exporter is unhealthy".

Setup

Relations are added for the grafana-agent charm with both zookeeper and hardware-observer over the cos-agent interface.

juju deploy zookeeper
juju deploy hardware-observer --channel edge
# grafana-agent charm built from the "multi-sub" branch
juju deploy ./grafana-agent.charm

juju relate grafana-agent zookeeper
juju relate hardware-observer zookeeper
juju relate hardware-observer grafana-agent

# COS is setup on another k8s cloud
juju relate grafana-agent ashley/cos.prometheus

COS Setup
The microk8s charm is added in the same model and configured with COS. The k8s cloud is added to the same controller using the juju add-k8s command.

juju deploy microk8s
juju config microk8s addons="dns ingress hostpath-storage metallb:<public-ip-of-machine>-<public-ip-of-machine>"

cat containerd_env
# ---
ulimit -n 65536 || true
ulimit -l 16384 || true

HTTP_PROXY=http://squid.internal:3128
HTTPS_PROXY=http://squid.internal:3128
NO_PROXY=127.0.0.1,localhost,::1,10.130.11.0/24,10.130.12.0/24,10.130.13.0/24,10.152.183.0/24,api.jujucharms.com,api.charmhub.io
https_proxy=http://squid.internal:3128
http_proxy=http://squid.internal:3128
no_proxy=127.0.0.1,localhost,::1,10.130.11.0/24,10.130.12.0/24,10.130.13.0/24,10.152.183.0/24,api.jujucharms.com,api.charmhub.io
# ---

juju config microk8s containerd_env=@containerd_env

juju ssh microk8s/leader -- microk8s config > ~/.kube/config

# add microk8s cloud
juju add-k8s micro -c ct-maas-ctrl

# Add new model for cos in the newly setup cloud
juju add-model cos micro

juju deploy cos-lite --channel edge --trust
juju offer prometheus:receive-remote-write
juju status --relations
(...)
zookeeper/3*             active    idle   4        10.1.11.46
  grafana-agent/14*      active    idle            10.1.11.46                                grafana-cloud-config: off, logging-consumer: off, grafana-dashboards-provider: off
  hardware-observer/24*  blocked   idle            10.1.11.46                                Exporter is unhealthy

Relation provider                Requirer                         Interface                Type         Message
grafana-agent:peers              grafana-agent:peers              grafana_agent_replica    peer         
hardware-observer:cos-agent      grafana-agent:cos-agent          cos_agent                subordinate  
microk8s:cluster                 microk8s:cluster                 microk8s-cluster         peer         
prometheus:receive-remote-write  grafana-agent:send-remote-write  prometheus_remote_write  regular      
zookeeper:cluster                zookeeper:cluster                cluster                  peer         
zookeeper:cos-agent              grafana-agent:cos-agent          cos_agent                subordinate  
zookeeper:juju-info              hardware-observer:general-info   juju-info                subordinate  
zookeeper:restart                zookeeper:restart                rolling_op               peer         

Error

unit-hardware-observer-24: 18:28:09 ERROR unit.hardware-observer/24.juju-log cos-agent:41: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/./src/charm.py", line 173, in <module>
    ops.main(HardwareObserverCharm)  # type: ignore
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/main.py", line 454, in __call__
    return main(charm_class, use_juju_for_storage=use_juju_for_storage)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/framework.py", line 833, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/venv/ops/framework.py", line 922, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/./src/charm.py", line 148, in _on_cos_agent_relation_joined
    self.exporter.start()
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/src/service.py", line 36, in wrapper
    return_value = func(self, *args, **kwargs)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/src/service.py", line 154, in start
    systemd.service_start(EXPORTER_NAME)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/lib/charms/operator_libs_linux/v1/systemd.py", line 156, in service_start
    return _systemctl("start", service_name)
  File "/var/lib/juju/agents/unit-hardware-observer-24/charm/lib/charms/operator_libs_linux/v1/systemd.py", line 125, in _systemctl
    raise SystemdError(
charms.operator_libs_linux.v1.systemd.SystemdError: Could not start hardware-exporter: systemd output: See "systemctl status hardware-exporter.service" and "journalctl -xeu hardware-exporter.service" for details.

unit-hardware-observer-24: 18:28:10 ERROR juju.worker.uniter.operation hook "cos-agent-relation-joined" (via hook dispatching script: dispatch) failed: exit status 1

Other Notes

  • This error doesn't immediately pop up. After adding the cos-agent relation between grafana-agent and hardware-observer, this error occurs after a few minutes where the status is initially active.
  • The zookeeper metrics are visible on COS Prometheus but the hardware-observer alerts/metrics are not present.
@dashmage
Copy link
Contributor Author

dashmage commented Jul 27, 2023

Here is the output from systemctl status and journalctl for hardware-exporter.service running on the machine.

I could find some pydantic errors in the journalctl output.

ubuntu@rozary:~$ journalctl -xeu hardware-exporter.service
Jul 27 10:50:13 rozary python3[2053576]: pydantic.error_wrappers.ValidationError: 2 validation errors for Config
Jul 27 10:50:13 rozary python3[2053576]: redfish_username
Jul 27 10:50:13 rozary python3[2053576]:   none is not an allowed value (type=type_error.none.not_allowed)
Jul 27 10:50:13 rozary python3[2053576]: redfish_password
Jul 27 10:50:13 rozary python3[2053576]:   none is not an allowed value (type=type_error.none.not_allowed)

(...)

dashmage added a commit to dashmage/hardware-observer-operator that referenced this issue Jul 27, 2023
These quotes are added to fix the pydantic error caused while reading
the yaml file. If the quotes are absent for the redfish keys, the
default value of empty string is registered as a None type while using
safe_load and reading the yaml file in prometheus-hardware-exporter.

Fixes canonical#43
dashmage added a commit to dashmage/hardware-observer-operator that referenced this issue Jul 27, 2023
These quotes are added to fix the pydantic error caused while reading
the yaml file. If the quotes are absent for the redfish keys, the
default value of empty string is registered as a None type while using
safe_load and reading the yaml file in prometheus-hardware-exporter.

Fixes canonical#43
@jneo8 jneo8 closed this as completed in #44 Jul 28, 2023
jneo8 pushed a commit that referenced this issue Jul 28, 2023
These quotes are added to fix the pydantic error caused while reading
the yaml file. If the quotes are absent for the redfish keys, the
default value of empty string is registered as a None type while using
safe_load and reading the yaml file in prometheus-hardware-exporter.

Fixes #43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant