You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Complex: At 1 customer, where other team members had to work with our roles, it was very hard for (less ansible-experienced) people to grasp the inner workings for mainly 2 reasons, 1 of which is related to this file (the other, related, that groups have to be generated from the blueprint config, is harder to simplify, as it is a 'natural' derivation/dependency)
Monolithic: All possible facts are set in one file, even though most of the vars are only needed in some parts (roles).
P.eg: The whole set of *_groups, *_hosts are only used in 2 places:
role blueprint: In the dynamic blueprint template (for good reason)
role database: only to configure DB access rules, which hosts need to access which DB (This one is a nice idea, but IMO overkill, and just allowing any cluster nodes access to any HDP DB is easier, and probably fine for most people)
Slow + Verbose: (Especially) during development on these roles, one often wants to quickly test only a subset (p.eg an ambari-server cfg change), but currently there's no way around running always the included
Same (set_variables)Tasks are run for each (sub)playbook (of install-ambari), even if the facts have been set already. This just multiplies the output, slows down the playbook (which makes a difference during development).
Note: This (sub)issue I already solved (in 1 custom(er) fork) by having 1 large install-ambari playbook (without sub playbooks, but using tags), where set_variables.yml is only run once (but then requires tag: always)
Analysis
Now the set_variables.yml file is already split into 3 host sections, on which it can be naturally split up:
P1) lines 2-77
name: Create the required Ansible groups
hosts: localhost , connection: local
P2) lines 78-353:
name: Create the Ansible helper variables for the ambari-server
hosts: ambari-server
P2a) lines 78-277: dynamic blueprint vars
P2b) lines 278-353: static blueprint vars
P3) lines 355-end
name: Create the Ansible helper variables for all nodes
hosts: hadoop-cluster
Implementation Ideas
Next I want to go more into detail, on ideas for each part, to improve some of the listed issues above
P1
THIS part needs to be run for all roles (except maybe 'common', because it configures the important (often used) ambari-server group
IDEAs:
Blocks 15-49: The ambari-server group creation can also re-use the same ansible_filter than in P2
Block 25-49: Move the blueprint_static validation to an earlier task, to keep only ambari-server group creation logic
Additional check (I might need): Only run the group creation logic if an ambari-server group does NOT yet exist.
P2
This part contains the most set_fact tasks, and thus simplifying/condensing it could fix most of the mentioned slowness, verbosity and duplication issue of this tasks file.
IDEA: (Instead of copy&pasting the same logic over&over) use some simple custom ansible filters (written in a few lines of python).
We might need upto 2 * 2 filters:
factor 2 because of dynamic/static blueprint logic
factor 2 to handle both var types: *_groups and *_hosts
Though later split could be avoided by deriving the *_host variables from the _groups vars (again by another filter)
So we either need 22 or 2+1 filter (if my idea here works out)
To illustrate how the filter API would look like, an example for the zk vars:
- name: Initialize the control variables
set_fact:
zookeeper_groups: blueprint_dynamic | get_groups([ZOOKEEPER_SERVER] )
zookeeper_hosts: []
hdf_hosts: blueprint_dynamic | get_hosts([NIFI_MASTER,STREAMLINE_SERVER,REGISTRY_SERVER] )
I see only 1 special case, with 1 extra condition, that can be handled by sep. task (or maybe better, move the extra checks like 'database==embedded' directly into the blueprint
Advantages:
we can do the logic in ONE set_fact task, nice!
avoid a lot of duplication etc
P2+P3 vars
Following vars are used in roles: ansible-config,blueprint (and 1 occurrence in common,post-install)
install_hdp
install_hdf
install_hdpsearch
IDEAs (WIP):
These could be replaced by group_vars (generated still from the blueprint).
At minimum it should be possible to set them for all nodes from the start (and not first in section P2, and later P3)
IMO this vars would ideally be (undefined) role vars, configurable by the user, and only 'auto' set if not pre-configured.
P3 ansible_python_interpreter
Question: Though I agree it's nice to prefer python3 (if found on the server) to the older v2, but isn't it better to have this part configured through ansible.cfg, and let the user control it, and this way also allow to use the right ansible feature for this: https://docs.ansible.com/ansible/latest/reference_appendices/interpreter_discovery.html
Other Benefit (if we can get rid of this part), "check mode" does not fail here already
POC for the most promising part (P2), can reviewed here: scigility#2
ps: The change is already successfully used at a customer (where I could remove also P2/b, as only the 'blueprint_dynamic' method is required)
Motivations
P.eg: The whole set of *_groups, *_hosts are only used in 2 places:
set_variables.yml
is only run once (but then requirestag: always
)Analysis
Now the
set_variables.yml
file is already split into 3host
sections, on which it can be naturally split up:name: Create the required Ansible groups
hosts: localhost , connection: local
name: Create the Ansible helper variables for the ambari-server
hosts: ambari-server
name: Create the Ansible helper variables for all nodes
hosts: hadoop-cluster
Implementation Ideas
Next I want to go more into detail, on ideas for each part, to improve some of the listed issues above
P1
THIS part needs to be run for all roles (except maybe 'common', because it configures the important (often used)
ambari-server
groupIDEAs:
ambari-server
group creation can also re-use the same ansible_filter than in P2ambari-server
group creation logicambari-server
group does NOT yet exist.P2
This part contains the most set_fact tasks, and thus simplifying/condensing it could fix most of the mentioned slowness, verbosity and duplication issue of this tasks file.
IDEA: (Instead of copy&pasting the same logic over&over) use some simple custom ansible filters (written in a few lines of python).
We might need upto 2 * 2 filters:
Though later split could be avoided by deriving the *_host variables from the _groups vars (again by another filter)
So we either need 22 or 2+1 filter (if my idea here works out)
To illustrate how the filter API would look like, an example for the zk vars:
I see only 1 special case, with 1 extra condition, that can be handled by sep. task (or maybe better, move the extra checks like 'database==embedded' directly into the blueprint
Advantages:
P2+P3 vars
Following vars are used in roles: ansible-config,blueprint (and 1 occurrence in common,post-install)
install_hdp
install_hdf
install_hdpsearch
IDEAs (WIP):
P3
ansible_python_interpreter
Question: Though I agree it's nice to prefer python3 (if found on the server) to the older v2, but isn't it better to have this part configured through ansible.cfg, and let the user control it, and this way also allow to use the right ansible feature for this: https://docs.ansible.com/ansible/latest/reference_appendices/interpreter_discovery.html
Next Steps
Unless I learn about some unforeseen blockers to above ideas, I plan to try out refactorings in following order:
The text was updated successfully, but these errors were encountered: