[Proposal] split up the `set_variables.yml` monolith #169

lhoss · 2019-09-02T16:29:57Z

Motivations

Complex: At 1 customer, where other team members had to work with our roles, it was very hard for (less ansible-experienced) people to grasp the inner workings for mainly 2 reasons, 1 of which is related to this file (the other, related, that groups have to be generated from the blueprint config, is harder to simplify, as it is a 'natural' derivation/dependency)
Monolithic: All possible facts are set in one file, even though most of the vars are only needed in some parts (roles).
P.eg: The whole set of *_groups, *_hosts are only used in 2 places:
- 1. role blueprint: In the dynamic blueprint template (for good reason)
- 1. role database: only to configure DB access rules, which hosts need to access which DB (This one is a nice idea, but IMO overkill, and just allowing any cluster nodes access to any HDP DB is easier, and probably fine for most people)
Slow + Verbose: (Especially) during development on these roles, one often wants to quickly test only a subset (p.eg an ambari-server cfg change), but currently there's no way around running always the included
Same (set_variables)Tasks are run for each (sub)playbook (of install-ambari), even if the facts have been set already. This just multiplies the output, slows down the playbook (which makes a difference during development).
- Note: This (sub)issue I already solved (in 1 custom(er) fork) by having 1 large install-ambari playbook (without sub playbooks, but using tags), where set_variables.yml is only run once (but then requires tag: always)

Analysis

Now the set_variables.yml file is already split into 3 host sections, on which it can be naturally split up:

P1) lines 2-77
name: Create the required Ansible groups
hosts: localhost , connection: local
P2) lines 78-353:
name: Create the Ansible helper variables for the ambari-server
hosts: ambari-server
- P2a) lines 78-277: dynamic blueprint vars
- P2b) lines 278-353: static blueprint vars
P3) lines 355-end
name: Create the Ansible helper variables for all nodes
hosts: hadoop-cluster

Implementation Ideas

Next I want to go more into detail, on ideas for each part, to improve some of the listed issues above

P1

THIS part needs to be run for all roles (except maybe 'common', because it configures the important (often used) ambari-server group
IDEAs:

Blocks 15-49: The ambari-server group creation can also re-use the same ansible_filter than in P2
Block 25-49: Move the blueprint_static validation to an earlier task, to keep only ambari-server group creation logic
Additional check (I might need): Only run the group creation logic if an ambari-server group does NOT yet exist.

P2

This part contains the most set_fact tasks, and thus simplifying/condensing it could fix most of the mentioned slowness, verbosity and duplication issue of this tasks file.

IDEA: (Instead of copy&pasting the same logic over&over) use some simple custom ansible filters (written in a few lines of python).
We might need upto 2 * 2 filters:

1. factor 2 because of dynamic/static blueprint logic
1. factor 2 to handle both var types: *_groups and *_hosts
  Though later split could be avoided by deriving the *_host variables from the _groups vars (again by another filter)
  So we either need 22 or 2+1 filter (if my idea here works out)

To illustrate how the filter API would look like, an example for the zk vars:

    - name: Initialize the control variables
      set_fact:
        zookeeper_groups:  blueprint_dynamic | get_groups([ZOOKEEPER_SERVER] )
        zookeeper_hosts: []
        hdf_hosts: blueprint_dynamic | get_hosts([NIFI_MASTER,STREAMLINE_SERVER,REGISTRY_SERVER] )

I see only 1 special case, with 1 extra condition, that can be handled by sep. task (or maybe better, move the extra checks like 'database==embedded' directly into the blueprint

Advantages:

we can do the logic in ONE set_fact task, nice!
avoid a lot of duplication etc

P2+P3 vars

Following vars are used in roles: ansible-config,blueprint (and 1 occurrence in common,post-install)
install_hdp
install_hdf
install_hdpsearch
IDEAs (WIP):

These could be replaced by group_vars (generated still from the blueprint).
At minimum it should be possible to set them for all nodes from the start (and not first in section P2, and later P3)
IMO this vars would ideally be (undefined) role vars, configurable by the user, and only 'auto' set if not pre-configured.

P3 `ansible_python_interpreter`

Question: Though I agree it's nice to prefer python3 (if found on the server) to the older v2, but isn't it better to have this part configured through ansible.cfg, and let the user control it, and this way also allow to use the right ansible feature for this: https://docs.ansible.com/ansible/latest/reference_appendices/interpreter_discovery.html

Other Benefit (if we can get rid of this part), "check mode" does not fail here already
ps: Need to consider the related discussion from older (closed) PR: Replace a bash command with "stat" module to check if python exists #90

Next Steps

Unless I learn about some unforeseen blockers to above ideas, I plan to try out refactorings in following order:

P2 trying out ansible filters
splitting the 1 large file into 3 files (per host section) and include only where needed
P1 or P3

The text was updated successfully, but these errors were encountered:

lhoss · 2019-09-13T08:23:00Z

POC for the most promising part (P2), can reviewed here: scigility#2
ps: The change is already successfully used at a customer (where I could remove also P2/b, as only the 'blueprint_dynamic' method is required)

This is one of the best uses of (custom) 'Ansible Filters' I've done myself 👍
Wdyt @alexandruanghel , @agriffaut , @zer0glitch ?!

lhoss mentioned this issue Sep 13, 2019

[WIP] set_variables monolith' refactor: P2 POC (dynamic blueprint vars) scigility/ansible-hortonworks#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] split up the `set_variables.yml` monolith #169

[Proposal] split up the `set_variables.yml` monolith #169

lhoss commented Sep 2, 2019

lhoss commented Sep 13, 2019 •

edited

Loading

[Proposal] split up the set_variables.yml monolith #169

[Proposal] split up the set_variables.yml monolith #169

Comments

lhoss commented Sep 2, 2019

Motivations

Analysis

Implementation Ideas

P1

P2

P2+P3 vars

P3 ansible_python_interpreter

Next Steps

lhoss commented Sep 13, 2019 • edited Loading

[Proposal] split up the `set_variables.yml` monolith #169

[Proposal] split up the `set_variables.yml` monolith #169

P3 `ansible_python_interpreter`

lhoss commented Sep 13, 2019 •

edited

Loading