Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] split up the set_variables.yml monolith #169

Open
lhoss opened this issue Sep 2, 2019 · 1 comment
Open

[Proposal] split up the set_variables.yml monolith #169

lhoss opened this issue Sep 2, 2019 · 1 comment

Comments

@lhoss
Copy link
Contributor

lhoss commented Sep 2, 2019

Motivations

  • Complex: At 1 customer, where other team members had to work with our roles, it was very hard for (less ansible-experienced) people to grasp the inner workings for mainly 2 reasons, 1 of which is related to this file (the other, related, that groups have to be generated from the blueprint config, is harder to simplify, as it is a 'natural' derivation/dependency)
  • Monolithic: All possible facts are set in one file, even though most of the vars are only needed in some parts (roles).
    P.eg: The whole set of *_groups, *_hosts are only used in 2 places:
      1. role blueprint: In the dynamic blueprint template (for good reason)
      1. role database: only to configure DB access rules, which hosts need to access which DB (This one is a nice idea, but IMO overkill, and just allowing any cluster nodes access to any HDP DB is easier, and probably fine for most people)
  • Slow + Verbose: (Especially) during development on these roles, one often wants to quickly test only a subset (p.eg an ambari-server cfg change), but currently there's no way around running always the included
  • Same (set_variables)Tasks are run for each (sub)playbook (of install-ambari), even if the facts have been set already. This just multiplies the output, slows down the playbook (which makes a difference during development).
    • Note: This (sub)issue I already solved (in 1 custom(er) fork) by having 1 large install-ambari playbook (without sub playbooks, but using tags), where set_variables.yml is only run once (but then requires tag: always)

Analysis

Now the set_variables.yml file is already split into 3 host sections, on which it can be naturally split up:

  • P1) lines 2-77
    name: Create the required Ansible groups
    hosts: localhost , connection: local
  • P2) lines 78-353:
    name: Create the Ansible helper variables for the ambari-server
    hosts: ambari-server
    • P2a) lines 78-277: dynamic blueprint vars
    • P2b) lines 278-353: static blueprint vars
  • P3) lines 355-end
    name: Create the Ansible helper variables for all nodes
    hosts: hadoop-cluster

Implementation Ideas

Next I want to go more into detail, on ideas for each part, to improve some of the listed issues above

P1

THIS part needs to be run for all roles (except maybe 'common', because it configures the important (often used) ambari-server group
IDEAs:

  • Blocks 15-49: The ambari-server group creation can also re-use the same ansible_filter than in P2
  • Block 25-49: Move the blueprint_static validation to an earlier task, to keep only ambari-server group creation logic
  • Additional check (I might need): Only run the group creation logic if an ambari-server group does NOT yet exist.

P2

This part contains the most set_fact tasks, and thus simplifying/condensing it could fix most of the mentioned slowness, verbosity and duplication issue of this tasks file.

IDEA: (Instead of copy&pasting the same logic over&over) use some simple custom ansible filters (written in a few lines of python).
We might need upto 2 * 2 filters:

    1. factor 2 because of dynamic/static blueprint logic
    1. factor 2 to handle both var types: *_groups and *_hosts
      Though later split could be avoided by deriving the *_host variables from the _groups vars (again by another filter)
      So we either need 2
      2 or 2+1 filter (if my idea here works out)

To illustrate how the filter API would look like, an example for the zk vars:

    - name: Initialize the control variables
      set_fact:
        zookeeper_groups:  blueprint_dynamic | get_groups([ZOOKEEPER_SERVER] )
        zookeeper_hosts: []
        hdf_hosts: blueprint_dynamic | get_hosts([NIFI_MASTER,STREAMLINE_SERVER,REGISTRY_SERVER] )

I see only 1 special case, with 1 extra condition, that can be handled by sep. task (or maybe better, move the extra checks like 'database==embedded' directly into the blueprint

Advantages:

  • we can do the logic in ONE set_fact task, nice!
  • avoid a lot of duplication etc

P2+P3 vars

Following vars are used in roles: ansible-config,blueprint (and 1 occurrence in common,post-install)
install_hdp
install_hdf
install_hdpsearch
IDEAs (WIP):

  • These could be replaced by group_vars (generated still from the blueprint).
  • At minimum it should be possible to set them for all nodes from the start (and not first in section P2, and later P3)
  • IMO this vars would ideally be (undefined) role vars, configurable by the user, and only 'auto' set if not pre-configured.

P3 ansible_python_interpreter

Question: Though I agree it's nice to prefer python3 (if found on the server) to the older v2, but isn't it better to have this part configured through ansible.cfg, and let the user control it, and this way also allow to use the right ansible feature for this: https://docs.ansible.com/ansible/latest/reference_appendices/interpreter_discovery.html

Next Steps

Unless I learn about some unforeseen blockers to above ideas, I plan to try out refactorings in following order:

  • P2 trying out ansible filters
  • splitting the 1 large file into 3 files (per host section) and include only where needed
  • P1 or P3
@lhoss
Copy link
Contributor Author

lhoss commented Sep 13, 2019

POC for the most promising part (P2), can reviewed here: scigility#2
ps: The change is already successfully used at a customer (where I could remove also P2/b, as only the 'blueprint_dynamic' method is required)

This is one of the best uses of (custom) 'Ansible Filters' I've done myself 👍
Wdyt @alexandruanghel , @agriffaut , @zer0glitch ?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant