Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize /etc/hosts and hostnames on all our cloud machines #131

Closed
smlambert opened this issue Jan 15, 2018 · 23 comments
Closed

Sanitize /etc/hosts and hostnames on all our cloud machines #131

smlambert opened this issue Jan 15, 2018 · 23 comments
Assignees
Labels

Comments

@smlambert
Copy link
Contributor

Some jdk_jdi tests are failing due to a machine config issue, and therefore excluded.

Grabbing comment from:
adoptium/aqa-tests#132

It's a machine configuration issue.
Assign a host name to the machine by following below steps:
Choose Apple menu -> System Preferences, then click Sharing).
Click Edit, then enter a local hostname.
Add this machine name entry in /etc/hosts file with machine ip address.
Eg: 127.0.0.1 mymachine
Reboot the machine to reflect the changes.

@karianna karianna added the bug label Jan 15, 2018
@smlambert
Copy link
Contributor Author

FYI @gdams
I don't think I can make this change, as I won't have privileges to reboot the machine.

@bblondin
Copy link
Contributor

This test example provided speaks about MAC OS. Is this issue limited to MAC OS? or does simular tests that run on other linux OS' also rely on the /etc/hosts file to retrieve the hostname?

If we have testing the require the system's hostname to be listed in the /etc/hosts file (as opposed to calling the 'hostname' command) then we will also need to ensure this change is made to all active systems and that the needed changes are applied to the playbooks.

@akolarkunnu
Copy link

It is not limited to MAC OS, in all OS it rely on configuration in the /etc/hosts file. Right now our Linux test machines are configured properly, so this issue (adoptium/aqa-tests#132) is applicable/reproducible only in MAC OS.

@sxa
Copy link
Member

sxa commented Jan 17, 2018

FYI I've had comparable issues with the /etc/hosts for the JCK as well. I haven't yet adjusted those playbooks to do the right thing (Some Ubuntus are comint to us out of the box with the system's hostname against 127.0.1.1 instead of 127.0.0.1 which causes problems)

@gdams
Copy link
Member

gdams commented Jan 17, 2018

perhaps we should remove /etc/hosts from each machine and then use ansible to template it so that they are the same across all of our providers ?

@sxa
Copy link
Member

sxa commented Jan 17, 2018

Maybe, although it could be different depending on whether IPv6 and the like have been configured so that might not be ideal. And there's probably some clever reason why 127.0.1.1 was in there as well as 127.0.0.1

@sxa
Copy link
Member

sxa commented Jan 17, 2018

We definitely need some sort of consistent strategy going forward. The test machines that were causing adoptium/aqa-systemtest#66 didn't have any entries for the system's hostname in /etc/hosts.

Most Ubuntu's seem to set the systems hostname against the loopback IP 127.0.0.1 but would it make more sense to have an entry with the real IP in there against the hostname? Maybe ...

Either way we could do with some consistency. Would anyone object going forward to having a strategy of making sure the hostname on the machine is a bit more consistent with what's in jenkins? For that mauve test failure the machine calls itself test-ubuntu-16-04-1 but in jenkins it is test-osuosl-ppc64le-ubuntu-16.04-2 - it's going to make debugging a lot easier if we have them consistent when going through log files.

@smlambert With the way this is going perhaps we should change the title of this issue to "Sanitize /etc/hosts and hostnames on all our cloud machines?" although I appreciate that you possibly need a tactical short-term fix until we've thrashed it out.

@smlambert smlambert changed the title An update to /etc/hosts required for jdk_jdi tests Sanitize /etc/hosts and hostnames on all our cloud machines Jan 17, 2018
@bblondin
Copy link
Contributor

Maybe something like this? thought?

  - name: Update /etc/hosts file - IP FQDN hostname
    lineinfile:
      dest: /etc/hosts
      regexp: "^(.*){{ ansible_hostname }}(.*)$"
      line: "{{ ansible_default_ipv4.address }} {{ ansible_fqdn }} {{ ansible_hostname }}"
      state: present
    tags: hosts_file

  - name: Update /etc/hosts file - 127.0.0.1
    lineinfile:
      dest: /etc/hosts
      regexp: "^(.*)127.0.0.1(.*)$"
      line: "127.0.0.1   localhost"
      state: present
    tags: hosts_file

@bblondin
Copy link
Contributor

I've updated the /etc/hosts file on both of our macs.
@smlambert please test and let us know if this fixes adoptium/aqa-tests#132

@sxa
Copy link
Member

sxa commented Jan 18, 2018

@bblondin Wouldn't that wipe an entry such as the following (which I think we get by default on some installs):

127.0.0.1 localhost myhostname

and not replace it with the second section because the 127.0.0.1 entry would be removed entirely in the first section? This sort of thing has made me paranoid about doing it automatically - but I do think it's worth thrashing out ;-)

(Edit: Assuming state:present will add it if the regexp doesn't match then it's probably all good!)

@sxa
Copy link
Member

sxa commented Jan 18, 2018

We will hit an issue where we have disconnects between FQDN and the hostname on the machine - we've had to replace . characters in the FQDN with other characters on the machines used for the JCK for example (Can't find the relevant issue just now - will amend sometime later)

@bblondin
Copy link
Contributor

@sxa555 yes it would 'replace' those entires.
but I don't think 127.0.0.1 localhost *myhostname* is a default
I think its simply 127.0.0.1 localhost or 127.0.0.1 localhost localhost

Yes: state:present will add it if the regexp doesn't match

I'd like to know more about this replacing of the peroid in the FQDN

In some case (virtual machines) there may not be a FQDN however ansible {{ ansible_fqdn }} will return just the hostname in those cases give us MyIPAddress MyHostname MyHostname

@sxa
Copy link
Member

sxa commented Jan 19, 2018

Regarding the . issue - we have some tests that cannot run properly if the machine's hostname has those characters - for those machines we've replaced the . with a - on the machine, but the names as stored in jenkins etc. are left as-is with the .. I think as well as having a discussion on the jenkins tags for machines #93 we should consider standardising and documenting what the hostnames should be too going forward.

Ref the default entries, here's one of the more unusual examples from build-scaleway-x64-ubuntu-16-04-2:

[sxa@sxa ~]$ ssh [email protected] cat /etc/hosts
127.0.1.1       build-scaleway-x64-ubuntu-16-04-2 build-scaleway-x64-ubuntu-16-04-2
127.0.0.1       localhost
::1             localhost ip6-localhost ip6-loopback
ff02::1         ip6-allnodes
ff02::2         ip6-allrouters

Our joyent ubuntu machine build-joyent-x64-ubuntu-16.04-2 has this (the "random" hex string doesn't match hostname FWIW

[sxa@sxa ~]$ ssh [email protected] cat /etc/hosts
127.0.0.1	localhost 378108a1-c01c-e82d-9b57-d80d22317d7e
::1		localhost ip6-localhost ip6-loopback
ff02::1		ip6-allnodes
ff02::2		ip6-allrouters

I think your proposed rules would santize them all quite well though if that's the way we want to go (I'm always a touch nervious about having non-default configs for OSs in case we mask errors a customer may see on their systems, but from our perspective it would likely make things work more consistently

@bblondin
Copy link
Contributor

I think the following would be the best of both worlds:

Added backup: yes, this will backup file with a timestamp. This way if there is an issue the administrator can easily recover the original file.

  - name: Update /etc/hosts file - IP FQDN hostname
    lineinfile:
      dest: /etc/hosts
      regexp: "^(.*){{ ansible_hostname }}(.*)$"
      line: "{{ ansible_default_ipv4.address }} {{ ansible_fqdn }} {{ ansible_hostname }}"
      state: present
      backup: yes
    tags: hosts_file

  - name: Update /etc/hosts file - 127.0.0.1
    lineinfile:
      dest: /etc/hosts
      regexp: "^(.*)127.0.0.1(.*)$"
      line: "127.0.0.1   localhost"
      state: present
      backup: yes
    tags: hosts_file

@sxa
Copy link
Member

sxa commented Jan 19, 2018

I'd be tempted to add localhost.localdomain to the 127.0.0.1 since that seems quite common too

@bblondin
Copy link
Contributor

Pull request #136
(includes localhost.localdomain)

bblondin added a commit that referenced this issue Jan 19, 2018
* Update /etc/hosts

full details in: #131 (comment)

* localhost.localdomain added as per SXA
@smlambert
Copy link
Contributor Author

I reran the tests referenced in openjdk-tests issue 132 just now (on test-macincloud-macos1010-1), but they still fail:

ERROR: transport error 202: gethostbyname: unknown host
ERROR: JDWP Transport dt_socket failed to initialize, TRANSPORT_INIT(510)
JDWP exit error AGENT_ERROR_TRANSPORT_INIT(197): No transports initialized [debugInit.c:750]

You can find the entire set of test results here: https://ci.adoptopenjdk.net/view/work%20in%20progress/job/test_personal/129/testReport/

@smlambert
Copy link
Contributor Author

Looking at the /etc/hosts file on test-macincloud-macos1010-1, it does not appear to have changed from the before the issue was reported.

@bblondin - I believe you updated then 2 build machines, the 2 test macs (test-macincloud-macos1010-1 and test-macincloud-macos1010-2) do not appear to be updated.

@bblondin
Copy link
Contributor

@smlambert I updated the wrong macs... (build-macstadium-macos1010-1 and 2)

Updated test-macincloud-macos1010-1 and test-macincloud-macos1010-2

sh-3.2# ping test-macincloud-macos1010-2
ping: cannot resolve test-macincloud-macos1010-2: Unknown host
sh-3.2# vi /etc/hosts
sh-3.2# ping test-macincloud-macos1010-2
PING dxu773 (74.80.250.173): 56 data bytes
64 bytes from 74.80.250.173: icmp_seq=0 ttl=64 time=0.046 ms
64 bytes from 74.80.250.173: icmp_seq=1 ttl=64 time=0.061 ms
64 bytes from 74.80.250.173: icmp_seq=2 ttl=64 time=0.060 ms

@bblondin
Copy link
Contributor

@smlambert Have you had a chance to rerun the test?

@smlambert
Copy link
Contributor Author

Yes, and now 89/90 tests that used to fail are passing, thanks.

@smlambert
Copy link
Contributor Author

Apologies, I closed this issue because the test problem was addressed, but remember that this issue was broadened to address all machines so will reopen.

@smlambert smlambert reopened this Jan 29, 2018
@bblondin
Copy link
Contributor

bblondin commented Jan 29, 2018

Pull request #136 addresses this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants