Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiFi Country Code confusion during IIAB installation [IIAB should check for mismatched kernel e.g. if apt updates were applied w/o reboot] #2975

Closed
holta opened this issue Sep 1, 2021 · 28 comments
Labels
Milestone

Comments

@holta
Copy link
Member

holta commented Sep 1, 2021

Some IIAB installs fail, and people are left very confused as to what is happening and why. This seems to involve missing and/or contradictory WiFi Country Codes.

TK Kang and @darkenvy have 2-3 examples here:

Essentially this a UX bug, and a serious one, as it can leave people extremely frustrated, not understanding what to do next to recover their IIAB install process.

After we begin to understand better (and make precise) which WiFi Country Code scenarios cause this, we can then instrument a cleaner install process that does not leave people hanging, confused and frustrated.

Related:

@holta holta added the bug label Sep 1, 2021
@holta holta added this to the 7.2 milestone Sep 1, 2021
@holta holta pinned this issue Sep 1, 2021
@holta
Copy link
Member Author

holta commented Sep 1, 2021

Not Everyone is running Raspberry Pi OS, but as of today WiFi Country Code is commonly specified in these 4 ways:

It appears we need to sort out how these 4 techniques depend on each other.

@jvonau
Copy link
Contributor

jvonau commented Sep 1, 2021

One could very easily test the theory of not using raspi-config contributes to the above mentioned failure, just write the image to the sdcard and proceed directly to the curl stage of the install if you really want to get to the bottom of this issue. I've sorted out the interrelationships, you need to teach yourself what goes where, all the information has been recorded in the various PRs/issues over time. Now you could explain to me what your understanding is and I'll try to clear the fog for you.

@holta
Copy link
Member Author

holta commented Sep 1, 2021

Yes I install OS images to microSD card and then immediately use IIAB's 1-line installer at download.iiab.io very frequently, without problems.

Which is why I'd like to start by hearing out others' failures first and foremost.

Understanding precise scenarios that are failing. Understanding precisely what steps are being taken. Steps that are (apparently) a bit outside the mainstream — but yet are repeatedly causing quite severe frustration — even among US installations, where host_country_code: US is coincidentally already set in /etc/iiab/local_vars.yml

Hopefully @darkenvy has time to explain before weekend, or soon!

@holta
Copy link
Member Author

holta commented Sep 1, 2021

Small points:

  1. If it takes multiple weeks (or even months) to understand this and solve it properly, rather than just a few days, so be it.
  2. Either way, it's important that we do not leave well-intentioned and sincere people confused and upset, as they try to install IIAB.
  3. Certainly I personally do not know whether this will take a few days, or a few months!
  4. So we'll take the take the time to solve it properly, once the most common underlying scenario(s) are better understood.
  5. If it takes a long time to solve conscientiously, we'll add a brief warning/explanation in the Known Issues section of IIAB 7.2 here: https://github.com/iiab/iiab/wiki/IIAB-7.2-Release-Notes#known-issues

@jvonau
Copy link
Contributor

jvonau commented Sep 1, 2021

I'm just suggesting trying to duplicate part of darkenvy's environment, from what I see as being something different from your usual test routine. All I picked up on was the stanza that noted the absence of country_code from wpa_supplicant.conf from http://sprunge.us/BMaNQl?en with the other half of the problem environment being the use of a serial console.

1698 2021-08-31 21:32:55,870 p=7078 u=root n=ansible | TASK [network : New Raspbian requires country code -- check for it] ************
1699 2021-08-31 21:32:56,448 p=7078 u=root n=ansible | fatal: [127.0.0.1]: FAILED! => {"changed": true, "cmd": "grep country /etc/wpa_supplicant/wpa_supplicant.conf", "delta": "0:00:00.013302", "end": "2021-08-31 21:32:56.383521", "msg": "non-zero return code", "rc": 1, "start": "2021-08-31 21:32:56.370219", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
1700 2021-08-31 21:32:56,449 p=7078 u=root n=ansible | ...ignoring

Next are the actions to undo the 'out of the box disablement', note 'changed'

1701 2021-08-31 21:32:56,491 p=7078 u=root n=ansible | TASK [network : Put country code (US) in /etc/wpa_supplicant/wpa_supplicant.conf if nec] ***
1702 2021-08-31 21:32:57,098 p=7078 u=root n=ansible | changed: [127.0.0.1]
1703 2021-08-31 21:32:57,137 p=7078 u=root n=ansible | TASK [network : Enable the WiFi with rfkill] ***********************************
1704 2021-08-31 21:32:57,733 p=7078 u=root n=ansible | changed: [127.0.0.1]

Normally these entries are not viewable in a successful install within iiab-diagnostics as the iiab-install.log is only shown with the last 100 lines and might not be shown. Might want to bump that while this is being investigated. The test subject that was used for the go/no_go fate of 2379 never really reported any useful information on the state of the above stanza, just conjecture without being able to review. A pastebin link to the iiab-install.log would be helpful here.

@psiie
Copy link

psiie commented Sep 1, 2021

I've installed IIAB many times in the past without setting a country code as well. I omit the step specifically because it is the default state of raspbian/raspberryos and we should replicate the steps of the uninformed.

Anyways here are various install tests and the results. All of these tests are after the 2973 merge. I was able to install IIAB BEFORE the merge successfully.

RaspberryOS + user: pi + Serial + Unset Lang Locale + Unset WiFi Locale + Size 0 = Failed on "Restart hostapd"
RaspberryOS + user: pi + Serial + Unset Lang Locale + Unset WiFi Locale+ Size 1 = Failed on "Restart hostapd"

RaspberryOS + user: pi + Ethernet SSH + Unset Lang Locale + Unset WiFi Locale +Size 0 = Failed on "Restart hostapd"
RaspberryOS + user: pi + Ethernet SSH + Set Lang Locale + Set WiFi Locale + Size 0 = Success!

RaspberryOS + user: pi + Serial + Unset Lang Locale + Set WiFi Locale + Size 0 = Success!
RaspberryOS + user: pi + Serial + Set Lang Locale + unset WiFi Locale + Size 0 = Failed on "Restart hostapd"

PS: Im setting Lang locale and WiFi locale from sudo raspi-config's "Locale" and "WLAN Country" options respectfully. (options 5.1 and 5.4)

Its strange because there is specifically a task called "Put country code (US) in /etc/wpa_supplicant/wpa_supplicant.conf if nec" and it is successful. Does this task do the same as "WLAN Country" setting in sudo raspi-config? I dont think it does.

@holta
Copy link
Member Author

holta commented Sep 1, 2021

@darkenvy I just ran curl d.iiab.io/fast.txt | sudo bash (over Ethernet, as usual) on Raspberry Pi OS Lite here:

CLARIF: I did NOT set a country code (using raspi-config, or in any other way). I simply installed IIAB onto the fresh/clean OS.

What else should I try?

Feel free to log in to my Raspberry Pi 4 (10.8.0.58) in case mine might somehow be different from yours!?

@jvonau
Copy link
Contributor

jvonau commented Sep 1, 2021

Raspberry Pi OS Lite as above written, apt updated, rebooted:
pi@raspberrypi:~ $ sudo cat /etc/wpa_supplicant/wpa_supplicant.conf

ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1

pi@raspberrypi:~ $ iw reg get

global
country 00: DFS-UNSET
(2402 - 2472 @ 40), (N/A, 20), (N/A)
(2457 - 2482 @ 20), (N/A, 20), (N/A), AUTO-BW, PASSIVE-SCAN
(2474 - 2494 @ 20), (N/A, 20), (N/A), NO-OFDM, PASSIVE-SCAN
(5170 - 5250 @ 80), (N/A, 20), (N/A), AUTO-BW, PASSIVE-SCAN
(5250 - 5330 @ 80), (N/A, 20), (0 ms), DFS, AUTO-BW, PASSIVE-SCAN
(5490 - 5730 @ 160), (N/A, 20), (0 ms), DFS, PASSIVE-SCAN
(5735 - 5835 @ 80), (N/A, 20), (N/A), PASSIVE-SCAN
(57240 - 63720 @ 2160), (N/A, 0), (N/A)

pi@raspberrypi:~ $ ps -AH

snip
1 ? 00:00:04 systemd
119 ? 00:00:00 systemd-journal
155 ? 00:00:00 systemd-udevd
320 ? 00:00:00 systemd-timesyn
361 ? 00:00:00 alsactl
369 ? 00:00:00 rsyslogd
377 ? 00:00:00 thd
381 ? 00:00:00 rngd
387 ? 00:00:00 cron
397 ? 00:00:00 dbus-daemon
417 ? 00:00:00 wpa_supplicant
418 ? 00:00:00 avahi-daemon
437 ? 00:00:00 avahi-daemon
438 ? 00:00:00 systemd-logind
489 ? 00:00:00 wpa_supplicant
504 ? 00:00:00 hciattach
518 ? 00:00:00 bluetoothd
568 ? 00:00:00 dhcpcd
3743 ? 00:00:00 dhcpcd-run-hook
573 tty1 00:00:00 login
760 tty1 00:00:00 bash
574 ? 00:00:00 sshd
872 ? 00:00:00 sshd
1070 ? 00:00:00 sshd
1071 pts/0 00:00:00 bash
3742 pts/0 00:00:00 ps
750 ? 00:00:00 systemd
751 ? 00:00:00 (sd-pam)

pi@raspberrypi:~ $ systemctl status wpa_supplicant

● wpa_supplicant.service - WPA supplicant
Loaded: loaded (/lib/systemd/system/wpa_supplicant.service; enabled; vendor p
Active: active (running) since Wed 2021-09-01 23:53:28 BST; 35min ago
Main PID: 417 (wpa_supplicant)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/wpa_supplicant.service
└─417 /sbin/wpa_supplicant -u -s -O /run/wpa_supplicant
Sep 01 23:53:27 raspberrypi systemd[1]: Starting WPA supplicant...
Sep 01 23:53:28 raspberrypi wpa_supplicant[417]: Successfully initialized wpa_su
Sep 01 23:53:28 raspberrypi systemd[1]: Started WPA supplicant.

pi@raspberrypi:~ $ systemctl status dhcpcd --no-pager -l

● dhcpcd.service - dhcpcd on all interfaces
Loaded: loaded (/lib/systemd/system/dhcpcd.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/dhcpcd.service.d
└─wait.conf
Active: active (running) since Wed 2021-09-01 23:53:37 BST; 28min ago
Process: 422 ExecStart=/usr/lib/dhcpcd5/dhcpcd -q -w (code=exited, status=0/SUCCESS)
Main PID: 568 (dhcpcd)
Tasks: 2 (limit: 4915)
CGroup: /system.slice/dhcpcd.service
├─489 wpa_supplicant -B -c/etc/wpa_supplicant/wpa_supplicant.conf -iwlan0 -Dnl80211,wext
└─568 /sbin/dhcpcd -q -w
Sep 01 23:53:33 raspberrypi dhcpcd[422]: eth0: Router Advertisement from fe80::2fc:8dff:fedb:6692
Sep 01 23:53:33 raspberrypi dhcpcd[422]: eth0: adding address fd00:fc:8ddb:6692:e819:1d1a:156c:ad46/64
Sep 01 23:53:33 raspberrypi dhcpcd[422]: eth0: adding route to fd00:fc:8ddb:6692::/64
Sep 01 23:53:33 raspberrypi dhcpcd[422]: eth0: soliciting a DHCPv6 lease
Sep 01 23:53:33 raspberrypi dhcpcd[422]: eth0: fe80::2fc:8dff:fedb:6692 is reachable again
Sep 01 23:53:36 raspberrypi dhcpcd[422]: eth0: leased 192.168.0.12 for 604800 seconds
Sep 01 23:53:36 raspberrypi dhcpcd[422]: eth0: adding route to 192.168.0.0/24
Sep 01 23:53:36 raspberrypi dhcpcd[422]: eth0: adding default route via 192.168.0.1
Sep 01 23:53:37 raspberrypi dhcpcd[422]: forked to background, child pid 568
Sep 01 23:53:37 raspberrypi systemd[1]: Started dhcpcd on all interfaces.
Warning: The unit file, source configuration file or drop-ins of dhcpcd.service changed on disk. Run 'systemctl daemon-reload' to reload units.

pi@raspberrypi:~ $ sudo journalctl -u dhcpcd --no-pager -l

-- Logs begin at Wed 2021-09-01 23:53:23 BST, end at Thu 2021-09-02 00:26:00 BST. --
Sep 01 23:53:27 raspberrypi systemd[1]: Starting dhcpcd on all interfaces...
Sep 01 23:53:27 raspberrypi dhcpcd[422]: dev: loaded udev
Sep 01 23:53:28 raspberrypi dhcpcd[422]: wlan0: starting wpa_supplicant
Sep 01 23:53:28 raspberrypi dhcpcd-run-hooks[467]: wlan0: starting wpa_supplicant
Sep 01 23:53:28 raspberrypi dhcpcd[422]: wlan0: connected to Access Point `'
Sep 01 23:53:28 raspberrypi dhcpcd[422]: dhcpcd_prestartinterface: wlan0: Operation not possible due to RF-kill
Sep 01 23:53:28 raspberrypi dhcpcd[422]: eth0: waiting for carrier
Sep 01 23:53:28 raspberrypi dhcpcd[422]: dhcpcd_prestartinterface: wlan0: Operation not possible due to RF-kill
Sep 01 23:53:28 raspberrypi dhcpcd[422]: wlan0: waiting for carrier
Sep 01 23:53:31 raspberrypi dhcpcd[422]: eth0: carrier acquired
Sep 01 23:53:31 raspberrypi dhcpcd[422]: DUID 00:01:00:01:28:28:12:98:dc:a6:32:20:b8:77
Sep 01 23:53:31 raspberrypi dhcpcd[422]: eth0: IAID 32:20:b8:77
Sep 01 23:53:31 raspberrypi dhcpcd[422]: eth0: adding address fe80::f793:2ce4:5364:9ee6
Sep 01 23:53:31 raspberrypi dhcpcd[422]: eth0: rebinding lease of 192.168.0.12
Sep 01 23:53:31 raspberrypi dhcpcd[422]: eth0: probing address 192.168.0.12/24
Sep 01 23:53:32 raspberrypi dhcpcd[422]: eth0: soliciting an IPv6 router
Sep 01 23:53:33 raspberrypi dhcpcd[422]: eth0: Router Advertisement from fe80::2fc:8dff:fedb:6692
Sep 01 23:53:33 raspberrypi dhcpcd[422]: eth0: adding address fd00:fc:8ddb:6692:e819:1d1a:156c:ad46/64
Sep 01 23:53:33 raspberrypi dhcpcd[422]: eth0: adding route to fd00:fc:8ddb:6692::/64
Sep 01 23:53:33 raspberrypi dhcpcd[422]: eth0: soliciting a DHCPv6 lease
Sep 01 23:53:33 raspberrypi dhcpcd[422]: eth0: fe80::2fc:8dff:fedb:6692 is reachable again
Sep 01 23:53:36 raspberrypi dhcpcd[422]: eth0: leased 192.168.0.12 for 604800 seconds
Sep 01 23:53:36 raspberrypi dhcpcd[422]: eth0: adding route to 192.168.0.0/24
Sep 01 23:53:36 raspberrypi dhcpcd[422]: eth0: adding default route via 192.168.0.1
Sep 01 23:53:37 raspberrypi dhcpcd[422]: forked to background, child pid 568
Sep 01 23:53:37 raspberrypi systemd[1]: Started dhcpcd on all interfaces.

I'll continue post install.

@holta
Copy link
Member Author

holta commented Sep 2, 2021

@darkenvy call me crazy, but can I ask you to install IIAB using a different ISP — e.g. your cellphone's hotspot — to eliminate any Country Code monkey-business possibly caused by your CenturyLink router?

@jvonau
Copy link
Contributor

jvonau commented Sep 2, 2021

From: 21Mxm5?en above, noted the trigger for missing country code did fire with hostapd not failing to start as shown below:

1200 =IIAB==========================================================================
1201 COMMAND: /usr/bin/systemctl status hostapd # Downstream Wi-Fi: Is hostapd running?
1202
1203 ● hostapd.service - Hostapd IEEE 802.11 AP, IEEE 802.1X/WPA/WPA2/EAP/RADIUS Authenticator
1204 Loaded: loaded (/etc/systemd/system/hostapd.service; enabled; vendor preset: enabled)
1205 Active: active (running) since Wed 2021-09-01 23:26:27 BST; 1min 0s ago
1206 Process: 27181 ExecStart=/usr/sbin/hostapd -B -P /run/hostapd.pid /etc/hostapd/hostapd.conf (code=exited, status=0/SUCCESS)
1207 Process: 27192 ExecStartPost=/sbin/ip link set ap0 up (code=exited, status=0/SUCCESS)
1208 Main PID: 27189 (hostapd)
1209 Tasks: 1 (limit: 3720)
1210 CGroup: /system.slice/hostapd.service
1211 └─27189 /usr/sbin/hostapd -B -P /run/hostapd.pid /etc/hostapd/hostapd.conf
1212
1213 Sep 01 23:26:27 box systemd[1]: Starting Hostapd IEEE 802.11 AP, IEEE 802.1X/WPA/WPA2/EAP/RADIUS Authenticator...
1214 Sep 01 23:26:27 box hostapd[27181]: Configuration file: /etc/hostapd/hostapd.conf
1215 Sep 01 23:26:27 box hostapd[27181]: nl80211: kernel reports: Match already configured
1216 Sep 01 23:26:27 box hostapd[27181]: nl80211: kernel reports: Match already configured
1217 Sep 01 23:26:27 box hostapd[27181]: ap0: interface state UNINITIALIZED->COUNTRY_UPDATE
1218 Sep 01 23:26:27 box hostapd[27181]: ap0: Could not connect to kernel driver
1219 Sep 01 23:26:27 box hostapd[27181]: Using interface ap0 with hwaddr 02:da:4c:80:16:44 and ssid "unittest"
1220 Sep 01 23:26:27 box hostapd[27181]: ap0: interface state COUNTRY_UPDATE->ENABLED
1221 Sep 01 23:26:27 box hostapd[27181]: ap0: AP-ENABLED
1222 Sep 01 23:26:27 box systemd[1]: Started Hostapd IEEE 802.11 AP, IEEE 802.1X/WPA/WPA2/EAP/RADIUS Authenticator.

Magic question is why are there intermittent failures with what appears to be the same routine. I hope to know more in a bit when my test install finishes. For @darkenvy, when hostapd does fail to start, before rebooting can you post the output of systemctl status hostapd please.

@jvonau
Copy link
Contributor

jvonau commented Sep 2, 2021

Well that was a bust, I can't reproduce a failure at restarting hostapd http://sprunge.us/YP0cBD

pi@raspberrypi:~ $ systemctl status hostapd --no-pager

● hostapd.service - Hostapd IEEE 802.11 AP, IEEE 802.1X/WPA/WPA2/EAP/RADIUS Authenticator
Loaded: loaded (/etc/systemd/system/hostapd.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2021-09-02 00:58:46 BST; 9min ago
Process: 1765 ExecStart=/usr/sbin/hostapd -B -P /run/hostapd.pid /etc/hostapd/hostapd.conf (code=exited, status=0/SUCCESS)
Process: 1775 ExecStartPost=/sbin/ip link set ap0 up (code=exited, status=0/SUCCESS)
Main PID: 1772 (hostapd)
Tasks: 1 (limit: 4915)
CGroup: /system.slice/hostapd.service
└─1772 /usr/sbin/hostapd -B -P /run/hostapd.pid /etc/hostapd/hostapd.conf
Sep 02 00:58:46 box hostapd[1765]: Configuration file: /etc/hostapd/hostapd.conf
Sep 02 00:58:46 box hostapd[1765]: nl80211: kernel reports: Match already configured
Sep 02 00:58:46 box hostapd[1765]: nl80211: kernel reports: Match already configured
Sep 02 00:58:46 box hostapd[1765]: nl80211: kernel reports: Match already configured
Sep 02 00:58:46 box hostapd[1765]: ap0: interface state UNINITIALIZED->COUNTRY_UPDATE
Sep 02 00:58:46 box hostapd[1765]: ap0: Could not connect to kernel driver
Sep 02 00:58:46 box hostapd[1765]: Using interface ap0 with hwaddr 02:fb:8c:49:b8:c3 and ssid "unittest"
Sep 02 00:58:46 box hostapd[1765]: ap0: interface state COUNTRY_UPDATE->ENABLED
Sep 02 00:58:46 box hostapd[1765]: ap0: AP-ENABLED
Sep 02 00:58:46 box systemd[1]: Started Hostapd IEEE 802.11 AP, IEEE 802.1X/WPA/WPA2/EAP/RADIUS Authenticator.

@holta
Copy link
Member Author

holta commented Sep 2, 2021

@darkenvy call me crazy, but can I ask you to install IIAB using a different ISP — e.g. your cellphone's hotspot — to eliminate any Country Code monkey-business possibly caused by your CenturyLink router?

@darkenvy outside of your home, others have not been able to reproduce these IIAB install failures.

Which begs the question:

Do you have an old laptop kicking around, ideally one where Linux AP mode works (for hostapd) — and if so can you reproduce your IIAB install failures on that one too?

(e.g. if you install Debian 11 or any recent version of Ubuntu onto that laptop!)

And if all fails...sell the 🏠 !

@holta
Copy link
Member Author

holta commented Sep 2, 2021

@darkenvy Another thing you could try is this...

Connect a freshly-installed RaspiOS to your CenturyLink router (via Ethernet) that has never before connected to the Internet using CenturyLink.

Then carefully look for any BEFORE/AFTER changes...in the output of this command:

grep -ri country /etc

(Giving it a few minutes for the DHCP handshake [and any other similar handshakes] to take effect — and also reboot the RPi for good measure — to see if any country settings get changed automatically ?)

Maybe even do the 'apt update' and 'apt dist-upgrade' steps too?!

@holta
Copy link
Member Author

holta commented Sep 2, 2021

@darkenvy a similar idea is to connect a fresh install of RaspiOS to your phone's Internet hotspot (tethering over USB also works) and then run:

apt install etckeeper

This would allow you to monitor any changes being made within the Raspberry Pi's /etc using version control... BEFORE/AFTER connecting the Raspberry Pi to your CenturyLink router for the very 1st time!?

Instructions are here, if you do choose to use this etckeeper approach:

https://etckeeper.branchable.com

@jvonau
Copy link
Contributor

jvonau commented Sep 2, 2021

With hostapd using bridging under the covers the only thing I can think of that would prevent hostapd from starting when the bridging module is loaded on demand by hostapd would be a mis-matched kernel and on disk kernel modules from installing a kernel update and not rebooting before starting or continuing the iiab install. For reference see #482

@jvonau
Copy link
Contributor

jvonau commented Sep 2, 2021

Isn't MIN_RPI_KERN in iiab-install suppose to catch when a user is not running the latest kernel? Looks a bit outdated and is really the wrong touch point, think that should be in the user's face much sooner maybe in iiab?

@jvonau
Copy link
Contributor

jvonau commented Sep 2, 2021

@tim-moody
Copy link
Contributor

All of these tests are after the 2973 merge. I was able to install IIAB BEFORE the merge successfully.

Before merge of PR2973 install succeeded and now sometimes doesn't? Should roll back 2973

Anyways here are various install tests and the results.

The only thing that matters is Set WiFi Locale. How is that done? Btw, what do you mean by serial, terminal over USB?

If you wrote raspios to an sdcard and plugged in a keyboard and attached to a monitor with no ethernet and tried to run iiab-install I would expect it to fail as it would have no internet connection. (there would also be no repos or iiab-install), so how is the machine set up before running install on a 'serial' connection.

@holta
Copy link
Member Author

holta commented Sep 2, 2021

With hostapd using bridging under the covers the only thing I can think of that would prevent hostapd from starting when the bridging module is loaded on demand by hostapd would be a mis-matched kernel and on disk kernel modules from installing a kernel update and not rebooting before starting or continuing the iiab install. For reference see #482

It's a great question. @darkenvy please if you clarify how exactly you are burning RaspiOS Lite to the microSD card.

So we can completely eliminate this possibility.

I'm not saying there are unattended-upgrades on RaspiOS (?) but still we need to be careful — so if @darkenvy can guarantee 100% that he is not (e.g. even accidentally) installing apt upgrades — and then failing to reboot into the new kernel — that will allow us all to focus on what's really happening here.

@jvonau
Copy link
Contributor

jvonau commented Sep 2, 2021

Testing a no reboot situation with apt update apt upgrade to fool iiab's apt checking then curl without rebooting first, as booted:

pi@raspberrypi:~ $ uname -a
Linux raspberrypi 5.10.17-v7l+ #1414 SMP Fri Apr 30 13:20:47 BST 2021 armv7l GNU/Linux

post apt upgrade noting the kernel was updated

pi@raspberrypi:~ $ apt list raspberrypi-kernel
Listing... Done
raspberrypi-kernel/testing,now 1:1.20210831-1 armhf [installed]

Guess what... We have a reproducer http://sprunge.us/8RFl83

pi@raspberrypi:~ $ sudo systemctl status hostapd
● hostapd.service - Hostapd IEEE 802.11 AP, IEEE 802.1X/WPA/WPA2/EAP/RADIUS Authenticator
Loaded: loaded (/etc/systemd/system/hostapd.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2021-09-02 14:54:15 BST; 2min 23s ago
Process: 2934 ExecStart=/usr/sbin/hostapd -B -P /run/hostapd.pid /etc/hostapd/hostapd.conf (code=exited, status=1/FAILURE)
Sep 02 14:54:15 box hostapd[2934]: nl80211: Failed to add the bridge interface br0: Package not installed
Sep 02 14:54:15 box hostapd[2934]: nl80211: deinit ifname=ap0 disabled_11b_rates=0
Sep 02 14:54:15 box hostapd[2934]: nl80211 driver initialization failed.
Sep 02 14:54:15 box hostapd[2934]: ap0: interface state UNINITIALIZED->DISABLED
Sep 02 14:54:15 box hostapd[2934]: ap0: AP-DISABLED
Sep 02 14:54:15 box hostapd[2934]: ap0: CTRL-EVENT-TERMINATING
Sep 02 14:54:15 box hostapd[2934]: hostapd_free_hapd_data: Interface ap0 wasn't started
Sep 02 14:54:15 box systemd[1]: hostapd.service: Control process exited, code=exited, status=1/FAILURE
Sep 02 14:54:15 box systemd[1]: hostapd.service: Failed with result 'exit-code'.
Sep 02 14:54:15 box systemd[1]: Failed to start Hostapd IEEE 802.11 AP, IEEE 802.1X/WPA/WPA2/EAP/RADIUS Authenticator.

@holta
Copy link
Member Author

holta commented Sep 2, 2021

This is really great news. Regardless if that's what is afflicting @darkenvy.

We can hopefully now identify a way to test automatically for the failure to reboot after applying apt upgrades.

@jvonau
Copy link
Contributor

jvonau commented Sep 2, 2021

That was the whole point of MIN_RPI_KERN the first time this cropped up, just was not maintained or improved. There is an old closed PR to work from, back over to those who can commit code, or teach users.

@psiie
Copy link

psiie commented Sep 2, 2021

Yea confirmed. As long as you reboot after apt update & apt upgrade, no issue will be hit.
The setting of the locals (either option) in raspi-config forced a reboot. So It was the reboot itself.

It's humorous because I started doing sudo apt-get update -y && sudo apt-get -y and then the one-liner specifically to save time & to see if I could install iiab without ever running sudo iiab. By doing so, I found an edgecase XD.

@holta
Copy link
Member Author

holta commented Sep 2, 2021

EUREKA!

Thanks @darkenvy and everyone for getting to the bottom of this at last.

I'll work on an automated mitigation strategy, even if Raspberry Pi OS unfortunately does not support /var/run/reboot-required and /var/run/reboot-required.pkgs as Ubuntu and Debian apparently do.

PS The subject line (title) of this ticket probably need to be changed, to reflect the true nature of the underlying problem.

@holta holta changed the title WiFi Country Code confusion during IIAB installation WiFi Country Code confusion during IIAB installation [IIAB should check for mismatched kernel e.g. if apt updates were applied w/o reboot] Sep 2, 2021
@holta
Copy link
Member Author

holta commented Sep 2, 2021

@jvonau
Copy link
Contributor

jvonau commented Sep 3, 2021

The released apt kernel packages can be tracked at https://github.com/raspberrypi/firmware/commits/stable, each commit has a extra/uname_string* file that is used to create the banner for uname that contains a shorthand notation of the revision level as the #xxxx value that is displayed. iiab/iiab-factory#189

@holta
Copy link
Member Author

holta commented Sep 3, 2021

@darkenvy @jvonau please review PR iiab/iiab-factory#188 which has been overhauled to be much more reliable, a lot simpler, and a lot faster too:

@holta
Copy link
Member Author

holta commented Sep 3, 2021

PR iiab/iiab-factory#188 was further cleaned up and merged.

Thanks everyone for making IIAB installs increasingly very friendly to newcomers — as we set out to do pretty much exactly 3 years ago — and are now very close to delivering:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants