Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCNE 1.6 - trouble with Calico readiness #481

Closed
hussam-qasem opened this issue May 1, 2023 · 8 comments
Closed

OCNE 1.6 - trouble with Calico readiness #481

hussam-qasem opened this issue May 1, 2023 · 8 comments

Comments

@hussam-qasem
Copy link
Contributor

I've submitted a PR made to enable OCNE 1.6. However, I'm having trouble with Calico readiness. Any clues would be greatly appreciated

Screen Shot 2023-05-01 at 5 17 36 PM

/var/log/messages
May  1 14:58:38 master1 NetworkManager[9147]: <info>  [1682953118.8515] manager: (calico_tmp_B): new Veth device (/org/freedesktop/NetworkManager/Devices/641)
May  1 14:58:38 master1 NetworkManager[9147]: <info>  [1682953118.8529] manager: (calico_tmp_A): new Veth device (/org/freedesktop/NetworkManager/Devices/642)
May  1 14:58:38 master1 systemd-udevd[71677]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
May  1 14:58:38 master1 systemd-udevd[71677]: Could not generate persistent MAC address for calico_tmp_B: No such file or directory
May  1 14:58:38 master1 systemd-udevd[71678]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
May  1 14:58:38 master1 systemd-udevd[71678]: Could not generate persistent MAC address for calico_tmp_A: No such file or directory
k -n calico-system logs calico-node-*
2023-05-01 14:22:30.722 [INFO][20520] felix/ipsets.go 965: Current state of IP sets family="inet" output="Name: cali40this-host\nType: hash:ip\nRevision: 4\nHeader: family inet hashsize 1024 maxelem 1048576\nSize in memory: 496\nReferences: 0\nNumber of entries: 5\nMembers:\n127.0.0.1\n10.0.2.15\n192.168.56.111\n127.0.0.0\n10.244.200.192\n"
2023-05-01 14:22:30.722 [PANIC][20520] felix/ipsets.go 352: Failed to update IP sets after multiple retries. family="inet"
panic: (*logrus.Entry) 0xc0008e2e00

goroutine 153 [running]:
github.com/sirupsen/logrus.(*Entry).log(0xc00017aaf0, 0x0, {0xc0005d05a0, 0x30})
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:260 +0x47e
github.com/sirupsen/logrus.(*Entry).Log(0xc00017aaf0, 0x0, {0xc000597b58?, 0x5?, 0x0?})
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:304 +0x4f
github.com/sirupsen/logrus.(*Entry).Panic(...)
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:342
github.com/projectcalico/calico/felix/ipsets.(*IPSets).ApplyUpdates(0xc0003fadc0)
	/go/src/github.com/projectcalico/calico/felix/ipsets/ipsets.go:352 +0x75d
github.com/projectcalico/calico/felix/dataplane/linux.(*InternalDataplane).apply.func1({0x34b3c90?, 0xc0003fadc0?})
	/go/src/github.com/projectcalico/calico/felix/dataplane/linux/int_dataplane.go:1995 +0x3d
created by github.com/projectcalico/calico/felix/dataplane/linux.(*InternalDataplane).apply
	/go/src/github.com/projectcalico/calico/felix/dataplane/linux/int_dataplane.go:1994 +0x125f
2023-05-01 14:22:30.791 [INFO][20591] felix/daemon.go 378: Successfully loaded configuration. GOMAXPROCS=1 builddate="2023-03-06T11:01:12+0000" config=&config.Config{UseInternalDataplaneDriver:true, DataplaneDriver:"calico-iptables-plugin", DataplaneWatchdogTimeout:90000000000, WireguardEnabled:false, WireguardEnabledV6:false, WireguardListeningPort:51820, WireguardListeningPortV6:51821, WireguardRoutingRulePriority:99, WireguardInterfaceName:"wireguard.cali", WireguardInterfaceNameV6:"wg-v6.cali", WireguardMTU:0, WireguardMTUV6:0, WireguardHostEncryptionEnabled:false, WireguardPersistentKeepAlive:0, BPFEnabled:false, BPFDisableUnprivileged:true, BPFLogLevel:"off", BPFDataIfacePattern:(*regexp.Regexp)(0xc0008dac80), BPFL3IfacePattern:(*regexp.Regexp)(nil), BPFConnectTimeLoadBalancingEnabled:true, BPFExternalServiceMode:"tunnel", BPFKubeProxyIptablesCleanupEnabled:true, BPFKubeProxyMinSyncPeriod:1000000000, BPFKubeProxyEndpointSlicesEnabled:true, BPFExtToServiceConnmark:0, BPFPSNATPorts:numorstring.Port{MinPort:0x4e20, MaxPort:0x752f, PortName:""}, BPFMapSizeNATFrontend:65536, BPFMapSizeNATBackend:262144, BPFMapSizeNATAffinity:65536, BPFMapSizeRoute:262144, BPFMapSizeConntrack:512000, BPFMapSizeIPSets:1048576, BPFMapSizeIfState:1000, BPFHostConntrackBypass:true, BPFEnforceRPF:"Strict", BPFPolicyDebugEnabled:true, DebugBPFCgroupV2:"", DebugBPFMapRepinEnabled:false, DatastoreType:"kubernetes", FelixHostname:"worker1.vagrant.vm", EtcdAddr:"127.0.0.1:2379", EtcdScheme:"http", EtcdKeyFile:"", EtcdCertFile:"", EtcdCaFile:"", EtcdEndpoints:[]string(nil), TyphaAddr:"", TyphaK8sServiceName:"calico-typha", TyphaK8sNamespace:"calico-system", TyphaReadTimeout:30000000000, TyphaWriteTimeout:10000000000, TyphaKeyFile:"/node-certs/tls.key", TyphaCertFile:"/node-certs/tls.crt", TyphaCAFile:"/etc/pki/tls/certs/tigera-ca-bundle.crt", TyphaCN:"typha-server", TyphaURISAN:"", Ipv6Support:false, BpfIpv6Support:false, IptablesBackend:"auto", RouteRefreshInterval:90000000000, InterfaceRefreshInterval:90000000000, DeviceRouteSourceAddress:net.IP(nil), DeviceRouteSourceAddressIPv6:net.IP(nil), DeviceRouteProtocol:3, RemoveExternalRoutes:true, IptablesRefreshInterval:90000000000, IptablesPostWriteCheckIntervalSecs:1000000000, IptablesLockFilePath:"/run/xtables.lock", IptablesLockTimeoutSecs:0, IptablesLockProbeIntervalMillis:50000000, FeatureDetectOverride:map[string]string(nil), FeatureGates:map[string]string(nil), IpsetsRefreshInterval:10000000000, MaxIpsetSize:1048576, XDPRefreshInterval:90000000000, PolicySyncPathPrefix:"", NetlinkTimeoutSecs:10000000000, MetadataAddr:"", MetadataPort:8775, OpenstackRegion:"", InterfacePrefix:"cali", InterfaceExclude:[]*regexp.Regexp{(*regexp.Regexp)(0xc0008dadc0)}, ChainInsertMode:"insert", DefaultEndpointToHostAction:"ACCEPT", IptablesFilterAllowAction:"ACCEPT", IptablesMangleAllowAction:"ACCEPT", LogPrefix:"calico-packet", LogFilePath:"", LogSeverityFile:"", LogSeverityScreen:"INFO", LogSeveritySys:"", LogDebugFilenameRegex:(*regexp.Regexp)(nil), VXLANEnabled:(*bool)(nil), VXLANPort:4789, VXLANVNI:4096, VXLANMTU:0, VXLANMTUV6:0, IPv4VXLANTunnelAddr:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xa, 0xf4, 0xc8, 0xc0}, IPv6VXLANTunnelAddr:net.IP(nil), VXLANTunnelMACAddr:"", VXLANTunnelMACAddrV6:"", IpInIpEnabled:(*bool)(nil), IpInIpMtu:0, IpInIpTunnelAddr:net.IP(nil), FloatingIPs:"Disabled", AllowVXLANPacketsFromWorkloads:false, AllowIPIPPacketsFromWorkloads:false, AWSSrcDstCheck:"DoNothing", ServiceLoopPrevention:"Drop", WorkloadSourceSpoofing:"Disabled", ReportingIntervalSecs:0, ReportingTTLSecs:90000000000, EndpointReportingEnabled:false, EndpointReportingDelaySecs:1000000000, IptablesMarkMask:0xffff0000, DisableConntrackInvalidCheck:false, HealthEnabled:true, HealthPort:9099, HealthHost:"localhost", HealthTimeoutOverrides:map[string]time.Duration(nil), PrometheusMetricsEnabled:false, PrometheusMetricsHost:"", PrometheusMetricsPort:9091, PrometheusGoMetricsEnabled:true, PrometheusProcessMetricsEnabled:true, PrometheusWireGuardMetricsEnabled:true, FailsafeInboundHostPorts:[]config.ProtoPort{config.ProtoPort{Net:"", Protocol:"tcp", Port:0x16}, config.ProtoPort{Net:"", Protocol:"udp", Port:0x44}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0xb3}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x94b}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x94c}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x1561}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x192b}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x1a0a}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x1a0b}}, FailsafeOutboundHostPorts:[]config.ProtoPort{config.ProtoPort{Net:"", Protocol:"udp", Port:0x35}, config.ProtoPort{Net:"", Protocol:"udp", Port:0x43}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0xb3}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x94b}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x94c}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x1561}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x192b}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x1a0a}, config.ProtoPort{Net:"", Protocol:"tcp", Port:0x1a0b}}, KubeNodePortRanges:[]numorstring.Port{numorstring.Port{MinPort:0x7530, MaxPort:0x7fff, PortName:""}}, NATPortRange:numorstring.Port{MinPort:0x0, MaxPort:0x0, PortName:""}, NATOutgoingAddress:net.IP(nil), UsageReportingEnabled:true, UsageReportingInitialDelaySecs:300000000000, UsageReportingIntervalSecs:86400000000000, ClusterGUID:"99e7c0c9d4774e1ab828f89985519c4d", ClusterType:"k8s,operator,kubeadm,kdd,typha", CalicoVersion:"v3.25.0", ExternalNodesCIDRList:[]string(nil), DebugMemoryProfilePath:"", DebugCPUProfilePath:"/tmp/felix-cpu-<timestamp>.pprof", DebugDisableLogDropping:false, DebugSimulateCalcGraphHangAfter:0, DebugSimulateDataplaneHangAfter:0, DebugPanicAfter:0, DebugSimulateDataRace:false, RouteSource:"CalicoIPAM", RouteTableRange:idalloc.IndexRange{Min:0, Max:0}, RouteTableRanges:[]idalloc.IndexRange(nil), RouteSyncDisabled:false, IptablesNATOutgoingInterfaceFilter:"", SidecarAccelerationEnabled:false, XDPEnabled:true, GenericXDPEnabled:false, Variant:"Calico", MTUIfacePattern:(*regexp.Regexp)(0xc0008db040), Encapsulation:config.Encapsulation{IPIPEnabled:false, VXLANEnabled:true, VXLANEnabledV6:false}, internalOverrides:map[string]string{}, sourceToRawConfig:map[config.Source]map[string]string{0x1:map[string]string{"CalicoVersion":"v3.25.0", "ClusterGUID":"99e7c0c9d4774e1ab828f89985519c4d", "ClusterType":"k8s,operator,kubeadm,kdd,typha", "FloatingIPs":"Disabled", "HealthPort":"9099", "LogSeverityScreen":"Info", "ReportingIntervalSecs":"0"}, 0x2:map[string]string{"IPv4VXLANTunnelAddr":"10.244.200.192"}, 0x3:map[string]string{"LogFilePath":"None", "LogSeverityFile":"None", "LogSeveritySys":"None", "MetadataAddr":"None"}, 0x4:map[string]string{"datastoretype":"kubernetes", "defaultendpointtohostaction":"ACCEPT", "felixhostname":"worker1.vagrant.vm", "healthenabled":"true", "healthport":"9099", "ipv6support":"false", "typhacafile":"/etc/pki/tls/certs/tigera-ca-bundle.crt", "typhacertfile":"/node-certs/tls.crt", "typhacn":"typha-server", "typhak8snamespace":"calico-system", "typhak8sservicename":"calico-typha", "typhakeyfile":"/node-certs/tls.key"}}, rawValues:map[string]string{"CalicoVersion":"v3.25.0", "ClusterGUID":"99e7c0c9d4774e1ab828f89985519c4d", "ClusterType":"k8s,operator,kubeadm,kdd,typha", "DatastoreType":"kubernetes", "DefaultEndpointToHostAction":"ACCEPT", "FelixHostname":"worker1.vagrant.vm", "FloatingIPs":"Disabled", "HealthEnabled":"true", "HealthPort":"9099", "IPv4VXLANTunnelAddr":"10.244.200.192", "Ipv6Support":"false", "LogFilePath":"None", "LogSeverityFile":"None", "LogSeverityScreen":"Info", "LogSeveritySys":"None", "MetadataAddr":"None", "ReportingIntervalSecs":"0", "TyphaCAFile":"/etc/pki/tls/certs/tigera-ca-bundle.crt", "TyphaCN":"typha-server", "TyphaCertFile":"/node-certs/tls.crt", "TyphaK8sNamespace":"calico-system", "TyphaK8sServiceName":"calico-typha", "TyphaKeyFile":"/node-certs/tls.key"}, Err:error(nil), loadClientConfigFromEnvironment:(func() (*apiconfig.CalicoAPIConfig, error))(0x14562e0), useNodeResourceUpdates:false} gitcommit="d86c70b2d883cdc9cc08a385bfeba2b0e7b18de8" version="d86c70b2d883"
2023-05-01 14:22:30.793 [INFO][20591] felix/bootstrap.go 209: Wireguard is not enabled - ensure no wireguard config iface="wireguard.cali" ipVersion=0x4 nodeName="worker1.vagrant.vm"
2023-05-01 14:22:30.797 [INFO][20591] felix/bootstrap.go 624: Wireguard public key not set in datastore ipVersion=0x4 nodeName="worker1.vagrant.vm"
2023-05-01 14:22:30.797 [INFO][20591] felix/bootstrap.go 209: Wireguard is not enabled - ensure no wireguard config iface="wg-v6.cali" ipVersion=0x6 nodeName="worker1.vagrant.vm"
2023-05-01 14:22:30.800 [INFO][20591] felix/bootstrap.go 624: Wireguard public key not set in datastore ipVersion=0x6 nodeName="worker1.vagrant.vm"
2023-05-01 14:22:30.800 [INFO][20591] felix/driver.go 72: Using internal (linux) dataplane driver.
...
2023-05-01 14:59:44.662 [WARNING][24389] felix/ipsets.go 340: Failed to update IP sets. Marking dataplane for resync. error=exit status 1 family="inet"
2023-05-01 14:59:44.732 [WARNING][24389] felix/ipsets.go 712: Failed to complete ipset restore, IP sets may be out-of-sync. closeErr=<nil> commitErr=<nil> family="inet" flushErr=<nil> input="create cali40all-ipam-pools hash:net family inet maxelem 1048576\ncreate cali4t28 hash:net family inet maxelem 1048576\nadd cali4t28 10.244.0.0/16\nswap cali40all-ipam-pools cali4t28\ndestroy cali4t28\ncreate cali40masq-ipam-pools hash:net family inet maxelem 1048576\ncreate cali4t29 hash:net family inet maxelem 1048576\nadd cali4t29 10.244.0.0/16\nswap cali40masq-ipam-pools cali4t29\ndestroy cali4t29\ncreate cali4t30 hash:ip family inet maxelem 1048576\nadd cali4t30 10.0.2.15\nadd cali4t30 192.168.56.111\nadd cali4t30 10.244.200.192\nadd cali4t30 127.0.0.0\nadd cali4t30 127.0.0.1\nswap cali40this-host cali4t30\ndestroy cali4t30\ncreate cali40all-vxlan-net hash:net family inet maxelem 1048576\ncreate cali4t31 hash:net family inet maxelem 1048576\nadd cali4t31 192.168.56.101/32\nadd cali4t31 192.168.56.112/32\nswap cali40all-vxlan-net cali4t31\ndestroy cali4t31\nCOMMIT\n" processErr=exit status 1 stderr="ipset v7.1: Error in line 1: Kernel error received: set type not supported\n" stdout="" writeErr=<nil>

I also attempted to install the calico networking module but with similar results:

installation:
  cni:
    type: Calico
  # Configures Calico networking.
  calicoNetwork:
    bgp: Disabled
    # Note: The ipPools section cannot be modified post-install.
    ipPools:
    - cidr: 10.244.0.0/16
      encapsulation: VXLAN
    # IPV4 for now
    nodeAddressAutodetectionV4:
     interface: eth1
      # natOutgoing: Enabled
      # nodeSelector: all()
  registry: 10.0.2.2:5000
  imagePath: olcne
@thtanaka
Copy link
Member

thtanaka commented May 2, 2023

did you have firewall running or not ?

@hussam-qasem
Copy link
Contributor Author

did you have firewall running or not ?

I disabled firewalld.service ...

@thtanaka
Copy link
Member

thtanaka commented May 2, 2023

I guess the logs your provided above is for the calico-node-g5tt6 why did it only happening on one node, could you perhaps share the output of:

  1. kubectl get po -A -o wide
  2. kubectl get nodes -o wide

@hussam-qasem
Copy link
Contributor Author

hussam-qasem commented May 3, 2023

Thanks @thtanaka for your reply. Please find the requested below:
Screen Shot 2023-05-03 at 6 56 17 AM

Screen Shot 2023-05-03 at 6 53 02 AM

I'm curious, were you unable to replicate the problem? If time permits, please try:

git clone https://github.com/oracle/vagrant-projects
cd vagrant-projects/OCNE
VERBOSE=true vagrant up

(for the screenshots, I also set NB_WORKERS=1)

(some time later, both pods are not ready)
Screen Shot 2023-05-03 at 12 02 15 PM

@jromers
Copy link
Member

jromers commented May 3, 2023

Does the issue also happens when you try the UEK6 kernel (you are using UEK7 atm) ?

@hussam-qasem
Copy link
Contributor Author

Hi @jromers

Does the issue also happens when you try the UEK6 kernel (you are using UEK7 atm) ?

Using config.vm.box_version 8.6.359 (UEKR6), it seems to work!!

Screen Shot 2023-05-03 at 3 18 01 PM

How do I make it work on UEKR7?

@AmedeeBulle
Copy link
Member

@hussam-qasem

I think the culprit is here:

if ! [[ ${POD_NETWORK} == "calico" || ${DEPLOY_CALICO} == 1 ]]; then
msg "Installing kernel-uek-modules"
echo_do sudo dnf install -y kernel-uek-modules-$(uname -r)
fi

Even when using calico, you still need to masquerade here:

echo_do sudo firewall-cmd --add-masquerade --permanent

I did a quick test and:

  • I can reproduce the issue with the pristine code
  • It seems to work when I comment out the test mentioned above (that is: always install the kernel modules)

(I don't use K8s these days, so I haven't done thorough testing)

@hussam-qasem
Copy link
Contributor Author

hussam-qasem commented May 3, 2023

Thank you @AmedeeBulle . That was exactly my problem. Thank you @jromers for the tip.
I have submitted a new PR #482 #483 to re-install kernel-uek-modules for UEKR7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants