Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: panic due to goroutine setting returned error #748

Merged
merged 6 commits into from
Feb 19, 2025

Conversation

mzz2017
Copy link
Contributor

@mzz2017 mzz2017 commented Feb 17, 2025

Background

返回值显式给出变量名时,例如 (err error),如果函数内有 go routine 设置 err 的值,则有概率会导致 racing 风险,导致 errors.Is / errors.Unwrap 出错。

复现方法:

image

多运行几次即可复现。

其中的 sniffing.IsSniffingError 定义如下:

var (
	Error            = fmt.Errorf("sniffing error")
)

func IsSniffingError(err error) bool {
	return errors.Is(err, Error)
}

猜测:存在编译器优化使得类似 std::move,函数内外的 err 是同一个地址。

修复:不要在 go routine 里直接设置 err 的值。

Checklist

Full Changelogs

  • [Implement ...]

Issue Reference

Closes #685
Closes #693
Closes #735
Closes #745

Test Result

jschwinger233
jschwinger233 previously approved these changes Feb 17, 2025
@@ -68,7 +68,7 @@ func ResolveIp46(ctx context.Context, dialer netproxy.Dialer, dns netip.AddrPort
}()
var e error
addrs6, e = ResolveNetip(ctx6, dialer, dns, host, dnsmessage.TypeAAAA, network)
if err != nil && !errors.Is(e, context.Canceled) {
if e != nil && !errors.Is(e, context.Canceled) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前的逻辑,当 err != nil 的时候,e 可能是 nil,此时 errors.Is(nil, ...) 就恐慌了。这种手滑写错变量,还有 == nil 写反成 != nil 之类的都很容易被忽视 😹

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的!我用的 ide 自动补全成的 err

Copy link
Contributor Author

@mzz2017 mzz2017 Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jschwinger233 不过 errors.Is(nil, ...) 不会恐慌,这个是检查的时候顺手修的

@oluceps
Copy link

oluceps commented Feb 17, 2025

目前没有复现panic,但是发现在我的所有机器上使用这条更改 97f41ec...278d6da 的dae,网络环境都会从原先的没有dns污染变为有dns污染,具体表现为应用这条更改后curl和chrome内解析 zh.wikipedia.org 会解析到facebook地址,然后net::ERR_CERT_COMMON_NAME_INVALID(因为CN是*.secure.latest.facebook.com),google之类的站点会直接ERR_CONNECTION_CLOSED。匪夷所思但是试验多次前后唯一的变量就只有这条更改。

测试前后的rev是 97f41ec (也就是现在的main HEAD,正常)和 278d6da(异常)

我的dae配置 https://github.com/oluceps/nixos-config/blob/7df068e4e147f9c639653aa4f5966c0dc20c8988/repack/dae.nix#L21 节点全部是本地socks5
系统 Linux 6.13.2, NixOS, 25.05 (Warbler), 25.05.20250210.58edd1e

没开dnsmasq没开systemd-resolved,网络使用systemd-networkd管理,dnsproxy listen 53

@mzz2017
Copy link
Contributor Author

mzz2017 commented Feb 18, 2025

@oluceps 换回去就好了吗

@oluceps
Copy link

oluceps commented Feb 18, 2025

@oluceps 换回去就好了吗

是的,试验多次 换回去就一切正常了

@oluceps
Copy link

oluceps commented Feb 18, 2025

再次试验百分百复现,dae revision 278d6da
我的dae dns配置:


dns {
    ipversion_prefer: 4
    upstream {
        me: 'udp://127.0.0.1' # dnsproxy upstream
        pol: 'udp://127.0.0.1'
    }
    routing {
        request {
            qname(ext:"geosite:category-ads-all") -> reject
            fallback: me
        }
        response {
            upstream(pol) -> accept
            !qname(geosite:cn) && ip(geoip:private) -> pol
            fallback: accept
        }
    }
}

请求时dae的debug日志,从curl开始请求到请求结束:

Feb 18 13:17:44 kaambl dae[102898]: level=info msg="192.168.1.187:43852 <-> 31.13.94.49:443" dialer=hk-hy dscp=0 ip="31.13.94.49:443" mac="38:d5:7a:e2:19:7b" network=tcp4 outbound=all pid=0 pname=curl policy=min_avg10 sniffed=zh.wikipedia.org

curl的详细信息:

> curl https://zh.wikipedia.org -v
* Host zh.wikipedia.org:443 was resolved.
* IPv6: 2001::1
* IPv4: 31.13.94.10
*   Trying [2001::1]:443...
* Immediate connect fail for 2001::1: Network is unreachable
*   Trying 31.13.94.10:443...
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_CHACHA20_POLY1305_SHA256 / x25519 / RSASSA-PSS
* ALPN: server accepted h2
* Server certificate:
*  subject: C=US; ST=California; L=Menlo Park; O=Meta Platforms, Inc.; CN=*.secure.latest.facebook.com
*  start date: Nov 27 00:00:00 2024 GMT
*  expire date: Feb 25 23:59:59 2025 GMT
*  subjectAltName does not match hostname zh.wikipedia.org
* SSL: no alternative certificate subject name matches target hostname 'zh.wikipedia.org'
* closing connection #0
curl: (60) SSL: no alternative certificate subject name matches target hostname 'zh.wikipedia.org'
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the webpage mentioned above.

dnsproxy配置

bootstrap:
- 8.8.8.8
- 119.29.29.29
- tcp://223.6.6.6:53
listen-addrs:
- '::'
listen-ports:
- 53
upstream:
- quic://unfiltered.adguard-dns.com
- quic://dns.alidns.com
- h3://dns.alidns.com/dns-query
- tls://dot.pub
upstream-mode: parallel

换回 97f41ec 一切正常

@mzz2017
Copy link
Contributor Author

mzz2017 commented Feb 18, 2025

@oluceps 已更新,烦请看看是否修复 dns 污染问题

@oluceps
Copy link

oluceps commented Feb 18, 2025

@oluceps 已更新,烦请看看是否修复 dns 污染问题

更新到 6035c92 正常了 没有出现上述问题

Copy link
Contributor

@dae-prow dae-prow bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧪 Since the PR has been fully tested, please consider merging it.

@@ -140,6 +140,25 @@ func ResolveNS(ctx context.Context, d netproxy.Dialer, dns netip.AddrPort, host
return records, nil
}

func ResolveSOA(ctx context.Context, d netproxy.Dialer, dns netip.AddrPort, host string, network string) (records []string, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数。。好像没有被用到?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的忘了删了,先留着吧

@mzz2017 mzz2017 merged commit e7c2497 into main Feb 19, 2025
29 of 31 checks passed
@mzz2017 mzz2017 deleted the mzz/fix_panic_due_to_goroutine_set_returned_error branch February 19, 2025 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants