Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-threaded time synchronization performance regression #1778

Open
SoftExpert opened this issue Dec 20, 2024 · 12 comments
Open

Single-threaded time synchronization performance regression #1778

SoftExpert opened this issue Dec 20, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@SoftExpert
Copy link

Hello,

I built from source and installed locally ntpd-rs; with the release of version 1.4.0 I was expecting that the number of threads would be reduced to 1 (I'm using it as an NTP client).
However, there are still as many threads as there are cores:

❯ lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          48 bits physical, 48 bits virtual
  Byte Order:             Little Endian
CPU(s):                   16
  On-line CPU(s) list:    0-15
Vendor ID:                AuthenticAMD
  Model name:             AMD Ryzen 7 8700G w/ Radeon 780M Graphics
    CPU family:           25
    Model:                117
    Thread(s) per core:   2
    Core(s) per socket:   8
    Socket(s):            1
    Stepping:             2


❯ pstree -p `pidof ntp-daemon`
ntp-daemon(927)─┬─{ntp-daemon}(929)
                ├─{ntp-daemon}(930)
                ├─{ntp-daemon}(931)
                ├─{ntp-daemon}(932)
                ├─{ntp-daemon}(933)
                ├─{ntp-daemon}(934)
                ├─{ntp-daemon}(935)
                ├─{ntp-daemon}(936)
                ├─{ntp-daemon}(937)
                ├─{ntp-daemon}(938)
                ├─{ntp-daemon}(939)
                ├─{ntp-daemon}(940)
                ├─{ntp-daemon}(941)
                ├─{ntp-daemon}(942)
                ├─{ntp-daemon}(943)
                ├─{ntp-daemon}(944)
                └─{ntp-daemon}(945)

My config file looks like this:

❯ cat ntp.toml 
[observability]
# You can configure ntpd-rs with different output levels of logging information
# Basic values for this are `trace`, `debug`, `info`, `warn` and `error`.
log-level = "info"
## Using the observe socket you can retrieve statistical information about the
## daemon while it is running. You can use the `ntp-ctl` or prometheus based
## `ntp-metrics-exporter` binaries for some default options to read from the
## observe socket.
observation-path = "/var/run/ntpd-rs/observe"

## The sources section allows configuring sources, you may configure multiple of
## these blocks to add more sources to your configuration.
## Our default configuration spawns a pool of sources (by default this attempts
## to discover 4 distinct sources).
[[source]]
mode = "pool"
address = "ntpd-rs.pool.ntp.org"
count = 4

[[source]]
mode = "pool"
address = "ntp.time.nl"
count = 2


## If you have an NTS server, you can configure a source that connects using NTS
## by adding a configuration such as the one below
[[source]]
mode = "nts"
# NTS service from NETNOD: https://www.netnod.se/nts/network-time-security
address = "nts.netnod.se"

## A source in server mode will only create a single source in contrast to the
## multiple sources of a pool. This is the recommended source mode if you only
## have an IP address for your source.
#[[source]]
#mode = "server"
#address = "ntpd-rs.pool.ntp.org"

## If you want to provide time to other machines, the configuration below
## enables serving time on port 123 of all network interfaces.
#[[server]]
#listen = "[::]:123"

## Below are configured various thresholds beyond which ntpd-rs will not
## change the system clock. CHANGE THESE TO MATCH YOUR SECURITY NEEDS!
[synchronization]
# The maximum step size (in seconds) of a single step during normal operation
single-step-panic-threshold = 1800
# On startup a larger jump may occur, this sets limits for that initial jump
startup-step-panic-threshold = { forward="inf", backward = 86400 }
# If, during the lifetime of the ntp-daemon the combined time of time jumps
# exceeds this value, then the NTP daemon will stop, this is disabled by default
#accumulated-threshold = 1800
#minimum-agreeing-sources = 3

What am I missing?
Is there something wrong / missing in my configuration or the change has not yet been integrated in the released version ?

@SoftExpert
Copy link
Author

Well, I got the answer - the commit has been made after the v1.4.0 release.
I built from latest git and I have now a single thread:

❯ pstree -p `pidof ntp-daemon`
ntp-daemon(937)───{ntp-daemon}(938)

Sorry for the noise !

@SoftExpert
Copy link
Author

Hummm, I notice now that ntp-ctl status shows that synchronization reaches Stratum: 3 at best.
I wonder if that has something to do with the single thread commit ...

@rnijveld
Copy link
Member

Hmm that would be an interesting change in behavior if that were the case. I would not expect much of any difference with the change over to a single thread, I'm guessing you previously got to a stratum 2? I believe our stratum should be set to one higher than the source for which we have the lowest uncertainty (i.e. the source with the most influence on our clock). It could also very well be that the daemon determined that one of the sources from pool.ntp.org (assuming the configuration you've shown above, that pool has mostly stratum 2 and 3 servers) appears more reliable to us than either time.nl or netnod (especially if those sources are a little further away from you that could very well be possible).

@SoftExpert
Copy link
Author

I confirm - with v1.4.0 and previous (1.3.0) I was constantly at stratum 2, within less than 2min from cold boot.
With current git version (that includes the commit for single thread), it oscillates between stratum 4 and stratum 3, occasionally getting to stratum 2, but not for long.
The config file remained strictly as posted (I made sure of that after each install).
It feels like, somehow, it is less able to negotiate time precision at stratum 2 level.

@rnijveld rnijveld changed the title Tokio still multi-threaded for a client in v1.4.0 Single-threaded time synchronization performance regression Dec 20, 2024
@rnijveld rnijveld added the bug Something isn't working label Dec 20, 2024
@SoftExpert
Copy link
Author

If it helps, I can play with different configurations and collect logs. I would need some guidance, though ...

@rnijveld
Copy link
Member

I don't think we really have enough logging to get to the bottom of this easily. I think we will need to do a little debugging to see what is going on, but it will probably take a little while before either me or David has time to do that unfortunately. Before releasing this we were planning on doing some synchronization performance testing ourselves as well. For now I would recommend to go back to 1.4.0 unfortunately.

@SoftExpert
Copy link
Author

If you need feedback and some light testing, count me in !

@davidv1992
Copy link
Member

I won't be able to put much time into this yet, but to get you started on doing some useful tests, could you do two runs with debug logging, each for about 24 hours or so, once with a self-built version from the 1.4.0 commit and once with the git version where you are seeing the problems, but with fixed NTP pool servers (so replacing the pool source with 4 pool servers manually) so as to reduce potential impact of different servers from the pool?

Should the indicated behaviour still continue, then if you could send me the logs of both those runs, along with ntp-ctl status output at a few points throughout the running of either test, that would be really helpfull.

Note that the algorithm we use is not setup to prioritize stratum, so the effect you are seeing isn't necessarily unexpected, although it would be strange if, with identical sets of servers, behaviour changes from just the tokio single threading commit. In the way we implement the synchronization, we prioritize based on the quality of the sources as we measure them and how they advertise that to us (through fields other than stratum). Hence, running with stratum larger than 2 isn't necessarily an indication that synchronization quality is worse, just that it thought that a different server is better, and that server happened to have higher stratum.

@SoftExpert
Copy link
Author

Sure; would you, please, provide the exact contents of the config file I should use ?

@davidv1992
Copy link
Member

Sure, I would need a bit of information from your side, could you provide me with the output of nslookup ntpd-rs.pool.ntp.org?

@SoftExpert
Copy link
Author

I used dnslookup, if that's OK ..

dnslookup master
Server: 127.0.0.153:53

dnslookup result (elapsed 55.740515ms):
;; opcode: QUERY, status: NOERROR, id: 40084
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;ntpd-rs.pool.ntp.org.  IN       A

;; ANSWER SECTION:
ntpd-rs.pool.ntp.org.   130     IN      A       162.159.200.1
ntpd-rs.pool.ntp.org.   130     IN      A       82.67.126.242
ntpd-rs.pool.ntp.org.   130     IN      A       185.123.84.51
ntpd-rs.pool.ntp.org.   130     IN      A       162.159.200.123

@davidv1992
Copy link
Member

The following should be a good configuration for the test:

[observability]
# You can configure ntpd-rs with different output levels of logging information
# Basic values for this are `trace`, `debug`, `info`, `warn` and `error`.
log-level = "debug"
## Using the observe socket you can retrieve statistical information about the
## daemon while it is running. You can use the `ntp-ctl` or prometheus based
## `ntp-metrics-exporter` binaries for some default options to read from the
## observe socket.
observation-path = "/var/run/ntpd-rs/observe"

## The sources section allows configuring sources, you may configure multiple of
## these blocks to add more sources to your configuration.
## Our default configuration spawns a pool of sources (by default this attempts
## to discover 4 distinct sources).
[[source]]
mode = "server"
address = "162.159.200.1"

[[source]]
mode = "server"
address = "82.67.126.242"

[[source]]
mode = "server"
address = "185.123.84.51"

[[source]]
mode = "server"
address = "162.159.200.123"

[[source]]
mode = "server"
address = "94.198.159.10"

[[source]]
mode = "server"
address = "94.198.159.14"

## If you have an NTS server, you can configure a source that connects using NTS
## by adding a configuration such as the one below
[[source]]
mode = "nts"
# NTS service from NETNOD: https://www.netnod.se/nts/network-time-security
address = "nts.netnod.se"

## A source in server mode will only create a single source in contrast to the
## multiple sources of a pool. This is the recommended source mode if you only
## have an IP address for your source.
#[[source]]
#mode = "server"
#address = "ntpd-rs.pool.ntp.org"

## If you want to provide time to other machines, the configuration below
## enables serving time on port 123 of all network interfaces.
#[[server]]
#listen = "[::]:123"

## Below are configured various thresholds beyond which ntpd-rs will not
## change the system clock. CHANGE THESE TO MATCH YOUR SECURITY NEEDS!
[synchronization]
# The maximum step size (in seconds) of a single step during normal operation
single-step-panic-threshold = 1800
# On startup a larger jump may occur, this sets limits for that initial jump
startup-step-panic-threshold = { forward="inf", backward = 86400 }
# If, during the lifetime of the ntp-daemon the combined time of time jumps
# exceeds this value, then the NTP daemon will stop, this is disabled by default
#accumulated-threshold = 1800
#minimum-agreeing-sources = 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants