Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activate Error Driven Snowflake #3265

Closed
wants to merge 1 commit into from
Closed

Conversation

yacovm
Copy link
Contributor

@yacovm yacovm commented Aug 2, 2024

Why this should be merged and how was this tested

This commit enables error driven snowflake.

I deployed on the Fuji network two nodes with the same spec (8 CPU, 32GB RAM), and collected metrics for 12 hours.

The measurement of the average time to finalize a block on the latest version of Avalanche is as follows:

Screenshot 2024-08-09 at 15 25 46

In contrast, with this commit, the time to finalize a block is now cut by ~ 35%:

Screenshot 2024-08-09 at 15 26 35

When 5% of the stake is unreachable, the current snowflake finalization time increases:

Screenshot 2024-08-09 at 21 33 33

The error driven snowflake finalization time increases more, but it seems it is still slightly faster than the current snowflake:

Screenshot 2024-08-09 at 21 32 48

The time measurements for block finalizations were collected via the metric avalanche_snowman_blks_accepted_sum which is the metric which measures the duration from the time the block is seen for the first time and enters consensus, to the time the block is finalized.

How this works

In the classical snowflake protocol, a node issues a sequence of polls, and every poll yields a certain confidence score which is based on the number of nodes that responded and the content of the response.

If the poll contains enough responses that amplify the confidence score above a certain threshold, the poll is considered a success, and the criteria to finalize a block is collecting enough successive successful polls.

In the error driven snowflake which is described in Section 4.1 in the Frosty paper, a poll can succeed in various degrees of success: The higher the confidence score that the poll concludes, the more successful the poll is considered. In contrast to the classical snowflake protocol, the criteria for how many successful polls are required to finalize a block is now determined by the confidence score of the polls - successive polls with a higher confidence score require fewer of them to finalize, and vice versa.

Each poll consists of sending queries to a number of nodes, and collecting responses. Since nodes may be offline, slow or malicious, they might not return responses in a timely manner. In order for the polls to be efficient, there exists logic which terminates a poll early once it has reached a required level of confidence, or if enough nodes timed out such that it is evident that the required confidence level cannot be reached by waiting for further nodes.

The current snowflake code already supports the error driven variant. However, the logic that terminates the polls early currently only supports the classical snowflake with a single confidence score.
In addition, there is no way to express in the configuration the threshold for the error driven snowflake, as there is only a single confidence configuration in the configuration.

This commit introduces new configuration flags to the avalanche node which express the error driven snowflake various confidence criteria, and also changes the early termination logic to accommodate the error driven snowflake.

@yacovm yacovm requested a review from StephenButtolph as a code owner August 2, 2024 22:06
@yacovm yacovm marked this pull request as draft August 2, 2024 22:07
@yacovm yacovm force-pushed the errDrivenSF branch 17 times, most recently from 91bd651 to 3eece44 Compare August 8, 2024 17:25
@yacovm yacovm changed the title [WIP] Activate Error Driven Snowflake Activate Error Driven Snowflake Aug 9, 2024
This commit enables error driven snowflake and reduces the average poll time by 35% on the Fuji network.

In the classical snowflake protocol, a node issues a sequence of polls, and every poll yields a certain confidence score which is based on the number of nodes that responded and the content of the response.

If the poll contains enough responses that amplify the confidence score above a certain threshold, the poll is considered a success, and the criteria to finalize a block is collecting enough successive successful polls.

In the error driven snowflake which is described in Section 4.1 in the Frosty paper, a poll can succeed in various degrees of success: The higher the confidence score that the poll concludes, the more successful the poll is considered. In contrast to the classical snowflake protocol, the criteria for how many successful polls are required to finalize a block is now determined by the confidence score of the polls - successive polls with a higher confidence score require fewer of them to finalize, and vice versa.

Each poll consists of sending queries to a number of nodes, and collecting responses. Since nodes may be offline, slow or malicious, they might not return responses in a timely manner. In order for the polls to be efficient, there exists logic which terminates a poll early once it has reached a required level of confidence, or if enough nodes timed out such that it is evident that the required confidence level cannot be reached by waiting for further nodes.

The current snowflake code already supports the error driven variant. However, the logic that terminates the polls early currently only supports the classical snowflake with a single confidence score.
In addition, there is no way to express in the configuration the threshold for the error driven snowflake, as there is only a single confidence configuration in the configuration.

This commit introduces new configuration flags to the avalanche node which express the error driven snowflake various confidence criteria, and also changes the early termination logic to accommodate the error driven snowflake.

Signed-off-by: Yacov Manevich <[email protected]>
@yacovm yacovm marked this pull request as ready for review August 9, 2024 16:50
@yacovm yacovm closed this Aug 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant