perf: Add workflow template informer to server #13672

jakkubu · 2024-09-27T13:46:15Z

Motivation

Improve performance of creating workflows with complex templateRef structure.

During template validation k8s API is called for each templateRef. For complex workflows with many refs it creates huge overhead. Let's cache such templates

Connected to issue #7418

This is a follow up PR #13633

Modifications

Added informer to the server and use it in workflow validation

Verification

I run the tests similar to the ones in 1st PR. The results are awesome - benchmarking results and details in separate comment

pkg/apiclient/argo-kube-client.go

jakkubu · 2024-10-10T07:36:06Z

Benchmarking multiple-ref template creation

Setup

Branches:

Main (commit 5244064)
Rebased to above commit and changes from PR: perf: Add template validation caching #13633 (commit 9df7abf)

Using fresh kind cluster v1.28.9

Argo server started with server --auth-mode=server --auth-mode=client --kube-api-burst=200 --kube-api-qps=200

Benchmark workflow templates are placed in test/benchmarks/*.yaml.

Before each tests following procedure were followed:

Delete all workflow
Wait for all workflows pods to be removed
Restart controller and server

Benchmarking tool: hey. It runs command in parallel by default 200 times using 50 workers. Those values can be modified using:

-n: number of requests
-c: number of workers

Typical call is described in test/benchmarks/README.md.

Results

Requests	Workers	Template	No cache ART [s]	Manual cache ART [s]	Informer ART [s]
200	50	20-echos	deadline exceeded	9.1370	0.0833
50	2	20-echos	4.3682	0.3974	0.0119
16	8	20-echos	18.0247	1.0127	0.0290
50	1	20-echos	2.3204	0.2095	0.0273
200	50	echo-1	11.7437	4.3005	0.0362

*ART - Average Response Time

Appendix

Manual Caching hey output

hey \
    -n 200 -c 50 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	38.5292 secs
  Slowest:	10.0132 secs
  Fastest:	0.2812 secs
  Average:	9.1370 secs
  Requests/sec:	5.1909


Response time histogram:
  0.281 [1]	|
  1.254 [0]	|
  2.228 [2]	|■
  3.201 [3]	|■
  4.174 [3]	|■
  5.147 [1]	|
  6.120 [0]	|
  7.094 [6]	|■■
  8.067 [8]	|■■
  9.040 [26]	|■■■■■■■
  10.013 [150]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 7.8264 secs
  25% in 9.0510 secs
  50% in 9.9969 secs
  75% in 9.9999 secs
  90% in 10.0010 secs
  95% in 10.0017 secs
  99% in 10.0090 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0086 secs, 0.2812 secs, 10.0132 secs
  DNS-lookup:	0.0008 secs, 0.0003 secs, 0.0028 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0009 secs
  resp wait:	9.1282 secs, 0.2608 secs, 10.0097 secs
  resp read:	0.0001 secs, 0.0001 secs, 0.0006 secs

Status code distribution:
  [200]	200 responses

hey \
    -n 50 -c 1 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	10.4736 secs
  Slowest:	2.8200 secs
  Fastest:	0.0116 secs
  Average:	0.2095 secs
  Requests/sec:	4.7739


Response time histogram:
  0.012 [1]	|■
  0.292 [43]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.573 [2]	|■■
  0.854 [2]	|■■
  1.135 [1]	|■
  1.416 [0]	|
  1.697 [0]	|
  1.977 [0]	|
  2.258 [0]	|
  2.539 [0]	|
  2.820 [1]	|■


Latency distribution:
  10% in 0.0215 secs
  25% in 0.0398 secs
  50% in 0.0904 secs
  75% in 0.2085 secs
  90% in 0.4544 secs
  95% in 0.9240 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0150 secs, 0.0116 secs, 2.8200 secs
  DNS-lookup:	0.0004 secs, 0.0003 secs, 0.0022 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0003 secs
  resp wait:	0.1941 secs, 0.0084 secs, 2.7740 secs
  resp read:	0.0003 secs, 0.0001 secs, 0.0037 secs

Status code distribution:
  [200]	50 responses

hey \
    -n 16 -c 8 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	2.0559 secs
  Slowest:	1.9955 secs
  Fastest:	0.0570 secs
  Average:	1.0127 secs
  Requests/sec:	7.7826


Response time histogram:
  0.057 [1]	|■■■■■
  0.251 [7]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.445 [0]	|
  0.639 [0]	|
  0.832 [0]	|
  1.026 [0]	|
  1.220 [0]	|
  1.414 [0]	|
  1.608 [0]	|
  1.802 [0]	|
  1.996 [8]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 0.0593 secs
  25% in 0.0601 secs
  50% in 1.9162 secs
  75% in 1.9707 secs
  90% in 1.9955 secs
  0% in 0.0000 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0140 secs, 0.0570 secs, 1.9955 secs
  DNS-lookup:	0.0017 secs, 0.0001 secs, 0.0032 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0003 secs
  resp wait:	0.9971 secs, 0.0330 secs, 1.9879 secs
  resp read:	0.0004 secs, 0.0000 secs, 0.0013 secs

Status code distribution:
  [200]	16 responses

 hey \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "echo-1"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	18.5190 secs
  Slowest:	5.0074 secs
  Fastest:	0.0267 secs
  Average:	4.3005 secs
  Requests/sec:	10.7997

  Total data:	200600 bytes
  Size/request:	1003 bytes

Response time histogram:
  0.027 [1]	|
  0.525 [2]	|■
  1.023 [0]	|
  1.521 [8]	|■■
  2.019 [9]	|■■
  2.517 [7]	|■■
  3.015 [9]	|■■
  3.513 [10]	|■■■
  4.011 [3]	|■
  4.509 [2]	|■
  5.007 [149]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 2.0682 secs
  25% in 4.3698 secs
  50% in 4.9992 secs
  75% in 5.0003 secs
  90% in 5.0012 secs
  95% in 5.0018 secs
  99% in 5.0058 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0076 secs, 0.0267 secs, 5.0074 secs
  DNS-lookup:	0.0013 secs, 0.0002 secs, 0.0070 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0007 secs
  resp wait:	4.2928 secs, 0.0135 secs, 5.0043 secs
  resp read:	0.0001 secs, 0.0000 secs, 0.0003 secs

Status code distribution:
  [200]	200 responses

hey \
    -n 50 -c 2 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	9.9623 secs
  Slowest:	2.1692 secs
  Fastest:	0.0177 secs
  Average:	0.3974 secs
  Requests/sec:	5.0189


Response time histogram:
  0.018 [1]	|■■
  0.233 [21]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.448 [19]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.663 [0]	|
  0.878 [4]	|■■■■■■■■
  1.093 [1]	|■■
  1.309 [0]	|
  1.524 [1]	|■■
  1.739 [0]	|
  1.954 [2]	|■■■■
  2.169 [1]	|■■


Latency distribution:
  10% in 0.0415 secs
  25% in 0.0680 secs
  50% in 0.3625 secs
  75% in 0.4159 secs
  90% in 1.0482 secs
  95% in 1.8283 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0151 secs, 0.0177 secs, 2.1692 secs
  DNS-lookup:	0.0005 secs, 0.0001 secs, 0.0015 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0003 secs
  resp wait:	0.3813 secs, 0.0131 secs, 2.1538 secs
  resp read:	0.0009 secs, 0.0001 secs, 0.0175 secs

Status code distribution:
  [200]	50 responses

Caching OFF hey output

hey \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Error distribution:
  [1]	Post "https://localhost:2746/api/v1/workflows/argo-test": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

hey \
    -n 50 -c 1 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	116.0186 secs
  Slowest:	3.0743 secs
  Fastest:	0.8805 secs
  Average:	2.3204 secs
  Requests/sec:	0.4310


Response time histogram:
  0.881 [1]	|■
  1.100 [0]	|
  1.319 [1]	|■
  1.539 [0]	|
  1.758 [0]	|
  1.977 [0]	|
  2.197 [0]	|
  2.416 [46]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  2.636 [0]	|
  2.855 [1]	|■
  3.074 [1]	|■


Latency distribution:
  10% in 2.3432 secs
  25% in 2.3488 secs
  50% in 2.3497 secs
  75% in 2.3521 secs
  90% in 2.3539 secs
  95% in 2.8334 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0046 secs, 0.8805 secs, 3.0743 secs
  DNS-lookup:	0.0005 secs, 0.0003 secs, 0.0014 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0001 secs
  resp wait:	2.3155 secs, 0.8690 secs, 3.0706 secs
  resp read:	0.0001 secs, 0.0000 secs, 0.0007 secs

Status code distribution:
  [200]	50 responses

hey \
    -n 10 -c 2 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	22.0162 secs
  Slowest:	5.2764 secs
  Fastest:	3.1682 secs
  Average:	4.3682 secs
  Requests/sec:	0.4542


Response time histogram:
  3.168 [1]	|■■■■■■■■■■■■■
  3.379 [1]	|■■■■■■■■■■■■■
  3.590 [0]	|
  3.801 [0]	|
  4.011 [0]	|
  4.222 [2]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■
  4.433 [1]	|■■■■■■■■■■■■■
  4.644 [1]	|■■■■■■■■■■■■■
  4.855 [1]	|■■■■■■■■■■■■■
  5.066 [0]	|
  5.276 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 3.2170 secs
  25% in 4.1037 secs
  50% in 4.4375 secs
  75% in 5.2165 secs
  90% in 5.2764 secs
  0% in 0.0000 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0101 secs, 3.1682 secs, 5.2764 secs
  DNS-lookup:	0.0010 secs, 0.0004 secs, 0.0025 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0001 secs
  resp wait:	4.3578 secs, 3.1597 secs, 5.2671 secs
  resp read:	0.0002 secs, 0.0001 secs, 0.0006 secs

Status code distribution:
  [200]	10 responses

 hey \
    -n 16 -c 8 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	36.8523 secs
  Slowest:	19.6692 secs
  Fastest:	16.5928 secs
  Average:	18.0247 secs
  Requests/sec:	0.4342


Response time histogram:
  16.593 [1]	|■■■■■■■■■■
  16.900 [0]	|
  17.208 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  17.516 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  17.823 [0]	|
  18.131 [0]	|
  18.439 [0]	|
  18.746 [4]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  19.054 [2]	|■■■■■■■■■■■■■■■■■■■■
  19.362 [0]	|
  19.669 [2]	|■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 17.1318 secs
  25% in 17.2294 secs
  50% in 18.4704 secs
  75% in 18.7662 secs
  90% in 19.6692 secs
  0% in 0.0000 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0123 secs, 16.5928 secs, 19.6692 secs
  DNS-lookup:	0.0013 secs, 0.0003 secs, 0.0021 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0002 secs
  resp wait:	18.0119 secs, 16.5720 secs, 19.6658 secs
  resp read:	0.0002 secs, 0.0001 secs, 0.0008 secs

Status code distribution:
  [200]	16 responses

hey \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test"
            },
            "spec": {
                "workflowTemplateRef": {"name": "echo-1"},
                "arguments": {}
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	48.5225 secs
  Slowest:	12.7513 secs
  Fastest:	6.5247 secs
  Average:	11.7437 secs
  Requests/sec:	4.1218

  Total data:	200600 bytes
  Size/request:	1003 bytes

Response time histogram:
  6.525 [1]	|
  7.147 [5]	|■■
  7.770 [2]	|■
  8.393 [1]	|
  9.015 [0]	|
  9.638 [7]	|■■
  10.261 [9]	|■■■
  10.883 [13]	|■■■■
  11.506 [12]	|■■■■
  12.129 [33]	|■■■■■■■■■■■
  12.751 [117]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 9.9773 secs
  25% in 12.0355 secs
  50% in 12.4015 secs
  75% in 12.4993 secs
  90% in 12.5020 secs
  95% in 12.5169 secs
  99% in 12.6785 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0083 secs, 6.5247 secs, 12.7513 secs
  DNS-lookup:	0.0008 secs, 0.0002 secs, 0.0030 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0013 secs
  resp wait:	11.7352 secs, 6.5076 secs, 12.7478 secs
  resp read:	0.0001 secs, 0.0001 secs, 0.0020 secs

Status code distribution:
  [200]	200 responses

Informer Hey outputs

hey \
    -n 50 -c 1 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test",
                "labels": {
                    "workflows.argoproj.io/benchmark": "true"
                }
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {},
                "podMetadata": {
                    "labels": {
                        "workflows.argoproj.io/benchmark": "true"
                    }
                }
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	1.3652 secs
  Slowest:	0.0959 secs
  Fastest:	0.0093 secs
  Average:	0.0273 secs
  Requests/sec:	36.6243


Response time histogram:
  0.009 [1]	|■■
  0.018 [11]	|■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.027 [17]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.035 [8]	|■■■■■■■■■■■■■■■■■■■
  0.044 [10]	|■■■■■■■■■■■■■■■■■■■■■■■■
  0.053 [2]	|■■■■■
  0.061 [0]	|
  0.070 [0]	|
  0.079 [0]	|
  0.087 [0]	|
  0.096 [1]	|■■


Latency distribution:
  10% in 0.0111 secs
  25% in 0.0185 secs
  50% in 0.0249 secs
  75% in 0.0377 secs
  90% in 0.0408 secs
  95% in 0.0524 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0096 secs, 0.0093 secs, 0.0959 secs
  DNS-lookup:	0.0005 secs, 0.0003 secs, 0.0013 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0001 secs
  resp wait:	0.0174 secs, 0.0060 secs, 0.0581 secs
  resp read:	0.0002 secs, 0.0001 secs, 0.0036 secs

Status code distribution:
  [200]	50 responses

hey \
    -n 16 -c 8 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test",
                "labels": {
                    "workflows.argoproj.io/benchmark": "true"
                }
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {},
                "podMetadata": {
                    "labels": {
                        "workflows.argoproj.io/benchmark": "true"
                    }
                }
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	0.0608 secs
  Slowest:	0.0422 secs
  Fastest:	0.0159 secs
  Average:	0.0290 secs
  Requests/sec:	263.1996


Response time histogram:
  0.016 [1]	|■■■■■■■■
  0.019 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■
  0.021 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■
  0.024 [1]	|■■■■■■■■
  0.026 [0]	|
  0.029 [0]	|
  0.032 [0]	|
  0.034 [0]	|
  0.037 [0]	|
  0.040 [5]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.042 [3]	|■■■■■■■■■■■■■■■■■■■■■■■■


Latency distribution:
  10% in 0.0163 secs
  25% in 0.0193 secs
  50% in 0.0389 secs
  75% in 0.0393 secs
  90% in 0.0422 secs
  0% in 0.0000 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0091 secs, 0.0159 secs, 0.0422 secs
  DNS-lookup:	0.0012 secs, 0.0002 secs, 0.0020 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0005 secs
  resp wait:	0.0190 secs, 0.0098 secs, 0.0280 secs
  resp read:	0.0002 secs, 0.0000 secs, 0.0010 secs

Status code distribution:
  [200]	16 responses

hey \
    -n 50 -c 2 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test",
                "labels": {
                    "workflows.argoproj.io/benchmark": "true"
                }
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {},
                "podMetadata": {
                    "labels": {
                        "workflows.argoproj.io/benchmark": "true"
                    }
                }
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	0.3063 secs
  Slowest:	0.0279 secs
  Fastest:	0.0083 secs
  Average:	0.0119 secs
  Requests/sec:	163.2477


Response time histogram:
  0.008 [1]	|■■
  0.010 [21]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.012 [17]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.014 [3]	|■■■■■■
  0.016 [2]	|■■■■
  0.018 [1]	|■■
  0.020 [3]	|■■■■■■
  0.022 [0]	|
  0.024 [0]	|
  0.026 [0]	|
  0.028 [2]	|■■■■


Latency distribution:
  10% in 0.0089 secs
  25% in 0.0096 secs
  50% in 0.0104 secs
  75% in 0.0122 secs
  90% in 0.0185 secs
  95% in 0.0279 secs
  0% in 0.0000 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0043 secs, 0.0083 secs, 0.0279 secs
  DNS-lookup:	0.0004 secs, 0.0000 secs, 0.0021 secs
  req write:	0.0000 secs, 0.0000 secs, 0.0001 secs
  resp wait:	0.0075 secs, 0.0054 secs, 0.0174 secs
  resp read:	0.0001 secs, 0.0000 secs, 0.0002 secs

Status code distribution:
  [200]	50 responses

hey \
    -n 200 -c 50 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test",
                "labels": {
                    "workflows.argoproj.io/benchmark": "true"
                }
            },
            "spec": {
                "workflowTemplateRef": {"name": "20-echos"},
                "arguments": {},
                "podMetadata": {
                    "labels": {
                        "workflows.argoproj.io/benchmark": "true"
                    }
                }
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	0.3854 secs
  Slowest:	0.1707 secs
  Fastest:	0.0137 secs
  Average:	0.0833 secs
  Requests/sec:	518.9769


Response time histogram:
  0.014 [1]	|■
  0.029 [1]	|■
  0.045 [13]	|■■■■■■■■■■
  0.061 [21]	|■■■■■■■■■■■■■■■■
  0.076 [49]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.092 [54]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.108 [27]	|■■■■■■■■■■■■■■■■■■■■
  0.124 [14]	|■■■■■■■■■■
  0.139 [13]	|■■■■■■■■■■
  0.155 [6]	|■■■■
  0.171 [1]	|■


Latency distribution:
  10% in 0.0480 secs
  25% in 0.0686 secs
  50% in 0.0810 secs
  75% in 0.0956 secs
  90% in 0.1353 secs
  95% in 0.1381 secs
  99% in 0.1471 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0235 secs, 0.0137 secs, 0.1707 secs
  DNS-lookup:	0.0009 secs, 0.0000 secs, 0.0037 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0010 secs
  resp wait:	0.0592 secs, 0.0056 secs, 0.1481 secs
  resp read:	0.0005 secs, 0.0000 secs, 0.0089 secs

Status code distribution:
  [200]	200 responses

hey \
    -n 200 -c 50 \
    -m POST \
    -disable-keepalive \
    -T "application/json" \
    -d '{
        "serverDryRun": false,
        "workflow": {
            "metadata": {
                "generateName": "curl-echo-test-",
                "namespace": "argo-test",
                "labels": {
                    "workflows.argoproj.io/benchmark": "true"
                }
            },
            "spec": {
                "workflowTemplateRef": {"name": "echo-1"},
                "arguments": {},
                "podMetadata": {
                    "labels": {
                        "workflows.argoproj.io/benchmark": "true"
                    }
                }
            }
        }
        }' \
    https://localhost:2746/api/v1/workflows/argo-test

Summary:
  Total:	0.1556 secs
  Slowest:	0.0568 secs
  Fastest:	0.0171 secs
  Average:	0.0362 secs
  Requests/sec:	1285.0704

  Total data:	230400 bytes
  Size/request:	1152 bytes

Response time histogram:
  0.017 [1]	|■
  0.021 [2]	|■■
  0.025 [7]	|■■■■■■
  0.029 [29]	|■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.033 [44]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.037 [32]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.041 [33]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.045 [22]	|■■■■■■■■■■■■■■■■■■■■
  0.049 [13]	|■■■■■■■■■■■■
  0.053 [9]	|■■■■■■■■
  0.057 [8]	|■■■■■■■


Latency distribution:
  10% in 0.0277 secs
  25% in 0.0299 secs
  50% in 0.0356 secs
  75% in 0.0418 secs
  90% in 0.0480 secs
  95% in 0.0524 secs
  99% in 0.0566 secs

Details (average, fastest, slowest):
  DNS+dialup:	0.0184 secs, 0.0171 secs, 0.0568 secs
  DNS-lookup:	0.0008 secs, 0.0000 secs, 0.0024 secs
  req write:	0.0001 secs, 0.0000 secs, 0.0011 secs
  resp wait:	0.0175 secs, 0.0063 secs, 0.0279 secs
  resp read:	0.0001 secs, 0.0000 secs, 0.0006 secs

Status code distribution:
  [200]	200 responses

Joibel · 2024-10-14T14:41:51Z

pkg/apiclient/argo-kube-client.go

+}
+
+func (a *argoKubeClient) startStores(restConfig *restclient.Config, namespace string) error {
+	if a.opts.UseCaching {


UseCaching appears to be always false

This was the intention - not to introduce breaking change. In the same time my team is using argoKubeClient in code and we would like to enable caching here. The code that depends on this is tested - it's basically server code.

I'm not sure why you consider the caching version a breaking change? What does it break?

This PR is marked as a performance improvement, but doesn't improve the performance of the product, only of your usage of it as a go-client? Why wouldn't everyone want this enabled? It uses more memory...

The problem I'm facing is that there is little testing happening in pkg/apiclient.
I could expose this option in CLI to run e2e, to make it more testable. However I don't think this option make sense in CLI. Informer would simply make startup time longer - in very specific conditions this could make some difference. Even in such case you could simply connect to server that has caching enabled by default, instead of using k8s connection.

This PR is marked as a performance improvement, but doesn't improve the performance of the product, only of your usage of it as a go-client? Why wouldn't everyone want this enabled? It uses more memory..

This is enabled by default for argo server and all tests for argo server are using this imrovement. This part disables it for argocli user-facing commands in a case that you are using kubectl connection. So if you use argocli for submitting workflow you won't need to wait for informer to synchronise all templates.

@Joibel can you re-review this PR. I did answer your comments - if this is not enough please let me know.

Joibel · 2024-10-15T13:48:06Z

pkg/apiclient/argo-kube-client.go

@@ -37,14 +37,34 @@ var (
 	NoArgoServerErr               = fmt.Errorf("this is impossible if you are not using the Argo Server, see %s", help.CLI())
 )

+type ArgoKubeOpts struct {


This struct is never used as initialised in this code, nor are there any tests for UseCaching = true.

I believe this might be "for the future" but please could it not be included in this PR and saved for a future one until it's tested and used.

It's for the code for programatic users of argo-kube-client. As there are no tests for such usage. It doesn't make sense to enable it by default, since it will be enabled in cli, which doesn't make sense (we don't want to have informer in cli).
Second use case for this structure is SDK, and this config depends on use-case. For one-time submit it doesn't make sense, for long-running process - it does. However I'm not sure which segment we are targeting, I'm assuming that it's betterr to keep default as is.
Server doesn't use this code for its startup and it's enabled there, as there are obvious benefits there.

test/benchmarks/README.md

server/workflowtemplate/informer.go

server/workflowtemplate/wf_client_store.go

tooptoop4 · 2024-10-29T19:42:32Z

does #13763 affect this?

jakkubu · 2024-10-31T18:24:09Z

does #13763 affect this?

I don't think so. This PR is adding informer to server, whilst the root issue (as identified by @Joibel in his comment) is in controller logic.

jakkubu · 2024-11-04T17:00:55Z

New benchmarking implementation results.

How to run:

make BenchmarkArgoServer

Results:

original (): 890ms (890185243 ns/op)
increase benchmark time to 20s, to have more executions, with default one there were only 2
after adding informer 9ms 9067016 ns/op
using default time, as with 20s, there were over 3k workflows created, which might skew the result

This is in line with manual benchmarks, which shows ~100x improvement using informer platform.

 make BenchmarkArgoServer
GIT_COMMIT=6dfb464b60ed7149132fc188af6a6168b7ea46f1 GIT_BRANCH=add-server-informer GIT_TAG=untagged GIT_TREE_STATE=dirty RELEASE_TAG=false DEV_BRANCH=true VERSION=latest
KUBECTX=k3d-k3s-default DOCKER_DESKTOP=false K3D=true DOCKER_PUSH=false TARGET_PLATFORM=linux/arm64
RUN_MODE=local PROFILE=minimal AUTH_MODE=hybrid SECURE=false  STATIC_FILES=false ALWAYS_OFFLOAD_NODE_STATUS=false UPPERIO_DB_DEBUG=0 LOG_LEVEL=debug NAMESPACED=true
go test --tags api,cli,cron,executor,examples,corefunctional,functional,plugins ./test/e2e -run='BenchmarkArgoServer' -benchmem -bench 'BenchmarkArgoServer'  .
WARN[0000] Non-transient error: <nil>                   
WARN[0000] Non-transient error: <nil>                   
Creating workflow template multiple-ref-echo-1
Creating workflow template multiple-ref-echo-2
Creating workflow template multiple-ref-main
goos: linux
goarch: arm64
pkg: github.com/argoproj/argo-workflows/v3/test/e2e
BenchmarkArgoServer/Submit_workflow_with_multiple_refs-12                    118           9067016 ns/op          137881 B/op        243 allocs/op
--- BENCH: BenchmarkArgoServer/Submit_workflow_with_multiple_refs-12
    printer.go:116: POST /api/v1/workflows/argo HTTP/1.1
        Host: localhost:2746
        Authorization: Bearer [REDACTED]
        
        {
                                                "workflow": {
                                                        "metadata": {
                                                                "generateName": "create-wf-from-template-benchmark-",
                                                                "labels": {
                                                                        "workflows.argoproj.io/benchmark": "true",
        ... [output truncated]
PASS
ok      github.com/argoproj/argo-workflows/v3/test/e2e  6.547s

During template validation k8s API is called for each templateRef. For complex workflows with many refs it creates huge overhead. Let's use informer for getting templates and use old mechanism as fallback Signed-off-by: Jakub Buczak <[email protected]>

Signed-off-by: Jakub Buczak <[email protected]>

…late server Signed-off-by: Jakub Buczak <[email protected]>

Signed-off-by: Jakub Buczak <[email protected]>

Remove Lister() method (as informer don't support full k8s list options) Signed-off-by: Jakub Buczak <[email protected]>

Signed-off-by: Jakub Buczak <[email protected]>

fix not starting clusterWftmpl Informer in server add more descriptive client store naming Signed-off-by: Jakub Buczak <[email protected]>

Pass created client stores in tests Signed-off-by: Jakub Buczak <[email protected]>

Signed-off-by: Jakub Buczak <[email protected]>

Enable single benchmark run Signed-off-by: Jakub Buczak <[email protected]>

jakkubu commented Sep 27, 2024

View reviewed changes

pkg/apiclient/argo-kube-client.go Outdated Show resolved Hide resolved

jakkubu changed the title ~~Add workflow template informer to server~~ perf: Add workflow template informer to server Sep 27, 2024

blkperl added the area/server label Oct 2, 2024

jakkubu force-pushed the add-server-informer branch 3 times, most recently from 9dac1ce to 3d90e33 Compare October 9, 2024 08:07

agilgur5 mentioned this pull request Oct 9, 2024

perf: Add template validation caching #13633

Closed

jakkubu force-pushed the add-server-informer branch from fbf13ba to 7221f93 Compare October 10, 2024 07:24

jakkubu force-pushed the add-server-informer branch 5 times, most recently from 653023b to f1f89a9 Compare October 11, 2024 10:07

jakkubu marked this pull request as ready for review October 11, 2024 10:50

jakkubu force-pushed the add-server-informer branch from f1f89a9 to 2659d1a Compare October 14, 2024 13:32

Joibel requested changes Oct 15, 2024

View reviewed changes

jakkubu force-pushed the add-server-informer branch from 2659d1a to 21892f1 Compare October 17, 2024 11:12

jakkubu force-pushed the add-server-informer branch from 12b9b94 to 6dfb464 Compare October 31, 2024 18:06

jakkubu force-pushed the add-server-informer branch from f728e9b to 1fa3f69 Compare November 4, 2024 16:30

jakkubu force-pushed the add-server-informer branch from 1fa3f69 to 49bd1f4 Compare November 4, 2024 17:02

jakkubu requested review from Joibel and agilgur5 November 6, 2024 05:56

jakkubu added 4 commits November 7, 2024 15:18

perf: Add workflow template informer to workflow template server

d40eada

Signed-off-by: Jakub Buczak <[email protected]>

perf: Add workflow template informer to cron workflow server

d8fd2e5

Signed-off-by: Jakub Buczak <[email protected]>

perf: Add cluster workflow template informer

b3dca2e

Signed-off-by: Jakub Buczak <[email protected]>

jakkubu added 10 commits November 7, 2024 15:18

perf: Add cluster workflow template informer to workflow template server

8bbf84b

Signed-off-by: Jakub Buczak <[email protected]>

perf: Add cluster workflow template informer to cron workflow server

36ff0f1

Signed-off-by: Jakub Buczak <[email protected]>

perf: Add cluster workflow template informer to cluster workflow temp…

54310f3

…late server Signed-off-by: Jakub Buczak <[email protected]>

perf: Add (Custer)WorkflowTemplateStore implementation using wfClient

b591ee3

Signed-off-by: Jakub Buczak <[email protected]>

perf: Use template store for all viable get requests

997257d

Remove Lister() method (as informer don't support full k8s list options) Signed-off-by: Jakub Buczak <[email protected]>

perf: Add benchmarks workflows + instructions

312fdc4

Signed-off-by: Jakub Buczak <[email protected]>

perf: Add kube-client-opts for enabling caching

f528d25

fix not starting clusterWftmpl Informer in server add more descriptive client store naming Signed-off-by: Jakub Buczak <[email protected]>

perf: Remove default template store implementation

063cdfc

Pass created client stores in tests Signed-off-by: Jakub Buczak <[email protected]>

fix: remove leftover comments and blank lines

d9f5678

Signed-off-by: Jakub Buczak <[email protected]>

perf: automatic benchmarks for submitting multiple-ref workflows

7e947ff

Enable single benchmark run Signed-off-by: Jakub Buczak <[email protected]>

jakkubu force-pushed the add-server-informer branch from 49bd1f4 to 7e947ff Compare November 7, 2024 14:19

Joibel self-assigned this Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Add workflow template informer to server #13672

perf: Add workflow template informer to server #13672

jakkubu commented Sep 27, 2024 •

edited

Loading

jakkubu commented Oct 10, 2024

Manual Caching hey output

Caching OFF hey output

Informer Hey outputs

Joibel Oct 14, 2024

jakkubu Oct 17, 2024 •

edited

Loading

Joibel Oct 17, 2024

jakkubu Oct 17, 2024

jakkubu Oct 25, 2024

jakkubu Oct 31, 2024

Joibel Oct 15, 2024

jakkubu Oct 17, 2024 •

edited

Loading

tooptoop4 commented Oct 29, 2024

jakkubu commented Oct 31, 2024

jakkubu commented Nov 4, 2024

perf: Add workflow template informer to server #13672

Are you sure you want to change the base?

perf: Add workflow template informer to server #13672

Conversation

jakkubu commented Sep 27, 2024 • edited Loading

Motivation

Modifications

Verification

jakkubu commented Oct 10, 2024

Benchmarking multiple-ref template creation

Setup

Results

Appendix

Manual Caching hey output

Caching OFF hey output

Informer Hey outputs

Joibel Oct 14, 2024

Choose a reason for hiding this comment

jakkubu Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Joibel Oct 17, 2024

Choose a reason for hiding this comment

jakkubu Oct 17, 2024

Choose a reason for hiding this comment

jakkubu Oct 25, 2024

Choose a reason for hiding this comment

jakkubu Oct 31, 2024

Choose a reason for hiding this comment

Joibel Oct 15, 2024

Choose a reason for hiding this comment

jakkubu Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

tooptoop4 commented Oct 29, 2024

jakkubu commented Oct 31, 2024

jakkubu commented Nov 4, 2024

jakkubu commented Sep 27, 2024 •

edited

Loading

jakkubu Oct 17, 2024 •

edited

Loading

jakkubu Oct 17, 2024 •

edited

Loading