Euro 2024: Using Fastly Compute: trying to add some padding after sch…

…emas
BedrockStreaming · Nov 8, 2024 · 07eefaa · 07eefaa
1 parent 7605ad3
commit 07eefaa
Showing 1 changed file with 7 additions and 0 deletions.
diff --git a/_posts/2024-11-07-compute-at-edge-personalize-static-pages.md b/_posts/2024-11-07-compute-at-edge-personalize-static-pages.md
@@ -47,6 +47,8 @@ flowchart LR
 
 <center><ins><strong>Schema 1: one BFF between frontend apps and backend APIs.</strong></ins></center>
 
+<br>
+
 With this architecture, BFF receives a lot of requests, and each of these causes up to dozens of requests to other services. This is greatly amplified by the fact that all BFF responses are personalized, which means we cannot cache entire responses.
 
 As most of our APIs are running in a Kubernetes cluster, you might say *“use auto-scaling!”*, and you’d be kind of right. We are using auto-scaling. But reactive auto-scaling, based on traffic or load metrics, is not fast enough to handle huge spikes in traffic -- like big football matches can cause. For this, we have developed a [pre-scaling mechanism](/2022/02/03/prescaling.html), and even [open-sourced it](/2022/09/01/kubernetes-prescaling-we-open-source-our-solution.html).
@@ -106,6 +108,7 @@ flowchart LR
 
 <center><ins><strong>Schema 2: doing work in front of the BFF.</strong></ins></center>
 
+<br>
 
 Our goal was to implement a lightweight personalization layer in front of our backend BFF application. The backend application would then only need to return non-personalized layouts -- and those would be cacheable.
 
@@ -140,6 +143,7 @@ flowchart LR
 
 <center><ins><strong>Schema 3: chaining a VCL and a compute service.</strong></ins></center>
 
+<br>
 
 Before starting to actually implement this, we talked with our contacts at Fastly, to confirm this approach made sense to them and their systems would be able to handle the load and traffic spikes we were expecting. They validated the concept, and noted we should shard our data over several KVStores, as each KVStore is limited to 1000 writes/second and 5000 reads/second -- good catch!
 
@@ -185,6 +189,7 @@ flowchart LR
 
 <center><ins><strong>Schema 4: pushing data asynchronously to Fastly’s KVStores.</strong></ins></center>
 
+<br>
 
 With this mechanism, data in the KVstores is updated after 1 or 2 seconds *(we could speed things up a little by not using batching when reading from DynamoDB Streams)*, which is fine for this use-case.
 
@@ -222,6 +227,7 @@ flowchart LR
 
 <center><ins><strong>Schema 5: storing non-personalized layouts on Amazon S3.</strong></ins></center>
 
+<br>
 
 Of course, doing this requires a bit more development work. We’ve had to setup a background cronjob to generate static layouts and store them on S3. But, keeping in mind our *“users must be able to start a stream”* goal, we estimated the potential gain on resiliency was worth it. Also, we already had a process to generate static layouts and push them to S3, so it wasn’t *that much* additional work.
 
@@ -243,6 +249,7 @@ During these load-tests as well as during real events later, we monitored a few
 
 <center><ins><strong>Schema 6: monitoring Fastly’s Compute during a load-test.</strong></ins></center>
 
+<br>
 
 On our backend’s application side, we also checked the number of calls per second was going down while it was going up on Compute at-edge. In practice, it went down to 0 for the `/live/` route, and remained stable or even went up for other routes, as there were more users browsing the catalog.