Bug: Garbage collector always overwrites cached file even if content unchanged #684

porg · 2023-05-08T14:47:50Z

Negative Consequences

This totally brakes the cache policy “cache with validation”.
Why? → It causes revisiting users to unnecessarily re-download (HTTP 200) instead of using their local cache (HTTP 304 Not Modified). Because overwriting changes the Last-Modified. And in consequence also the ETag changes, b/c W3TC calculates the ETag from Last-Modified and Size (see: Bug in detail).

Proposed Fix

Garbage collector visits URL-X. Loads output into memory and hashes it.
Garbage collector compares hash of URL-X to hash of cached URL-X.
If the hashes are identical, content is considered unchanged, and skips to the next URL.
If the hashes differ, it overwrites URL-X and notes down its hash.
- Where/how exactly is up to your developer expertise ofc! Ideas from me as a layman:
- Either in the file metadata (part of filename or in xattr). This means almost no extra load as the file is accessed anyways.
- Or separate from the cache-files (possibly utilizing RAM based caching pool too). I dunno what’s more efficient. Am tech-savvy but no dev.

Note: W3TC "Cache Preload" does exactly that already. Maybe you do not need to develop anything nw, but just need to ensure that the Garbage Collector uses the same functions.

Bug in detail

/wp-content/cache/page_enhanced/.htaccess has the directive:

FileETag MTime Size

Lets think what this means for totally unchanged content fetched by the Garbage Collector:

Size between cached and current remains unchanged.
But because the Garbage Calculator stubbornly always overwrite, the MTime of the file changes.
So the Last-Modified HTTP header changes.
And in consequence its ETag HTTP header changes too, because the FileETag directive in W3TC’s setup means that the ETag gets calculated from a combination of MTime and Size.
❗️ So although NOTHING changed, the cached content gets overwritten.
❗️ All revisiting users, which have the page in their local cache, unnecessarily reload the very same page again.

Notes on the Hash

As W3TC puts  in the last lines of HTML files, these should never be included for calculating the hash, as this “everchanging noise” would ofc always result in a different hash. But my proposal never hashes full files anyways, but only the pure HTML as output by the live CMS.
In step 1 the pure output from the CMS is in memory and gets hashed.
In step 4 the hash of that pure HTML content (debug lines not included!) is noted down.
In step 2 that hash noted for the cached content as it came from the CMS unaltered is used in the comparison.
So this should be conflict free.

My Website Environment

Small personal portfolio website

~ 100 pages, predicting max +5 per year
~ 10 blogposts, predicting +30 per year
Main domain caching only HTML + all plugin/theme assets (CSS/JS/fonts/…)
Media Library on subdomain, could get CDN one day

Shared Hosting which includes

OPcode — on/off ; timeout: 30 secs, 5 min, 1 hour, 4 hours
Caching pool 16 MB: Memcached OR Redis (both as UNIX socket)
Varnish — on/off ; timeout: 1, 3, 5, 15, 30, 60 mins
Hardware: 100% SSD-Hosting, High-End Proliant Servers by HP

W3TC fundamentals

Page Cache: Disk: Enhanced
Opcode Cache: Zend Opcache — Speedup in Backend is on average 2x, sometimes peeking at 3x, great efficiency improvement!
Object Cache: Memcached
Database Cache: OFF — b/c I guess the 16 MB are already utilized enough by the Object Cache
Varnish: OFF during debugging to not complicate things. But when I tested it it again was an extra great speedup (reducing latency from ca 80-100ms to 20ms).

My Website Usability and Performance Goals

Load is not a concern for now. Not a motivation for caching.
Want my users to browse fast. Main motivation for caching.
Don’t want the first visitor of a new or purged cached page to await page cache generation.

Hence: Page Cache → Cache Preload:
- a) ☑︎ Automatically prime the page cache
- b) ☑︎ Preload the post cache upon publish events

Want my users to always have a chance to get the most recent version of a page/post.

a) Settings from 3 are enough to provide this for first time visitors.
b) But by themselves NOT enough for re-visiting users!
- Hence: Browser Cache → HTML/XML → Cache-Policy:
- b1) If “cache with max-age” and the content change occurs earlier than Max-Age kicking in, then the revisiting users will have the page as “fresh” in their Local Cache and hence will not re-validate and thus will show the outdated local version instead of the renewed content in the CMS.
- b2) If “cache with validation” then revisiting users have:
  - CON: Needs a tiny bit longer than those trusting only Max-Age. Requesting, server sees Last-Modified and ETag are still sufficing with minimal latency, responds with HTTP 304 Not modified, and the browser shows the locally cached page. Only if it really changed it re-sends it.
  - PRO: Reliably always get the most recent content with only a minimal overhead: One roundtrip of HTTP metadata exchange, no content needs to be exchanged if unchanged.
  - BUG: That benefit is destroyed by the bug.

The text was updated successfully, but these errors were encountered:

porg mentioned this issue May 8, 2023

Bug: Cache Preloading not always kicking in after cache purge from publishing event #687

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Garbage collector always overwrites cached file even if content unchanged #684

Bug: Garbage collector always overwrites cached file even if content unchanged #684

porg commented May 8, 2023 •

edited

Loading

Bug: Garbage collector always overwrites cached file even if content unchanged #684

Bug: Garbage collector always overwrites cached file even if content unchanged #684

Comments

porg commented May 8, 2023 • edited Loading

Negative Consequences

Proposed Fix

Bug in detail

Notes on the Hash

My Website Environment

Small personal portfolio website

Shared Hosting which includes

W3TC fundamentals

My Website Usability and Performance Goals

porg commented May 8, 2023 •

edited

Loading