You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This totally brakes the cache policy “cache with validation”.
Why? → It causes revisiting users to unnecessarily re-download (HTTP 200) instead of using their local cache (HTTP 304 Not Modified). Because overwriting changes the Last-Modified. And in consequence also the ETag changes, b/c W3TC calculates the ETag from Last-Modified and Size (see: Bug in detail).
Proposed Fix
Garbage collector visits URL-X. Loads output into memory and hashes it.
Garbage collector compares hash of URL-X to hash of cached URL-X.
If the hashes are identical, content is considered unchanged, and skips to the next URL.
If the hashes differ, it overwrites URL-X and notes down its hash.
Where/how exactly is up to your developer expertise ofc! Ideas from me as a layman:
Either in the file metadata (part of filename or in xattr). This means almost no extra load as the file is accessed anyways.
Or separate from the cache-files (possibly utilizing RAM based caching pool too). I dunno what’s more efficient. Am tech-savvy but no dev.
Note: W3TC "Cache Preload" does exactly that already. Maybe you do not need to develop anything nw, but just need to ensure that the Garbage Collector uses the same functions.
Bug in detail
/wp-content/cache/page_enhanced/.htaccess has the directive:
FileETag MTime Size
Lets think what this means for totally unchanged content fetched by the Garbage Collector:
Size between cached and current remains unchanged.
But because the Garbage Calculator stubbornly always overwrite, the MTime of the file changes.
So the Last-Modified HTTP header changes.
And in consequence its ETag HTTP header changes too, because the FileETag directive in W3TC’s setup means that the ETag gets calculated from a combination of MTime and Size.
❗️ So although NOTHING changed, the cached content gets overwritten.
❗️ All revisiting users, which have the page in their local cache, unnecessarily reload the very same page again.
Notes on the Hash
As W3TC puts <!-- Debug/stats --> in the last lines of HTML files, these should never be included for calculating the hash, as this “everchanging noise” would ofc always result in a different hash. But my proposal never hashes full files anyways, but only the pure HTML as output by the live CMS.
In step 1 the pure output from the CMS is in memory and gets hashed.
In step 4 the hash of that pure HTML content (debug lines not included!) is noted down.
In step 2 that hash noted for the cached content as it came from the CMS unaltered is used in the comparison.
So this should be conflict free.
My Website Environment
Small personal portfolio website
~ 100 pages, predicting max +5 per year
~ 10 blogposts, predicting +30 per year
Main domain caching only HTML + all plugin/theme assets (CSS/JS/fonts/…)
Hardware: 100% SSD-Hosting, High-End Proliant Servers by HP
W3TC fundamentals
Page Cache: Disk: Enhanced
Opcode Cache: Zend Opcache — Speedup in Backend is on average 2x, sometimes peeking at 3x, great efficiency improvement!
Object Cache: Memcached
Database Cache: OFF — b/c I guess the 16 MB are already utilized enough by the Object Cache
Varnish: OFF during debugging to not complicate things. But when I tested it it again was an extra great speedup (reducing latency from ca 80-100ms to 20ms).
My Website Usability and Performance Goals
Load is not a concern for now. Not a motivation for caching.
Want my users to browse fast. Main motivation for caching.
Don’t want the first visitor of a new or purged cached page to await page cache generation.
Hence: Page Cache → Cache Preload:
a) ☑︎ Automatically prime the page cache
b) ☑︎ Preload the post cache upon publish events
Want my users to always have a chance to get the most recent version of a page/post.
a) Settings from 3 are enough to provide this for first time visitors.
b) But by themselves NOT enough for re-visiting users!
Hence: Browser Cache → HTML/XML → Cache-Policy:
b1) If “cache with max-age” and the content change occurs earlier than Max-Age kicking in, then the revisiting users will have the page as “fresh” in their Local Cache and hence will not re-validate and thus will show the outdated local version instead of the renewed content in the CMS.
b2) If “cache with validation” then revisiting users have:
CON: Needs a tiny bit longer than those trusting only Max-Age. Requesting, server sees Last-Modified and ETag are still sufficing with minimal latency, responds with HTTP 304 Not modified, and the browser shows the locally cached page. Only if it really changed it re-sends it.
PRO: Reliably always get the most recent content with only a minimal overhead: One roundtrip of HTTP metadata exchange, no content needs to be exchanged if unchanged.
BUG: That benefit is destroyed by the bug.
The text was updated successfully, but these errors were encountered:
Negative Consequences
Proposed Fix
Note: W3TC "Cache Preload" does exactly that already. Maybe you do not need to develop anything nw, but just need to ensure that the Garbage Collector uses the same functions.
Bug in detail
/wp-content/cache/page_enhanced/.htaccess
has the directive:Lets think what this means for totally unchanged content fetched by the Garbage Collector:
Notes on the Hash
<!-- Debug/stats -->
in the last lines of HTML files, these should never be included for calculating the hash, as this “everchanging noise” would ofc always result in a different hash. But my proposal never hashes full files anyways, but only the pure HTML as output by the live CMS.My Website Environment
Small personal portfolio website
Shared Hosting which includes
W3TC fundamentals
My Website Usability and Performance Goals
Load is not a concern for now. Not a motivation for caching.
Want my users to browse fast. Main motivation for caching.
Don’t want the first visitor of a new or purged cached page to await page cache generation.
Hence: Page Cache → Cache Preload:
a) ☑︎ Automatically prime the page cache
b) ☑︎ Preload the post cache upon publish events
a) Settings from 3 are enough to provide this for first time visitors.
b) But by themselves NOT enough for re-visiting users!
Hence: Browser Cache → HTML/XML → Cache-Policy:
b1) If “cache with max-age” and the content change occurs earlier than Max-Age kicking in, then the revisiting users will have the page as “fresh” in their Local Cache and hence will not re-validate and thus will show the outdated local version instead of the renewed content in the CMS.
b2) If “cache with validation” then revisiting users have:
CON: Needs a tiny bit longer than those trusting only Max-Age. Requesting, server sees Last-Modified and ETag are still sufficing with minimal latency, responds with HTTP 304 Not modified, and the browser shows the locally cached page. Only if it really changed it re-sends it.
PRO: Reliably always get the most recent content with only a minimal overhead: One roundtrip of HTTP metadata exchange, no content needs to be exchanged if unchanged.
BUG: That benefit is destroyed by the bug.
The text was updated successfully, but these errors were encountered: