Caching oembed with nginx

Posting a link on Mastodon can have big impacts on our WordPress websites. Unlike other social media platforms, Mastodon isn’t one thing — it’s many independent servers (or instances) on the Fediverse, and there are other platforms on the Fediverse too. And posting a link on Mastodon will prompt a whole bunch of them to come and visit our blog. Donncha Ó Caoimh investigated this recently, and wrote about it on his blog. Of particular interest is that after hitting the blog post, each server also hits the wp-json REST endpoint for rendering oembed content, something that usually isn’t cached.

Having a page cache helps our website deal with the extra visits to our blog posts and other pages, by serving up a cached copy of the generated HTML. Any good caching plugin will do this well, but WP Super Cache does it very well and with minimal fuss. Recently, though, I switched from WP Super Cache to the nginx fastcgi cache — I need a little extra flexibility with what I want to cache and for how long. And that allows better handling of the oembed endpoint.

A standard configuration for WordPress on nginx fastcgi cache excludes any requests for wp-json, and any requests with query string arguments. That excludes the oembed endpoint on two counts — wp-json and query arguments. As in this example:

https://example.com/wp-json/oembed/1.0/embed?url=https%3A%2F%2Fexample.com%2Fmy-favourite-pony%2F

Here’s what the nginx config usually looks like for skipping the cache:

# start with assumption of caching
set $no_cache 0;

# don't use the cache for requests with query strings
if ($query_string != "") {
	set $no_cache 1;
}

# Don't cache URIs containing the following segments
if ($request_uri ~ "(?:^/wp-admin/|^/wp-json/|^/xmlrpc\.php|index\.php|^wp-.*\.php|/feed/|^/sitemap\.x[ms]l)") {
	set $no_cache 1;
}

# Don't use the cache for logged in users, recent commenters, shoppers
if ($http_cookie ~ "comment_author|wordpress_[a-f0-9]+|wp-postpass|wordpress_logged_in|edd_|_discount") {
	set $no_cache 1;
}

# Don't serve cached files or cache new content if $no_cache is set
fastcgi_cache_bypass $no_cache;
fastcgi_no_cache $no_cache;

We need to change that a little, so that some wp-json requests can be cached, and that some requests with query arguments can be cached. An easy way to do that is to use nginx maps. A map allows us to define a value based on some inputs, and it accepts regular expressions which makes maps very powerful. NB: maps need to be defined outside the server blocks in our nginx config — but don’t worry, they won’t be used unless we call them, so they have no impact on performance.

Here’s how we can tell nginx that some wp-json paths are OK to cache, and some paths can be cached with query arguments.

# any query strings will bust the cache, with path-based exceptions
map $request_uri $no_cache_args {
    default                         $args;
    ~^/glotpress/projects           "";
    ~^/wp-json/oembed/1.0/embed     "";
}

# override URLs that bust the cache, for some path-based exceptions
map $request_uri $no_cache_url {
	default							1;
	~^/wp-json/oembed/1.0/embed		0;
	~^/wp-json/wp/v2/(?:posts|pages)/ 0;
}

Now we can modify the no-cache configuration from before, using these two new values. Instead of testing for any query strings, we test our new $no_cache_args value; and when we detect a URI beginning with wp-json, we set $no_cache only if $no_cache_url is 1.

# start with assumption of caching
set $no_cache 0;

# don't use the cache for requests with query strings, with exceptions
if ($no_cache_args != "") {
	set $no_cache 1;
}

# Don't cache URIs containing the following segments
if ($request_uri ~ "(?:^/wp-admin/|^/wp-json/|^/xmlrpc\.php|index\.php|^wp-.*\.php|/feed/|^/sitemap\.x[ms]l)") {
	set $no_cache $no_cache_url;
}

# Don't use the cache for logged in users, recent commenters, shoppers
if ($http_cookie ~ "comment_author|wordpress_[a-f0-9]+|wp-postpass|wordpress_logged_in|edd_|_discount") {
	set $no_cache 1;
}

That’s the nginx configuration dealt with. Now, oembed requests will be cached, for as long as nginx allows regular pages to be cached. But I’m not so sure about caching oembed content long term — I typically cache pages and posts for seven days, and I couldn’t work out a tidy way to ask the Nginx Helper plugin to purge an oembed cache when its related post was updated. So let’s expire oembed caches fairly quickly. The majority of hits from the Fediverse happen in the first minute after posting a link, with some stragglers coming in over the next few minutes, so let’s set the expiry on oembeds to 10 minutes. We can do that by sending an X-Accel-Expires header with the oembed content. Here’s a little PHP snippet for that.

/**
 * set short cache expiry for wp-json oembed requests
 */
add_filter('oembed_request_post_id', function(int $post_id) : int {
	$custom_expires = 10 * MINUTE_IN_SECONDS;
	header("X-Accel-Expires: $custom_expires");

	return $post_id;
});

And with that, the job is done; when the hordes of Fediverse servers descend on our website after we post a link on Mastodon, both the page content and the oembed will be served from the cache. Any late comers after 10 minutes will still get the page from the cache, and they’ll reset the cache on the oembed content for another 10 minutes.