TLDR; The fix for this is to ensure you are forcing your CDN to properly handle “application/activity+json” in the Accept header vs anything else. In other words, you need to Vary on Accept, but it’s best to limit it to “application/activity+json” if you can.

With the release of ActivityPub 1.0.0 plugin for WordPress I hope we’ll see a surge in the number of WordPress sites that can be followed using your favorite ActivityPub based systems like Mastodon and others. However, if you are hosting your WordPress site on Cloudflare (and likely other CDNs) and you have activated full page caching you are going to have a difficult time integrating your blog with the greater Fediverse. This is because when an ActivityPub user on a service like Mastodon performs a search for your profile, that search will land on your WordPress author page looking for additional information in JSON format. If someone has visited your author page recently in a browser then there is the chance Mastodon will get HTML back instead resulting in a broken search. The reverse of this situation can happen too. If a Mastodon user has recently performed a search and later someone lands on your author page, they will see JSON instead of the expected results.

The cause of this is because Cloudflare doesn’t differentiate between a request looking for HTML or one looking for JSON, this information is not factored into how Cloudflare caches the page. Instead, it only sees the author page URL and determines that it is the same request and returns whatever it has. The good news is, with some effort, we can trick Cloudflare into considering what type of content the client is looking for while still allowing for full page caching. Luckily the ActivityPub has a nice undocumented feature to help work around this situation.

To fix this while keeping page caching you will need to use a Cloudflare worker to adjust the request if the Accept header contains “application/activity+json”. I assume you already have page caching in place and you do not have some other plugin on your site that would interfere with page caching, like batcache, WP SuperCache and more. For my site I use Cloudflare’s APO for WordPress and nothing else.

First, you will want to ensure that your “Caching Level” configuration is set to standard. Next, you will need to get setup for working with Cloudflare Workers. You can follow the official guide at https://developers.cloudflare.com/workers/. Next, create a new project, again using their documentation. Next, replace the index.js file contents with:

export default {
  async fetch(req) {
    const acceptHeader = req.headers.get('accept');
    const url = new URL(req.url);

    if (acceptHeader?.indexOf("application/activity+json") > -1) {
      url.searchParams.append("activitypub", "true");
    }

    return fetch(url.toString(), {
      cf: {
        // Always cache this fetch regardless of content type
        // for a max of 5 minutes before revalidating the resource
        cacheTtl: 300,
        cacheEverything: true,
      },
    });
  }
}

You can now publish this using wrangler publish. You can adjust the cacheTtl to something longer or shorter to suite your needs.

Last step is to associate the worker with the /author route of your WordPress site. For my setup I created a worker route of “*dustinrue/author*” and that was it. My site will now cache and return the correct content based on whether or not the Accept header contains “application/activity+json”.

Remember that Cloudflare Workers do cost money though I suspect a lot of small sites will easily fit into the free tier.

Update – I have also posted an alternative method that preserves the ability to have full page caching enabled. Please find it at WordPress ActivityPub and Cloudflare.

In a previous post I discussed how to deal with the fact the ActivityPub plugin for WordPress must return author pages in a different format depending on the value of the Accept header. A browser hitting an author page is going to be looking for HTML to be returned, while Mastodon will expect a JSON instead. If you use any kind of caching system be it a CDN, special plugin or combination of the two then you may run into an issue where the wrong content is being cached for each Accept header type. You might see this in your site health report with:

Your author URL does not return valid JSON for application/activity+json. Please check if your hosting supports alternate Accept headers.

In this post I will discuss a method for dealing with this while not totally losing the ability to cache the response. This is useful for busy sites or as a way to help mitigate some forms of DoS attack. The example I provide here is meant for Nginx with php-fpm but you can apply this same sort of thinking anywhere else where you have enough control over the configuration to make it work.

Assuming you followed the previous post and have created an exception for your author URL in your CDN then it is on your server to render author pages each time a request is made. This is a waste of resources and doesn’t provide an ideal experience for end users. To enable caching on this endpoint, we will leverage Nginx’s built in caching capability while setting the cache key based on the Accept header.

To start, let’s setup basic Nginx caching. At the top of your configuration file, outside of the server{} block, (advanced users can adjust as desired) add the following:

fastcgi_cache_path /etc/nginx/cache levels=1:2 keys_zone=wordpress:100m inactive=10m max_size=100m;

You can adjust the path if you want but in essence we are defining a path of /etc/nginx/cache with a name of wordpress. We are limiting it to 100MB and saying delete anything older than 10 minutes if it hasn’t been accessed. /etc/nginx/cache must exist and must be owned by the same user that runs Nginx. If you have multiple servers know that this cache is unlikely to be shared so each server will have a unique cache.

Next, add a map that to define what Accept headers we want to Vary on:

map $http_accept $vary_key {
  default "default";
  "~application/activity\+json" "json";
}

This block will create a new variable we can use later called $vary_key. Notice here that we will only create a different cache entry when application/activity+json is sent included in the list.

Now inside the server{} block for your site, let’s add a nice header we can use to ensure our caching is working properly. Adding add_header X-Nginx-Cache $upstream_cache_status; to this section will cause Nginx to output a header we can see to know the cache status. It will be BYPASS, MISS or HIT in response headers.

Next, inside the location block that is handling PHP requests, add the following config options:

# cache key
fastcgi_cache_key "$vary_key$host$request_method$request_uri";

# matches keys_zone in fastcgi_cache_path
fastcgi_cache wordpress;

# don't cache pages defined earlier
fastcgi_no_cache $no_cache;

#defines the default cache time
fastcgi_cache_valid any 10m;

# misc additional settings
fastcgi_cache_use_stale updating error timeout invalid_header http_500;
fastcgi_cache_lock on;
fastcgi_cache_lock_timeout 10s;

The next settings depend on what you want to do. If you are using a CDN and are only exposing author pages then you can use the following settings

# Cache nothing by default
set $no_cache 1;

# Only cache author pages
if ($request_uri ~* "/author/") {
  set $no_cache 0;
}

If instead, you want to cache everything your CDN might miss then you can use this (this is what I use):

# Cache everything by default
set $no_cache 0;

# Don't cache logged in users or commenters
if ( $http_cookie ~* "comment_author_|wordpress_(?!test_cookie)|wp-postpass_" ) {
  set $no_cache 1;
}

# Don't cache the following URLs
if ($request_uri ~* "/(wp-admin/|wp-login.php)") {
  set $no_cache 1;
}

If done correctly then hitting an author page will result in different results depending on the Accept header being used. To verify, take an author page and load it up in a browser. You should get a proper HTML page. Copy the URL out and, using curl, send the following:

curl -I https://dustinrue.com/author/ruedu/ -H "Accept: application/activity+json"

x-nginx-cache: MISS

Your Nginx cache status may already be HIT if someone recently searched for you. It should be HIT if you send the request again.

Debugging

It is important that you debug and properly resolve this endpoint. Failing to do so will result in failed searches of your author/user from ActivityPub clients. To be clear, the following must return different content:

curl -I https://dustinrue.com/author/ruedu
curl -I https://dustinrue.com/author/ruedu -H "Accept: application/activity+json"

Adjust these URLs for your author page URLs and ensure the first curl returns HTML content while the second one returns JSON content. While writing this post I noticed that the plugin is still outputting that the content type is text/html when it should say application/activity+json. Despite this inconsistency, clients will use the returned content.

If the curl calls are returning different content, next pay attention to the x-nginx-cache header to ensure that it is actually caching. You can add another utility header to your Nginx config to assist with this:

add_header x-accept $vary_key always;

This add_header will output what value the map landed on so you can ensure things are being picked up properly.

Conclusion

I hope this is enough to help guide you in improving your WordPress + ActivityPub experience.

Quick tip on a rather specific situation I found myself in though I believe it could come up for a lot of people using WordPress trying to integrate with ActivityPub networks. If you are:

  • Running WordPress
  • Using a page caching solution like Cloudflare APO or manually configured
  • Running an ActivityPub plugin and/or webfinger

Then you will likely run into an issue with your site not being reliably discoverable when searched for. Using Matthias Pfefferle‘s ActivityPub, Webfinger and Nodeinfo plugins to get your WordPress site exposed as an ActivityPub server will add a few routes to your site. One of the routes is the author pages of WordPress which exist at /author/<author username>. However, this path when hit with a browser will return HTML. ActivityPub instances on the other hand will be looking for a different content type called application/activity+json. Unfortunately, many caching layers will not provide a Vary on Accept which you will need in order to return different data depending on what type of content the requester is looking for.

To resolve this on my site, which uses Cloudflare for CDN, I added a page rule that disallows caching for my author page. This works because I am the only author on the site. A full “proper” solution would be to set a Vary on the Accept header for that path, which Cloudflare does not support.

You may want to be very specific about what Vary headers are used, on what paths and what you actually accept a Vary header on and so on. Allowing for a wide or unlimited range of values can result in people easily breaking cache at the CDN sending requests to your origin servers.

In a previous post I quickly mentioned that this site now has pfefferle’s ActivityPub plugin installed. This plugin implements enough of the ActivityPub and associated protocols to allow a WordPress site to look and behave a bit like a user on an ActivityPub compatible platform including Mastodon and more. By installing the plugin, you can search for an author of a site and then follow them so you can see a stream of their content whenever they post it. From there you can comment on the post and interact with it from within your favorite Fediverse platform.

In this post I quickly describe how to get started with the plugin. To start, install the plugin (https://wordpress.org/plugins/activitypub/) using your usual method for installing plugins. For me, that means adding it to a composer.json file, for you it might mean simply searching for the plugin in your WordPress admin -> plugins screen. Once installed, activate the plugin. That’s it! Your site is now ready to be followed by anyone within the Fediverse network.

The plugin implements the bare minimum required, it seems, so it can be a bit confusing when there are no immediately obvious visual changes to anything. Most of what the plugin does is in the background, inserting routes that are necessary to make the webfinger and ActivityPub protocols work on your site. Don’t worry though, the plugin is working!

To get the search string people need to use to follow your blog posts or pages visit your user profile page. You should see something similar to this near the bottom of the page:

These profile identifiers can be pasted into the search bar of an instance and from there you can follow the author. Simply take the @[email protected] portion that you see and paste it into the search bar of your Mastodon instance. For simplicity, I put this into my Fosstodon profile https://fosstodon.org/@[email protected]. This allows people to easily see my profile.

This final step was not immediately clear to me but I found this in my profile it was super easy to then follow myself. Your followers will appear in Users -> Followers. Anyone that replies to a post on the Fediverse will be added as a comment on your site. On my setup, using Akismet, incoming comments were put into spam for some reason.