TLDR; The fix for this is to ensure you are forcing your CDN to properly handle “application/activity+json” in the Accept header vs anything else. In other words, you need to Vary on Accept, but it’s best to limit it to “application/activity+json” if you can.

With the release of ActivityPub 1.0.0 plugin for WordPress I hope we’ll see a surge in the number of WordPress sites that can be followed using your favorite ActivityPub based systems like Mastodon and others. However, if you are hosting your WordPress site on Cloudflare (and likely other CDNs) and you have activated full page caching you are going to have a difficult time integrating your blog with the greater Fediverse. This is because when an ActivityPub user on a service like Mastodon performs a search for your profile, that search will land on your WordPress author page looking for additional information in JSON format. If someone has visited your author page recently in a browser then there is the chance Mastodon will get HTML back instead resulting in a broken search. The reverse of this situation can happen too. If a Mastodon user has recently performed a search and later someone lands on your author page, they will see JSON instead of the expected results.

The cause of this is because Cloudflare doesn’t differentiate between a request looking for HTML or one looking for JSON, this information is not factored into how Cloudflare caches the page. Instead, it only sees the author page URL and determines that it is the same request and returns whatever it has. The good news is, with some effort, we can trick Cloudflare into considering what type of content the client is looking for while still allowing for full page caching. Luckily the ActivityPub has a nice undocumented feature to help work around this situation.

To fix this while keeping page caching you will need to use a Cloudflare worker to adjust the request if the Accept header contains “application/activity+json”. I assume you already have page caching in place and you do not have some other plugin on your site that would interfere with page caching, like batcache, WP SuperCache and more. For my site I use Cloudflare’s APO for WordPress and nothing else.

First, you will want to ensure that your “Caching Level” configuration is set to standard. Next, you will need to get setup for working with Cloudflare Workers. You can follow the official guide at https://developers.cloudflare.com/workers/. Next, create a new project, again using their documentation. Next, replace the index.js file contents with:

export default {
  async fetch(req) {
    const acceptHeader = req.headers.get('accept');
    const url = new URL(req.url);

    if (acceptHeader?.indexOf("application/activity+json") > -1) {
      url.searchParams.append("activitypub", "true");
    }

    return fetch(url.toString(), {
      cf: {
        // Always cache this fetch regardless of content type
        // for a max of 5 minutes before revalidating the resource
        cacheTtl: 300,
        cacheEverything: true,
      },
    });
  }
}

You can now publish this using wrangler publish. You can adjust the cacheTtl to something longer or shorter to suite your needs.

Last step is to associate the worker with the /author route of your WordPress site. For my setup I created a worker route of “*dustinrue/author*” and that was it. My site will now cache and return the correct content based on whether or not the Accept header contains “application/activity+json”.

Remember that Cloudflare Workers do cost money though I suspect a lot of small sites will easily fit into the free tier.

When you create a k3s cluster using colima it will default to using Docker for the runtime. This means that any Docker image you build or pull will be available to k3s. This greatly simplifies testing locally built images being referenced by Helm charts or Kustomize (or whatever you are using).

This is not a feature unique to colima, but rather a feature of k3s if it is told to use the Docker runtime. You can read more at https://docs.k3s.io/advanced#using-docker-as-the-container-runtime.

If you have an older Kubernetes cluster with older Helm based software installs and you aren’t paying attention, it can be easy to leave some resources in a state that are impossible to update or remove. This is because of APIs that have been deprecated and then removed. While facing this issue today, I found this Helm plugin exists which can help resolve the problem – https://github.com/helm/helm-mapkubeapis

Pushed some updates to my proxmox packer project at https://github.com/dustinrue/proxmox-packer. The updates include effort to support Packer 1.9.2 and the most recent Proxmox plugin. The makefile has been updated to run the necessary init command to get the Proxmox plugin installed as well.

More importantly, the changes fix an issue that prevented the provisioner from running leading to broken or missing cloud-init support in the resulting templates.

In this post I’m going to more or less drop some notes about how I went about debugging a WireGuard VPN issue I was having. I have a WireGuard based VPN running on Rocky Linux 9 which is basically a default minimal installation with WireGuard installed. The system ships with firewalld and nftables, which will be important later. firewalld and nftables are used on my installation of WireGuard to work properly and I have a number of PostUp and PostDown commands that are run to insert rules so that VPN clients are NAT’d properly. As an older Linux user, I am very comfortable with iptables but significantly less so with firewalld and especially nftables.

My adventure began after a system update that prevented data from passing through the connection. Due to how WireGuard works, it appears the connection is made but no data would flow. After confirming my IP address had not changed recently and I was indeed connecting to the system at all I still couldn’t get traffic to pass through VPN.

The first thing I set out to do was verify connectivity with the service. Starting with tcpdump -i any port 51820 I was surprised that I wasn’t seeing any traffic to the service. I was surprised by this because there were no rules present in iptables to suggest the port was blocked and yet I would also get messages stating the port was administratively closed. In an effort to confirm this I wanted to see if I could get WireGuard to log what it was doing. As it turns out, despite being in the kernel as a module, there is a way to make it output logs. The following will set the module’s debug mode to on:

echo 'module wireguard +p' | sudo tee /sys/kernel/debug/dynamic_debug/control

Unfortunately I cannot share these logs but it will provide information about peers coming and going and efforts to maintain the connection. After enabling debug I found…no entries. Very unusual. My next step was to remove firewalld, and be extension nftables. Once removed, there were no firewall rules on the system at all and it was wide open. At this point I was able to see debug messages from WireGuard showing that new peers were connecting. As expected, the VPN still didn’t work because the required NAT rules were missing but this did finally confirm that the problem was with firewalld.

At this point it hit me that firewalld works differently than I was thinking and it has a concept of services. These services won’t appear in an iptables listing. At this point I had a feeling I knew what I had to do. I reinstalled firewalld, enabled and started it and looked at the services it was set to allow:

firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: ens18
  sources:
  services: cockpit dhcpv6-client ssh
  ports:
  protocols:
  forward: yes
  masquerade: yes
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

Immediately it became obvious that wireguard was not included in the list. Simply running firewall-cmd --zone=public --add-service=wireguard --permanent added the service to my public zone and from here my VPN started working again.

I have provided my thoughts in other places (Mastodon) about how I dislike firewalld and nftables. I find them much more tedious than iptables and haven’t ever really taken the time to learn them. As distributions change and mature over time, a lot of the default settings I took for granted, like a system’s networking being otherwise wide open after an install, are no longer true. At this point I am being forced to learn these tools, which is actually a good thing.

Long title but a relatively quick TIL. In a helm chart, when specifying a storageClass in your templates, it is important that if the user does not set a storageClass that you do not output storageClass in your template for a persistent volume. That is say that this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ""
  resources:
    requests:
      storage: 5Gi

Is not equivalent to this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

When installed the first time, both will result in Kubernetes selecting your default storage class. When this happens, your resource is actually updated to reflect the selected storageClass in the API. When you apply the same file again during an upgrade, an empty storageClassName will be seen as a change from the default storage to “”, which is not what you want. Instead, you should not send storageClassName at all if the user did not specify one.

TIL that you can tell systemd to run any ExecStartPre and ExecStopPost scripts as root instead of the user the service is supposed to run under.

At the same time we can touch on how to create an override for a service. In my case, I wanted to override how Redis is started on a system to ensure hugepage was set correctly, per their documentation. Creating an override for a service is super simple:

systemctl edit <service>

In my case this means:

systemctl edit redis

Since I know that the name of the service I want to edit is redis. From here, I am presented with my favorite editor where I can input the following:

[Service]
PermissionsStartOnly=true
ExecStartPre=/usr/local/sbin/hugepage.sh

Here I am defining two things. First, I want the permissions to only apply to the ExecStart command, not the others. Next, I am specifying an ExecStartPre that calls a script. This script is simply outputting:

/bin/echo never > /sys/kernel/mm/transparent_hugepage/enabled

Once complete, save out the file and restart your service. Your changes will now take affect.

If you are running a newer release of systemd (231+) then you can use the following format as well:

ExecStartPre=+/usr/local/sbin/hugepage.sh

One thing I dislike in WordPress is that it makes numerous external http requests while in the admin. This happens even if you have disabled any auto update systems in wp-config.php and can cause small pauses while loading admin pages while you wait for the requests to finish. Since I manage my site through a Gitlab based CI/CD workflow, auto updates don’t make a lot of sense for me and I would prefer to not have WordPress core or themes phoning home and slowing down the admin experience.

There is an existing option for blocking http requests in WordPress and it presented as a pair of defines you can use to block all requests and then allow some. These defines are WP_HTTP_BLOCK_EXTERNAL and WP_ACCESSIBLE_HOSTS which are describe in more depth at https://developer.wordpress.org/reference/classes/wp_http/block_request/. This a great way to block requests and generally the way to do something like this, block everything and then allow what you want. However, for my situation there is a much smaller set of domains I want to block and then allow everything else. In other words, I want to do the opposite of what these defines can do you for you. This is because there are a number of external services I do want to interact with like Cloudflare and Mastodon.

What I came up with was an mu-plugin that reverses the logic of defines above. It is an almost 1:1 copy/paste of the code that is used to block some requests. I then define a list of domains I wish to block. The code is very simple:

<?php
function block_urls( $preempt, $parsed_args, $uri ) {

    if ( ! defined( 'WP_BLOCKED_HOSTS' ) ) {
      return false;
    }

    $check = parse_url( $uri );
    if ( ! $check ) {
      return false;
    }

    static $blocked_hosts = null;
    static $wildcard_regex   = array();
    if ( null === $blocked_hosts ) {
        $blocked_hosts = preg_split( '|,\s*|', WP_BLOCKED_HOSTS );
        if ( false !== strpos( WP_BLOCKED_HOSTS, '*' ) ) {
          $wildcard_regex = array();
          foreach ( $blocked_hosts as $host ) {
            $wildcard_regex[] = str_replace( '\*', '.+', preg_quote( $host, '/' ) );
          }
          $wildcard_regex = '/^(' . implode( '|', $wildcard_regex ) . ')$/i';
        }
    }

    if ( ! empty( $wildcard_regex ) ) {
      $results = preg_match( $wildcard_regex, $check['host'] );
      if ($results > 0) {
        error_log(sprintf("Blocking %s://%s%s", $check['scheme'], $check['host'], $check['path']));
      } else {
        error_log(sprintf("Allowing %s://%s%s", $check['scheme'], $check['host'], $check['path']));
      }

      return $results > 0;
    } else {
      $results = in_array( $check['host'], $blocked_hosts, true ); // Inverse logic, if it's in the array, then block it.

      if ($results) {
        error_log(sprintf("Blocking %s://%s%s", $check['scheme'], $check['host'], $check['path']));
      } else {
        error_log(sprintf("Allowing %s://%s%s", $check['scheme'], $check['host'], $check['path']));
      }
      return $results;
    }
}

add_filter('pre_http_request', 'block_urls', 10, 3);

With this code saved in your mu-plugins directory as blocked-urls.php, you can then add a define like this to block those URLs from being loaded by WordPress:

define( 'WP_BLOCKED_HOSTS', 'api.wordpress.org,themeisle.com,*.themeisle.com' );

When WordPress attempts to load URLs from these domains, they will be blocked. You’ll also notice that this plugin is outputting all http requests that pass through WordPress core’s remote_get function. Using this information, you can block additional domains if you need to.

Update – I have also posted an alternative method that preserves the ability to have full page caching enabled. Please find it at WordPress ActivityPub and Cloudflare.

In a previous post I discussed how to deal with the fact the ActivityPub plugin for WordPress must return author pages in a different format depending on the value of the Accept header. A browser hitting an author page is going to be looking for HTML to be returned, while Mastodon will expect a JSON instead. If you use any kind of caching system be it a CDN, special plugin or combination of the two then you may run into an issue where the wrong content is being cached for each Accept header type. You might see this in your site health report with:

Your author URL does not return valid JSON for application/activity+json. Please check if your hosting supports alternate Accept headers.

In this post I will discuss a method for dealing with this while not totally losing the ability to cache the response. This is useful for busy sites or as a way to help mitigate some forms of DoS attack. The example I provide here is meant for Nginx with php-fpm but you can apply this same sort of thinking anywhere else where you have enough control over the configuration to make it work.

Assuming you followed the previous post and have created an exception for your author URL in your CDN then it is on your server to render author pages each time a request is made. This is a waste of resources and doesn’t provide an ideal experience for end users. To enable caching on this endpoint, we will leverage Nginx’s built in caching capability while setting the cache key based on the Accept header.

To start, let’s setup basic Nginx caching. At the top of your configuration file, outside of the server{} block, (advanced users can adjust as desired) add the following:

fastcgi_cache_path /etc/nginx/cache levels=1:2 keys_zone=wordpress:100m inactive=10m max_size=100m;

You can adjust the path if you want but in essence we are defining a path of /etc/nginx/cache with a name of wordpress. We are limiting it to 100MB and saying delete anything older than 10 minutes if it hasn’t been accessed. /etc/nginx/cache must exist and must be owned by the same user that runs Nginx. If you have multiple servers know that this cache is unlikely to be shared so each server will have a unique cache.

Next, add a map that to define what Accept headers we want to Vary on:

map $http_accept $vary_key {
  default "default";
  "~application/activity\+json" "json";
}

This block will create a new variable we can use later called $vary_key. Notice here that we will only create a different cache entry when application/activity+json is sent included in the list.

Now inside the server{} block for your site, let’s add a nice header we can use to ensure our caching is working properly. Adding add_header X-Nginx-Cache $upstream_cache_status; to this section will cause Nginx to output a header we can see to know the cache status. It will be BYPASS, MISS or HIT in response headers.

Next, inside the location block that is handling PHP requests, add the following config options:

# cache key
fastcgi_cache_key "$vary_key$host$request_method$request_uri";

# matches keys_zone in fastcgi_cache_path
fastcgi_cache wordpress;

# don't cache pages defined earlier
fastcgi_no_cache $no_cache;

#defines the default cache time
fastcgi_cache_valid any 10m;

# misc additional settings
fastcgi_cache_use_stale updating error timeout invalid_header http_500;
fastcgi_cache_lock on;
fastcgi_cache_lock_timeout 10s;

The next settings depend on what you want to do. If you are using a CDN and are only exposing author pages then you can use the following settings

# Cache nothing by default
set $no_cache 1;

# Only cache author pages
if ($request_uri ~* "/author/") {
  set $no_cache 0;
}

If instead, you want to cache everything your CDN might miss then you can use this (this is what I use):

# Cache everything by default
set $no_cache 0;

# Don't cache logged in users or commenters
if ( $http_cookie ~* "comment_author_|wordpress_(?!test_cookie)|wp-postpass_" ) {
  set $no_cache 1;
}

# Don't cache the following URLs
if ($request_uri ~* "/(wp-admin/|wp-login.php)") {
  set $no_cache 1;
}

If done correctly then hitting an author page will result in different results depending on the Accept header being used. To verify, take an author page and load it up in a browser. You should get a proper HTML page. Copy the URL out and, using curl, send the following:

curl -I https://dustinrue.com/author/ruedu/ -H "Accept: application/activity+json"

x-nginx-cache: MISS

Your Nginx cache status may already be HIT if someone recently searched for you. It should be HIT if you send the request again.

Debugging

It is important that you debug and properly resolve this endpoint. Failing to do so will result in failed searches of your author/user from ActivityPub clients. To be clear, the following must return different content:

curl -I https://dustinrue.com/author/ruedu
curl -I https://dustinrue.com/author/ruedu -H "Accept: application/activity+json"

Adjust these URLs for your author page URLs and ensure the first curl returns HTML content while the second one returns JSON content. While writing this post I noticed that the plugin is still outputting that the content type is text/html when it should say application/activity+json. Despite this inconsistency, clients will use the returned content.

If the curl calls are returning different content, next pay attention to the x-nginx-cache header to ensure that it is actually caching. You can add another utility header to your Nginx config to assist with this:

add_header x-accept $vary_key always;

This add_header will output what value the map landed on so you can ensure things are being picked up properly.

Conclusion

I hope this is enough to help guide you in improving your WordPress + ActivityPub experience.

After more than ten years running under the blog.dustinrue.com subdomain I have moved my site to sit on right on dustinrue.com instead. I am making this change simply because this really is the only site I have and there is no need to have it on a subdomain. Redirects have been setup to ensure no links are broken.

In addition to the URL change, I have also removed ads and in fact any tracking at all on the site at all. This means the cookie consent banner is also gone (a personal pet peeve of mine). For what it is worth, the ads and analytics were being done by the Google Site Kit plugin, an excellent plugin put out by my employer.