TLDR; The fix for this is to ensure you are forcing your CDN to properly handle “application/activity+json” in the Accept header vs anything else. In other words, you need to Vary on Accept, but it’s best to limit it to “application/activity+json” if you can.

With the release of ActivityPub 1.0.0 plugin for WordPress I hope we’ll see a surge in the number of WordPress sites that can be followed using your favorite ActivityPub based systems like Mastodon and others. However, if you are hosting your WordPress site on Cloudflare (and likely other CDNs) and you have activated full page caching you are going to have a difficult time integrating your blog with the greater Fediverse. This is because when an ActivityPub user on a service like Mastodon performs a search for your profile, that search will land on your WordPress author page looking for additional information in JSON format. If someone has visited your author page recently in a browser then there is the chance Mastodon will get HTML back instead resulting in a broken search. The reverse of this situation can happen too. If a Mastodon user has recently performed a search and later someone lands on your author page, they will see JSON instead of the expected results.

The cause of this is because Cloudflare doesn’t differentiate between a request looking for HTML or one looking for JSON, this information is not factored into how Cloudflare caches the page. Instead, it only sees the author page URL and determines that it is the same request and returns whatever it has. The good news is, with some effort, we can trick Cloudflare into considering what type of content the client is looking for while still allowing for full page caching. Luckily the ActivityPub has a nice undocumented feature to help work around this situation.

To fix this while keeping page caching you will need to use a Cloudflare worker to adjust the request if the Accept header contains “application/activity+json”. I assume you already have page caching in place and you do not have some other plugin on your site that would interfere with page caching, like batcache, WP SuperCache and more. For my site I use Cloudflare’s APO for WordPress and nothing else.

First, you will want to ensure that your “Caching Level” configuration is set to standard. Next, you will need to get setup for working with Cloudflare Workers. You can follow the official guide at https://developers.cloudflare.com/workers/. Next, create a new project, again using their documentation. Next, replace the index.js file contents with:

export default {
  async fetch(req) {
    const acceptHeader = req.headers.get('accept');
    const url = new URL(req.url);

    if (acceptHeader?.indexOf("application/activity+json") > -1) {
      url.searchParams.append("activitypub", "true");
    }

    return fetch(url.toString(), {
      cf: {
        // Always cache this fetch regardless of content type
        // for a max of 5 minutes before revalidating the resource
        cacheTtl: 300,
        cacheEverything: true,
      },
    });
  }
}

You can now publish this using wrangler publish. You can adjust the cacheTtl to something longer or shorter to suite your needs.

Last step is to associate the worker with the /author route of your WordPress site. For my setup I created a worker route of “*dustinrue/author*” and that was it. My site will now cache and return the correct content based on whether or not the Accept header contains “application/activity+json”.

Remember that Cloudflare Workers do cost money though I suspect a lot of small sites will easily fit into the free tier.

When you create a k3s cluster using colima it will default to using Docker for the runtime. This means that any Docker image you build or pull will be available to k3s. This greatly simplifies testing locally built images being referenced by Helm charts or Kustomize (or whatever you are using).

This is not a feature unique to colima, but rather a feature of k3s if it is told to use the Docker runtime. You can read more at https://docs.k3s.io/advanced#using-docker-as-the-container-runtime.

If you have an older Kubernetes cluster with older Helm based software installs and you aren’t paying attention, it can be easy to leave some resources in a state that are impossible to update or remove. This is because of APIs that have been deprecated and then removed. While facing this issue today, I found this Helm plugin exists which can help resolve the problem – https://github.com/helm/helm-mapkubeapis

Pushed some updates to my proxmox packer project at https://github.com/dustinrue/proxmox-packer. The updates include effort to support Packer 1.9.2 and the most recent Proxmox plugin. The makefile has been updated to run the necessary init command to get the Proxmox plugin installed as well.

More importantly, the changes fix an issue that prevented the provisioner from running leading to broken or missing cloud-init support in the resulting templates.

In this post I’m going to more or less drop some notes about how I went about debugging a WireGuard VPN issue I was having. I have a WireGuard based VPN running on Rocky Linux 9 which is basically a default minimal installation with WireGuard installed. The system ships with firewalld and nftables, which will be important later. firewalld and nftables are used on my installation of WireGuard to work properly and I have a number of PostUp and PostDown commands that are run to insert rules so that VPN clients are NAT’d properly. As an older Linux user, I am very comfortable with iptables but significantly less so with firewalld and especially nftables.

My adventure began after a system update that prevented data from passing through the connection. Due to how WireGuard works, it appears the connection is made but no data would flow. After confirming my IP address had not changed recently and I was indeed connecting to the system at all I still couldn’t get traffic to pass through VPN.

The first thing I set out to do was verify connectivity with the service. Starting with tcpdump -i any port 51820 I was surprised that I wasn’t seeing any traffic to the service. I was surprised by this because there were no rules present in iptables to suggest the port was blocked and yet I would also get messages stating the port was administratively closed. In an effort to confirm this I wanted to see if I could get WireGuard to log what it was doing. As it turns out, despite being in the kernel as a module, there is a way to make it output logs. The following will set the module’s debug mode to on:

echo 'module wireguard +p' | sudo tee /sys/kernel/debug/dynamic_debug/control

Unfortunately I cannot share these logs but it will provide information about peers coming and going and efforts to maintain the connection. After enabling debug I found…no entries. Very unusual. My next step was to remove firewalld, and be extension nftables. Once removed, there were no firewall rules on the system at all and it was wide open. At this point I was able to see debug messages from WireGuard showing that new peers were connecting. As expected, the VPN still didn’t work because the required NAT rules were missing but this did finally confirm that the problem was with firewalld.

At this point it hit me that firewalld works differently than I was thinking and it has a concept of services. These services won’t appear in an iptables listing. At this point I had a feeling I knew what I had to do. I reinstalled firewalld, enabled and started it and looked at the services it was set to allow:

firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: ens18
  sources:
  services: cockpit dhcpv6-client ssh
  ports:
  protocols:
  forward: yes
  masquerade: yes
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

Immediately it became obvious that wireguard was not included in the list. Simply running firewall-cmd --zone=public --add-service=wireguard --permanent added the service to my public zone and from here my VPN started working again.

I have provided my thoughts in other places (Mastodon) about how I dislike firewalld and nftables. I find them much more tedious than iptables and haven’t ever really taken the time to learn them. As distributions change and mature over time, a lot of the default settings I took for granted, like a system’s networking being otherwise wide open after an install, are no longer true. At this point I am being forced to learn these tools, which is actually a good thing.

Long title but a relatively quick TIL. In a helm chart, when specifying a storageClass in your templates, it is important that if the user does not set a storageClass that you do not output storageClass in your template for a persistent volume. That is say that this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ""
  resources:
    requests:
      storage: 5Gi

Is not equivalent to this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

When installed the first time, both will result in Kubernetes selecting your default storage class. When this happens, your resource is actually updated to reflect the selected storageClass in the API. When you apply the same file again during an upgrade, an empty storageClassName will be seen as a change from the default storage to “”, which is not what you want. Instead, you should not send storageClassName at all if the user did not specify one.

TIL that you can tell systemd to run any ExecStartPre and ExecStopPost scripts as root instead of the user the service is supposed to run under.

At the same time we can touch on how to create an override for a service. In my case, I wanted to override how Redis is started on a system to ensure hugepage was set correctly, per their documentation. Creating an override for a service is super simple:

systemctl edit <service>

In my case this means:

systemctl edit redis

Since I know that the name of the service I want to edit is redis. From here, I am presented with my favorite editor where I can input the following:

[Service]
PermissionsStartOnly=true
ExecStartPre=/usr/local/sbin/hugepage.sh

Here I am defining two things. First, I want the permissions to only apply to the ExecStart command, not the others. Next, I am specifying an ExecStartPre that calls a script. This script is simply outputting:

/bin/echo never > /sys/kernel/mm/transparent_hugepage/enabled

Once complete, save out the file and restart your service. Your changes will now take affect.

If you are running a newer release of systemd (231+) then you can use the following format as well:

ExecStartPre=+/usr/local/sbin/hugepage.sh

Yesterday I was reminded that when a URL is shared on Mastodon, every instance that has a user following you, that server will make a request to your site at least once in an effort to get some additional embed information. If your site is WordPress based, like this one, then you will likely see two requests. The first request to your site will request the URL that was added to the post while the second one follows any embed information WordPress is exposing in order to get some additional meta data. Since Mastodon is a federated system, every Mastodon server or instance will need to gather this data in order for it to be displayed to its users.

If you are a user that has a lot of followers then posting a link to your blog or site will likely result in a mini DDoS has hundreds of Mastodon instances request this information from your server. If you have not taken precautions this can potentially take down your site as it is overloaded with requests! Years ago this would have been referred to as being “slash dotted” (links on https://slashdot.org) or “fireballed” (links on https://daringfireball.net).

Fortunately you can very effectively deal with this situation on your own or by working with your hosting provider. In this post, I am going to describe how I handle the situation using Cloudflare, which is the CDN provider I have chosen to put my site behind. I am not going into full detail on how to implement all options and I am not selling Cloudflare or associated with them beyond being a customer. What I share here will be applicable to any CDN or will at least serve as inspiration for how to handle it in your configuration.

As I said previously, this site is using WordPress and is behind Cloudflare. To make it easy on myself I have also purchased their Automatic Platform Optimization for WordPress feature. I got into this option initially because I wanted to understand it better but have since kept it because it works well. The biggest feature of APO for WordPress is that it enables full page caching for your site. This is a must if you want to get the best possible experience for users globally. Using APO is absolutely not necessary, you can simply use Nginx micro caching instead or any other caching solution, but the key here is to have full page caching so that repeated requests to your site do not incur actual processing time by WordPress.

APO will, out of the box, cache full pages of your site but what it will not protect is the meta data URL used to provide additional information for embeds. To prevent Mastodon servers from crushing your site with embed meta data requests, there is one additional endpoint you need to force to be cached. Here is how I forced Cloudflare to cache the correct URL for me.

Login into Cloudflare and click on the domain for your site. Find the caching section of the menu and click on Cache Rules. Add a new rule and define what is shown in the screenshot

Screenshot showing a cloudflare configuration screen for caching an oembed request from Mastodon, or any system that would do this. Add a name for your rule, set the Field to URI Path contains the path /wp-json/oembed

From here, tell Cloudflare what to do with this match

Screenshot showing another Cloudflare configuration screen. Here you should set the Cache status to "Eligible for cache" and "Override origin" set to 2 hours. 2 hours is the minimum option on a free plan

Note that 2 hours is the lowest cache time I can specify on an otherwise free Cloudflare plan so that is what I set it to. With these options filled out you can click save and you are done. Anything looking for this URL will now either get a cached copy of the response or will cause the content to be cached for future requests.

Of course, you don’t need to use Cloudflare to make this work. Savvy users can also translate these URLs to Nginx or Apache configuration to perform the trick. The goal is to ensure your WordPress site is better able to handle when you have shared a link to Mastodon and there are many options. Using Cloudflare is one option that has worked well for me. I encourage everyone that hosts a blog, either self-hosted or through some managed provider, to ensure that page caching and the oembed URL for WordPress is cached.

If you run a private instance of Mastodon it can feel mighty lonely sometimes. This is due to an inherent design characteristic of Mastodon and federated services in general…how does one instance get information from another instance? Typically, if you have a user on your instance that follows someone else then that information will be added to your instance, along with any hashtags they use and so on. If you run a small server, then obvious there are far fewer ways for information to flow to your instance.

Solving this problem is a matter of ensuring more data is flowing into your instance so that it can then see more content, and most importantly as of version 4.0 of Mastodon, additional hashtags. The easiest way of doing this (aside from running a large instance) is to use relays.

Relays, in essence, take in feeds from a number of instances and passes them to other instances attached to the relay. Finding a relay to add to your instance is as easy as going to https://relaylist.com and picking one or more to add to your instance. Relay servers usually support both Pleroma as well as Mastodon but remember the information you add to each is different. Be sure to add the right URL.

Screenshot showing relay URLs

You can subscribe to a relay by visiting /admin/relays in the admin dashboard of your instance and clicking the “Add New Relay” button. Simply pick a relay server from the list and add it using the correct URL. For Mastodon, the URL will end with /inbox in almost every case. After a short while you will begin to see your Federated timeline be populated with posts from other instances. Your instance will now see a lot of new content including hashtags that you can follow.

Keep in mind that bringing in this cost does have a cost associated with it. All instances of Mastodon will store everything it sees locally, including post content and media like images and video. It is a good idea to set retention limits on your system so that you are not storing everything ever seen forever. On my system I set my retention limits to 14 days for media and content cache and 7 days for user archives. You will find your instance’s content retention policy at /admin/settings/content_retention.

In addition to using more disk space (and bandwidth transferring all that media) you will also incur more processing time on your instance. The amount of space and processing you need depends heavily on which or how many relay services you add to your instance. For my instance I struck a balance between getting enough data flowing so that hashtags were interesting but not so much that I increased my costs unnecessarily. You can track this information in both your admin dashboard at /admin/dashboard (scroll to the bottom) as well as your chosen object storage provider (you did set one up right?).

If you are running a private instance and feel a bit left out I hope this helps you get the activity you are looking for. Also remember to follow a lot of people and boost content you like instead of just liking it. This will lead to more followers for yourself, more interactions and a more interesting timeline!

Since about mid December 2022 I have been running my own private instance of Mastodon. I thought I would detail how I did it and what it has cost me so far.

When I first learned about Mastodon I was excited to get to understand it better, particularly how it is hosted and scaled. For Mastodon, I decided right away that the best way to better my understanding was to host it myself and to do so on my favorite platform, Kubernetes. I started by creating my helm chart (https://github.com/dustinrue/mastodon-helm-chart) and installed the core software in my home lab which consists of k3s. The chart I created is based on the official helm chart (https://github.com/mastodon/chart). I created my own because I, again, wanted to learn about the moving pieces of a Mastodon installation but also because I was unhappy with the official chart integrating Redis and PostgreSQL as dependencies. In addition, it doesn’t break out the Sidekiq processes in a way that makes sense…but more on that later.

Before we can get to deep into what I did, we should probably first discuss some of the major components of a Mastodon instance or server. Mastodon is a collection of services working together to form a full solution which includes:

  • A web service which provides the user interface but is also the sort of API server for all things Mastodon. In a full production setup it is important that this be highly available.
  • A streaming service which feeds data to the web frontend as it arrives and is processed. This is almost important but doesn’t seem to be critical. In other words, you can survive a bit of downtime here, you’ll just have a less than great experience.
  • A number of Sidekiq queues. There are numerous Sidekiq queues which are the heart of how data moves in a Mastodon instance. These queues, as of this writing, include a scheduler, ingress, mailer, push, pull and default. Each queue has a specific purpose and each queue is again not absolutely critical to the availability of your Mastodon instance. This means that you can easily take down each queue temporarily to deal with some issue. While a queue is down know that nothing that queue is responsible for will be processed. The special scheduler queue, if not running, will likely prevent most other queues from doing anything at all.
  • Redis is a glue that keeps data flowing between processes. It is also a critical piece to keep running though losing data within it, while not ideal, is ok. Keeping it running is critical because all of the other Mastodon processes expect it to be available and will fail to start without it. In a full production setup I recommend ensuring it is running in a highly available fashion.
  • PostgreSQL is the last required piece of software when running Mastodon. Like Redis, it is what I could consider to be critical to your setup. If running a full production setup you will want to cluster it to maintain availability first with performance a secondary consideration.
  • You need some system for dealing with email. Mastodon needs to send email for account confirmation and some administrative or moderation work. For my system I am using Send In Blue (https://www.sendinblue.com) which has a free tier.

Mastodon also supports other, optional services which you can read about at https://docs.joinmastodon.org/admin/optional/.

As you can likely see, running Mastodon is not simple yet it isn’t overwhelming either. I believe running Mastodon can be done inexpensively, especially a private instance, but to run it in production correctly, there is definitely a base cost you need to consider so that you can remove as much failure points as possible. In addition, there are many other pieces you will likely want if running a large installation like how to monitor metrics, keeping track of Sidekiq queue depth and processing times and more.

Having spent some time on Mastodon during the great Twitter migration I witnessed some of the struggles of a number of instance admins as a their instances struggled to meet the demands of new users and users who had created accounts before but were suddenly active. I saw a few notable patterns emerge that contributed to their scaling woes including:

  • Not using a CDN or object storage system initially
  • Not installing pgbouncer in front of PostgreSQL
  • Not installing Sidekiq into separate processes running each queue

There are some really excellent guides and references on how to scale Mastodon (https://hazelweakly.me/blog/scaling-mastodon/ to name but one) but many of the recommendations will require you to do or have done one, if not all, of the above mentioned steps. Each of these items are disruptive in a way that you probably do not want to be trying to handle them while in a panic of trying to get your instance running again. If you are running or plan to run a public instance where you allow anyone to sign up then I highly recommend getting at least those three items out of the way from day one. Doing so will help ensure that scaling up from there is much, much easier as most items will then become adding additional servers to run more Sidekiq processes or tuning parameters.

When I created my helm chart, I took these lessons and applied them as conscious decisions in the design of the chart. Though not at all necessary for a small or single user instance, my chart breaks out all of the current Sidekiq queues into separate processes. This layout ensures the hard work of separating the processes out is done and the rest is a matter of scaling and tuning.

As of this writing, my helm chart also installs a weekly cronjob to clean up media files and, optionally, a cronjob for backing up the database to some shared storage in your Kubernetes cluster. Though it is ultimately incomplete, I feel the helm chart is a good start.

As for actually running Mastodon for myself I created a subdomain for my instance to live at. I then installed Mastodon, using my helm chart, into my k3s cluster. Ignoring the cost of my ISP and the computers I have, running Mastodon is quite minimal. My home lab provides everything I need to make Mastodon work including persistent storage using TrueNAS. For media storage, I created a Cloudflare R2 bucket and URL for public access. Mastodon is configured to send media content to R2 which is then served from the CDN URL. This keeps all of the heavy storage separate from the rest of the system. My last bill for R2 was just $0.06 which was for the approximately 20GB of content I have stored there. I do expect my next bill to be more because the average amount of data stored in R2 will be higher.

Since my installation is just a private one, I installed PostgreSQL and Redis as single instances within my k3s cluster. Both instances are extremely basic Bitnami based installed using their available helm charts. PostgreSQL is backed by persistent storage provided by TrueNAS. For email, my k3s cluster runs an installation of Postfix. Postfix is configured to send email through Send In Blue and services that I run in my cluster are configured to talk to Postfix. This allows me to have a single mail relay that I need to maintain the configuration for.

Ingress is provided by Cloudflare and cloudflared tunnels. A tunnel is configured on a different VM I have running and then configured in the Cloudfare side on how to route traffic to the Kube cluster with the correct hostname included.

All said, this setup has proven reliable for me since mid December. In a future post I’ll discuss how I got my private instance to feel a bit more included in the Fediverse by adding relays. Please leave a comment if you feel I missed something or got something wrong.