Pushed some updates to my proxmox packer project at https://github.com/dustinrue/proxmox-packer. The updates include effort to support Packer 1.9.2 and the most recent Proxmox plugin. The makefile has been updated to run the necessary init command to get the Proxmox plugin installed as well.

More importantly, the changes fix an issue that prevented the provisioner from running leading to broken or missing cloud-init support in the resulting templates.

In this post I’m going to more or less drop some notes about how I went about debugging a WireGuard VPN issue I was having. I have a WireGuard based VPN running on Rocky Linux 9 which is basically a default minimal installation with WireGuard installed. The system ships with firewalld and nftables, which will be important later. firewalld and nftables are used on my installation of WireGuard to work properly and I have a number of PostUp and PostDown commands that are run to insert rules so that VPN clients are NAT’d properly. As an older Linux user, I am very comfortable with iptables but significantly less so with firewalld and especially nftables.

My adventure began after a system update that prevented data from passing through the connection. Due to how WireGuard works, it appears the connection is made but no data would flow. After confirming my IP address had not changed recently and I was indeed connecting to the system at all I still couldn’t get traffic to pass through VPN.

The first thing I set out to do was verify connectivity with the service. Starting with tcpdump -i any port 51820 I was surprised that I wasn’t seeing any traffic to the service. I was surprised by this because there were no rules present in iptables to suggest the port was blocked and yet I would also get messages stating the port was administratively closed. In an effort to confirm this I wanted to see if I could get WireGuard to log what it was doing. As it turns out, despite being in the kernel as a module, there is a way to make it output logs. The following will set the module’s debug mode to on:

echo 'module wireguard +p' | sudo tee /sys/kernel/debug/dynamic_debug/control

Unfortunately I cannot share these logs but it will provide information about peers coming and going and efforts to maintain the connection. After enabling debug I found…no entries. Very unusual. My next step was to remove firewalld, and be extension nftables. Once removed, there were no firewall rules on the system at all and it was wide open. At this point I was able to see debug messages from WireGuard showing that new peers were connecting. As expected, the VPN still didn’t work because the required NAT rules were missing but this did finally confirm that the problem was with firewalld.

At this point it hit me that firewalld works differently than I was thinking and it has a concept of services. These services won’t appear in an iptables listing. At this point I had a feeling I knew what I had to do. I reinstalled firewalld, enabled and started it and looked at the services it was set to allow:

firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: ens18
  sources:
  services: cockpit dhcpv6-client ssh
  ports:
  protocols:
  forward: yes
  masquerade: yes
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

Immediately it became obvious that wireguard was not included in the list. Simply running firewall-cmd --zone=public --add-service=wireguard --permanent added the service to my public zone and from here my VPN started working again.

I have provided my thoughts in other places (Mastodon) about how I dislike firewalld and nftables. I find them much more tedious than iptables and haven’t ever really taken the time to learn them. As distributions change and mature over time, a lot of the default settings I took for granted, like a system’s networking being otherwise wide open after an install, are no longer true. At this point I am being forced to learn these tools, which is actually a good thing.

Long title but a relatively quick TIL. In a helm chart, when specifying a storageClass in your templates, it is important that if the user does not set a storageClass that you do not output storageClass in your template for a persistent volume. That is say that this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ""
  resources:
    requests:
      storage: 5Gi

Is not equivalent to this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

When installed the first time, both will result in Kubernetes selecting your default storage class. When this happens, your resource is actually updated to reflect the selected storageClass in the API. When you apply the same file again during an upgrade, an empty storageClassName will be seen as a change from the default storage to “”, which is not what you want. Instead, you should not send storageClassName at all if the user did not specify one.

TIL that you can tell systemd to run any ExecStartPre and ExecStopPost scripts as root instead of the user the service is supposed to run under.

At the same time we can touch on how to create an override for a service. In my case, I wanted to override how Redis is started on a system to ensure hugepage was set correctly, per their documentation. Creating an override for a service is super simple:

systemctl edit <service>

In my case this means:

systemctl edit redis

Since I know that the name of the service I want to edit is redis. From here, I am presented with my favorite editor where I can input the following:

[Service]
PermissionsStartOnly=true
ExecStartPre=/usr/local/sbin/hugepage.sh

Here I am defining two things. First, I want the permissions to only apply to the ExecStart command, not the others. Next, I am specifying an ExecStartPre that calls a script. This script is simply outputting:

/bin/echo never > /sys/kernel/mm/transparent_hugepage/enabled

Once complete, save out the file and restart your service. Your changes will now take affect.

If you are running a newer release of systemd (231+) then you can use the following format as well:

ExecStartPre=+/usr/local/sbin/hugepage.sh

Yesterday I was reminded that when a URL is shared on Mastodon, every instance that has a user following you, that server will make a request to your site at least once in an effort to get some additional embed information. If your site is WordPress based, like this one, then you will likely see two requests. The first request to your site will request the URL that was added to the post while the second one follows any embed information WordPress is exposing in order to get some additional meta data. Since Mastodon is a federated system, every Mastodon server or instance will need to gather this data in order for it to be displayed to its users.

If you are a user that has a lot of followers then posting a link to your blog or site will likely result in a mini DDoS has hundreds of Mastodon instances request this information from your server. If you have not taken precautions this can potentially take down your site as it is overloaded with requests! Years ago this would have been referred to as being “slash dotted” (links on https://slashdot.org) or “fireballed” (links on https://daringfireball.net).

Fortunately you can very effectively deal with this situation on your own or by working with your hosting provider. In this post, I am going to describe how I handle the situation using Cloudflare, which is the CDN provider I have chosen to put my site behind. I am not going into full detail on how to implement all options and I am not selling Cloudflare or associated with them beyond being a customer. What I share here will be applicable to any CDN or will at least serve as inspiration for how to handle it in your configuration.

As I said previously, this site is using WordPress and is behind Cloudflare. To make it easy on myself I have also purchased their Automatic Platform Optimization for WordPress feature. I got into this option initially because I wanted to understand it better but have since kept it because it works well. The biggest feature of APO for WordPress is that it enables full page caching for your site. This is a must if you want to get the best possible experience for users globally. Using APO is absolutely not necessary, you can simply use Nginx micro caching instead or any other caching solution, but the key here is to have full page caching so that repeated requests to your site do not incur actual processing time by WordPress.

APO will, out of the box, cache full pages of your site but what it will not protect is the meta data URL used to provide additional information for embeds. To prevent Mastodon servers from crushing your site with embed meta data requests, there is one additional endpoint you need to force to be cached. Here is how I forced Cloudflare to cache the correct URL for me.

Login into Cloudflare and click on the domain for your site. Find the caching section of the menu and click on Cache Rules. Add a new rule and define what is shown in the screenshot

Screenshot showing a cloudflare configuration screen for caching an oembed request from Mastodon, or any system that would do this. Add a name for your rule, set the Field to URI Path contains the path /wp-json/oembed

From here, tell Cloudflare what to do with this match

Screenshot showing another Cloudflare configuration screen. Here you should set the Cache status to "Eligible for cache" and "Override origin" set to 2 hours. 2 hours is the minimum option on a free plan

Note that 2 hours is the lowest cache time I can specify on an otherwise free Cloudflare plan so that is what I set it to. With these options filled out you can click save and you are done. Anything looking for this URL will now either get a cached copy of the response or will cause the content to be cached for future requests.

Of course, you don’t need to use Cloudflare to make this work. Savvy users can also translate these URLs to Nginx or Apache configuration to perform the trick. The goal is to ensure your WordPress site is better able to handle when you have shared a link to Mastodon and there are many options. Using Cloudflare is one option that has worked well for me. I encourage everyone that hosts a blog, either self-hosted or through some managed provider, to ensure that page caching and the oembed URL for WordPress is cached.

If you run a private instance of Mastodon it can feel mighty lonely sometimes. This is due to an inherent design characteristic of Mastodon and federated services in general…how does one instance get information from another instance? Typically, if you have a user on your instance that follows someone else then that information will be added to your instance, along with any hashtags they use and so on. If you run a small server, then obvious there are far fewer ways for information to flow to your instance.

Solving this problem is a matter of ensuring more data is flowing into your instance so that it can then see more content, and most importantly as of version 4.0 of Mastodon, additional hashtags. The easiest way of doing this (aside from running a large instance) is to use relays.

Relays, in essence, take in feeds from a number of instances and passes them to other instances attached to the relay. Finding a relay to add to your instance is as easy as going to https://relaylist.com and picking one or more to add to your instance. Relay servers usually support both Pleroma as well as Mastodon but remember the information you add to each is different. Be sure to add the right URL.

Screenshot showing relay URLs

You can subscribe to a relay by visiting /admin/relays in the admin dashboard of your instance and clicking the “Add New Relay” button. Simply pick a relay server from the list and add it using the correct URL. For Mastodon, the URL will end with /inbox in almost every case. After a short while you will begin to see your Federated timeline be populated with posts from other instances. Your instance will now see a lot of new content including hashtags that you can follow.

Keep in mind that bringing in this cost does have a cost associated with it. All instances of Mastodon will store everything it sees locally, including post content and media like images and video. It is a good idea to set retention limits on your system so that you are not storing everything ever seen forever. On my system I set my retention limits to 14 days for media and content cache and 7 days for user archives. You will find your instance’s content retention policy at /admin/settings/content_retention.

In addition to using more disk space (and bandwidth transferring all that media) you will also incur more processing time on your instance. The amount of space and processing you need depends heavily on which or how many relay services you add to your instance. For my instance I struck a balance between getting enough data flowing so that hashtags were interesting but not so much that I increased my costs unnecessarily. You can track this information in both your admin dashboard at /admin/dashboard (scroll to the bottom) as well as your chosen object storage provider (you did set one up right?).

If you are running a private instance and feel a bit left out I hope this helps you get the activity you are looking for. Also remember to follow a lot of people and boost content you like instead of just liking it. This will lead to more followers for yourself, more interactions and a more interesting timeline!

Since about mid December 2022 I have been running my own private instance of Mastodon. I thought I would detail how I did it and what it has cost me so far.

When I first learned about Mastodon I was excited to get to understand it better, particularly how it is hosted and scaled. For Mastodon, I decided right away that the best way to better my understanding was to host it myself and to do so on my favorite platform, Kubernetes. I started by creating my helm chart (https://github.com/dustinrue/mastodon-helm-chart) and installed the core software in my home lab which consists of k3s. The chart I created is based on the official helm chart (https://github.com/mastodon/chart). I created my own because I, again, wanted to learn about the moving pieces of a Mastodon installation but also because I was unhappy with the official chart integrating Redis and PostgreSQL as dependencies. In addition, it doesn’t break out the Sidekiq processes in a way that makes sense…but more on that later.

Before we can get to deep into what I did, we should probably first discuss some of the major components of a Mastodon instance or server. Mastodon is a collection of services working together to form a full solution which includes:

  • A web service which provides the user interface but is also the sort of API server for all things Mastodon. In a full production setup it is important that this be highly available.
  • A streaming service which feeds data to the web frontend as it arrives and is processed. This is almost important but doesn’t seem to be critical. In other words, you can survive a bit of downtime here, you’ll just have a less than great experience.
  • A number of Sidekiq queues. There are numerous Sidekiq queues which are the heart of how data moves in a Mastodon instance. These queues, as of this writing, include a scheduler, ingress, mailer, push, pull and default. Each queue has a specific purpose and each queue is again not absolutely critical to the availability of your Mastodon instance. This means that you can easily take down each queue temporarily to deal with some issue. While a queue is down know that nothing that queue is responsible for will be processed. The special scheduler queue, if not running, will likely prevent most other queues from doing anything at all.
  • Redis is a glue that keeps data flowing between processes. It is also a critical piece to keep running though losing data within it, while not ideal, is ok. Keeping it running is critical because all of the other Mastodon processes expect it to be available and will fail to start without it. In a full production setup I recommend ensuring it is running in a highly available fashion.
  • PostgreSQL is the last required piece of software when running Mastodon. Like Redis, it is what I could consider to be critical to your setup. If running a full production setup you will want to cluster it to maintain availability first with performance a secondary consideration.
  • You need some system for dealing with email. Mastodon needs to send email for account confirmation and some administrative or moderation work. For my system I am using Send In Blue (https://www.sendinblue.com) which has a free tier.

Mastodon also supports other, optional services which you can read about at https://docs.joinmastodon.org/admin/optional/.

As you can likely see, running Mastodon is not simple yet it isn’t overwhelming either. I believe running Mastodon can be done inexpensively, especially a private instance, but to run it in production correctly, there is definitely a base cost you need to consider so that you can remove as much failure points as possible. In addition, there are many other pieces you will likely want if running a large installation like how to monitor metrics, keeping track of Sidekiq queue depth and processing times and more.

Having spent some time on Mastodon during the great Twitter migration I witnessed some of the struggles of a number of instance admins as a their instances struggled to meet the demands of new users and users who had created accounts before but were suddenly active. I saw a few notable patterns emerge that contributed to their scaling woes including:

  • Not using a CDN or object storage system initially
  • Not installing pgbouncer in front of PostgreSQL
  • Not installing Sidekiq into separate processes running each queue

There are some really excellent guides and references on how to scale Mastodon (https://hazelweakly.me/blog/scaling-mastodon/ to name but one) but many of the recommendations will require you to do or have done one, if not all, of the above mentioned steps. Each of these items are disruptive in a way that you probably do not want to be trying to handle them while in a panic of trying to get your instance running again. If you are running or plan to run a public instance where you allow anyone to sign up then I highly recommend getting at least those three items out of the way from day one. Doing so will help ensure that scaling up from there is much, much easier as most items will then become adding additional servers to run more Sidekiq processes or tuning parameters.

When I created my helm chart, I took these lessons and applied them as conscious decisions in the design of the chart. Though not at all necessary for a small or single user instance, my chart breaks out all of the current Sidekiq queues into separate processes. This layout ensures the hard work of separating the processes out is done and the rest is a matter of scaling and tuning.

As of this writing, my helm chart also installs a weekly cronjob to clean up media files and, optionally, a cronjob for backing up the database to some shared storage in your Kubernetes cluster. Though it is ultimately incomplete, I feel the helm chart is a good start.

As for actually running Mastodon for myself I created a subdomain for my instance to live at. I then installed Mastodon, using my helm chart, into my k3s cluster. Ignoring the cost of my ISP and the computers I have, running Mastodon is quite minimal. My home lab provides everything I need to make Mastodon work including persistent storage using TrueNAS. For media storage, I created a Cloudflare R2 bucket and URL for public access. Mastodon is configured to send media content to R2 which is then served from the CDN URL. This keeps all of the heavy storage separate from the rest of the system. My last bill for R2 was just $0.06 which was for the approximately 20GB of content I have stored there. I do expect my next bill to be more because the average amount of data stored in R2 will be higher.

Since my installation is just a private one, I installed PostgreSQL and Redis as single instances within my k3s cluster. Both instances are extremely basic Bitnami based installed using their available helm charts. PostgreSQL is backed by persistent storage provided by TrueNAS. For email, my k3s cluster runs an installation of Postfix. Postfix is configured to send email through Send In Blue and services that I run in my cluster are configured to talk to Postfix. This allows me to have a single mail relay that I need to maintain the configuration for.

Ingress is provided by Cloudflare and cloudflared tunnels. A tunnel is configured on a different VM I have running and then configured in the Cloudfare side on how to route traffic to the Kube cluster with the correct hostname included.

All said, this setup has proven reliable for me since mid December. In a future post I’ll discuss how I got my private instance to feel a bit more included in the Fediverse by adding relays. Please leave a comment if you feel I missed something or got something wrong.

Quick tip on a rather specific situation I found myself in though I believe it could come up for a lot of people using WordPress trying to integrate with ActivityPub networks. If you are:

  • Running WordPress
  • Using a page caching solution like Cloudflare APO or manually configured
  • Running an ActivityPub plugin and/or webfinger

Then you will likely run into an issue with your site not being reliably discoverable when searched for. Using Matthias Pfefferle‘s ActivityPub, Webfinger and Nodeinfo plugins to get your WordPress site exposed as an ActivityPub server will add a few routes to your site. One of the routes is the author pages of WordPress which exist at /author/<author username>. However, this path when hit with a browser will return HTML. ActivityPub instances on the other hand will be looking for a different content type called application/activity+json. Unfortunately, many caching layers will not provide a Vary on Accept which you will need in order to return different data depending on what type of content the requester is looking for.

To resolve this on my site, which uses Cloudflare for CDN, I added a page rule that disallows caching for my author page. This works because I am the only author on the site. A full “proper” solution would be to set a Vary on the Accept header for that path, which Cloudflare does not support.

You may want to be very specific about what Vary headers are used, on what paths and what you actually accept a Vary header on and so on. Allowing for a wide or unlimited range of values can result in people easily breaking cache at the CDN sending requests to your origin servers.

One of the requirements when setting up a Mastodon instance is that you are able to send outgoing email. If you are running a personal instance you easily get away with running something like Mailhog which will simply capture all emails being sent and present it to you in a nice web interface. While setting up my personal Mastodon instance I decided to setup a real smarthost/relay for my k3s cluster. I did this using Postfix configured to route mail through a smarthost. Search the web for details on how to do this, there are a lot of how-tos out there explaining the process if you are not familiar with it.

In the past I would have used my gmail account as my smtp relay. Earlier in 2022, Gmail removed the ability to do this so I needed to find a replacement. I ended up settling on https://www.sendinblue.com because they offer a free tier that allows for 300 emails per day. Since everything I do in my cluster is personal there is no way I’ll ever hit that limit. Even if I were to hit the limit I don’t mind if the email messages simply stop working until the next day. I found setup to be easy. Simply create an account (giving them a bit of information) and then visiting the SMTP & API page, clicking SMTP and getting my credentials to put into Postfix.

SendInBlue SMTP/API interface

I am not affiliated SendInBlue in any way, just sharing something I found that allows you to quickly setup an SMTP relay for free.