One of the challenges or points of friction for me using Proxmox in my home lab has been integrating Ansible with it more cleanly. The issue is I have traditionally maintained my inventory file manually which is a bit of a hassle. Part of the issue is that Proxmox doesn’t really expose a lot of metadata about the VMs you have running to things like tagging don’t actually exist. Despite that I set out to get a basic, dynamically generated inventory system that will work against my Proxmox installation to make the process at least a bit smoother.

For some time, Ansible has supported the idea of dynamic inventory. This type of inventory will query a backend to build out an inventory that is compliant with Ansible. Proxmox, having an API, has a dynamic inventory plugin available from the community. In this post I will showcase how I got started with a basic Proxmox dynamic inventory.

When I set out I had a few requirements. First, I really don’t have a naming convention of my VMs that makes any sense in DNS. Some systems have a fully qualified domain but most do not. The ones that do have fully qualified domain name wouldn’t actually be available over ssh on the IP resolved for that domain. To get around this, I wanted to be able to map the host name in Proxmox to its internal IP address. By default, the dynamic inventory plugin will set ansible_host to the name of the VM. For this I had to provide a compose entry to set the ansible_host which you’ll see below. This feature is made possible because I always install the qemu guest agent.

The second requirement is that ssh connection info was dynamic as well because I use a number of different operating systems. Since all of my systems use cloud-init I am able to set the ssh username to the ciuser value thus ensuring I always know what the ssh user is regardless of the operating system used.

Here is my dynamic inventory file:

plugin: community.general.proxmox
validate_certs: false
want_facts: true
compose:
  ansible_host: proxmox_agent_interfaces[1]["ip-addresses"][0].split('/')[0]
  ansible_user: proxmox_ciuser

I placed this information into inventory/inventory.proxmox.yaml. Most of the entries are self-explanatory but I will go through what the compose section is doing.

The first item in the compose section is setting the ansible_host. When the inventory plugin gathers information from Proxmox it will gather the assigned IP addresses as determined using the Qemu Guest Agent. In all cases that I could see, the first IP address will be localhost and the second one will always be the primary interface in the system. With information known, I was able to create the jinja2 template to grab the correct IP address and strip the netmask off of it.

The next line is setting the ansible_user by just copying the proxmox_ciuser value. With these two variables set, Ansible will use that username when connecting to the host at its internal IP address. Since the systems were brought up using cloud-init, my ssh key is already present on all of the machines and the connection works without much fuss.

To support this configuration, here is my ansible.cfg:

[defaults]
inventory = ./inventory
fact_caching_connection = .cache
retry_files_enabled = False
host_key_checking = False
forks = 5
fact_caching = jsonfile

[inventory]
cache = True
cache_plugin = jsonfile

[ssh_connection]
pipelining = True
ssh_args = -F ssh_config

This configuration is setting a few options for me related to how to find the inventory, where to cache inventory information and where to cache facts about remote machines. Caching this info greatly speeds up your Ansible runs and I recommend it. The ssh_args value allows me to specify some additional ssh connection info.

In addition to the above configuration files, there are environment variables that are set on my system. These variables define where to find the Proxmox API, what user to connect with and the password. The environment variables are defined on the dynamic inventory plugin page but here is what my variables look like:

PROXMOX_PASSWORD=[redacted]
PROXMOX_URL=https://[redacted]:8006/
PROXMOX_INVALID_CERT=True
PROXMOX_USERNAME=root@pam
PROXMOX_USER=root@pam

The user/username value is duplicated because some other tools rely on PROXMOX_USERNAME instead of PROXMOX_USER.

And that’s it! With this configured I am able to target all of my running hosts by targeting “proxmox_all_running”. For example, ansible proxmox_all_running -m ping will ping all running machines across my Proxmox cluster.

TLDR; The fix for this is to ensure you are forcing your CDN to properly handle “application/activity+json” in the Accept header vs anything else. In other words, you need to Vary on Accept, but it’s best to limit it to “application/activity+json” if you can.

With the release of ActivityPub 1.0.0 plugin for WordPress I hope we’ll see a surge in the number of WordPress sites that can be followed using your favorite ActivityPub based systems like Mastodon and others. However, if you are hosting your WordPress site on Cloudflare (and likely other CDNs) and you have activated full page caching you are going to have a difficult time integrating your blog with the greater Fediverse. This is because when an ActivityPub user on a service like Mastodon performs a search for your profile, that search will land on your WordPress author page looking for additional information in JSON format. If someone has visited your author page recently in a browser then there is the chance Mastodon will get HTML back instead resulting in a broken search. The reverse of this situation can happen too. If a Mastodon user has recently performed a search and later someone lands on your author page, they will see JSON instead of the expected results.

The cause of this is because Cloudflare doesn’t differentiate between a request looking for HTML or one looking for JSON, this information is not factored into how Cloudflare caches the page. Instead, it only sees the author page URL and determines that it is the same request and returns whatever it has. The good news is, with some effort, we can trick Cloudflare into considering what type of content the client is looking for while still allowing for full page caching. Luckily the ActivityPub has a nice undocumented feature to help work around this situation.

To fix this while keeping page caching you will need to use a Cloudflare worker to adjust the request if the Accept header contains “application/activity+json”. I assume you already have page caching in place and you do not have some other plugin on your site that would interfere with page caching, like batcache, WP SuperCache and more. For my site I use Cloudflare’s APO for WordPress and nothing else.

First, you will want to ensure that your “Caching Level” configuration is set to standard. Next, you will need to get setup for working with Cloudflare Workers. You can follow the official guide at https://developers.cloudflare.com/workers/. Next, create a new project, again using their documentation. Next, replace the index.js file contents with:

export default {
  async fetch(req) {
    const acceptHeader = req.headers.get('accept');
    const url = new URL(req.url);

    if (acceptHeader?.indexOf("application/activity+json") > -1) {
      url.searchParams.append("activitypub", "true");
    }

    return fetch(url.toString(), {
      cf: {
        // Always cache this fetch regardless of content type
        // for a max of 5 minutes before revalidating the resource
        cacheTtl: 300,
        cacheEverything: true,
      },
    });
  }
}

You can now publish this using wrangler publish. You can adjust the cacheTtl to something longer or shorter to suite your needs.

Last step is to associate the worker with the /author route of your WordPress site. For my setup I created a worker route of “*dustinrue/author*” and that was it. My site will now cache and return the correct content based on whether or not the Accept header contains “application/activity+json”.

Remember that Cloudflare Workers do cost money though I suspect a lot of small sites will easily fit into the free tier.

When you create a k3s cluster using colima it will default to using Docker for the runtime. This means that any Docker image you build or pull will be available to k3s. This greatly simplifies testing locally built images being referenced by Helm charts or Kustomize (or whatever you are using).

This is not a feature unique to colima, but rather a feature of k3s if it is told to use the Docker runtime. You can read more at https://docs.k3s.io/advanced#using-docker-as-the-container-runtime.

If you have an older Kubernetes cluster with older Helm based software installs and you aren’t paying attention, it can be easy to leave some resources in a state that are impossible to update or remove. This is because of APIs that have been deprecated and then removed. While facing this issue today, I found this Helm plugin exists which can help resolve the problem – https://github.com/helm/helm-mapkubeapis

Pushed some updates to my proxmox packer project at https://github.com/dustinrue/proxmox-packer. The updates include effort to support Packer 1.9.2 and the most recent Proxmox plugin. The makefile has been updated to run the necessary init command to get the Proxmox plugin installed as well.

More importantly, the changes fix an issue that prevented the provisioner from running leading to broken or missing cloud-init support in the resulting templates.

In this post I’m going to more or less drop some notes about how I went about debugging a WireGuard VPN issue I was having. I have a WireGuard based VPN running on Rocky Linux 9 which is basically a default minimal installation with WireGuard installed. The system ships with firewalld and nftables, which will be important later. firewalld and nftables are used on my installation of WireGuard to work properly and I have a number of PostUp and PostDown commands that are run to insert rules so that VPN clients are NAT’d properly. As an older Linux user, I am very comfortable with iptables but significantly less so with firewalld and especially nftables.

My adventure began after a system update that prevented data from passing through the connection. Due to how WireGuard works, it appears the connection is made but no data would flow. After confirming my IP address had not changed recently and I was indeed connecting to the system at all I still couldn’t get traffic to pass through VPN.

The first thing I set out to do was verify connectivity with the service. Starting with tcpdump -i any port 51820 I was surprised that I wasn’t seeing any traffic to the service. I was surprised by this because there were no rules present in iptables to suggest the port was blocked and yet I would also get messages stating the port was administratively closed. In an effort to confirm this I wanted to see if I could get WireGuard to log what it was doing. As it turns out, despite being in the kernel as a module, there is a way to make it output logs. The following will set the module’s debug mode to on:

echo 'module wireguard +p' | sudo tee /sys/kernel/debug/dynamic_debug/control

Unfortunately I cannot share these logs but it will provide information about peers coming and going and efforts to maintain the connection. After enabling debug I found…no entries. Very unusual. My next step was to remove firewalld, and be extension nftables. Once removed, there were no firewall rules on the system at all and it was wide open. At this point I was able to see debug messages from WireGuard showing that new peers were connecting. As expected, the VPN still didn’t work because the required NAT rules were missing but this did finally confirm that the problem was with firewalld.

At this point it hit me that firewalld works differently than I was thinking and it has a concept of services. These services won’t appear in an iptables listing. At this point I had a feeling I knew what I had to do. I reinstalled firewalld, enabled and started it and looked at the services it was set to allow:

firewall-cmd --list-all
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: ens18
  sources:
  services: cockpit dhcpv6-client ssh
  ports:
  protocols:
  forward: yes
  masquerade: yes
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

Immediately it became obvious that wireguard was not included in the list. Simply running firewall-cmd --zone=public --add-service=wireguard --permanent added the service to my public zone and from here my VPN started working again.

I have provided my thoughts in other places (Mastodon) about how I dislike firewalld and nftables. I find them much more tedious than iptables and haven’t ever really taken the time to learn them. As distributions change and mature over time, a lot of the default settings I took for granted, like a system’s networking being otherwise wide open after an install, are no longer true. At this point I am being forced to learn these tools, which is actually a good thing.

Long title but a relatively quick TIL. In a helm chart, when specifying a storageClass in your templates, it is important that if the user does not set a storageClass that you do not output storageClass in your template for a persistent volume. That is say that this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ""
  resources:
    requests:
      storage: 5Gi

Is not equivalent to this:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: volume-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

When installed the first time, both will result in Kubernetes selecting your default storage class. When this happens, your resource is actually updated to reflect the selected storageClass in the API. When you apply the same file again during an upgrade, an empty storageClassName will be seen as a change from the default storage to “”, which is not what you want. Instead, you should not send storageClassName at all if the user did not specify one.

TIL that you can tell systemd to run any ExecStartPre and ExecStopPost scripts as root instead of the user the service is supposed to run under.

At the same time we can touch on how to create an override for a service. In my case, I wanted to override how Redis is started on a system to ensure hugepage was set correctly, per their documentation. Creating an override for a service is super simple:

systemctl edit <service>

In my case this means:

systemctl edit redis

Since I know that the name of the service I want to edit is redis. From here, I am presented with my favorite editor where I can input the following:

[Service]
PermissionsStartOnly=true
ExecStartPre=/usr/local/sbin/hugepage.sh

Here I am defining two things. First, I want the permissions to only apply to the ExecStart command, not the others. Next, I am specifying an ExecStartPre that calls a script. This script is simply outputting:

/bin/echo never > /sys/kernel/mm/transparent_hugepage/enabled

Once complete, save out the file and restart your service. Your changes will now take affect.

If you are running a newer release of systemd (231+) then you can use the following format as well:

ExecStartPre=+/usr/local/sbin/hugepage.sh

Yesterday I was reminded that when a URL is shared on Mastodon, every instance that has a user following you, that server will make a request to your site at least once in an effort to get some additional embed information. If your site is WordPress based, like this one, then you will likely see two requests. The first request to your site will request the URL that was added to the post while the second one follows any embed information WordPress is exposing in order to get some additional meta data. Since Mastodon is a federated system, every Mastodon server or instance will need to gather this data in order for it to be displayed to its users.

If you are a user that has a lot of followers then posting a link to your blog or site will likely result in a mini DDoS has hundreds of Mastodon instances request this information from your server. If you have not taken precautions this can potentially take down your site as it is overloaded with requests! Years ago this would have been referred to as being “slash dotted” (links on https://slashdot.org) or “fireballed” (links on https://daringfireball.net).

Fortunately you can very effectively deal with this situation on your own or by working with your hosting provider. In this post, I am going to describe how I handle the situation using Cloudflare, which is the CDN provider I have chosen to put my site behind. I am not going into full detail on how to implement all options and I am not selling Cloudflare or associated with them beyond being a customer. What I share here will be applicable to any CDN or will at least serve as inspiration for how to handle it in your configuration.

As I said previously, this site is using WordPress and is behind Cloudflare. To make it easy on myself I have also purchased their Automatic Platform Optimization for WordPress feature. I got into this option initially because I wanted to understand it better but have since kept it because it works well. The biggest feature of APO for WordPress is that it enables full page caching for your site. This is a must if you want to get the best possible experience for users globally. Using APO is absolutely not necessary, you can simply use Nginx micro caching instead or any other caching solution, but the key here is to have full page caching so that repeated requests to your site do not incur actual processing time by WordPress.

APO will, out of the box, cache full pages of your site but what it will not protect is the meta data URL used to provide additional information for embeds. To prevent Mastodon servers from crushing your site with embed meta data requests, there is one additional endpoint you need to force to be cached. Here is how I forced Cloudflare to cache the correct URL for me.

Login into Cloudflare and click on the domain for your site. Find the caching section of the menu and click on Cache Rules. Add a new rule and define what is shown in the screenshot

Screenshot showing a cloudflare configuration screen for caching an oembed request from Mastodon, or any system that would do this. Add a name for your rule, set the Field to URI Path contains the path /wp-json/oembed

From here, tell Cloudflare what to do with this match

Screenshot showing another Cloudflare configuration screen. Here you should set the Cache status to "Eligible for cache" and "Override origin" set to 2 hours. 2 hours is the minimum option on a free plan

Note that 2 hours is the lowest cache time I can specify on an otherwise free Cloudflare plan so that is what I set it to. With these options filled out you can click save and you are done. Anything looking for this URL will now either get a cached copy of the response or will cause the content to be cached for future requests.

Of course, you don’t need to use Cloudflare to make this work. Savvy users can also translate these URLs to Nginx or Apache configuration to perform the trick. The goal is to ensure your WordPress site is better able to handle when you have shared a link to Mastodon and there are many options. Using Cloudflare is one option that has worked well for me. I encourage everyone that hosts a blog, either self-hosted or through some managed provider, to ensure that page caching and the oembed URL for WordPress is cached.

If you run a private instance of Mastodon it can feel mighty lonely sometimes. This is due to an inherent design characteristic of Mastodon and federated services in general…how does one instance get information from another instance? Typically, if you have a user on your instance that follows someone else then that information will be added to your instance, along with any hashtags they use and so on. If you run a small server, then obvious there are far fewer ways for information to flow to your instance.

Solving this problem is a matter of ensuring more data is flowing into your instance so that it can then see more content, and most importantly as of version 4.0 of Mastodon, additional hashtags. The easiest way of doing this (aside from running a large instance) is to use relays.

Relays, in essence, take in feeds from a number of instances and passes them to other instances attached to the relay. Finding a relay to add to your instance is as easy as going to https://relaylist.com and picking one or more to add to your instance. Relay servers usually support both Pleroma as well as Mastodon but remember the information you add to each is different. Be sure to add the right URL.

Screenshot showing relay URLs

You can subscribe to a relay by visiting /admin/relays in the admin dashboard of your instance and clicking the “Add New Relay” button. Simply pick a relay server from the list and add it using the correct URL. For Mastodon, the URL will end with /inbox in almost every case. After a short while you will begin to see your Federated timeline be populated with posts from other instances. Your instance will now see a lot of new content including hashtags that you can follow.

Keep in mind that bringing in this cost does have a cost associated with it. All instances of Mastodon will store everything it sees locally, including post content and media like images and video. It is a good idea to set retention limits on your system so that you are not storing everything ever seen forever. On my system I set my retention limits to 14 days for media and content cache and 7 days for user archives. You will find your instance’s content retention policy at /admin/settings/content_retention.

In addition to using more disk space (and bandwidth transferring all that media) you will also incur more processing time on your instance. The amount of space and processing you need depends heavily on which or how many relay services you add to your instance. For my instance I struck a balance between getting enough data flowing so that hashtags were interesting but not so much that I increased my costs unnecessarily. You can track this information in both your admin dashboard at /admin/dashboard (scroll to the bottom) as well as your chosen object storage provider (you did set one up right?).

If you are running a private instance and feel a bit left out I hope this helps you get the activity you are looking for. Also remember to follow a lot of people and boost content you like instead of just liking it. This will lead to more followers for yourself, more interactions and a more interesting timeline!