Tag: mastodon

WordPress ActivityPub and Cloudflare

September 16, 2023 by Dustin Rue·6 Comments

TLDR; The fix for this is to ensure you are forcing your CDN to properly handle “application/activity+json” in the Accept header vs anything else. In other words, you need to Vary on Accept, but it’s best to limit it to “application/activity+json” if you can.

With the release of ActivityPub 1.0.0 plugin for WordPress I hope we’ll see a surge in the number of WordPress sites that can be followed using your favorite ActivityPub based systems like Mastodon and others. However, if you are hosting your WordPress site on Cloudflare (and likely other CDNs) and you have activated full page caching you are going to have a difficult time integrating your blog with the greater Fediverse. This is because when an ActivityPub user on a service like Mastodon performs a search for your profile, that search will land on your WordPress author page looking for additional information in JSON format. If someone has visited your author page recently in a browser then there is the chance Mastodon will get HTML back instead resulting in a broken search. The reverse of this situation can happen too. If a Mastodon user has recently performed a search and later someone lands on your author page, they will see JSON instead of the expected results.

The cause of this is because Cloudflare doesn’t differentiate between a request looking for HTML or one looking for JSON, this information is not factored into how Cloudflare caches the page. Instead, it only sees the author page URL and determines that it is the same request and returns whatever it has. The good news is, with some effort, we can trick Cloudflare into considering what type of content the client is looking for while still allowing for full page caching. Luckily the ActivityPub has a nice undocumented feature to help work around this situation.

To fix this while keeping page caching you will need to use a Cloudflare worker to adjust the request if the Accept header contains “application/activity+json”. I assume you already have page caching in place and you do not have some other plugin on your site that would interfere with page caching, like batcache, WP SuperCache and more. For my site I use Cloudflare’s APO for WordPress and nothing else.

First, you will want to ensure that your “Caching Level” configuration is set to standard. Next, you will need to get setup for working with Cloudflare Workers. You can follow the official guide at https://developers.cloudflare.com/workers/. Next, create a new project, again using their documentation. Next, replace the index.js file contents with:

export default {
  async fetch(req) {
    const acceptHeader = req.headers.get('accept');
    const url = new URL(req.url);

    if (acceptHeader?.indexOf("application/activity+json") > -1) {
      url.searchParams.append("activitypub", "true");
    }

    return fetch(url.toString(), {
      cf: {
        // Always cache this fetch regardless of content type
        // for a max of 5 minutes before revalidating the resource
        cacheTtl: 300,
        cacheEverything: true,
      },
    });
  }
}

You can now publish this using wrangler publish. You can adjust the cacheTtl to something longer or shorter to suite your needs.

Last step is to associate the worker with the /author route of your WordPress site. For my setup I created a worker route of “*dustinrue/author*” and that was it. My site will now cache and return the correct content based on whether or not the Accept header contains “application/activity+json”.

Remember that Cloudflare Workers do cost money though I suspect a lot of small sites will easily fit into the free tier.

Simplistic method for blocking http requests in WordPress

March 19, 2023 by Dustin Rue·3 Comments

One thing I dislike in WordPress is that it makes numerous external http requests while in the admin. This happens even if you have disabled any auto update systems in wp-config.php and can cause small pauses while loading admin pages while you wait for the requests to finish. Since I manage my site through a Gitlab based CI/CD workflow, auto updates don’t make a lot of sense for me and I would prefer to not have WordPress core or themes phoning home and slowing down the admin experience.

There is an existing option for blocking http requests in WordPress and it presented as a pair of defines you can use to block all requests and then allow some. These defines are WP_HTTP_BLOCK_EXTERNAL and WP_ACCESSIBLE_HOSTS which are describe in more depth at https://developer.wordpress.org/reference/classes/wp_http/block_request/. This a great way to block requests and generally the way to do something like this, block everything and then allow what you want. However, for my situation there is a much smaller set of domains I want to block and then allow everything else. In other words, I want to do the opposite of what these defines can do you for you. This is because there are a number of external services I do want to interact with like Cloudflare and Mastodon.

What I came up with was an mu-plugin that reverses the logic of defines above. It is an almost 1:1 copy/paste of the code that is used to block some requests. I then define a list of domains I wish to block. The code is very simple:

<?php
function block_urls( $preempt, $parsed_args, $uri ) {

    if ( ! defined( 'WP_BLOCKED_HOSTS' ) ) {
      return false;
    }

    $check = parse_url( $uri );
    if ( ! $check ) {
      return false;
    }

    static $blocked_hosts = null;
    static $wildcard_regex   = array();
    if ( null === $blocked_hosts ) {
        $blocked_hosts = preg_split( '|,\s*|', WP_BLOCKED_HOSTS );
        if ( false !== strpos( WP_BLOCKED_HOSTS, '*' ) ) {
          $wildcard_regex = array();
          foreach ( $blocked_hosts as $host ) {
            $wildcard_regex[] = str_replace( '\*', '.+', preg_quote( $host, '/' ) );
          }
          $wildcard_regex = '/^(' . implode( '|', $wildcard_regex ) . ')$/i';
        }
    }

    if ( ! empty( $wildcard_regex ) ) {
      $results = preg_match( $wildcard_regex, $check['host'] );
      if ($results > 0) {
        error_log(sprintf("Blocking %s://%s%s", $check['scheme'], $check['host'], $check['path']));
      } else {
        error_log(sprintf("Allowing %s://%s%s", $check['scheme'], $check['host'], $check['path']));
      }

      return $results > 0;
    } else {
      $results = in_array( $check['host'], $blocked_hosts, true ); // Inverse logic, if it's in the array, then block it.

      if ($results) {
        error_log(sprintf("Blocking %s://%s%s", $check['scheme'], $check['host'], $check['path']));
      } else {
        error_log(sprintf("Allowing %s://%s%s", $check['scheme'], $check['host'], $check['path']));
      }
      return $results;
    }
}

add_filter('pre_http_request', 'block_urls', 10, 3);

With this code saved in your mu-plugins directory as blocked-urls.php, you can then add a define like this to block those URLs from being loaded by WordPress:

define( 'WP_BLOCKED_HOSTS', 'api.wordpress.org,themeisle.com,*.themeisle.com' );

When WordPress attempts to load URLs from these domains, they will be blocked. You’ll also notice that this plugin is outputting all http requests that pass through WordPress core’s remote_get function. Using this information, you can block additional domains if you need to.

Caching WordPress author pages when using ActivityPub

March 12, 2023 by Dustin Rue·2 Comments

Update – I have also posted an alternative method that preserves the ability to have full page caching enabled. Please find it at WordPress ActivityPub and Cloudflare.

In a previous post I discussed how to deal with the fact the ActivityPub plugin for WordPress must return author pages in a different format depending on the value of the Accept header. A browser hitting an author page is going to be looking for HTML to be returned, while Mastodon will expect a JSON instead. If you use any kind of caching system be it a CDN, special plugin or combination of the two then you may run into an issue where the wrong content is being cached for each Accept header type. You might see this in your site health report with:

Your author URL does not return valid JSON for application/activity+json. Please check if your hosting supports alternate Accept headers.

In this post I will discuss a method for dealing with this while not totally losing the ability to cache the response. This is useful for busy sites or as a way to help mitigate some forms of DoS attack. The example I provide here is meant for Nginx with php-fpm but you can apply this same sort of thinking anywhere else where you have enough control over the configuration to make it work.

Assuming you followed the previous post and have created an exception for your author URL in your CDN then it is on your server to render author pages each time a request is made. This is a waste of resources and doesn’t provide an ideal experience for end users. To enable caching on this endpoint, we will leverage Nginx’s built in caching capability while setting the cache key based on the Accept header.

To start, let’s setup basic Nginx caching. At the top of your configuration file, outside of the server{} block, (advanced users can adjust as desired) add the following:

fastcgi_cache_path /etc/nginx/cache levels=1:2 keys_zone=wordpress:100m inactive=10m max_size=100m;

You can adjust the path if you want but in essence we are defining a path of /etc/nginx/cache with a name of wordpress. We are limiting it to 100MB and saying delete anything older than 10 minutes if it hasn’t been accessed. /etc/nginx/cache must exist and must be owned by the same user that runs Nginx. If you have multiple servers know that this cache is unlikely to be shared so each server will have a unique cache.

Next, add a map that to define what Accept headers we want to Vary on:

map $http_accept $vary_key {
  default "default";
  "~application/activity\+json" "json";
}

This block will create a new variable we can use later called $vary_key. Notice here that we will only create a different cache entry when application/activity+json is sent included in the list.

Now inside the server{} block for your site, let’s add a nice header we can use to ensure our caching is working properly. Adding add_header X-Nginx-Cache $upstream_cache_status; to this section will cause Nginx to output a header we can see to know the cache status. It will be BYPASS, MISS or HIT in response headers.

Next, inside the location block that is handling PHP requests, add the following config options:

# cache key
fastcgi_cache_key "$vary_key$host$request_method$request_uri";

# matches keys_zone in fastcgi_cache_path
fastcgi_cache wordpress;

# don't cache pages defined earlier
fastcgi_no_cache $no_cache;

#defines the default cache time
fastcgi_cache_valid any 10m;

# misc additional settings
fastcgi_cache_use_stale updating error timeout invalid_header http_500;
fastcgi_cache_lock on;
fastcgi_cache_lock_timeout 10s;

The next settings depend on what you want to do. If you are using a CDN and are only exposing author pages then you can use the following settings

# Cache nothing by default
set $no_cache 1;

# Only cache author pages
if ($request_uri ~* "/author/") {
  set $no_cache 0;
}

If instead, you want to cache everything your CDN might miss then you can use this (this is what I use):

# Cache everything by default
set $no_cache 0;

# Don't cache logged in users or commenters
if ( $http_cookie ~* "comment_author_|wordpress_(?!test_cookie)|wp-postpass_" ) {
  set $no_cache 1;
}

# Don't cache the following URLs
if ($request_uri ~* "/(wp-admin/|wp-login.php)") {
  set $no_cache 1;
}

If done correctly then hitting an author page will result in different results depending on the Accept header being used. To verify, take an author page and load it up in a browser. You should get a proper HTML page. Copy the URL out and, using curl, send the following:

curl -I https://dustinrue.com/author/ruedu/ -H "Accept: application/activity+json"

x-nginx-cache: MISS

Your Nginx cache status may already be HIT if someone recently searched for you. It should be HIT if you send the request again.

Debugging

It is important that you debug and properly resolve this endpoint. Failing to do so will result in failed searches of your author/user from ActivityPub clients. To be clear, the following must return different content:

curl -I https://dustinrue.com/author/ruedu
curl -I https://dustinrue.com/author/ruedu -H "Accept: application/activity+json"

Adjust these URLs for your author page URLs and ensure the first curl returns HTML content while the second one returns JSON content. While writing this post I noticed that the plugin is still outputting that the content type is text/html when it should say application/activity+json. Despite this inconsistency, clients will use the returned content.

If the curl calls are returning different content, next pay attention to the x-nginx-cache header to ensure that it is actually caching. You can add another utility header to your Nginx config to assist with this:

add_header x-accept $vary_key always;

This add_header will output what value the map landed on so you can ensure things are being picked up properly.

Conclusion

I hope this is enough to help guide you in improving your WordPress + ActivityPub experience.

Moving on from Twitter

February 12, 2023 by Dustin Rue·0 Comments

Quick note about having all but completely left Twitter. Posted here because I can’t actually tell you where I went because of their policies will not allow me to link to my new profile.

About mid-October of 2022 I decided that Twitter was no longer a place I wanted to use as my primary micro-blogging platform. There are many reasons for this but it mostly comes down to the behavior of the CEO, his treatment of employees that work there and the way he is handling the platform. When Elon Musk bought Twitter I was hopeful that he would do a good job with it but it was quickly squashed with news of unreasonable demands of employees, allowing certain people back on the platform and more. It is within his right to do with the platform as he pleases and it remains in my right to choose not to be a part of it. I don’t fault anyone who chooses to remain on the platform but for me it was time to move on.

Instead, I have found a new home on the Mastodon network. You can read about it in this excellently written article at https://tidbits.com/2023/01/27/mastodon-a-new-hope-for-social-networking/ which goes into detail about what it is, how it works and even how to use it. My profile on Mastodon, hosted on my own private instance of the software, is at https://mastodon.dustinrue.com/@dustinrue. After getting past the initial confusion of how the platform works I have settled in and found that I enjoy the platform more than I have Twitter. I appreciate the federated nature, the fact I can block entire instances, being able to follow hashtags, being able to self-host and more.

Mastodon as a platform is still relatively young that has recently seen explosive growth due to people being equally disenfranchised with Twitter and looking for alternatives. I highly recommend people give it a try and see if it is the right fit for them!

Avoiding stampeding Mastodons

February 9, 2023 by Dustin Rue·0 Comments

Yesterday I was reminded that when a URL is shared on Mastodon, every instance that has a user following you, that server will make a request to your site at least once in an effort to get some additional embed information. If your site is WordPress based, like this one, then you will likely see two requests. The first request to your site will request the URL that was added to the post while the second one follows any embed information WordPress is exposing in order to get some additional meta data. Since Mastodon is a federated system, every Mastodon server or instance will need to gather this data in order for it to be displayed to its users.

If you are a user that has a lot of followers then posting a link to your blog or site will likely result in a mini DDoS has hundreds of Mastodon instances request this information from your server. If you have not taken precautions this can potentially take down your site as it is overloaded with requests! Years ago this would have been referred to as being “slash dotted” (links on https://slashdot.org) or “fireballed” (links on https://daringfireball.net).

Fortunately you can very effectively deal with this situation on your own or by working with your hosting provider. In this post, I am going to describe how I handle the situation using Cloudflare, which is the CDN provider I have chosen to put my site behind. I am not going into full detail on how to implement all options and I am not selling Cloudflare or associated with them beyond being a customer. What I share here will be applicable to any CDN or will at least serve as inspiration for how to handle it in your configuration.

As I said previously, this site is using WordPress and is behind Cloudflare. To make it easy on myself I have also purchased their Automatic Platform Optimization for WordPress feature. I got into this option initially because I wanted to understand it better but have since kept it because it works well. The biggest feature of APO for WordPress is that it enables full page caching for your site. This is a must if you want to get the best possible experience for users globally. Using APO is absolutely not necessary, you can simply use Nginx micro caching instead or any other caching solution, but the key here is to have full page caching so that repeated requests to your site do not incur actual processing time by WordPress.

APO will, out of the box, cache full pages of your site but what it will not protect is the meta data URL used to provide additional information for embeds. To prevent Mastodon servers from crushing your site with embed meta data requests, there is one additional endpoint you need to force to be cached. Here is how I forced Cloudflare to cache the correct URL for me.

Login into Cloudflare and click on the domain for your site. Find the caching section of the menu and click on Cache Rules. Add a new rule and define what is shown in the screenshot

Screenshot showing a cloudflare configuration screen for caching an oembed request from Mastodon, or any system that would do this. Add a name for your rule, set the Field to URI Path contains the path /wp-json/oembed

From here, tell Cloudflare what to do with this match

Screenshot showing another Cloudflare configuration screen. Here you should set the Cache status to "Eligible for cache" and "Override origin" set to 2 hours. 2 hours is the minimum option on a free plan

Note that 2 hours is the lowest cache time I can specify on an otherwise free Cloudflare plan so that is what I set it to. With these options filled out you can click save and you are done. Anything looking for this URL will now either get a cached copy of the response or will cause the content to be cached for future requests.

Of course, you don’t need to use Cloudflare to make this work. Savvy users can also translate these URLs to Nginx or Apache configuration to perform the trick. The goal is to ensure your WordPress site is better able to handle when you have shared a link to Mastodon and there are many options. Using Cloudflare is one option that has worked well for me. I encourage everyone that hosts a blog, either self-hosted or through some managed provider, to ensure that page caching and the oembed URL for WordPress is cached.

Adding relays to your Mastodon instance

January 7, 2023 by Dustin Rue·0 Comments

If you run a private instance of Mastodon it can feel mighty lonely sometimes. This is due to an inherent design characteristic of Mastodon and federated services in general…how does one instance get information from another instance? Typically, if you have a user on your instance that follows someone else then that information will be added to your instance, along with any hashtags they use and so on. If you run a small server, then obvious there are far fewer ways for information to flow to your instance.

Solving this problem is a matter of ensuring more data is flowing into your instance so that it can then see more content, and most importantly as of version 4.0 of Mastodon, additional hashtags. The easiest way of doing this (aside from running a large instance) is to use relays.

Relays, in essence, take in feeds from a number of instances and passes them to other instances attached to the relay. Finding a relay to add to your instance is as easy as going to https://relaylist.com and picking one or more to add to your instance. Relay servers usually support both Pleroma as well as Mastodon but remember the information you add to each is different. Be sure to add the right URL.

You can subscribe to a relay by visiting /admin/relays in the admin dashboard of your instance and clicking the “Add New Relay” button. Simply pick a relay server from the list and add it using the correct URL. For Mastodon, the URL will end with /inbox in almost every case. After a short while you will begin to see your Federated timeline be populated with posts from other instances. Your instance will now see a lot of new content including hashtags that you can follow.

Keep in mind that bringing in this cost does have a cost associated with it. All instances of Mastodon will store everything it sees locally, including post content and media like images and video. It is a good idea to set retention limits on your system so that you are not storing everything ever seen forever. On my system I set my retention limits to 14 days for media and content cache and 7 days for user archives. You will find your instance’s content retention policy at /admin/settings/content_retention.

In addition to using more disk space (and bandwidth transferring all that media) you will also incur more processing time on your instance. The amount of space and processing you need depends heavily on which or how many relay services you add to your instance. For my instance I struck a balance between getting enough data flowing so that hashtags were interesting but not so much that I increased my costs unnecessarily. You can track this information in both your admin dashboard at /admin/dashboard (scroll to the bottom) as well as your chosen object storage provider (you did set one up right?).

If you are running a private instance and feel a bit left out I hope this helps you get the activity you are looking for. Also remember to follow a lot of people and boost content you like instead of just liking it. This will lead to more followers for yourself, more interactions and a more interesting timeline!

My experience running a private Mastodon instance

January 7, 2023 by Dustin Rue·0 Comments

Since about mid December 2022 I have been running my own private instance of Mastodon. I thought I would detail how I did it and what it has cost me so far.

When I first learned about Mastodon I was excited to get to understand it better, particularly how it is hosted and scaled. For Mastodon, I decided right away that the best way to better my understanding was to host it myself and to do so on my favorite platform, Kubernetes. I started by creating my helm chart (https://github.com/dustinrue/mastodon-helm-chart) and installed the core software in my home lab which consists of k3s. The chart I created is based on the official helm chart (https://github.com/mastodon/chart). I created my own because I, again, wanted to learn about the moving pieces of a Mastodon installation but also because I was unhappy with the official chart integrating Redis and PostgreSQL as dependencies. In addition, it doesn’t break out the Sidekiq processes in a way that makes sense…but more on that later.

Before we can get to deep into what I did, we should probably first discuss some of the major components of a Mastodon instance or server. Mastodon is a collection of services working together to form a full solution which includes:

A web service which provides the user interface but is also the sort of API server for all things Mastodon. In a full production setup it is important that this be highly available.
A streaming service which feeds data to the web frontend as it arrives and is processed. This is almost important but doesn’t seem to be critical. In other words, you can survive a bit of downtime here, you’ll just have a less than great experience.
A number of Sidekiq queues. There are numerous Sidekiq queues which are the heart of how data moves in a Mastodon instance. These queues, as of this writing, include a scheduler, ingress, mailer, push, pull and default. Each queue has a specific purpose and each queue is again not absolutely critical to the availability of your Mastodon instance. This means that you can easily take down each queue temporarily to deal with some issue. While a queue is down know that nothing that queue is responsible for will be processed. The special scheduler queue, if not running, will likely prevent most other queues from doing anything at all.
Redis is a glue that keeps data flowing between processes. It is also a critical piece to keep running though losing data within it, while not ideal, is ok. Keeping it running is critical because all of the other Mastodon processes expect it to be available and will fail to start without it. In a full production setup I recommend ensuring it is running in a highly available fashion.
PostgreSQL is the last required piece of software when running Mastodon. Like Redis, it is what I could consider to be critical to your setup. If running a full production setup you will want to cluster it to maintain availability first with performance a secondary consideration.
You need some system for dealing with email. Mastodon needs to send email for account confirmation and some administrative or moderation work. For my system I am using Send In Blue (https://www.sendinblue.com) which has a free tier.

Mastodon also supports other, optional services which you can read about at https://docs.joinmastodon.org/admin/optional/.

As you can likely see, running Mastodon is not simple yet it isn’t overwhelming either. I believe running Mastodon can be done inexpensively, especially a private instance, but to run it in production correctly, there is definitely a base cost you need to consider so that you can remove as much failure points as possible. In addition, there are many other pieces you will likely want if running a large installation like how to monitor metrics, keeping track of Sidekiq queue depth and processing times and more.

Having spent some time on Mastodon during the great Twitter migration I witnessed some of the struggles of a number of instance admins as a their instances struggled to meet the demands of new users and users who had created accounts before but were suddenly active. I saw a few notable patterns emerge that contributed to their scaling woes including:

Not using a CDN or object storage system initially
Not installing pgbouncer in front of PostgreSQL
Not installing Sidekiq into separate processes running each queue

There are some really excellent guides and references on how to scale Mastodon (https://hazelweakly.me/blog/scaling-mastodon/ to name but one) but many of the recommendations will require you to do or have done one, if not all, of the above mentioned steps. Each of these items are disruptive in a way that you probably do not want to be trying to handle them while in a panic of trying to get your instance running again. If you are running or plan to run a public instance where you allow anyone to sign up then I highly recommend getting at least those three items out of the way from day one. Doing so will help ensure that scaling up from there is much, much easier as most items will then become adding additional servers to run more Sidekiq processes or tuning parameters.

When I created my helm chart, I took these lessons and applied them as conscious decisions in the design of the chart. Though not at all necessary for a small or single user instance, my chart breaks out all of the current Sidekiq queues into separate processes. This layout ensures the hard work of separating the processes out is done and the rest is a matter of scaling and tuning.

As of this writing, my helm chart also installs a weekly cronjob to clean up media files and, optionally, a cronjob for backing up the database to some shared storage in your Kubernetes cluster. Though it is ultimately incomplete, I feel the helm chart is a good start.

As for actually running Mastodon for myself I created a subdomain for my instance to live at. I then installed Mastodon, using my helm chart, into my k3s cluster. Ignoring the cost of my ISP and the computers I have, running Mastodon is quite minimal. My home lab provides everything I need to make Mastodon work including persistent storage using TrueNAS. For media storage, I created a Cloudflare R2 bucket and URL for public access. Mastodon is configured to send media content to R2 which is then served from the CDN URL. This keeps all of the heavy storage separate from the rest of the system. My last bill for R2 was just $0.06 which was for the approximately 20GB of content I have stored there. I do expect my next bill to be more because the average amount of data stored in R2 will be higher.

Since my installation is just a private one, I installed PostgreSQL and Redis as single instances within my k3s cluster. Both instances are extremely basic Bitnami based installed using their available helm charts. PostgreSQL is backed by persistent storage provided by TrueNAS. For email, my k3s cluster runs an installation of Postfix. Postfix is configured to send email through Send In Blue and services that I run in my cluster are configured to talk to Postfix. This allows me to have a single mail relay that I need to maintain the configuration for.

Ingress is provided by Cloudflare and cloudflared tunnels. A tunnel is configured on a different VM I have running and then configured in the Cloudfare side on how to route traffic to the Kube cluster with the correct hostname included.

All said, this setup has proven reliable for me since mid December. In a future post I’ll discuss how I got my private instance to feel a bit more included in the Fediverse by adding relays. Please leave a comment if you feel I missed something or got something wrong.

WordPress, Fediverse and Caching

December 17, 2022 by Dustin Rue·3 Comments

Quick tip on a rather specific situation I found myself in though I believe it could come up for a lot of people using WordPress trying to integrate with ActivityPub networks. If you are:

Running WordPress
Using a page caching solution like Cloudflare APO or manually configured
Running an ActivityPub plugin and/or webfinger

Then you will likely run into an issue with your site not being reliably discoverable when searched for. Using Matthias Pfefferle‘s ActivityPub, Webfinger and Nodeinfo plugins to get your WordPress site exposed as an ActivityPub server will add a few routes to your site. One of the routes is the author pages of WordPress which exist at /author/<author username>. However, this path when hit with a browser will return HTML. ActivityPub instances on the other hand will be looking for a different content type called application/activity+json. Unfortunately, many caching layers will not provide a Vary on Accept which you will need in order to return different data depending on what type of content the requester is looking for.

To resolve this on my site, which uses Cloudflare for CDN, I added a page rule that disallows caching for my author page. This works because I am the only author on the site. A full “proper” solution would be to set a Vary on the Accept header for that path, which Cloudflare does not support.

You may want to be very specific about what Vary headers are used, on what paths and what you actually accept a Vary header on and so on. Allowing for a wide or unlimited range of values can result in people easily breaking cache at the CDN sending requests to your origin servers.

SMTP Smarthost

November 26, 2022 by Dustin Rue·0 Comments

One of the requirements when setting up a Mastodon instance is that you are able to send outgoing email. If you are running a personal instance you easily get away with running something like Mailhog which will simply capture all emails being sent and present it to you in a nice web interface. While setting up my personal Mastodon instance I decided to setup a real smarthost/relay for my k3s cluster. I did this using Postfix configured to route mail through a smarthost. Search the web for details on how to do this, there are a lot of how-tos out there explaining the process if you are not familiar with it.

In the past I would have used my gmail account as my smtp relay. Earlier in 2022, Gmail removed the ability to do this so I needed to find a replacement. I ended up settling on https://www.sendinblue.com because they offer a free tier that allows for 300 emails per day. Since everything I do in my cluster is personal there is no way I’ll ever hit that limit. Even if I were to hit the limit I don’t mind if the email messages simply stop working until the next day. I found setup to be easy. Simply create an account (giving them a bit of information) and then visiting the SMTP & API page, clicking SMTP and getting my credentials to put into Postfix.

I am not affiliated SendInBlue in any way, just sharing something I found that allows you to quickly setup an SMTP relay for free.

Getting to know Mastodon

November 25, 2022 by Dustin Rue·0 Comments

In this post I’m going to go over some of the things I have picked up about Mastodon over the past few weeks. Mastodon is better described right from the source but in very basic terms, it is a sort of distributed, better known as “federated“, Twitter like system comprised of any number of hosts communicating together using a standard set of protocols. Users of Mastodon exist on separate installations called instances that each have their own core topic or target audience. For example, I am currently on https://fosstodon.org which is focused on bringing people interested in Fee Open Source Software together. Being on an instance does not mean you are only able to interact with people on that instance or go off topic. No, every instance (unless blocked by the instance admin, more on that later) is connected to other instances by way of users following each other. What this means is if I follow someone on a different instance, the instance I am on becomes aware of it and will begin to draw in posts form that instance. In a way, the more people cross follow each other across instances the stronger the bond and the more data flows across all of them.

Larger instances with a lot of users will have the most diverse feeds available to you as the user. So there is a certain amount of advantage to being part of a large community as the pool of potential people you can interact with is naturally larger. As each instance has multiple feeds, consisting of your local feed of people you have chosen to follow, the local feed consisting of everyone on your instance and the federated feed containing everything the server has discovered from its users, you can change your social media experience on the fly. In addition to the multiple feeds, you have powerful self moderation tools including the ones you would expect like being able to mute or outright block users but you can also block entire instances with a couple of clicks. Just not into people posting Waifu stuff? Block entire instances dedicated to that material. Even though it may not violate any acceptable use policies you are able to quickly block entire types of content from your feed.

The most recent release of Mastodon allows you to follow hashtags allowing you to easily follow topics you like, so long as users tag their posts properly. This is an extra layer of awesomeness that is the Mastodon system. Which brings me to one of my last points about using Mastodon, there is no “algorithm” pushing content towards you. Gone are posts that rise to the top because someone paid money to put it there. No more toxic, hot take garbage from bots filling your feed and drowning what you’re really looking for.

Owners of Mastodon instances are encouraged to commit to the Mastodon Server Covenant. This covenant is an agreement to certain terms in an effort to give users confidence in the Mastodon network and the instances they join. An instance that commits to the Mastodon Server Covenant agree to, among other things, moderate content, backup the service and provide users with at least a three month warning prior to stopping services. Committing to this covenant is optional but in exchange the instance will be listed as having committed to the covenant giving them much better visibility to potential new users than instances that haven’t.

Moving on to the more technical, behind the scenes side of Mastodon, is just as interesting. The most well known and visible portion of Mastodon is called ActivityPub. ActivityPub is not unique to Mastodon, in fact it can be implemented by any number of systems and indeed there are a number of federated services that use it. ActivityPub is the portion that pushes content around between users and other instances. There is even a WordPress plugin, which this site uses, to allow people to subscribe to a feed of my blog posts. The additional services using ActivityPub are beyond the scope of this post.

Unlike Twitter, you can host your own Mastodon instance using the code available at https://github.com/mastodon/mastodon by following the directions at https://docs.joinmastodon.org/admin/prerequisites/. While I have no interest in hosting my own instance long term for a variety of reasons, I did want to understand how the software works as a whole and what it took to host and scale it. As you can imagine, hosting a large instance is definitely a technical challenge let alone the challenges of cost and moderation.

In an effort to learn how to run Mastodon, I created my own Helm chart available at https://github.com/dustinrue/mastodon-helm-chart. My helm chart is functional but really only for a specific setup and is not currently ready for wide use. However, what it taught me was a lot about the moving parts that make up a full Mastodon instance. Designed to be highly scaleable, Mastodon requires just two external services (ignoring file asset storage) including Redis and PostgreSQL. Redis is used for caching various pieces of data while PostgreSQL of course stores data that must persist including user accounts, their settings, posts and so on. You do also provide storage and this can be done using local storage or object storage like S3. Mastodon itself is shipped with everything else you need which consists of Sidekiq with various queues, a web socket based streaming application an a web app. The web app is what users see when they access your instance, the streaming portion feeds data to your browser as it comes in and Sidekiq handles a bunch of different background tasks including shipping your posts to subscribed instances, ingesting updates from other instances, getting images and videos ready (and pushed into S3 if you are using that). All of the Sidekiq queues can be broken out into individual processes (which is absolutely necessary for large installations) so that more work can be done in parallel. It is very clear that the designers of Mastodon have considered each aspect of hosting a large installation and taken care to ensure it is possible. In fact, there is a sort of “brain dump” style of “this is what it takes to scale Mastodon” from one of the primary stakeholders of the project at https://gist.github.com/Gargron/aa9341a49dc91d5a721019d9e0c9fd11.

In my short time with Mastodon, it has become one of the top three most exciting pieces of technology I have interacted with in my career, the other two being Linux and Kubernetes. The fact that I can run Mastodon on the other two makes it even better. Mastodon, and the protocols that power it, are the real “web 3.0” because it gives power back to the users. Come to think of it, Mastodon is like web 1.0 where its users hold the power and are in control. It was because of the Twitter shakeup that I discovered Mastodon and I hope that the current social media climate continues to bring greater awareness to the Mastodon network and it is able to continue growing. You can find me on fosstodon.org as @[email protected].