Skip to content(if available)orjump to list(if available)

Takahē: An efficient ActivityPub Server for small installs with multiple domains

simonw

Here are three specifically interesting things about Takahē:

1. The "multiple domains" feature. I'm running my own Mastodon instance right now purely so I can have my simonwillison.net domain as my identifier there (and protect myself from losing my identifier if the server I am using shuts down). This feels pretty wasteful! I'd much rather be able to point my domain at a Takahē instance shared with some of my friends, each with their own domains for it.

2. It's a Django app that's taking full advantage of the async features that have been added in the most recent releases of that framework. Async is a perfect match for ActivityPub due to the need to send thousands of outbound HTTP requests when publishing a message. And Takahē creator Andrew Godwin is the perfect person to build this because he's been driving the integration of async into Django for the past four years: https://www.aeracode.org/2018/06/04/django-async-roadmap/

3. The way it handles task queueing is super interesting. I've not fully got my head around it yet but it's the part of the codebase called Stator and it's modeled on things like the Kubernetes reconciliation loop - Andrew wrote a bit more about that here: https://www.aeracode.org/2022/11/14/takahe-new-server/ - Stator code is here: https://github.com/jointakahe/takahe/blob/main/stator/runner...

samsquire

This is really interesting. Thanks for writing this comment and sharing.

Async is good for lots of IO work and managing independent tasks with low coupling.

I am interested in task scheduling and asynchronous code I am interested in programming language development and parallelism and simultaneity without parallelism and cooperative and preemptive scheduling.

As an experiment inspired by Protothreads (a C library for implementing cooperative multitasking with a switch statement) I recently implemented async/await in Java as a giant switch statement and a while loop.

Providing that each coroutine only runs once, the amount of memory used shall not grow. The goal is to be stackless.

I began writing a programming language that looks similar to JavaScript but targets an imaginary interpreter that is multithreaded. I hope to think of how to represent async await so that the high level language can target the interpreter. I need to think of the code I need to codegen to implement async/await.

I played around with an C++ coroutines but someone told me that the approach I used is not C++20.

Code is at https://GitHub.com/samsquire/multiversion-concurrency-contro...

The reconciliation loop idea sounds interesting.

Klonoar

You can use your domain on Mastodon via WebFinger, you don’t need to necessarily self host it.

simonw

That's what I'm doing right now - my personal site is https://simonwillison.net/ and serves WebFinger - my Mastodon instance is https://fedi.simonwillison.net/ which is hosted by https://masto.host/ (just because I don't want to sysadmin it).

The problem is that you can't have multiple domains point to a single Mastodon instance. I'd like to share my single instance with friends who can bring their own domain name.

Basically the problem is that current Mastodon only supports single settings for the LOCAL_DOMAIN and WEB_DOMAIN.

More details on how mine works here: https://til.simonwillison.net/mastodon/custom-domain-mastodo...

TaylorAlexander

There is an open GitHub issue to add that functionality! If anyone is able to help your work would be appreciated!

https://github.com/mastodon/mastodon/issues/2668

Klonoar

Ahh, okay, this makes more sense now - appreciate the breakdown!

mwcampbell

I know of an organization that just advertised their new Mastodon instance as being at social.[domain].com. Is it too late now for them to start using WebFinger and advertise Mastodon handles at simply [domain].com?

singpolyma3

They would have to change all their addresses, but the advertise account change thing might be enough depending on need

phphphphp

Perhaps naive but is it possible to create some sort of Mastodon proxy that exists independent of any specific instance? Rather than run your own instance or point to a shared instance, a proxy could be a fairly simple system that uses DNS records (?) to route requests to the appropriate instance -- much like email.

simonw

Unfortunately that doesn't quite work with out-of-the-box Mastodon.

I'm running a bit of a proxy at https://simonwillison.net/.well-known/webfinger?resource=acc... but it still needs to point to my own dedicated instance, just because Mastodon can't have multiple domains pointed at a single instance of the software yet.

I'm using this pattern (also shared by Andrew, before he started to spin up Takahē) https://aeracode.org/2022/11/01/fediverse-custom-domains/

gkmcd

Found Andrew Godwin on mastodon: @andrew@aeracode.org

pickpuck

Ahh this is so exciting to see so much happening in this space all of a sudden! My quest to get a personal instance running has been a long slog for me personally.

I had been working on an ActivityPub server in Node.js/TypeScript for a while before the Twitter migration. It's got most of the features I'd want in a small server but it's basically bring-your-own-client at the moment.

https://github.com/michaelcpuckett/activitypub-core

Finding all the resources to build a complete server that can interact with other instances isn't easy, so maybe this can help someone. The spec is well worded, but the checklist is confusing, the test server is down, Mastodon has its own rules, etc. Plus you have to have at least a cursory knowledge of JSON-LD/RDF.

samwillis

Your project looks super interesting.

I had the idea of running a single user server on CloudFlare Workers and using D2 (their SQLite based db). A light weight JS/TS implementation would be perfect. Looks like you have Postgres planned, it would probably be possible to expand from that to SQLite.

mariusor

I work on an ActivityPub server in Go that supports an sqlite backend. Check my bio for details.

youngtaff

Yeh, that’s what I’d like too (or using one of the other edge compute services)

pickpuck

Cool idea! I'll start looking into that.

anthropodie

>Ahh this is so exciting to see so much happening in this space all of a sudden!

It's like Elon unknowingly funded this space!

hunterb123

Never underestimate HN's pettiness and spite!

Now it's mainstream to work on a a cool technology that's been around for awhile!

Oh and everyone can act like they weren't bad mouthing the tech and saying it wasn't going to work before.

Visit any Mastodon thread here before Elon's Twitter and it's nothing but negativity.

ruined

well, there's more than one person on this website.

also, a lot of these projects have been on a slow simmer for a long time, and are only just now starting to become complete and interesting.

edit: though it does seem to be true that takahe's initial commit was nov 5 :) and personally i don't consider it complete and interesting yet

mariusor

Is the server ActivityPub Client to Server compliant?

pickpuck

I think this refers to handling JSON ActivityStreams objects at the `/outbox` endpoint for a logged-in user, and then broadcasting those out appropriately. If so, then yes that's the only API that's used. It also handles the uploadMedia endpoint and a few other details that are included in the spec.

I have tried unsuccessfully so far to set up an OAuth provider server along with it, so that you could log in on your phone, etc.

mariusor

That is great news, one of the few implementations that does it. :D Do you have a demo server set up anywhere? Mine (based on my own activitypub code) is at https://federated.id :D

solarkraft

> Features on the long-term roadmap: > “Since you were gone” optional algorithmic timeline

That's exciting! The fediverse is severely lacking algorithmic curation presumably due to the belief that it's inherently evil (I'd strongly disagree; it's merely the algorithm not being user-controllable what's bad).

dewey

Fully agree, the algorithmic timeline (sprinkling in some likes and comments from other people that might be interesting to you) is one of Twitters best features even if many people (who mostly use third party clients for that reason) would not agree.

Ciantic

Do you people know that "# Explore" section on the right of Mastodon already lists posts which are gaining traction? It also lists news which are trending.

It says:

"These posts from this and other servers in the decentralized network are gaining traction on this server right now."

I don't know what the logic is, but on big servers it's listing a lot of content.

dewey

That's very different from the Twitter timeline though, which shows you "good" content from people you already follow that happened since the last time you used Twitter. So if you refresh a bunch of times you'll always see more interesting tweets / likes / comments.

charcircuit

I don't think those results are personalized which makes them worthless compared to twitter's suggestions.

cxr

The word "algorithm" has suffered wild semantic drift at the hands of journalists. Let's see if we can start to fix that now by making sure that on HN and in adjacent communities of all places we use the appropriate words for the thing we are talking about.

We are talking about heuristics here, not algorithms.

dewey

I would say that ship has sailed, just like "crypto" doesn't mean cryptography in the previous sense anymore.

charcircuit

Algorithm is the correct word. Why don't you think it is?

Tepix

Looks interesting.

Why does it need a Postgresql server? For just a handful of users, isn't sqlite the leaner, yet sufficient choice?

How does it compare to GoToSocial, which requires 50-100MB of RAM? They are also in alpha stage and i like their approach of keeping the web UI separate.

andrewgodwin

Author here - it's just to reduce support surface area. I know I'll need PostgreSQL's full text indexing and GIN indexes for hashtags/search eventually, and I probably also want to use some of the upsert and other specialised queries, and it's easier to just target one DB I know is very capable.

For reference, when I say "small to medium", in my head that means "up to about 1,000 people right now".

actionfromafar

That sounds like a very low number I would have never have guessed. Is Mastodon so heavy?

olliej

People were getting priced out of hosting an instance with "only" 10-20k users and the instance hosting services quote <= 4k users with the 4k end being >$US100/month. With the "low end" 1-200 user instances having 4 cores, 5tb of monthly bandwidth, etc.

The general sense I have got is that mastodon - the default software at least - is extremely resource heavy for relatively low user counts. My assumption/hope was that the bulk of this is that the server software hasn't ever really been under sufficient pressure to improve, and takahē seems to indicate that there's at least some room for improvement on the server side (i.e. performance problems aren't entirely protocol/architecture problems)

UncleEntity

I was poking into this a bit yesterday.

Is there any advantage to using a traditional db as opposed to a graph db since json-ld is just a text representation of graph nodes?

I was thinking the easiest path would be have the server deal with all the activityPub stuff and expose something like a graphQL interface for a bring your own client implementation. Of all the stuff they shoehorned graphQL into this seems like a valid fit, like they were made for each other.

Anyhoo, just my random thoughts…

manfre

For better or worse, many servers are targeting Mastodon API compatibility to be able to leverage the existing clients. Adding GraphQL increases surface area without solving the bigger issue of creating the clients.

nielsole

if they don't have PSQL specific queries, it might be a trivial change: https://github.com/jointakahe/takahe/blob/main/takahe/settin...

simonw

I tried swapping that for SQLite and successfully ran the test suite about a week ago, but I've not tried that again against the large number of more recent changes.

rch

I wonder if Postlite would work.

Actually that looks more like an interactive client.... https://news.ycombinator.com/item?id=30875837

scrollaway

SQLite is magical and incredibly lean, but it is not leaner than Postgres if you need real database features. You end up reimplementing a lot of features in code that belong in the db.

simonw

What kind of features are you talking about here?

This doesn't match my experience from the last few years. SQLite in WAL mode is extremely capable.

The only thing I really miss from PostgreSQL is that PostgreSQL has more built-in functions for things like date handling - but SQLite custom functions are very easy to register when you need them.

scrollaway

Constraints and validation for example. Efficient json store. Etc

mg

Nice to see a Python/Django implementation of ActivityPub. Having a nice, lean implementation of ActivityPub that I can customize to my liking is the only thing that keeps me from using the Fediverse more regularly. So I am watching the space closely.

What I find a bit unfortunate about Takahe is the coupling with Docker.

An even leaner ActivityPub implementation seems to be MicroBlogPub. I have not yet managed to set it up though.

Anybody interested in collaborating on a MicroBlogPub install script that turns a fresh Ubuntu installation (or container) into a running MicroBlogPub instance?

simonw

It's not coupled with Docker. Docker is purely one suggested way of running it - it's a classic Django app so running it directly on Ubuntu should work the same as any other Django application.

mg

Great!

When I saw "Prerequisites: Something that can run Docker/OCI images" in the documentation, my interpretation was that containers are needed. It also says "You need to run at least two copies of the Docker image". Maybe you want to change the wording a bit.

I would also collaborate on writing a setup script for Takahe then!

I really like to write a setup script instead of following manual installation guides. So for every software I try, my first step is to write a script that turns a fresh Debian installation into a running instance. (MicroBlogPub needs Python 3.10 which is not in Debian stable, so I would use Ubuntu)

mg

Hmm.. does not look good for the non-Docker setup. The developer replied with "I am deliberately avoiding offering a non-Docker install path" and closed the issue:

https://github.com/jointakahe/takahe/issues/44

Creating a non-Docker fork would then probably be an uphill battle.

rodgerd

> What I find a bit unfortunate about Takahe is the coupling with Docker.

While I don't love it, it's very understandable for a single-dev application. Anything else involves blizzards of questions and bugs filed against people using their disto version of Django vs their downloaded version of Django and the many versions of distros and the many conventions for Python environments and...

Exhausting.

Robotbeat

Are there any ActivityPub benchmarks to compare various implementations of Mastodon-compatible instances? Ie written in different languages, etc.

For instance, Go seems to be around an order of magnitude faster than Ruby, and I think I've seen a Golang implementation of ActivityPub somewhere. https://programming-language-benchmarks.vercel.app/go-vs-rub...

infotogivenm

BSD3, yay - even though I’m worthless at python, love to see an implementation that is not AGPL

anecdotal1

Surprised it hasn't been attacked over this yet as there's so much needless hand wringing about anything non-AGPL being a threat to the anti-capitalism views of the Fediverse

null

[deleted]

datalopers

Django does many things but “efficient” is not one of them.

Nasreddin_Hodja

> Django does many things but “efficient” is not one of them.

It depends on how you code. I wrote a user instance in django and and I'm happy with it's performance.

simonw

Mastodon currently needs 2GB of RAM. Takahē can run in a lot less than that.

Robotbeat

Can I run it on a Raspberry Pi 3 with 1GB of RAM?

actionfromafar

Let's find out.

scrollaway

You’re wrong. Just because it’s not the absolute peak of efficiency, written in C with asm routines to talk to the db, doesn’t mean it’s not efficient.

datalopers

https://www.techempower.com/benchmarks/#section=data-r21&tes...

Django ranks #137 out of #142 across numerous web frameworks and languages. It’s literally one of the least performant options that exist.

scrollaway

Performance and efficiency aren’t the same thing. Django does a lot of things other frameworks ranked here don’t do.

Such framework rankings are also utterly irrelevant when you want something widely used enough to easily find contributors and integrations. That restricts you quite a bit more than “any so called framework that just handles http”.

Did you even look at the top performers on that page? This is number 2: https://github.com/Xudong-Huang/may_minihttp

college_physics

mastodon is ruby on rails, just sligthly more "efficient" from django according to those benchmarks

college_physics

+1 for adopting django. with django's roots in journalism it somehow feels a natural building block of federated information exchange

nandalism

I see honk hasn't been mentioned on this thread. It's also an activitypub server which is very lightweight (golang) and easy to set up your own server. https://humungus.tedunangst.com/r/honk

mdaniel

https://humungus.tedunangst.com/r/honk/v/tip/f/web.go reads like it is either been sent through an obfuscator or is in the church of templeOS

danielheath

It's unfortunate, because Honk appears to be well designed otherwise, but I found it difficult enough to grok the idiosyncratic naming conventions that I gave up.

Terretta

> butwhatabout(mdaniel)

I see the sibling comment about obfuscation, but not sure I follow either of you. Is this code not clear?

To me the code reads with humor and creativity, while every bit as self-evident as a Gary Larson FarSide cartoon on second glance. I mean, what else is nomoroboto going to do than what it does?

I've never seen this tone in the wild before, but got a kick out of it, might even find it refreshing maintaining it.

mdaniel

squints I can't tell if this is trolling or not

Anyway, you're right, all code should be written in haiku form, to maximize creativity and succinctness, plus keeping methods short! True elite coders ensure variable names are always a prime number of characters

robga

Brief discussion of Takahe in the TalkPython podcast here https://youtu.be/LhBfMoR3bvI?t=2369

totetsu

Takahē are interesting birds too. There is a related bird the pūkeko which was also blown to NZ but at a different time. It has relatives in Australia and South America also. It was thought to be extinct at one point, caused by predation by introduced pests, and introduced deer eating the grasses they rely on for food. Now there is a population of about 400?