Takahē: An efficient ActivityPub Server for small installs with multiple domains
116 comments·November 24, 2022
This is really interesting. Thanks for writing this comment and sharing.
Async is good for lots of IO work and managing independent tasks with low coupling.
I am interested in task scheduling and asynchronous code I am interested in programming language development and parallelism and simultaneity without parallelism and cooperative and preemptive scheduling.
As an experiment inspired by Protothreads (a C library for implementing cooperative multitasking with a switch statement) I recently implemented async/await in Java as a giant switch statement and a while loop.
Providing that each coroutine only runs once, the amount of memory used shall not grow. The goal is to be stackless.
I played around with an C++ coroutines but someone told me that the approach I used is not C++20.
The reconciliation loop idea sounds interesting.
You can use your domain on Mastodon via WebFinger, you don’t need to necessarily self host it.
That's what I'm doing right now - my personal site is https://simonwillison.net/ and serves WebFinger - my Mastodon instance is https://fedi.simonwillison.net/ which is hosted by https://masto.host/ (just because I don't want to sysadmin it).
The problem is that you can't have multiple domains point to a single Mastodon instance. I'd like to share my single instance with friends who can bring their own domain name.
Basically the problem is that current Mastodon only supports single settings for the LOCAL_DOMAIN and WEB_DOMAIN.
More details on how mine works here: https://til.simonwillison.net/mastodon/custom-domain-mastodo...
There is an open GitHub issue to add that functionality! If anyone is able to help your work would be appreciated!
Ahh, okay, this makes more sense now - appreciate the breakdown!
I know of an organization that just advertised their new Mastodon instance as being at social.[domain].com. Is it too late now for them to start using WebFinger and advertise Mastodon handles at simply [domain].com?
They would have to change all their addresses, but the advertise account change thing might be enough depending on need
Perhaps naive but is it possible to create some sort of Mastodon proxy that exists independent of any specific instance? Rather than run your own instance or point to a shared instance, a proxy could be a fairly simple system that uses DNS records (?) to route requests to the appropriate instance -- much like email.
Unfortunately that doesn't quite work with out-of-the-box Mastodon.
I'm running a bit of a proxy at https://simonwillison.net/.well-known/webfinger?resource=acc... but it still needs to point to my own dedicated instance, just because Mastodon can't have multiple domains pointed at a single instance of the software yet.
I'm using this pattern (also shared by Andrew, before he started to spin up Takahē) https://aeracode.org/2022/11/01/fediverse-custom-domains/
Found Andrew Godwin on mastodon: @firstname.lastname@example.org
Your project looks super interesting.
I had the idea of running a single user server on CloudFlare Workers and using D2 (their SQLite based db). A light weight JS/TS implementation would be perfect. Looks like you have Postgres planned, it would probably be possible to expand from that to SQLite.
I work on an ActivityPub server in Go that supports an sqlite backend. Check my bio for details.
Yeh, that’s what I’d like too (or using one of the other edge compute services)
Cool idea! I'll start looking into that.
>Ahh this is so exciting to see so much happening in this space all of a sudden!
It's like Elon unknowingly funded this space!
Never underestimate HN's pettiness and spite!
Now it's mainstream to work on a a cool technology that's been around for awhile!
Oh and everyone can act like they weren't bad mouthing the tech and saying it wasn't going to work before.
Visit any Mastodon thread here before Elon's Twitter and it's nothing but negativity.
well, there's more than one person on this website.
also, a lot of these projects have been on a slow simmer for a long time, and are only just now starting to become complete and interesting.
edit: though it does seem to be true that takahe's initial commit was nov 5 :) and personally i don't consider it complete and interesting yet
Is the server ActivityPub Client to Server compliant?
I think this refers to handling JSON ActivityStreams objects at the `/outbox` endpoint for a logged-in user, and then broadcasting those out appropriately. If so, then yes that's the only API that's used. It also handles the uploadMedia endpoint and a few other details that are included in the spec.
I have tried unsuccessfully so far to set up an OAuth provider server along with it, so that you could log in on your phone, etc.
Fully agree, the algorithmic timeline (sprinkling in some likes and comments from other people that might be interesting to you) is one of Twitters best features even if many people (who mostly use third party clients for that reason) would not agree.
Do you people know that "# Explore" section on the right of Mastodon already lists posts which are gaining traction? It also lists news which are trending.
"These posts from this and other servers in the decentralized network are gaining traction on this server right now."
I don't know what the logic is, but on big servers it's listing a lot of content.
That's very different from the Twitter timeline though, which shows you "good" content from people you already follow that happened since the last time you used Twitter. So if you refresh a bunch of times you'll always see more interesting tweets / likes / comments.
I don't think those results are personalized which makes them worthless compared to twitter's suggestions.
The word "algorithm" has suffered wild semantic drift at the hands of journalists. Let's see if we can start to fix that now by making sure that on HN and in adjacent communities of all places we use the appropriate words for the thing we are talking about.
We are talking about heuristics here, not algorithms.
I would say that ship has sailed, just like "crypto" doesn't mean cryptography in the previous sense anymore.
Algorithm is the correct word. Why don't you think it is?
Author here - it's just to reduce support surface area. I know I'll need PostgreSQL's full text indexing and GIN indexes for hashtags/search eventually, and I probably also want to use some of the upsert and other specialised queries, and it's easier to just target one DB I know is very capable.
For reference, when I say "small to medium", in my head that means "up to about 1,000 people right now".
That sounds like a very low number I would have never have guessed. Is Mastodon so heavy?
People were getting priced out of hosting an instance with "only" 10-20k users and the instance hosting services quote <= 4k users with the 4k end being >$US100/month. With the "low end" 1-200 user instances having 4 cores, 5tb of monthly bandwidth, etc.
The general sense I have got is that mastodon - the default software at least - is extremely resource heavy for relatively low user counts. My assumption/hope was that the bulk of this is that the server software hasn't ever really been under sufficient pressure to improve, and takahē seems to indicate that there's at least some room for improvement on the server side (i.e. performance problems aren't entirely protocol/architecture problems)
I was poking into this a bit yesterday.
Is there any advantage to using a traditional db as opposed to a graph db since json-ld is just a text representation of graph nodes?
I was thinking the easiest path would be have the server deal with all the activityPub stuff and expose something like a graphQL interface for a bring your own client implementation. Of all the stuff they shoehorned graphQL into this seems like a valid fit, like they were made for each other.
Anyhoo, just my random thoughts…
For better or worse, many servers are targeting Mastodon API compatibility to be able to leverage the existing clients. Adding GraphQL increases surface area without solving the bigger issue of creating the clients.
if they don't have PSQL specific queries, it might be a trivial change: https://github.com/jointakahe/takahe/blob/main/takahe/settin...
I tried swapping that for SQLite and successfully ran the test suite about a week ago, but I've not tried that again against the large number of more recent changes.
I wonder if Postlite would work.
Actually that looks more like an interactive client.... https://news.ycombinator.com/item?id=30875837
SQLite is magical and incredibly lean, but it is not leaner than Postgres if you need real database features. You end up reimplementing a lot of features in code that belong in the db.
What kind of features are you talking about here?
This doesn't match my experience from the last few years. SQLite in WAL mode is extremely capable.
The only thing I really miss from PostgreSQL is that PostgreSQL has more built-in functions for things like date handling - but SQLite custom functions are very easy to register when you need them.
Constraints and validation for example. Efficient json store. Etc
It's not coupled with Docker. Docker is purely one suggested way of running it - it's a classic Django app so running it directly on Ubuntu should work the same as any other Django application.
When I saw "Prerequisites: Something that can run Docker/OCI images" in the documentation, my interpretation was that containers are needed. It also says "You need to run at least two copies of the Docker image". Maybe you want to change the wording a bit.
I would also collaborate on writing a setup script for Takahe then!
I really like to write a setup script instead of following manual installation guides. So for every software I try, my first step is to write a script that turns a fresh Debian installation into a running instance. (MicroBlogPub needs Python 3.10 which is not in Debian stable, so I would use Ubuntu)
> What I find a bit unfortunate about Takahe is the coupling with Docker.
While I don't love it, it's very understandable for a single-dev application. Anything else involves blizzards of questions and bugs filed against people using their disto version of Django vs their downloaded version of Django and the many versions of distros and the many conventions for Python environments and...
> Django does many things but “efficient” is not one of them.
It depends on how you code. I wrote a user instance in django and and I'm happy with it's performance.
Mastodon currently needs 2GB of RAM. Takahē can run in a lot less than that.
You’re wrong. Just because it’s not the absolute peak of efficiency, written in C with asm routines to talk to the db, doesn’t mean it’s not efficient.
Django ranks #137 out of #142 across numerous web frameworks and languages. It’s literally one of the least performant options that exist.
Performance and efficiency aren’t the same thing. Django does a lot of things other frameworks ranked here don’t do.
Such framework rankings are also utterly irrelevant when you want something widely used enough to easily find contributors and integrations. That restricts you quite a bit more than “any so called framework that just handles http”.
Did you even look at the top performers on that page? This is number 2: https://github.com/Xudong-Huang/may_minihttp
mastodon is ruby on rails, just sligthly more "efficient" from django according to those benchmarks
https://humungus.tedunangst.com/r/honk/v/tip/f/web.go reads like it is either been sent through an obfuscator or is in the church of templeOS
Yes I wrote the same thing here http://blog.deckc.hair/2022-11-19-ted-unangsts-golang-obfusc...
It's unfortunate, because Honk appears to be well designed otherwise, but I found it difficult enough to grok the idiosyncratic naming conventions that I gave up.
I see the sibling comment about obfuscation, but not sure I follow either of you. Is this code not clear?
To me the code reads with humor and creativity, while every bit as self-evident as a Gary Larson FarSide cartoon on second glance. I mean, what else is nomoroboto going to do than what it does?
I've never seen this tone in the wild before, but got a kick out of it, might even find it refreshing maintaining it.
squints I can't tell if this is trolling or not
Anyway, you're right, all code should be written in haiku form, to maximize creativity and succinctness, plus keeping methods short! True elite coders ensure variable names are always a prime number of characters