You don't need a CRDT to build a collaborative experience

Daily Digest email

Get the top HN stories in your inbox every day.

jitl

I agree broadly with the article’s position but I think locks are more harmful than helpful. When I was a Quip user (2018) it was super frustrating to get locked out of a paragraph because someone’s cursor idled there. Instead just allow LWW overwrites. If users have contention and your sync & presence is fast, they’ll figure it out pretty quick, and at most lose 1-2 keystrokes, or one drag gesture, or one color pick.

Notion is “collaborative” and we don’t use a CRDT for text, it’s all last-write-wins decided by the server. However our LWW texts are individually small - one block/paragraph in size - and adding/moving/removing blocks is intention-preserving if not perfectly convergent.

As the article says, the downside for LWW is that “offline” / async collaboration isn’t so great. That’s why we’re working on switching to CRDT for our texts. If you’re interested in bringing CRDTs to a product with a lot of users, consider joining Notion’s Docs team - https://boards.greenhouse.io/notion/jobs/5602426003 / @jitl on Twitter / jake@makenotion.com

zknill

So Last-Write-Wins (LWW) basically _is_ a CRDT, but not in the sense that anyone really expects, because they aren't that useful or intention preserving. Especially if the two writes happen in very quick succession / concurrently.

LWW becomes useful if you can:

a) help humans to see who is doing what on a doc

b) reduce the size of the change that is LWW

As you've said:

> However our LWW texts are individually small - one block/paragraph in size

This is really important, because by reducing the size of the individual element that any single user is updating you can also reduce the chance of conflicts. With a small amount of contextual feedback (like the notion icon next to the block) a lot of the problem cases are just avoided.

Clearly locking and updating the entire document would be terrible, but if you can do it on a small enough scope that others can change other elements, it can work really well.

(If you've worked on the notion things you're describing then I'm sure you know this better than I do, but just spelling it out really clearly.)

jitl

You can have a LWW CRDT, but not every LWW is a CRDT. LWW CRDTs generally pick a winner based on causal order which is convergent, the C in CRDT, because every peer receiving the same ops in any order would pick the same winner.

Picking a winner based on wall clock time order (as suggested in the article, and implemented by Notion) is not convergent; if peers used that algorithm to apply ops I they would not converge to the same state. Instead you need an authority (your server) to essentially pick an arbitrary order of ops and clients just have to deal with it.

A practical example is to consider three users (A, B, C) working on a document.

1. A, B, C start online working together.

2. C goes offline

3. A, B stay online, and make 100s of changes.

4. C makes a few changes while offline (not sent to other peers yet)

5. C comes back online and sends their changes to (the server / other peers).

In wall-clock LWW, C's changes will overwrite A & B's changes, even though A & B have done a lot more work.

In a causal ordering CRDT implementing LWW, C's changes will "lose" to A & B's changes, since they were actually made "earlier" in causal ordering.

> Clearly locking and updating the entire document would be terrible, but if you can do it on a small enough scope that others can change other elements, it can work really well.

I'm sure good UX is possible with locks, but I haven't used one yet for document editing. I'm still traumatized from Quip which did per-paragraph (per block?) locking and was really annoying. Eventually they added an unlock button next to the block so anyone could "steal" the lock at will due to all the user complaints, but at that point, why put it in at all?

maclockard

I understand what you are saying here in terms of the difference between using wall-clock or causal ordering to determine who 'wins' for LWW. However, both of these strategies seem convergent to me? In any case, all clients will agree on whose changes win.

1. With wall-clock decided by clients, A + B changes will win since C's wall-time is earlier (yes, C could lie, but still would converge).

2. With wall-clock decided by server C will win and everyone will agree.

3. With causal ordering, everyone will agree that A + B won.

2 is not a CRDT since it requires a central server, but I think 1 would still count? Or stated another way: I'm not sure the _convergence_ is what determines if these strategies are CRDTs or not, but rather whether or not this decision making is _distributed_ or not.

asib

> ...based on causal order which is convergent, the C in CRDT...

Doesn't the C stand for conflict-free? I suppose both are kind of getting at the same idea though.

undefined

[deleted]

ccorcos

I agree!

If people clobber each others updates by typing at the same time in the same input, at least they can understand what happened.

That’s much better than not being able to do something because someone left their cursor on something and walked away…

Summerbud

I feel the same in the paragraph. But in some situation it seems rational

Figma still limits you from editing the component if someone is on it. And Figjam did that too. In my mind, this is a good practice in the realm of collaborative design. Because it will be very messy if a single component obeys the last-write-win rule

undefined

[deleted]

undefined

[deleted]

btown

Are you considering having a CRDT for each text block individually, or moving to a CRDT for the entire data model for a document? Really curious about the design approach here, especially insofar as there's now an external API that the data models need to service!

jitl

It’s Complicated (tm), a hybrid of the two. Text content across multiple CRDT enabled blocks may be connected in a single logical CRDT, but the segment visible in any given block is stored in that block object, which maintains our permission system invariants. We’ll still support moving blocks to different pages, so one page may have a few different CRDTs visually interleaved.

All the edge cases are very interesting, like what happens if I use backspace to join a block of CRDT B into block of CRDT A.

API consumers are unaffected. Likely we’ll support API updates of text as best-effort diff-match-patch of the text on our side applied as CRDT ops.

npunt

always great hearing your perspective on the notion text engine jake, thanks for taking the time to explain everything!

lewisjoe

> everyone’s gonna say “but hey, google docs uses operational transform not CRDTs”.. OK yes, but you are not google.

Well, google docs works not because they somehow figured out how to converge OT edits with as much precision as CRDTs do, but simply because they have a central server which orders edits anyway and don't need true leader-less convergence.

In fact, I agree not many things don't need a CRDT. CRDTs help with mathematical rigidity of convergence when you want true peer-2-peer collaboration which works without any central authority.

However, most apps anyway work on top of a central authority (example SaaS apps) so there is no real reason to accomodate all the compexity that comes with CRDT. They might get far with a simpler OT based model + central server based ordering.

For example even Figma doesn't call its model a 100% pure CRDT. It's a partial, simpler CRDT implemented with an assumption that there's going to be a server that understands ordering.

It's the same with Google Docs. They don't need a CRDT because it's a cloud app after all, which means OT is more convenient with major heavylifting (ordering and conflict handlings) outsourced to the server.

josephg

Yeah. Text based OT is pretty simple to implement, too. It’s about 200 lines of code, plus a simple network protocol. It’s fast and relatively straightforward to code up, and unlike text CRDTs it’s pretty fast by default.

I use it as my standard test when trying out a new programming language. It was unexpectedly ugly in go because of go’s lack of enums & unions, and that’s one of the big reasons I never got in to programming Go.

jay-aye-see-key

Operational transforms are one of those interesting technologies I’ve wanted to learn by implementing but haven’t made the time yet. I also didn’t realise it could be implemented in that little code.

Can you recommend any learning resources for implementing an OT? (Ideally typescript, python, or rust)

dugmartin

This single file shows the entire set of OT transformations (retain, insert, delete):

https://github.com/Operational-Transformation/ot.js/blob/mas...

and this is a good post outlining the basics of OT, from the creator of CodeMirror:

https://marijnhaverbeke.nl/blog/collaborative-editing-cm.htm...

jahfer

I haven’t explored this space in a while, but I have a couple of examples that could be helpful. A Clojure library of mine [0] has a decent README with some background reading on how to use operational transform.

I also reimplemented it in a surprisingly tiny amount of OCaml [1] which was a fun way to learn that language :)

[0]: https://github.com/jahfer/othello [1]: https://github.com/jahfer/othello-ocaml

Reinmar

> Yeah. Text based OT is pretty simple to implement, too. It’s about 200 lines of code, plus a simple network protocol.

Just for clarification, while OT for plain-text (linear model) might be simple to implement, OT for typical rich-text editors (like Google Docs) that need to rely on a tree-structured data model is a whole different story: https://ckeditor.com/blog/lessons-learned-from-creating-a-ri....

lewisjoe

Well, interestingly josephg (parent commentor) was part of the team that made the original gdocs editor. And I guess he worked on the Google Wave OT implementation.

tkone

nah, that’s not true at all. have a look at ‘rich-text’[1] which allows for transforms on metadata in a separate stream from the main content. it’s the same basic algo used for OT on plain text.

(i was the cto at a startup which used this to create a multi-user text editor with rich text support in 2015ish)

1: https://github.com/ottypes/rich-text

MontagFTB

One of the main points of this article is to "just use locks", which glosses over a lot of technical complications about locking elements within a document. How long is the lock held? Can it be stolen from a user who has gone offline, or is still online but off to lunch, and we _really_ need to make this change before the presentation in an hour? What if the user comes back online and they have changes, but the lock was stolen - how are those changes reconciled to the document?

I am generally in favor of simpler is better, and if there is a way to build a collaborative experience without using CRDTs, then go for it. However, sometimes the cure can be worse than the disease, and solutions like locking may introduce more technical complexity than originally thought.

zknill

You're absolutely right, locking is a hard problem. Especially when you get into the edge cases of clients not disconnecting cleanly.

If you've ever used one of those awful early-2000s Microsoft word collaboration systems where you have to check-out the doc, and remember to check it back in, and no one can use it until you've checked it back in... it's awful!

I'm not directly in this team, but one of the teams at my company have been working on this problem. They call it "Spaces", and one of the features solves this component locking problem.

https://ably.com/examples/component-locking

chii

By exploring that locking and unlocking mechanism, you will find that the logical conclusion in the end, when enough complexity and edge cases get covered/fixed as bugs, that it turns into a crude form of "CRDT" (where it's not actually consistent, but merges within reason for 99% of use cases).

It might as well have been CRDT from the get go.

undefined

[deleted]

dsmmcken

> How long is the lock held?

For however long the user has a browser focus state on the element seems like a reasonable answer, and submit changes as they are made. However, I don't know how you resolve conflicts of two users simultaneously attempting to acquire a lock.

filleokus

The server must keep track of the locks, and it can only know about the lock being released if the client tells it. E.g by sending a message that the field is not focused any more. The tricky thing is in the "edge" cases, or really the non-perfect cases, which there are plenty of (as I think GP described).

The server can decide that the client is offline if the server misses expected heartbeat messages from the client. But how often will those be sent and how long grace period will we allow? If it's too short then it will be unreliable on shaky 4G connections, if it's too long then it will be annoying in the other direction.

And that's not considering the "social" problems with locks. I've worked on replacing a system that was lock-based with CRDTs where the lunch scenario from MontagFTB actually was a common occurrence.

In an "ideal" scenario your lock acquisition problem is not hard. Client's just show the UI optimistically and whoever the server decide was first gets the lock. The loosing client throws any state the user created in the short time-frame. Over reliable and fast connections for granular locks, this works fine. But in the real world that's just one of the issues with a lock based approach…

jitl

Having used an app with locks like this (Quip) I can tell you it really sucks to have a lock lease >10s after the last edit. The “lock” should be a UI level concept anyways - clients should broadcast “hey I’m taking field X”; if you have two users submit a write+lock request concurrently you’ll need to pick a winner anyways.

Arelius

> However, I don't know how you resolve conflicts of two users simultaneously attempting to acquire a lock.

It turns out you just have to pick one... This all depends on a source of truth, and when you are there it's easy to pick one, say based on whichever arrived at the network interface first.

antidnan

I don't think you need a pure CRDT either but I think locking and presence is a bit of an oversimplification.

LWW is a good place to start, and updating the smallest piece of information possible is the right idea in general but there is a lot more nuance to handling complex applications like a spreadsheet (I'm working on one) and whiteboard apps.

Things like reparenting or grouping shapes [1], or updating elements that aren't at the lowest scale like deleting a row or column in a spreadsheet make locking challenging to implement. Do you lock the entire row while I'm moving it? Do you lock the entire group of shapes?

With the exception of text editing, the popular libraries like Yjs don't just give you a perfect CRDT out of the box. You still have to construct your data model in a way that enables small scale updates [2], and CRDT libraries and literature are the best source of thinking for these problems that I've found.

[1] https://www.figma.com/blog/how-figmas-multiplayer-technology...

[2] https://mattweidner.com/2022/02/10/collaborative-data-design...

charles_f

That's true for collaborative experience. Crdts are a mechanism to handle eventual consistency (that's even the preface of the paper). If you assume that said collaborative experience is always online, you don't need them, and "using locks" as you described is probably enough.

If you want a mechanism to handle that eventual consistency, it's probably better to reuse their principles rather than reinventing something that will eventually ressemble Crdts.

You mentioned "offline first", I think it's probably a good place to pluck that ib https://www.inkandswitch.com/local-first/

saqadri

You may not need CRDT per-se, but building a collaborative experience is so difficult. I worked on collaborative systems for a bit, and also have read a bit about how Figma and Notion do it (this is a good read: https://www.figma.com/blog/how-figmas-multiplayer-technology...) -- it's still super hard to get right.

This talk by Karri about Linear's "sync engine" is also a good watch: https://www.youtube.com/watch?v=Wo2m3jaJixU.

iamwil

> Ever-growing state: for CRDTs to work well they need to keep a record of both what exists, and what has been deleted (so that the deletes aren’t accidentally added back in later). This means that CRDT state will continually expand.

I guess a couple things:

It depends on the CRDT. Some CRDTs grow with the number of replicas and others with the number of events.

State-based CRDTs don't need to keep history and don't need causal ordering of messages, but internal bookkeeping grows with the number of replicas. And for large states (like sets and maps), it can be prohibitive to send the state all over the wire for an idempotent merge.

That's why in practice, people implement Op-based CRDTs, which makes the trade: in order to send small ops over the wire, we now need causal ordering of messages. To make sure we can sync with replicas long offline, we keep as much history so that they can catch up.

There are other variations, such as delta-state based CRDTs that send diffs, and merkle CRDTs, which use merkle data structures to calculate diffs and detect concurrency, which have different growth characteristics.

---

As for a growing state: Is this actually a concern for devs that aren't using CRDTs for collaborative text? I can see that being an issue with the amount of changes that can happen.

But outside of that, lots of data don't grow that fast. We all regularly use Git and it keeps a history of everything. Our disks are huge, and having an immutable record is great for lots of things (providing you can access it).

> Opaque state: ...you’re generally left with an opaque blob of binary encoded data.

Most CRDT libraries take a document-orientated angle. It assumes that you can contain the entire "unit of work", like a document, inside of a CRDT. However, if your data is more relational, it doesn't quite fit. And while there's immutable data in a CRDT, I do wish it was more accessible and queryable. In addition, being a binary blob, it's not exactly composable. I think CRDT libraries should be composable with each other.

earthboundkid

I've seen locks used at the CMSes of large news organizations. It's fine, but they all need a mechanism to kick out an editor who has an idle tab left open. For my own small scale CMS, I just wrapped Google Docs and let them handle all the syncing headaches.

chromatin

We took a super simple (IMO) approach to collaborative editing in my current project:

Each block of text has a version number which must be incremented by one by the client at the time of submission. The database provides conflict prevention by uniqueness constraint which bubbles up to the API code. The frontend is informed of conflict, so that the user can be notified and let the human being perform conflict resolution.

Because most concurrent users are working on different blocks, this works great.

zknill

How do you handle getting the changes that one client makes onto the other clients? Are you pushing it from the server to the clients with websockets, or waiting for the clients to ask for new info, or waiting for the conflict to happen when someone else tries to make a change, or something else?

I'm thinking a lot about keeping server and client data in sync while working on our hopefully-soon-to-be-released LiveSync product[1]

[1] https://ably.com/livesync

chromatin

When the client gets HTTP 409 Conflict, it asks for the current version and presents to the user for manual resolution

namelosw

That's not gonna work for real-world projects. Real-world apps often have larger edits than locking individual cells/cards e.g. Move columns or replace large chunks of spreadsheets in Google Sheets, or Ctrl-A to select all and then drag to move.

Also, if you consider latency, locking does not work well because client B might do operations before he/she even acknowledges the lock from client A because of latency.

lmm

> You can’t inspect your model represented by the CRDT without using the CRDT library to decode the blob, and you can’t just store the underlying model state because the CRDT needs its change history also. You’re left with an opaque blob of data in your database. You can’t join on it, you can’t search it, you can’t do much without building extra features around that state blob.

So use the CRDT library when building your indices? Or better yet use a CRDT-aware datastore. This doesn't seem like a real problem.

> Locking for safety

Please don't. You're inevitably going to have lost locks, lost updates, or most likely both.

zknill

> So use the CRDT library when building your indices?

Yeah, sure, you can build a secondary index over the data. But you're still having to decode the blob and index it. There's no version where you can index the data without using the library to expose the underlying model state (like you could if you weren't using a CRDT).

On locking, yes, it's hard. But it's not the same kind of locking that you'd expect in other parts of a system. You're locking the UI, not the actual data, so it's a tiny bit more forgiving. In general the locks aren't trying to force consistency, instead they are trying to prompt the humans to reduce the chance of conflict happening in the first place. Ofc, you still have to care about the locking, unlocking, disconnection problems, etc.

Here's a decent example/demo you can play around with in multiple windows:

https://examples.ably.dev/component-locking?space=W0V-t5oY6A...

https://ably.com/spaces

lmm

> Yeah, sure, you can build a secondary index over the data. But you're still having to decode the blob and index it. There's no version where you can index the data without using the library to expose the underlying model state (like you could if you weren't using a CRDT).

How's that different from any other datastructure? You're never indexing the raw underlying bytes and you wouldn't want to. Like, sure, if you're using a datastore where indexing is built in then you need to make sure that your datastore understands the datastructures you're using, but that's always a problem that you have. Most of the CRDT infrastructure I've seen has been designed around having a datastore that understands the CRDT and building on top of that.

> On locking, yes, it's hard. But it's not the same kind of locking that you'd expect in other parts of a system. You're locking the UI, not the actual data, so it's a tiny bit more forgiving. In general the locks aren't trying to force consistency, instead they are trying to prompt the humans to reduce the chance of conflict happening in the first place.

I don't think that actually improves matters, it just means more edge cases? Either you have a locking system that's 100% consistent and people can lock each other out and deadlock, or you don't and you end up with lost updates that will be even more infuriating because the user checked whether anyone else was editing and it looked like they weren't.

spion

For offline first apps, or for applications where very high degree of control for the content is needed (e.g. legal docs) and realtime collaboration isn't that valuable, there is also the option to use 3-way merge instead.

The benefit is that you can even allow the user to resolve conflicts in a satisfactory way.

Another benefit is that the document doesn't even have to be derived from the original, it could go through exports and re-imports and it will still be possible to run a 3-way merge as long as a common base version is declared. This can be especially covnenient for systems that involve e.g. MS Word.

Daily Digest email

Get the top HN stories in your inbox every day.