The Secret Behind Collaborative Editing in Real Time

How Google Docs, Figma and your favourite collaborative apps keep everyone on the same page — explained with analogies, examples and a lot of re-drawing.

distributed-systemscollaborationCRDTOT

Two people are typing into the same document. Their keystrokes fly across the internet at different speeds, arrive in a jumbled order, sometimes get lost. And yet — somehow — both screens end up showing exactly the same text. This is the quiet miracle behind every Google Doc, every Figma file, every Notion page you've ever opened with a colleague.

There isn't one way to do this. There are three main families of algorithms, each with its own personality.

This post walks through all three — with examples, real-life analogies, and a look at which companies picked which, and why.

Contents

0 · The Problem
1 · Operational Transformation
2 · CRDTs
3 · CRDTs with Same-Position Edits
4 · Differential Sync
5 · Head-to-Head
6 · TL;DR

Chapter 0

Why Real-Time Collaboration Is Hard

Imagine two people, Bunty and Babli, both editing the sentence "Hello!" at the same time. Bunty wants to insert a comma after "Hello". Babli wants to insert " there" in the same spot.

If we naively apply both edits in the order they arrive, everything goes wrong. Bunty's edit uses position 5 (between "o" and "!"). Babli's edit also uses position 5. But once one of them is applied, the other's position number now points to the wrong place. The document ends up jumbled.

Every approach in this post is essentially a different answer to this exact problem. Let's look at them one by one.

Chapter 1

Operational Transformation (OT)

"Keep the edit, just translate it."

Operational Transformation was invented in 1989. The core idea is simple: when an edit arrives out of order, rewrite it to account for what's already happened.

Real-Life Analogy

Think of two chefs working from the same recipe, shouting instructions to a third person writing it down. Chef A yells "add salt after step 3!" At the same moment, Chef B yells "insert a new step 3: preheat oven!" The scribe can't blindly apply both — after Chef B's edit, the old "step 3" is now step 4. So the scribe mentally transforms Chef A's instruction: "add salt after step 4" (because a step got inserted before it). Both chefs get what they wanted. That's OT in one sentence.

The worked example

Let's look at one example: Bunty and Babli are using Google Docs, doing some edits on the same document. Whenever an edit happens, it gets synced to one central server which is the source of truth for the document.

Starting document

Hello!

012345

Bunty sends: INSERT "," at position 5. Babli sends: INSERT " there" at position 5. They arrive at the server at roughly the same time.

The server accepts Bunty's edit first. The document becomes:

After Bunty's edit

Hello,!

0123456

Now Babli's edit arrives. It still says INSERT " there" at position 5 — but position 5 now points to the comma Bunty just inserted. If we apply it literally, we get Hello there,! on the server but Hello, there! on Bunty's screen. They diverge. This is the nightmare.

OT's answer: transform Babli's edit. The server says: "I just inserted 1 character at position 5, so Babli's insert-at-5 should become insert-at-6." The transformed operation is applied, and the result is:

After transformed edit from Babli

Hello,there!

"Hello, there!"

Bunty's browser does the same math on its end, and both screens converge to the same text. Just smart arithmetic over the operations themselves.

The Key Insight

OT is essentially an algebra of edits. For every pair of operations (A, B) that could happen concurrently, there's a transformation function that rewrites A so it still makes sense after B was applied — and vice versa. If you write this function correctly for every operation pair, consistency is guaranteed.

Why it's tricky

That "if you write this function correctly for every operation pair" is doing a lot of heavy lifting. For a simple text editor with just insert and delete, it's manageable. Add formatting (bold, italic, color), then tables, then images with captions, and the number of operation pairs explodes. Getting OT right at scale is very hard — Google Docs reportedly took years to stabilise its OT engine.

OT usually needs a central server that decides which edit happened first. Without a server, OT becomes much harder to implement.

Who uses OT

Google Docs

text + rich formatting, server-mediated

Google Sheets

cell-level OT on a spreadsheet grid

Microsoft Office Online

Word / PowerPoint real-time co-editing

Visual Studio Live Share

collaborative code editing in VS Code

OT's Strengths

Compact messages — an edit is just an operation + position. Documents stay small — nothing is retained after deletion. Mature ecosystem — decades of production use, especially in Google's tooling.

Chapter 2

CRDTs — Conflict-Free Replicated Data Types

"Make edits work in any order, and conflicts disappear."

CRDTs take a completely different philosophy. Instead of transforming edits to match their new context, they design edits so that the order of arrival stops mattering in the first place. Apply them in any order and you get the same result. Mathematicians call this property commutativity.

For text, CRDTs achieve this with two tricks:

Unique, infinitely subdividable position IDs for every character. Need space between positions 4 and 5? Use 4.3, or 4.27, or 4.271 — there's always room.
Tombstones for deleted text. Characters are never actually removed — they're marked invisible. This way, other edits can still find their positions correctly.

Real-Life Analogy

Think of organizing photos in a shared album. You and your friend are both adding photos from a trip. Instead of numbering them 1, 2, 3 (which changes every time someone adds a photo), each photo gets a timestamp like "10:30 AM" or "10:30:15 AM". If you delete a photo, it doesn't disappear completely — it just gets a "hidden" tag. When you both sync your albums, all photos are sorted by time and the hidden ones don't show up. You both see the same album, regardless of who added or deleted photos first.

A full walk-through: the Delhi example

The document says "I love Delhi". Each character has a position:

I·love·Delhi

123456789101112

Bunty, working offline on his commute, wants to turn it into "I love South Delhi". He inserts "South " before "Delhi" — between positions 7 and 8. His device assigns each character a unique position in that gap:

Bunty's insert (between 7 and 8)

South·

7.17.27.37.47.57.6

Meanwhile Babli, on a different continent, is adding "food" to the end to get "I love Delhi food":

Babli's insert (after 12)

·food

12.112.212.312.412.5

Now their devices reconnect and swap edits. Every character from both sets gets merged into one master list, sorted by position ID:

Merged & sorted — everyone sees this

I·love·South·Delhi·food

"I love South Delhi food"

No server had to decide who was right. No operations were transformed. The position IDs simply sort themselves into the right order, and both screens converge.

Deletion: enter the tombstone

Later, Babli decides "food" sounds weird and deletes it. In a CRDT, the characters aren't erased — they're just marked as tombstoned:

Tombstoned — still in memory, hidden from user

I·love·South·Delhi·food

Why keep them? Because Bunty's device might not have received the delete yet. If he makes an edit that says "insert 'x' after position 12.3" (one of the 'o's in food), the CRDT needs to know that position 12.3 exists so it can sort the new character correctly. Erase the tombstones and you lose the coordinate system.

Chapter 3

What If Both Users Edit the Same Spot?

This is where CRDTs really shine. Let's replay the Delhi example, but now both Bunty and Babli want to insert something between positions 7 and 8 at the exact same time.

Bunty wants to insert "South " → "I love South Delhi"
Babli wants to insert "Old " → "I love Old Delhi"

Both devices independently pick position numbers between 7 and 8 — and there's a real chance they'll pick overlapping numbers. Without smart design, CRDTs would create jumbled text like SoOlduth Delhi (mixing "South" and "Old" incorrectly). Real CRDTs avoid this with three layers of defence.

Defence 1 — User IDs as tiebreakers

Every position ID carries not just a number but also a unique user ID. So Bunty's 7.3 is actually (7.3, "A") and Babli's 7.3 is (7.3, "R"). When sorting, ties break alphabetically by user ID.

This gives a deterministic order — every device sorts the characters identically. But by itself, the text would still be jumbled; we've just made sure everyone sees the same jumble.

Defence 2 — Each character remembers its predecessor

Modern CRDTs don't just assign a number — they link each character to the one it was inserted right after. This means Bunty's "South " forms a connected chain:

Bunty's chain

S → o → u → t → h → ·

after pos 7

Babli's chain

O → l → d → ·

after pos 7

The algorithm now sees: "two full chains both attach after position 7." It doesn't mix them together — it picks one to go first (using the user ID tiebreaker) and places the other after it. The result:

"I love South Old Delhi"

Every device computes this identically. No mixing. No jumbled text.

Defence 3 — Globally unique IDs

The most robust implementations (like Yjs and Automerge) never use decimal numbers at all. They generate truly unique IDs combining user ID + a logical clock (a per-user counter):

Collisions become impossible. Every device sorts these IDs using the same rules, so everyone gets the same result.

The Honest Truth

CRDTs guarantee convergence (everyone sees the same thing) but not intent (the result makes semantic sense). "I love South Old Delhi" is consistent but not what either person wanted. This is why Figma, Notion and others show you cursors of other users — so humans, not the algorithm, can avoid colliding in the same spot.

Who uses CRDTs

Figma

custom tree-based CRDT-inspired multiplayer engine

Apple Notes

sync across iCloud devices

Redis

CRDT data types for distributed databases

CRDT's Strengths & Costs

Strengths. Works without a central server — great for peer-to-peer and offline-first apps. Merging is deterministic and commutative. No pairwise transformation functions to write for every operation.

Costs. Every edit carries metadata (IDs, predecessors), so messages are bigger. Tombstones accumulate over time, meaning the document never fully shrinks. CPU cost of sorting many IDs can add up for very large documents.

Chapter 4

Differential Sync

"Forget the operations. Just send me the diff."

Differential Sync — often called Diff-Sync — is the practical choice. It doesn't track keystrokes. It doesn't assign unique IDs. It just asks one question, over and over, every couple of seconds: "What's different between your copy and mine? Let's exchange the diff."

Real-Life Analogy

Think of a group project report stored in a shared folder. You download a copy to work on at home. Your classmate downloads a copy to work in the library. Every few minutes, you both check the shared folder, compare your version to the latest one, and upload a note saying "I added section 3" or "I fixed the introduction." The folder updates and you both download any changes the other person made. After a few rounds, everyone has the same final report — without ever needing to call each other or wait for someone to finish.

The three-document dance

Each client holds three versions of the document in its head:

My current version

What I'm actively editing right now, keystroke by keystroke.

My shadow copy

What I last knew the shared/server version looked like.

The server's version

The official shared document, kept on the server.

The sync loop runs every couple of seconds:

Current→ diff against →Shadow→ send diff to →Server

Server→ sends diff back →CurrentupdatesShadow

The sync loop — run on repeat, forever

Worked example

The shared document is "I love Delhi". Both Bunty and Babli's shadow copies match.

Bunty edits his current version to "I love South Delhi". Babli edits his current version to "I love Delhi food".

When the sync loop fires:

Bunty's device computes

diff(current, shadow) = Insert "South " after "love "

Babli's device computes

diff(current, shadow) = Insert " food" after "Delhi"

Both diffs reach the server. Since the changes are in different regions, both patches apply cleanly:

"I love South Delhi food"

The server then sends each client a diff describing the other person's change. Bunty applies Babli's "insert ' food'" and updates his shadow. Babli applies Bunty's "insert 'South '" and updates his. Both screens are now in sync.

What happens with real conflicts?

Suppose both users edit the same word simultaneously — Bunty changes "Delhi" to "New Delhi", Babli changes "Delhi" to "Old Delhi". Their diffs arrive and the server uses fuzzy patching (the same algorithm git apply uses). Depending on how the patches overlap, one of several things happens:

Both patches apply with minor shifting → result becomes "I love New Old Delhi"
One patch fails to apply cleanly and is dropped → one user's edit is lost
A patch applies to the wrong nearby text → jumbled output, but temporary

Here's the clever part: the sync loop keeps running. Even if a cycle produces a weird state, the next cycle will detect the mismatch (because each client's shadow and current are now different from the server) and emit corrective diffs. Over a few seconds, the document settles.

The Philosophy

OT and CRDT promise perfect consistency on every edit. Diff-Sync is humbler — it promises eventual consistency through repeated convergence. For most apps (where edits are typically in different regions, and conflicts are rare), this trade is worth it. The algorithm is dramatically simpler, uses standard diff libraries, and requires no special data structures.

Who uses Diff-Sync

MobWrite (Google)

the original Diff-Sync implementation

Dropbox (early)

file sync uses diff-based deltas

Git workflows

the mental model of patch-and-merge

Diff-Sync's Strengths & Costs

Strengths. Simple to implement. Works for any kind of content. Documents stay compact (no tombstones). Survives lost messages gracefully — the next cycle catches up.

Costs. Not real-time in the strictest sense — edits appear in sync-cycle chunks. Individual edits can be dropped in true conflicts (fuzzy patches are best-effort). Usually needs a central server.

Chapter 5

Head-to-Head

Each approach made a different trade. Seeing them side by side makes the choices obvious:

Dimension	OT	CRDT	Diff-Sync
Core idea	Rewrite incoming edits to match local context	Design edits so order of arrival doesn't matter	Ignore edits; exchange document diffs periodically
Needs a central server?	Yes (usually)	No — works peer-to-peer	Yes (usually)
Offline-friendly?	Hard — operation order matters	Excellent — sync whenever you reconnect	Decent — but conflicts may lose edits
Message size	Small (just the operation)	Larger (IDs, predecessor refs)	Variable (diff size depends on edit)
Document memory	Only live content	Live content + tombstones forever	Only live content + shadow copy
Implementation difficulty	Hard — transform fn for every op pair	Moderate — but libraries exist (Yjs, Automerge)	Easy — use any diff library (google-diff-match-patch)
Real-time feel	Excellent — every keystroke syncs	Excellent — every keystroke syncs	Chunks — syncs every ~2 seconds
Conflict guarantees	Strong convergence	Strong convergence (mathematically proven)	Eventual convergence, may lose edits

TL;DR

The Three Approaches in 30 Seconds

OT — transform the edits When an edit arrives out of order, rewrite it so it still makes sense. Used by Google Docs, Google Sheets, Office 365. Central-server friendly, keystroke-realtime, but hard to extend.

CRDT — make order irrelevant Give every character a unique, sortable ID. Tombstone deletions instead of erasing. Any merge order produces the same result. Used by Figma, Apple Notes, Linear, Yjs-based apps. Offline-first and peer-to-peer, at the cost of bigger payloads.

Diff-Sync — just send the diff Every few seconds, send "here's what's different between my copy and yours." Fuzzy patch on the server. Simple to build, good enough for most wikis and note apps. Doesn't feel keystroke-realtime.

And if you only remember one sentence"Google Docs rewrites your edits, Figma makes the order not matter, and Diff-Sync just keeps asking what changed."