Skip to content(if available)orjump to list(if available)

Learn TLA+

Learn TLA+


·July 1, 2022


My mind works best with examples, I was about to ask here when I stumbled upon it on the TLA site [0].

The example starts out with a simple piece of code that exposes a bug, then the bug gets fixed and the following question is asked:

> Does the issue go away because we’ve solved it, or because we’ve made it rarer? Without being able to explore the actual consequences of the designs, we can’t guarantee we’ve solved anything

> The purpose of TLA+, then, is to programmatically explore these design issues. We want to give the tooling a system and a requirements and it can tell us whether or not we can break the requirement. If it can, then we know to change our design. If it can’t, we can be more confident that we’re correct.



I noticed a bug in the example, and thought that was the bug TLA+ would be used to solve. Apparently, that's a real bug, and the rest of the doc does not address it. So I proposed a fix for it [1].



God, that's embarrassing. I just merged your fix.


Take it easy; it's a proof that you're only human :-)

This bugfix brings up 2 good points.

1. Using TLA+ is no silver bullet to writing bullet-proof code. Someone translating from a proven TLA+ spec to code (C, Java, etc.) can easily introduce a typo/bug. I wish there was a translator that'd convert your TLA+ to code in a language of your choice.

2. Say what people may want about centralization (Git vs. Github, etc), this successful micro-collaboration was enabled by this centralization. Someone posts a link to a book/article, someone else posts a deep-link to some example code, yet another person finds bug in the said code, hunts down the Github link, uses Github's in-place edit feature (no (`git clone` + fix + `git push`)) and submits a merge-request, the author merges the fix (and smacks head :-). All this was kicked off by a conversation on HN, a center/hub for conversations.


For me this is also a point about writing code that can easily be mentally translated to human language.

It might seem simple enough, but this bug shows that

    if (Current < Withdrawal): Current -= Withdrawal
requires a few mental steps in working memory that can easily get missed


    isValidTransaction = Current - Withdrawal >= 0
    if (isValidTransaction): NewBalance = Current - Withdrawal
reads like English, and is hard to get wrong in my view.

In the first case, the reader is expected to do mental math (albeit easy) in order to verify correctness. Whereas in the second, they're only expected to read a logical statement and confirm that the logic checks out.


Increment this counter by one (by upvoting this comment) if you thought I was exaggerating, but hadn't immediately noticed the same bug as the one OP reported in my top example :)


Huh you’re right! The code and the TLA+ model are different:

> if (from.balance <= amount) # guard

> if acct[from] >= amnt then


Wow, good eye. Presumably this particular bug doesn’t require much other than unit tests.


Does TLA+ system prove the model or does it verify the model by essentially brute-forceing it? AlLoy seem to do the brute force verification using SAT solvers.


Yes it is essentially brute-forcing it.

Note that TLA+ is the language that the model is written in. The program validating is called TLC, it's a model checker.


tlc essentially uses brute-forcing, apalache uses SMT solvers.

tlc still has some advantages over apalache in some conditions, but actually I've never even used apalache with my models (I'm still in the practice phase, though).


I wonder if it's possible to model how fucked up the Golang concurrency model is with TLA+?

As a TLA non-SME I couldn't say, but would definitely find it useful and probably impressive. Maybe Brad Fitzpatrick could team up with Aphyr and use it to make golang v2 a bit less crap.

Note: I write this with love, as someone who's written hundreds of thousands of lines of Go, and am now turned off and afraid of the behavior at the sketchy edge boundaries. I've now shifted to learning Rust, albeit slowly and finding it saddeningly challenging compared to what I can pump out with the Go.

For reference, see "Data Race Patterns in Go", posted 21 days ago:


I've written about this before!

That said, I think the Spin syntax is slightly closer to native Go, making it easier to write specs. At least that's what people who know both Spin and Go tell me.


Not used Go myself, but since Go concurrency model is based on CSP, surely many CSP-based model checkers should help?


I just started reading the book a couple of days ago. Sigh... :-)

One thing that I wish websites did is to make it easy to report simple typos and grammatical errors without having to go through GitHub. For example, on "losting" should be "losing". It would be great to simply and quickly report them without having to context switch and go through the overhead of opening an issue or creating a PR. (In any case, I don't even have a GitHub account.)


It's also fine to send me an email (h@mymainwebsite) or a twitter DM or whatever, I just presented the issue so people know they can send feedback and that I'll accept it.


> I just started reading the book a couple of days ago. Sigh... :-)

I wasn't even going to click the link, because I thought this was for the old online book. Because of your comment, I see that Learn TLA+ has been updated, and Hillel says of the Practical TLA+ book I bought a couple of weeks ago: "Don't bother."

I haven't gotten that far yet, but I have modeled the wolf, cabbage, and goat problem, and helped the poor waiter in xkcd 287. Eventually I hope to apply it to a distributed database and our crazy Jira deployment. I'm not sure which will be harder.



I feel like you dropped this.


The ISO standard for Z says it must be pronounced "Zed", even if you're American, and as revenge I now pronounce Event-B as "Event-Bed"


Genuine question: how does this compare versus something like modelling in primitive python? what benefits are there, or is it just a case of age and that this is language / system agnostic?

I'm just finishing cosmic python[0], which talks about making a primitive model of your system first, that you can run business logic tests on, then all the other code depends upon that (domain driven design / onion layers, etc). To me it seems like this is the same thing. The only aspect that stuck out was the "two transfers at the same time" which to me, seems like it would depend on how you're implementing the model, rather than the model itself?

For instance, the example in [1] could also be done in primitive python, (arguably easier to read in my opinion but I'm not used to TLA syntax :)

    class Person:
        balance: int

    def wire(a: Person, b: Person, amount: int):
        if a.balance >= amount:
            a.balance -= amount
            b.balance += amount

        "a_amount, b_amount, transfer, a_remaining, b_remaining",
            (10, 10, 10, 0, 20),  # matching amount
            (6, 10, 10, 6, 10),  # has less
            (0, 10, 10, 0, 10),  # has nothing
            (10, 0, 10, 0, 10),  # to empty account
    def test_wire(a_amount, b_amount, transfer, a_remaining, b_remaining):
        a = Person(balance=a_amount)
        b = Person(balance=b_amount)
        wire(a, b, transfer)
        assert a.balance == a_remaining
        assert b.balance == b_remaining
The author did mention "... could be done in python" as well, so I doubt it's a case of not knowing about python, but perhaps my question is "why TLA over python?"

[0] [1]


TLA+'s model checker will verify (NB: this has scaling limits, but that isn't a problem for many practical circumstances) that two concurrent wires (when the system is properly specified) won't stomp on each other. Conversely, it will detect when they could stomp on each other.

So take that definition for the wire function and let two processes run it simultaneously:

  alice.balance = 100
  spawn(wire(alice, bob, 100))
  spawn(wire(alice, bob, 100))
Only one of these should succeed, but since the function isn't guaranteed to be atomic we can get to this state:

  if a.balance >= amount: -- true, so continues into the condition body
    a.balance -= amount -- process paused here
    b.balance += amount

  if a.balance >= amount: -- true, so continues into the condition body
    a.balance -= amount -- alice's balance now 0
    b.balance += amount -- bob gets 100
When p1 resumes after p2 finishes, Alice will end up with a balance of -100 and Bob will get 200. Assuming overdrafts aren't allowed (per the spec they aren't) then the system has reached an invalid state.

When testing concurrent programs it is a non-trivial thing to force these states, so bugs like this are hard to detect because they may only occur occasionally. And because they occur rarely, even when detected the actual source of the bug can be hard to reliably discern. TLA+'s model checker will check each possible interleaving of the two processes (in this case) and will discover the invalid execution path I showed above. Then you can go back to the spec and address the issue and rerun the checker, if your invariants hold (no sequence of actions lead to overdraft, in this case) then you can have confidence that, at least with regard to the properties you've specified, the system is correct.


There are two parts to the answer: the expressivity of the language, and the available tools.

TLA+ is not a programming language, and so cannot "run," but it is more expressive than any programming language could ever be, and can describe any discrete system at any arbitrary level of detail. This is particularly useful when the described system has some or a lot of nondeterminism, as is the case in distributed and concurrent systems. It is very easy in TLA+ to say "one of these things could happen at any time", or "the value of a variable can only grow monotonically over the system's lifetime", and it is easy to describe assertions of arbitrary complexity about a system, such as "every order that's received more than once will eventually be flagged."

As for tooling, while you cannot efficiently "run" TLA+ specifications, you can check your assertions in two ways: either with the proof assistant (although that requires a lot of effort), or with a model checker, that is places more limitations on what it is that you can check but is completely automatic.


TLA+ is for writing specifications of models, and programming languages are for implementing the models. Sometimes there is very little difference, and sometimes there is more. It's subtle in simpler examples like yours but becomes a little more obvious if you watch Leslie Lamport's video course:

The magic comes into play when you do things like using solvers (built into the TLA+ toolkit) to check invariants. This wouldn't be possible do with a code implementation alone, except for if you were to write property tests using a property testing framework or scenario generator using knowledge of all states the domain could be in. Your TLA+ specification is a mathematical formula of the states it can be in (even over time) while the Python implementation is more like a state machine that takes one input and results in an output.

Of course, in your example, you do something akin to property testing with pytest where you enumerate known states, I'm just using this to help describe the spec/implementation difference. In more complex models, testing property invariants may require enumerating more examples than you think to write tests for, and the important cases may not be what you expect. And as your property testing parameters get more complex in the search space, what your Python code ends up being is both an implementation (the runtime code) and a de-coupled model of the states it can be in (the test code) when conceptually it is one model. And in TLA+, you can specify it as a model and use the invariant checker as both a way to test your assumptions and a way to explore important states or state transitions.


If it takes a programmer in a specific (unused) programming language to write a specification, why not just write it in a working programming language?

As we can see from the comment above, we need to write a compiler and tests to check the specification anyway.


Python (and most languages) and testing likely won't detect the concurrency bug in the above program, unless you get lucky. TLA+ and model checking will.

You use TLA+ to examine a spec or design without the extraneous details and do V&V on it. Then you go back to your code and you can be more confident that what you are making will work as intended. It is complementary, not a full alternative. You'll still want tests for your code.


TLA+ lets you express algorthims that solve NP complete problems concisely. In particular, you can naturally say stuff like "this emits an arbitrary member of the set that satisfies this set of equations".

Of course, you can't execute such programs efficiently. The hope is that a model checker can efficiently prove they cannot hit an assert failure (halt) or find a counterexample.

This is impossible in theory, but often tractable in practice.


You write a compiler and tests to check the Python implementation. If you were to write it in TLA+ you could use a built in solver and specify invariants to check, which are just formulas, without requiring all the boilerplate that a programming language would require in order to be able to simulate examples from the state space.

I wouldn't say it "takes" programmers doing it this way. It's just something that can be in your toolbox if it's useful to you. Sometimes there are benefits to being able to iterate quickly on simpler models before diving into code implementations. Sometimes it wouldn't be warranted. Sometimes you can find a multi-step difficult-to-replicate bug in a specification that is modeling what would actually be multiple complex components in a system that might be in totally different programming languages, and testing them all to completion wouldn't be feasible, but testing a model of them might.


> If it takes a programmer in a specific (unused) programming language to write a specification

TLA+ isn't a programming language, it's a specification language.

Programming languages are much too concrete to evaluate designs; that's why specification languages (like TLA+) exist.

You "test" a TLA+ design by running a model checker (usually TLC) that verifies the design meets the spec (i.e. a bunch of invariants which are the TLA+-equivalent of unit tests).


Why do articles about OOP articles always use employee/payment/inventory management systems as examples… just feels like I’m right back at work.


This seems to have been copied straight from the Spock Framework[1] (which I've been using for comprehensive tests for several years), but worse.

Here's what it looks like in Spock:

    class Person {
        int balance 

        def wire(Person other, int amount) {
            if (balance >= amount) {
                balance -= amount
                other.balance += amount
    class PersonSpec extends Specification {
        def 'transferring #amount amount'() {
            given: 'two people'
            def pa = new Person(balance = a), pb = new Person(balance = b)

            when: 'an amount is transferred from a to b'
            pa.wire(pb, amount)

            then: 'the transfer succeeds if possible, or nothing happens otherwise'
            pa.balance == expectedA && pb.balance == expectedB

            amount | a  | b  || expectedA | expectedB
            0      | 0  | 0  || 0         | 0
            10     | 20 | 0  || 10        | 10
            10     | 10 | 10 || 0         | 20
            10     | 9  | 10 || 9         | 10



Spock is not a system for formal (ie. exhaustive) verification.


I did not say it is. I just said the Python code the parent commenter posted was just like Spock, but for Python.

Also: tests like that can be provably comprehensive in the cases they handle (which can be generated, you don't need to manually list them all), but that was not my point.


So can it spit out C or something? Or something to actually perform the algorithm? Or are you expected to manually translate back and forth between your model specification and your code?


It does not spit out C. TLA+ is aimed at the design/specification level, not the implementation level. You would have to take the information learned from the model checker or proof system (or even just the act of constructing the model can reveal problems) and change your program to address any discovered issues.


The spec might not even contain the level of detail necessary for that to be possible. That possibility is what makes modeling easier than implementing the spec.

Through a process of manual refinement, you can derive an implementation from the spec and check every step with the checker, but that's is more work than implementing the formal spec manually and is an most likely an overkill in practice.


In practice it's fuzzy but TLA+ is meant for checking your thinking not your code as per se


Several people I know with a formal methods and distributed systems background aren't that impressed with TLA+. I'm not exactly sure why or what else they prefer (Isabel? Coq?). Anyone with a formal methods background care to comment?


The vast majority of people using languages like Coq, Isabelle, or Lean, are researchers, and those tools are designed for research (such as defining and exploring new logical systems). TLA+, on the other hand, is designed for practitioners, i.e. people who build systems for a living. That is why more papers are being written about Coq and Isabelle, but more bugs in more real-world systems are being found with TLA+. So it depends on what your job is.


Yes the people I am referring to are in the business of designing and proving new distributed algorithms and proving/disproving the correctness of existing existing well-known algorithms for which only handwavy proofs have been provided. In particular it goes beyond just model checking.


Right. TLA+ has a proof assistant, just as Coq and Isabelle do, and it has been used to good effect. But because there is also a model-checker capable of checking (a subset of) TLA+ (actually, two model checkers now), practitioners greatly prefer using that over a proof assistant. The reason is that if your goal isn't to publish a paper but to deploy a system, what you're optimising for is bugs found per hour of effort, and a model-checker has a higher ROI in that regard than deductive proofs.


People that can write proofs in Coq and Isabelle might prefer that over the TLC approach of exhaustively checking all the possible states allowed by a TLA+ spec. But writing proofs in Coq/Isabelle/Lean/HOL is a more sophisticated skill and requires even more training on the multiple theories available in these provers. The brute force approach of TLA+/TLC is more dependable as part of an engineering process. Some specs can be really hard to prove, but easy to exhaustively check.


I'd be interested in hearing their opinions! Do you know more about their contexts? Like if they're in academia, industry, what kinds of things they're doing, etc.

(I don't think it's necessarily the case that they prefer Isabelle or Coq; it really depends on what they're trying to do. I'd be especially fascinated if they prefer, say, SPIN to TLA+, which is a much closer tool.)


It's not like TLA+ is really pushing research frontiers, it's just a great language using solid established algorithms that works really well for modeling distributed & concurrent systems. Maybe they're unhappy with the proof system part of the language? That's still a research project in progress.


TLA+ is used for model checking, right? Do Isabel and Coq do model checking, or are they optimized for model checking? I haven't tracked the area for years. I thought popular model checking systems were like SPIN and nuSVM. Didn't know the industry has shifted to using a proof system like Coq.


FWIW, I have a PhD in formal methods and spent a good chunk of that PhD proving stuff about distributed systems in Isabelle. I'm vaguely familiar with Coq. At work I've been largely using TLA+ the last few months to analyze designs of distributed protocols.

What I'll say is a simplification, but I'd describe TLA+ as a one-trick pony, that does that one trick very well. Roughly, you have to structure the model of your algorithm as a transition system, described using a predicate on "these are the legal initial states", and another predicate over two states that says how you can move from one state to the next. To describe these predicates, you get a fairly simple but expressive language at your disposal (a TLA cheat sheet probably fits a single page). This format lends itself well to describing high-level ideas of distributed protocols or concurrent algorithms (roughly what you'd write as pseudocode for those protocols/algorithms). You can then use that same predicate language, plus some additional operators to talk about time (e.g., you can say things like "never" or "eventually"), to specify what your system should do. Finally, you get an automated analysis tool (actually two tools these days, TLC and Apalache), that checks whether your model satisfies the specification. The beauty is that the checking is push-button - after you've finished writing the model and the spec, your work is largely complete. The analysis will have severe limitations; your model can't be too big, and it has to be finite (e.g., if you're writing a model of a distributed protocol, you have to limit the analysis to settings with a handful of nodes), but in practice even this limited analysis weeds out most of the bugs.

Isabelle and Coq are theorem provers (there's also a bunch of other theorem provers). That means, you have to define whatever you're modeling as a mathematical object, and then you go off and prove stuff about it. They're both extremely flexible - you can model a transition system just like TLA does, but you can also talk about any kind of mathematics (algebra, statistics, financial math, whatever). You can also encode programming language semantics, as well as program logics, that allow you both model actual C or OCaml or whatever code, and to specify properties about programs (as mathematical theorems) and prove them in a more comfortable way using program logics. The analysis (which corresponds to proving a theorem) is generally not limited (e.g., you prove a distributed protocol correct for any number of nodes).

The snag is that in Isabelle or Coq the proof is done in a largely manual way (you have to sort of type in the argument by hand, often in excruciating detail). If you want to verify a program with N lines of code, you'll typically end up writing 5-20 times N lines of proof to convince the theorem prover of the correctness of the said program. But to do this, you'll generally have a large library of already proved theorems (6-7 years ago I counted around 200k theorems available for Isabelle), and you will generally have a "metaprogramming language" at your disposal to write your own analysis (i.e., proof) procedures.

For academics, especially programming languages and formal methods people, their work is often writing new programming logics, or new proving procedures, or developing whole new theories. Theorem provers lend themselves better for that kind of work, as well as pushing boundaries, such as doing code-level analysis. But for engineering, TLA can be the 80/20 solution in many cases.


Great answer thanks. Yes this makes a lot of sense. My understanding was that TLA+ also supported theorem proving but it sounds like it is more used as a model checker in practice.


TLA+ has very different aims.

A better criticism would probably be to note that TLA+ is even within the more similar world of (say) model checking, it is a LISP to (say) Spin's (or similar) C




Ah, the HN's favourite language Three Letter Acronym +

I ctrl+f'd the page and the expansion of that acronym, Temporal Logic of Actions is nowhere to be found.


Leslie Lamport doesn't say what TLA+ standards for on his personal webpage, and I felt I had to pay him his respects

(Real answer: it's something most of us tell you if you ask, but "Temporal logic of actions" makes it sound a lot more intimidating to learn than it actually is)


The Glossary has it! I was also wondering how long it can go without ever resolving the meaning of each letter.


I wish sites like these were keyboard operable. You have a link down the bottom of the page, if I scroll to the bottom of the page (with space or down key or page down), it would be nice that it gets to the bottom, and then goes to the next page on the next keydown event.

We implemented this in medical software for reading patient documents at one of my last jobs, and it's a feature I wish more sites that were focused on reading material had (docs, book replacements, etc.)

Edit: just want to point out this is in no way a critique of this site specifically, so maybe off topic.


Really liked the paper book, looking forward to the new version of the website!

Nice to see some new posts on TLA every once in a while. A few days ago someone posted this series of very real-world examples: It's also quite good.


This looks like it could be really useful. I tend to write out my specs in pseudocode before coding anyway, so being able to easily translate that human gobbledygook into a form that can be checked for correctness, seems like it could help the creative process of coming to a solution. Just try out ideas and see if they help solve it. Keep honing them down until you get it.


Thanks for your work, Hilel! I've been using TLA extensively in my job the last few months (I work at a blockchain company), and it's been a good run - we found a bunch of issues in several designs and even implemented code (some fairly critical). My secret hope is to get some co-workers to start using TLA themselves (so I can go off to do code-level verification instead hehe), I've organized a couple of internal tutorials but no takers so far - hopefully this free learning resource will help advance that goal :)

On a related topic, does anyone know of a comparison to Alloy 6? I've been meaning to take a day or two to look at it (I tried Alloy out a while ago while I was still in my PhD, but have forgotten most of it), I'm curious to see how it stacks up.