NilAway: Practical nil panic detection for Go

Daily Digest email

Get the top HN stories in your inbox every day.

anonacct37

I do like the approach of static code analysis.

I found it a little funny that their big "win" for the nilness checker was some code logging nil panics thousands of time a day. Literally an example where their checker wasn't needed because it was being logged at runtime.

It's a good idea but they need some examples where their product beats running "grep panic".

lazaroclapp

Actually, if we were running into cases where we aren't logging a panic which is actually happening in production, then the first thing to note is that we need to improve our observability. The issue might or might not be recoverable, but it should be logged. If nothing else, it should show up as a service crash somewhere within those logs, which is also something that service owners monitor and get alerts on.

The advantage of NilAway is not just detecting nil panic crashes after the fact (as you note, we should always be able to detect those eventually, once they happen!), but detecting them early enough that they don't make it to users. If the tool had been online when that panic was first introduced, it would have been fixed before ever showing up in the logs (Presumably, at least! The tool is not currently blocking, and developers can mistake a real warning for a false positive, which also exist due to a number of reasons both fundamental and just related to features still being added)

But, on the big picture, this is the same general argument as: "Why do you want a statically typed language if a dynamically typed one will also inform you of the mismatch at runtime and crash?" "Well, because you want to know about the issue before it crashes."

Beyond not making it all the way to prod, there is also a big benefit of detecting issues early on the development lifecycle, simply in terms of the effort required to address them: 'while typing the code' beats 'while compiling and testing locally' beats 'at code review time' beats 'during the deployment flow or in staging' beats 'after the fact, from logs/alerts in production', which itself beats 'after the fact, from user complains after a major outage'. NilAway currently works on the code review stage for most internal users, but it is also fast enough to run during local builds (currently that requires all pre-existing warnings in the code to either be resolved or marked for suppression, though, which is why this mode is less common).

anonacct37

That makes sense. I hope my first comment came across as I intended, which wasn't as criticism of the product but a suggestion about how to talk about the value of the product.

lazaroclapp

No worries, that's how I understood it too :) Just adding a bit more context on why we feel the approach does beat "grep panic" for people checking out this thread.

carbocation

Just tried this out on some of my own code and it nails the warts that I had flagged as TODOs (and a few more...). The tool gives helpful info about the source of the nil, too. This is great.

Too

Building a type checker on global inference is the kind of thing that sounds romantic in academia - "no type definitions and yet get type checking!" - but ends up being a nightmare to use in practice.

Nilability of return values should be part of functions public interface. It shouldn't come as a surprise under certain circumstances of using the code. The problem of global inference is that it targets both the producer and the consumer of the interface at the same time, without a mediating interface definition deciding who is correct. If a producer starts returning nil and a consumer five levels downstream the call-stack happens to be using it, both the producer and caller is called out, even if that was documented public api before, just never executed. Or vice versa.

For anyone who had the great pleasure of deciphering error messages from C++ templates, you know what I'm talking about.

I understand the compromises they had to take due to language constraints and I'm sure this will be plenty useful anyway. Just sad to see that a language, often called modern and safe, having these idiosyncrasies and need such workarounds.

mrkeen

> Building a type checker on global inference is the kind of thing that sounds romantic in academia - "no type definitions and yet get type checking!" - but ends up being a nightmare to use in practice.

Hi! I use global type inference and I love it.

ludiludi

I got a nil pointer deref panic trying to use this tool:

$ nilaway ./...

panic: runtime error: invalid memory address or nil pointer dereference [recovered]

panic: runtime error: invalid memory address or nil pointer dereference

[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x100c16a58]

technics256

Is there any movement in the language spec to address this in the future with Nil types or something?

baby

I really love Golang and how it focused on making the job of the reader easy. But with today’s modern programming language the existence of null pointer dereference bugs doesn’t really make sense anymore. I don’t think I would recommend anyone to start a project in Golang today.

Maybe we’ll get a Golang 3 with sum types…

aatd86

If you squint really hard, the work on generics is a step toward the future.

If you don't squint, then I don't think so.

mcronce

With generics, can you not make a NonNil<T> struct in Go, where the contents of the struct are only a *T that has been checked at construction time to not be nil, and doesn't expose its inner pointer mutably to the public? I would think that would get the job done, but I also haven't really done much Go since prior to generics being introduced

Otherwise, since pointers are frequently used to represent optional parameters, generics + sum types would get the job done; for that use case, it's one of two steps to solve the problem. I don't foresee Go adding sum types, though.

suremarc

Every type in Go has a zero value. The zero value for pointers is nil. So you can't do it with regular pointers, because users can always create an instance of the zero value.

Mawr

You can make such a type and it works well in practice.

First we define the type, hiding the pointer/non-existent value:

    type Optional[Value any] struct {
        value  Value
        exists bool
    }

Then we expose it through a method:

    func (o Optional[Value]) Get() (Value, bool) {
        return o.value, o.exists
    }

Accessing the value then has to look like this:

    if value, ok := optional.Get(); ok {
        // value is valid
    }
    // value is invalid

This forces us to handle the nil/optional code path.

Here's a full implementation I wrote a while back: https://gist.github.com/MawrBF2/0a60da26f66b82ee87b98b03336e...

tw061023

Even if this would be possible, it won't be idiomatic.

tw061023

Without intention to offend. It's Golang, the language that famously ignored over 30 years of progress in language development for the sake of simplicity.

What answer do you expect?

dang

Hey, can you please not do programming language flamewar (or any flamewar) on HN? We're trying for something else here: https://news.ycombinator.com/newsguidelines.html.

Partly this is out of memory of the good/bad old newsgroup days where this kind of thing somehow worked ok, until it didn't, but it definitely doesn't work on the sort of forum that HN is. We'd like a better outcome than scorched earth for this place.

iot_devs

Wonderful job.

I am toying around with a similar project, with the same goal, and it is DIFFICULT.

I'll definitely get to learn from their implementation.

aatd86

Very interesting work. I wonder what were the difficulties encountered. Aliasing? Variable reassignment wrt short declaration shadowing?

Hopefully with time, when exploring union types and perhaps a limited form of generalized subtyping (currently it's only interface types) we'll be able to deal with nil for good.

Nil is useful, as long as correctly reined in.

mrkeen

> Nil is useful, as long as correctly reined in.

A good way to rein in behaviour is with types. If you need Nil in your domain, great! Give it type 'Nil'.

aatd86

Yes that's part of it. It will probably require a nil type which is currently untyped nil when found in interfaces.

The untyped nil type is just not a first-class citizen nowadays.

But with type sets, we could probably have ways to track nillables at the type system level through type assertions.

And where nillables are required such as map values it would be feasible to create some from non nillables then ( interface{T | nil})

But that's way ahead still.

undefined

[deleted]

candiddevmike

It's really easy to check a field of a pointer struct without first checking the struct is non nil. Would be interesting if go vet or test checked this somehow.

luxurytent

Plenty of Go commentary in this thread but can I just say I'm glad to have learned about nilness? Suffered through a few nil pointer dereferences after deploying and having this analyser enabled in gopls (off by default for me at least) is a nice change.

Tested via vim and looks good!

earthboundkid

I tried it but got too many false positives to be useful.

tptacek

I tried it and got a lot of false positives, but there wasn't so much output that I couldn't quickly pick out the interesting cases. This is very cool.

lazaroclapp

We'd be interested in the general characteristics of the most common ones you are seeing. If you have a chance to file a couple issues (and haven't done so yet): https://github.com/uber-go/nilaway/issues

We definitely have gotten some useful reports there already since the blog post!

We are aware of a number of sources of false positives and actively trying to drive them down (prioritizing the patterns that are common in our codebase, but very much interested in making the tool useful to others too!).

Some sources of false positives are fundamental (any non-trivial type system will forbid some programs which are otherwise safe in ways that can't be proven statically), others need complex in-development features for the tool to understand (e.g. contracts, such as "foo(...) returns nil iff its third argument is nil"), and some are just a matter of adding a library model or similar small change and we just haven't run into it ourselves.

earthboundkid

In one case, it couldn’t tell that a slice couldn’t go out of bounds because I was iterating through it backwards instead of forwards. In another case, I had a helper method on a type to deal with initializing a named map type, but it couldn’t see that and thought the map was going to explode from being nil. Those are two false positives I remember off the top of my head. I can look it up again later.

mrkeen

Does a false positive mean:

- You're confident that a flagged value is actually non-Nil?

- A value was Nil but you prefer it that way?

earthboundkid

https://news.ycombinator.com/item?id=38327236

pluto_modadic

cool... what does this mean the best linter / correctness checking is at the moment?

I have some code that eventually core dumps and honestly I don't know what I'm doing wrong, and neither do any golang tools I've tried :(

maaaaaybe there's something that'll check that your code never closes a channel or always blocks after a specific order of events happens...

mseepgood

I don't think a pure Go program can core dump, unless you use Cgo (wrongly) or unsafe. It can only panic.

yencabulator

Races between goroutines can corrupt memory. E.g. manipulate a map from two goroutines and you can wreck its internal state.

mutatio

Can this actually manifest? Even without the -race flag I think maps are a special case which will panic with a concurrent mutation error if access isn't synchronized.

adonovan

There are ways a Go program can fatal: by running out of heap, or stack, by corrupting variables by racing writes, by deadlocking, by misuse of reflect or unsafe, and so on.

undefined

[deleted]

DominoTree

I’ve seen it happen before because the stdlib actually directly just makes POSIX syscalls for a lot of things by default instead of using the native Go implementations and so you’re implicitly reliant on C code

adtac

I'm not sure if that was the best example to showcase NilAway. I understand there's a lot of context omitted to focus on NilAway's impact, but why is foo returning a channel to bar if bar is just going to block on it anyway? Why not just return a *U? If foo's function signature was func foo() (*U, error) {}, this wouldn't be a problem to begin with.

undefined

[deleted]

thiht

Not the point.

hardwaresofton

The code:

https://github.com/uber-go/nilaway

Daily Digest email

Get the top HN stories in your inbox every day.