Function Dispatch Tables in C (2019)

blog.alicegoldfuss.com

Daily Digest email

Get the top HN stories in your inbox every day.

asveikau

There is a vague reference to performance and runtime efficiency, as if function pointers will magically help. The syscalls table is said to be function pointers for performance reasons?? I feel the author does not understand this. The syscall ABI needs it because it's not a direct call, it's a generic mechanism to index into a list of functions whose implementations are completely abstract to the caller...

I guarantee you for these short examples in this article, using function pointers will slow everything down. The function pointer approach destroys locality, is bad for speculative execution, increases code size... A good optimizer would detect this and replace the indirect call with inlining. It's rare that you really need dynamic dispatch. There is a high performance cost for it.

The times when you need dynamic dispatch are those which you can't make a direct call, the callee is not known in advance. So things like generic callback mechanisms.

boffinAudio

Whats the performance cost exactly?

asveikau

Modern CPUs with pipelining, branch prediction, speculative execution, caching do best with fewer jumps, small code size, predictable jump targets, sequential access. A tight loop of "jump to the address i just loaded from this 64 bit quantity" throws a total wrench in the middle of it and will have those mechanisms stall.

Imagine a sorting algorithm with a jump into a callback to compare the elements. Then compare with a simple integer compare (1-2 instructions) inlined. In the former, you are jumping all over town, to jump targets loaded from RAM (not predictable). In the latter, one very small integer compare instruction, inline with the larger algorithm.

See also: qsort in c vs std::sort in c++

boffinAudio

I think you are constructing a straw man argument. The jump-table approach is just as likely to be a candidate for the optimization you claim will get in the way, as the code using if/switch statements. In fact, the compiler can see that simple call through the jump table and optimize it even better - maybe even by inlining.

The question is, how are you sure this isn't happening? The answer is, you're not - unless you examine the code your compiler is emitting, such claims are specious.

unwind

Uh, this is a few years old, and perhaps the author was learning as they went. Still, I think this passage is a bit too much:

Quick refresher: a pointer is a location in memory aka a memory address. That memory address can contain anything: an integer, a float, the middle of a string. It can also store the name of a function, also known as its label. A function’s name is its address in memory.

A pointer is not a location in memory. A pointer is a type of value, that represents a location in memory. It certainly cannot hold the name of a function, since names are not locations (also, in C there are no names at run-time typically).

It can hold the location of a function, which is represented symbolically in C by its name. These names are not called labels in C, that is a common term in assembly though.

actionfromafar

If you come at it from an assembly perspective, the explanation makes a lot of sense IMHO. But it could use a qualifier that the "name" is just a number and not the actual name as in the source code.

xboxnolifes

> Having 100 cases means we could potentially check 100 cases before selecting one. case statements are O(n), where n is the number of case/switch statements possible.

I was under the impression that most (all?) C compilers would optimize this use-case into a jump-table, turning this case statement usage into an O(1) operation. Or is this a C++ feature that I'm thinking of?

JonChesterfield

Quite a lot of effort goes into compiling switch statements. Jump tables and binary search are likely, as is a mixture of the two based on distribution of cases.

I haven't seen computed goto but wouldn't be surprised if some compilers have that as a third option.

Gibbon1

If you want fun see what a C compiler does to your switch statement with -Os

My opinion is the big advantage of dispatch tables in C is decoupling modules from each other.

Tip: Taking things further: variadic function pointers.

dchftcs

For readability, this is reasonable. There may be a cost that it weakens some static analyzers, but I don't program in C enough to be sure.

For performance, this is not necessary because the compiler can optimize long switch-case flow into dispatch tables: https://godbolt.org/z/7WxEfc6YM

elpescado

> For readability, this is reasonable.

With four branches, maybe. With more branches, having array of tens of function pointers makes hard to tell which index maps to which function call. OTOH, that can somehow be mitigated (at least in C99) with designated initialisers.

nynx

Definitely can be useful, but keep in mind that performance will usually be worse for function tables.

boffinAudio

Why?

And, if this is the case, why then do compilers turn switch statements into function tables?

flohofwoe

A jump table made of function pointers has more runtime overhead than a switch-case jump table because the latter directly jumps into machine code snippets within the same function, and those snippets don't have the function prologues/epilogues. And function pointers are also often an "optimization barrier" where the compiler can't inline to get rid of the epilogue and prologue.

vore

Yup, though I do think in some cases if the indexes into the function table are known a sufficiently smart compiler can inline it anyway, even if the linkage isn't static: https://godbolt.org/z/6cY7zxT9W

vore

While compilers do turn switch statements into jump tables, in this case storing function pointers and calling them you add the additional call overhead of saving registers, setting up the stack, etc in the function prolog and epilog.

boffinAudio

That call overhead is there in then code using "if-else" logic to determine which functions to call, also, though.

So I still don't see your claim as being accurate.

FrozenVoid

I prefer the flexibility of computed gotos a GCC extension that is likely the fastest dispatch method outside of compile-time. With computed gotos, you can have arbitrary dispatch complexity and structure, interleave code(e.g./code1/ ;skip_label: ;label1: /code2/; goto *return_loc;) and use assembler-like tricks to reduce fast paths(e.g. switch off code segment execution or alter control flow on the fly)

dmitrygr

> case statements are O(n), where n is the number of case/switch statements possible.

What now? Not even close. Any modern or even semi-modern (2005-vintage) compiler will compile large switches into a jump table (O(1)) and medium ones into a tree of branches, giving O(log n)

belgesel

I was expecting some form of benchmark to see if claims about better performance is true.

flohofwoe

Expecting jump tables to have higher performance than the alternatives sounds definitely iffy, this also reads as if the author doesn't know that switch-case statements also just use a jump table under the hood if the case-blocks are continuous.

Regarding performance, if the CPU branch predictor works well the jump indirection overhead might disappear completely, but that's still not as good as if the compiler can inline the destination function, and jump tables usually prevent that.

(a switch-case dispatcher might actually be better than a traditional function pointer jump table, because the switch-case eliminates some function entry/exit "ceremony", also see "computed goto")

vardump

Function pointer jumps definitely break most chances for a compiler to optimize a function call. But then, so does a switch statement, most of the time. (Not to mention switch statements are often transformed into jump tables anyways.)

Humans tend to do a lot better jump tables in assembler, because of better choices about register usage etc. can be made, less (or no) need to spill to stack. One of the few remaining compiler weaknesses.

hermitdev

I'm also dubious on the claim the switch statement is O(n). It might be in a pathological worst case, but you can pretty much bet the compiler is going to transform it into a jump table or other optimized execution (maybe a computed jump). Especially when the cases are contiguous like this...

I agree that a benchmark is warranted, or at least a comparison of the generated assembly (at different optimization levels).

nynx

Also, O(n) doesn’t mean much when the CPU can execute hundreds of checks in a few tens of cycles.

dzaima

Not for jumps - modern CPUs still (usually) have limit of one taken branch per cycle, or 1 to 2 untaken branches. And usually branch prediction will be the limiting factor anyway, for which a single branch is gonna be faster than many.

Patient0

This is also how virtual methods are implemented by C++ compilers.

undefined

[deleted]

jacknews

As others say I'm fairly sure switch/case creates a jump table under the hood, and it also checks bounds.

Personally, I don't think a static dispatch table like this is a good example. It's really more useful where the dispatch table might be dynamic; for example if a plugin could add more maths functions.

flohofwoe

> ...and it also checks bounds.

If you want to get rid of the switch-case bounds check, add a default case with __builtin_unreachable() (or on MSVC: __assume(0)), this hints the optimizer to not create a bounds check before the jump table access. At your own risk of course ;)

kibibu

There sure are lots of "fairly sure", "I feel" and "I think" comments in this thread.

I'm inclined to accept that Alice Goldfuss knows what theyre talking about, partly because their twitter is full of good tech stuff, and partly because I too have encountered situations where switching from a big branchy case statement to function pointers improved performance significantly.

jacknews

That's because there's no right or wrong answer about this, and people on here are not just a bunch of juniors who should automatically accept as an authority anyone who blogs.

If switch/case actually does compare each and every item, as the article claims, then obviously a dispatch table is a solid choice to improve performance.

But that is just not likely the case in this example (and it does depend on a number of factors), and so it's then largely a matter of taste. This technique definitely has it's place, but this example is not the best, um, case, IMHO.

kevin_thibedeau

A compiler can generate a jump table. Whether it will is another matter. For a small number of cases or values with large numeric gaps you will most likely get a chain of conditionals.

flohofwoe

IME the popular compilers (clang, gcc, msvc) are really aggressive about generating a jump table though, e.g. if you have continuous ranges with gaps between them, it will create a jump table for each range, not simply fall back to a dumb sequence of tests.

Dave3of5

Addendum, if choice is directly or indirectly decided by user input that code as is allows for array out of bounds.

Daily Digest email

Get the top HN stories in your inbox every day.