Skip to content(if available)orjump to list(if available)

Incrementing Vectors (2019)

Incrementing Vectors (2019)

18 comments

·September 21, 2022

cocoto

Serious question, why would you write the increment function differently than this simple one below?

void vector8_inc(std::vector<uint8_t> &v) { for (uint8_t &x : v) { ++x; } }

No need to deal with indices or pointers, no obfuscated code, no additional variables, only one clearly stated side effect (we don't need to update a local variable), reads like simple English. I personally don't like C++ (I prefer C) but why would you write C like code in C++?

Gupie

You wouldn't. The article itself says:

"What about the range-based for loop introduced in C++11?

Good news! It works fine. It is syntactic sugar for almost exactly the iterator-based version above, with the end pointer captured once before the loop, and so it performs equivalently."

nkurz

Seems like a fair question. At a glance, it produces the same code as Travis's preferred solution: https://godbolt.org/z/7oPhxEs5E. Anyone with more taste in C++ code able to answer? I'm more comfortable with C than C++, but this would be my preferred answer (if somehow knew in advance that it was going to produce good code).

BeeOnRope

It's mentioned in the article (look for "What about the range-based for loop"): it works fine because it uses iterators under the covers which as local objects avoid the specific problem which occurs here.

This article isn't really meant to highlight something problematic about vectors in particular, but with C and C++ in general: that we actually rely on aliasing-based optimization in many cases and this optimization is fragile.

The basic operation here is simply indexing into an array stored in a struct which is quite common, even though you of course should prefer the range-based for this minimal example.

amag

You wouldn't if all you want is to just increment all elements in a std::vector but for a more complex loop you might want the index...

noobermin

Sheesh, the ability to alias char to anything has some serious consequences on optimization, huh.

It's unfortunate, because what is likely an obscure case (where the user of the function call intends to alias things in the very container they are using) is a minority use case but it ruins the situation for most other code.

im3w1l

There are multiple possible lessons to take from this. On the one hand you could say that "aliasing is the issue, clearly things should not be just allowed to alias each other willy-nilly". Or on the other hand you could come away thinking "I always mistrusted optimizers! They are just too brittle and unpredictable! Placating them is a never ending treadmill. I'll manually vectorize key parts"

Twisol

Both. Both is good.

684577567345

I could not reproduce the authors results on clang 8.0 or gcc 8.1 at O3. The uint8_t version performed roughly 6 times faster than the uint32_t version. The post was published in 2019 so perhaps something has changed since then.

https://quick-bench.com/q/93vPCntI82FHbwAMv1ODUd_FjPc

TinkersW

Your vectors are local to the function, so not the same thing, trivial for compiler to know they aren't aliased.

yarrrrrrrrrrrr

https://quick-bench.com/q/x9kqob0PB5FsWQ65tFgpOlpaWOA

Repro. As the other commenter said, the internal data pointers need to be of unknown provenance (e.g. non-local).

dmitrygr

And that is why C has the "restrict" keyword, which you should use in precisely such cases

Joker_vD

Of course, if you accidentally pass aliased pointers as restricted pointer arguments to a function, "no diagnostic is required".

noobermin

Which is a shame because restrict is not in C++

nahstra

If it's not in the standard, but it's in every C++ compiler, is it really not in C++? Semantics

fartsucker69

Wonder how a manually written vectorized 8 bit version would compare (with some padding inserted at the end of the array to avoid extra checks). Work size/bandwidth in a small test like this shouldn't matter, prefetching/write combining should be essentially perfect and bandwidth won't show up until it's a bottleneck/we test throughput.

Besides the lacking optimization due to aliasing, could there also be some microcode iffyness be going on? Can an x86/64 cpu just increment an 1 byte value positioned at a random memory location?

MauranKilom

As the article notes, range-based for loops side-step this problem.

But the general problem (writing to a char* of unknown provenance might force the compiler to reload any other pointer reads) remains, of course.

null

[deleted]