Rpath, or why lld doesn’t work on NixOS

matklad.github.io

Daily Digest email

Get the top HN stories in your inbox every day.

slabity

How does an article on NixOS talk about the `rpath` issue without also mentioning the `patchelf` utility that NixOS developers created to solve this issue? It's a small tool that lets you modify ELF executables and binaries. It's also the recommended way for NixOS users to modify binaries to work properly.

https://github.com/NixOS/patchelf

matklad

`patchelf` solves a different problem: it fixes up an already built binary. Here, I am building the binary myself, so I’d rather make that just work without any extra build steps.

slabity

Ah, my bad. I was confused because I never run into linker issues when I build my rust (or any other type of) binaries on a NixOS system.

In fact, I am running the `evdev` example and I don't get any linker errors at all even when I change the linker to LLVM. I am using a nightly version of rust though.

rubicks

Strongly concur. Patchelf is indispensible for beating sense into third-party closed-source shared objects. The `--set-soname` option is particularly useful in correcting all kinds of ineptitude.

kazinator

I'm pretty sure NixOS isn't the only one doing this hack. The Yocto build system does something similar. It contains its own build-time binary glibc, and patches its tools to point to its own internal library installation. Or something like that. In effect, Yocto has its own build-time distro, which has to run from any filesystem location.

lloeki

> I'm pretty sure NixOS isn't the only one doing this hack

When developing ArchMac I had to do godawful hacks to bog-standard libs because whatever build system decided to hardcode a lib path (or forcefully strip one when it should be hardcoded, I've had to handle both) that I had to manipulate through various means including install_name_tool which is not that different from patchelf†.

This kind of issue was not macOS specific, it just turns out the various ways things were built happened to gracefully "work" on most Linux distros by sheer luck but they could have been equally broken.

† Not really a surprise when thinking about it, the concept of Nix derivations is not that different from the concept of Darwin bundles/frameworks (in terms of being a self-contained dependency package) so it's only natural similar issues, and thus approaches and tools to tackle them, emerged.

PaulDavisThe1st

macOS has also been doing this for years, decades even, as part of the job of install_name_tool

astrange

Though you can override paths to libraries with environment variables, as long as the overriding one has the same full install name.

That’s better since editing a binary will break its codesigning.

higherhalf

Speaking of significantly faster linkers, I personally use mold[1], including when writing in Rust.

[1]: https://github.com/rui314/mold

saghm

I'm super interested in trying out mold, but I was a bit taken about seeing that it's AGPL licensed; I have no idea what the implications of that license for a linker would be. Would using it to link into a final binary require sharing the source of the binary when its distributed? What about if its used to link statically into a library, and then that library is linked into another binary?

rkangel

You're running the tool not embedding or linking against its source code. An equivalent example might be if Libreoffice was under the GPL licence[1] - that licence wouldn't have any implications for a spreadsheet you created using it.

[1] It's not - it's MPL

moonchild

No more than would using a GPL-licensed linker, such as gnu ld or gold.

FL33TW00D

Really looking forward to MacOS support on Mold!

fanf2

Because I am stubborn, when I am linking with a library installed in a nonstandard place, I usually try to get the configure script to do the right thing, even though it is not always easy. But, just in case I lose the battle, I keep in mind the existence of chrpath, a little utility for changing the rpath in an ELF binary. Because you use it on the final build artefact, there is no way for autoconf or libtool to screw it up.

kaba0

As mentioned by another commenter, the NixOS project does have a similar little program called patchelf, which may be a bit better known?

wyldfire

> That something is rpath (also known as RUNPATH)

Pretty sure that runpath and rpath are distinct and have slightly distinct behavior. Can't fault you much for making the mistake, though. The two are not given names to be easily distinguished.

nemetroid

RPATH and RUNPATH are indeed different. However, I believe the -rpath flag is used to set either (--enable/disable-new-dtags determines which one is used), so the confusion is understandable.

skainswoo

What is the distinction between them? I always thought they were the same thing.

stabbles

Depends on your libc.

Search order:

glibc: rpath > LD_LIBRARY_PATH > runpath > ld.so.cache > default paths.

musl: LD_LIBRARY_PATH > rpath=runpath > default paths.

Search path inheritance:

glibc: rpaths are inherited: When exe depends on libx depends on liby, then liby first considers its own rpaths, then libx's rpaths, then exe's rpaths. HOWEVER if liby specifies runpath, it will not consider rpaths from parents.

musl: rpaths and runpaths are the same and always inherited.

I verified the glibc/musl sources when writing https://github.com/haampie/libtree

undefined

[deleted]

wyldfire

The difference IIRC is the order they're used by the dynamic loader. One is prior to searching LD_LIBRARY_PATH, the other after.

nemetroid

There's also a difference for transitive dependencies. From the Linux manpage for ld.so, about RUNPATH:

> Such directories are searched only to find those objects required by DT_NEEDED (direct dependencies) entries and do not apply to those objects' children, which must themselves have their own DT_RUNPATH entries. This is unlike DT_RPATH, which is applied to searches for all children in the dependency tree.

setheron

I actually wrote somethijg recently related to RPATH in Nix you might find interesting

https://fzakaria.com/2022/03/14/shrinkwrap-taming-dynamic-sh...

colordrops

I haven't laughed out loud to a technical post in a long while, and this comment got me:

"As this is NixOS, we are not going to barbarically install it globally"

pvtmert

interestingly macOS has a really nice solution called @rpath and install_name_tool

basically you build your binary and set a rpath, let's say it is /usr/local/lib in your machine

mach-o binary stores: @rpath = /usr/local/lib libxyz = @rpath/libxyz.dylib.1

when I want to install those to /opt/something only thing I need to do is install_name_tool -add_rpath /opt/something

This will add search directories to binary itself. There are some DYLD_* environment variables too but I'm not sure about them... (Some are SIP protected by the way)

PS: It may invalidate signed binaries. Again, not tested such use cases.

xenadu02

That's not exactly right.

A library itself decides if it is relocatable or fixed. If it is fixed the MH_DYLIB records its install name as /path/to/binary (generally by setting DYLIB_INSTALL_NAME_BASE so xcodebuild will merge that with the library name automatically). The binary must be at that path. However this can (and often is) a symlink just like other systems use where /usr/lib/somelib.dylib -> /usr/lib/somelib.1.3.dylib so that minor version updates can be made without rebuilding programs.

If a library wants to be relocatable it specifies an install name of @rpath/binary.

At runtime dyld creates a "run path list". Every time it encounters a load command with an @rpath name it tries substituting paths from the run path list until it finds the library. The main binary along with any dependencies can add entries to the run path list. These can be absolute paths or relative paths anchored from @executable_path or @loader_path. The former being the main binary and the latter being the path to the binary itself (eg if the main app loads a plugin the plugin can reference dependencies relative to the main app or itself as needed).

You can push your own paths in the mix with DYLD_LIBRARY_PATH (searched first) or DYLD_FALLBACK_LIBRARY_PATH (searched last). Check "man dyld" and "man ld" if you want more details.

None of the above requires modifying binaries and so doesn't invalidate code signatures. If you want to use install_name_tool on binaries you build pass "-headerpad_max_install_names" to ld so it will pad out the load commands which makes it easier to edit them.

There have been a bunch of security vulnerabilities around the Windows strategy of auto-loading any dependency from the same directory as the binary so YMMV.

kazinator

That's a pretty poor solution compared to just finding libraries relative to the root of the program's installation.

whateveracct

Yep - Windows of all things handles this the best.

You can try to do this with rpath of $ORIGIN. But then you'll probably still run into libc issues -.-

Not hard to see why a lot of games focus on Windows and let wine/proton handle Linux.

account42

> You can try to do this with rpath of $ORIGIN. But then you'll probably still run into libc issues -.-

Linux (g)libc is sort of equivalent to the Win32 API on Windows so you are not expected to ship your own version just like you don't ship your own ntdll, user32, etc. Since glibc has good backwards compatibility the onlye libc issue you will run into is having to compile against the oldest version you want to support.

undefined

[deleted]

kazinator

And the irony? It was someone on Unix who came up with the idea that when foo.c contains #include "bar.h", the same directory where foo.c resides will be searched for that header first, by default, before other places.

The idea of doing that for the DLL search in Windows could have been inspired by C.

ncmncm

So you are saying this is why Windows users universally complain about "DLL Hell", and Linux users don't? Or is this how MS finally fixed Windows DLL Hell? (Presuming Windows DLL Hell is, indeed, fixed; I wouldn't know.)

viraptor

That's a minimal issue for games. You ship all dependencies with a start script that points LD_LIBRARY_PATH where you want and everything's fine. Or maybe even ship a flatpak. Issues for games come from other places.

PaulDavisThe1st

install_name_tool -change libfoo @executable_path/../libs/libfoo

does precisely that. @executable_path is evaluated at runtime.

kazinator

I don't see anything like that in the solution I was replying to.

gigatexal

I've been admiring NixOS from afar but given the bending users have to do to use it in this particular use-case wouldn't it seem that NixOS should present in a transparent and seamless way libraries in these shell-environments the author creates for the build to surface the necessary libraries in such a way that "standard" tools like ld/ldd can find them? Shouldn't it be on the shoulders of NixOS to make this such that users of NixOS need not require patching tools or hacks etc?

rkangel

NixOS changes some standard conventions of Linux filesystem layout and where tools can expect to find things. These are for good reasons and are due to the core of what NixOS is trying to achieve. For building most things those changes are relatively abstracted away (see ld wrapper script for details) and you don't have to know about them. Fiddling around with something low level like a linker is I think a forgivable situation where the abstraction leaks - it is a process that is inextricably linked to where the system puts things.

My issue is the last sentence:

    So… .. turns out there’s more than one lld on NixOS. There’s pkgs.lld, the thing I have been using in the post. And then there’s pkgs.llvmPackages.bintools package, which also contains lld. And that version is actually wrapped into an rpath-setting shell script, the same way ld is.

That means that there isn't a problem. NixOS has fixed this, the system works. Except that you have to magically know which package you should be using. This is the sort of problem that I run into with Nix - it's hard to know the correct incantation.

gigatexal

That's fair. Makes my point/post obsolete.

cryptonector

Yes, you need to set the RPATH correctly when building ELF executables and shared objects. You would only not know this if you were only ever building things to install into /usr with dependencies on things in /usr.

> Curious observation: dynamic linking on NixOS is not entirely dynamic. Because executables expect to find shared libraries in specific locations marked with hashes of the libraries themselves, it’s not possible to just upgrade .so on disk for all the binaries to pick it up.

Dynamic linking is not just about being able to upgrade without re-linking. Dynamic linking is not even primarily about that, not anymore, if it ever was.

Dynamic linking is more than anything about semantics that no one has bothered to add to static linking!

Static linking for C is stuck in the 1970s.

Dynamic linking for C makes C more like C+ -- a different language.

Specifically:

- with static linking symbol conflicts are a serious problem

- with dynamic linking symbol conflicts need not be a problem because with direct binding (Illumos) or versioned symbols (GNU), you get to resolve the bindings correctly at build-time and have them resolve correctly at run-time

- at build time you get to list just direct dependencies, and the linker does the rest -- compare to static linking, where you have to list all dependencies only in the final link-edit and then you must flatten the dependency list into some order, and then if there are conflicts, you lose.

For all those who keep harping on how static linking is better than dynamic linking, what I would suggest is that what must be done to make static linking not suck is to enrich .a files with the kinds of metadata that ELF adds to shared objects so we can get the same "list only direct dependencies" semantics when static linking as when dynamic linking. And I would note that libtool does this, just... very poorly.

What I would do to fix static linking:

- have `ld` add a .o to every .a that includes the `-L`/`-R`/-l` arguments given when constructing the .a (normally one does not do this when linking statically!)

- have `ld` look in every .a found when doing a final link-edit to recursively find its dependencies, and, most importantly,

- provide the same direct binding / versioned symbol semantics as in dynamic linking so that external symbols in dependents are resolved to the correct dependencies in the same way as in dynamic linking.

Notionally this is quite simple. But adding this to the various ld implementations would probably be rather a lot of work. Still, if people insist on static linking, this work should be done.

rkangel

This is only semi-related, but would you mind explaining what libtool actually does and what problem it solves? I've never been able to fully grok it.

account42

libtool is a build tool made primarily to make your life hell if you do anything that the libtool authors did not plan for. Like who thought it would be a good idea to silently drop unknown linker-driver flags.

cryptonector

It writes metadata into .la files, which are text-based adjuncts to .a files. It's meant to give you a common and portable interface to static and dynamic linking.

But libtool is written in POSIX shell, it's not part of the linker, and it is a bit of a disaster.

jcelerier

> For all those who keep harping on how static linking is better than dynamic linking, what I would suggest is that what must be done to make static linking not suck is to enrich .a files with the kinds of metadata that ELF adds to shared objects so we can get the same "list only direct dependencies" semantics when static linking as when dynamic linking.

Or just use cmake which does that automatically

cryptonector

It can't. The problem is that .a files do not record their direct dependencies, unlike ELF objects, and instead depend on the final link-edit having the full tree of dependencies provided, but flattened into a list. That flattening loses critical information needed to correctly resolve conflicting symbols.

jcelerier

It absolutely does. When you use cmake, if you link against the target foo which is a static library and itself was marked as linking to bar, then your final executable will have -lfoo -lbar. Of course this information isn't stored in the .a files, but in cmake's FooConfig.cmake files - who cares as long as it works ? The vast majority of libraries now have those even if they don't use cmake as build system as they are fairly easy to generate.

stabbles

Linux usually (by convention) provides 1 file and 2 symlinks per lib: liba.so -> liba.so.x -> liba.so.j.k.l.

The first one is to make the linker (ld) happy: -la will look for liba.so. The linker puts the SONAME (liba.so.x) in DT_NEEDED.

The second symlink's filename corresponds to the SONAME, so that the runtime linker (ld.so) can locate the library by SONAME in rpaths.

The third one is the actual library, which can be updated while keeping the same soname & same abi.

Now, it would be great if the linker had an option to not only copy the SONAME into DT_NEEDED, but also register the path in which the library was located as an rpath.

Cause the situation on Linux is absurd! You pass some flags -L and -l to the compiler/linker, the linker links something and nobody knows what. Then when you run your executable it has to locate this something again, and you can only pray that your libc and binutils/llvm agree on search paths & order. In most cases this does not work, and you must manually pass -Wl,-rpath,/some/path to add a search path. Nobody guarantees that what the linker links is what the runtime linker uses.

Of course there are many edge cases:

- linking during make without relinking during make install will make your executables register rpaths to build directories instead of install dirs

- sometimes you link to a stub lib that should not be used at runtime

But still, some guarantee that what you build with is what you run with would be a major user experience improvement for linux.

jcranmer

> Cause the situation on Linux is absurd! You pass some flags -L and -l to the compiler/linker, the linker links something and nobody knows what. Then when you run your executable it has to locate this something again, and you can only pray that your libc and binutils/llvm agree on search paths & order. In most cases this does not work, and you must manually pass -Wl,-rpath,/some/path to add a search path. Nobody guarantees that what the linker links is what the runtime linker uses.

The issue you're missing is that the build directory is usually not the location of the final binary objects. The actual absolute path of libfoo.so may well be in /builds/runner/foo-package-4df78af0/build/prefix/lib/libfoo.so, which is unlikely to exist on anyone other than the CI's machine (and even on the CI machine itself for too much longer). The actual location will usually be /usr/lib64/libfoo.so, but the library that is linked against may well not be there at the time of linking (particularly in the case where a package is building both a library and an executable that depends on said library in the same package).

What you really want is for the relative path to the library to stored in the executable. Unless what you want is to actually use the globally-installed library and not one you're building at the same time. There's no single solution that fits every use case!

stabbles

> The actual location will usually be /usr/lib64/libfoo.so

Usually indeed, for distro's that pretty much support one single version of every library. But this is no longer true for Nix, Spack, Gentoo Prefix and Guix, all these package managers/distro's have in common that there should be no default search paths where all libraries are dumped into.

How about `--copy-link-path-as-rpath` and `--copy-link-path-as-rpath-ignore=/build/dir`, so that ld continues to copy the soname to dt_needed, and registers rpath of non-build dirs. Then Nix, Spack, ... can simply use these flags in their linker wrapper.

throwaway09223

I think most people designing build systems (I'm one of them) would prefer to explicitly set the paths at each point rather than have gcc ferry values between inputs behind the scenes.

My build system will have already resolved all these paths. It's very easy to interpolate these paths into the command to call the compiler.

SAI_Peregrinus

Of course you can't guarantee that the library will get installed to the directory you think it'll get installed to, since there's no unified installer system for all Linux distributions. So even a relative path doesn't necessarily work. The best you can do is hope that the Freedesktop filesystem hierarchy is being followed, but that forces installing software for all users at once instead of per user, despite Linux supposedly being a multiuser OS.

stabbles

In Nix and friends, they don't do relocation at all. So you know where the libs are on everybody's filesystem.

throwaway09223

> Cause the situation on Linux is absurd!

Your complaints are reasonable, but gcc already does this. Here, I'll show you how:

> You pass some flags -L and -l to the compiler/linker, the linker links something and nobody knows what.

I agree, it would be really nice to be able to specific exact shared object paths instead of using -L and -l. Build systems typically already know the full paths to all the objects and the abstraction here is often unhelpful.

This could be remedied fairly easily by allowing (for example) -l to take an absolute path to an object rather than searching -L paths. But gcc already does this - you can just put the shared object on the command line directly like so:

Change this: gcc -lfoo bar.c

Into this: gcc bar.c /path/to/foo.so

The effect is the same, but more explicit. The foo.so.x object will be linked and added to the SONAME list.

> Nobody guarantees that what the linker links is what the runtime linker uses.

This part, however, is by design. We explicitly do NOT want what the linker links to be what the runtime uses. This is how we update shared objects between minor versions to fix bugs without reinstalling every binary on the entire system!

Letting libfoo.so.1 link to libfoo.so.1.x is a huge feature. Locking in an explicit minor version would defeat the entire purpose of dynamic linking.

stabbles

> Letting libfoo.so.1 link to libfoo.so.1.x is a huge feature. Locking in an explicit minor version would defeat the entire purpose of dynamic linking.

My suggestion is to continue copying the SONAME into DT_NEEDED, and record the dir the lib was found in as an rpath.

I did not say it should fix the lib by filename.

throwaway09223

Well, you complained about ambiguity in the search process during build, and referencing by path is how to fix that.

Regarding runtime, we absolutely do not want to implicitly embed build paths. As others have said because it's unlikely we will put them in the same place, and it's unlikely we will build and run on the same systems.

The runtime library management system is going to have a structure for where to place libraries. It may be as simple as tossing them in /usr/lib, or it may be somthing where we have different paths for each application. We can't do this if the compiler implicitly dictates universal linker paths.

One aspect I think you may also be missing is that shared objects themselves have DT_NEEDED and rpaths. You would quickly run into very confusing conflicts between binaries built on different systems, or with different build environments.

It's hard to see a problem here, since adding an rpath is very easy. You appear to be asking for implicit, hidden behavior in the compiler which doesn't fit the vast majority of use cases.

kazinator

rpath is braindamaged; no program should be using that, and a distro build should almost never be inserting such a thing into executables.

I suspect that NixOS is playing with this in order to have a relocatable install: so that is to say, so that user can install NixOS in some subdirectory of a system running some existing distro. Any subdirectory, yet so that programs can find their libraries.

If I were in this predicament, rather than perpetrating hacks to patch the rpaths in binaries, I'd fix the dynamic linker to have a better way of locating shared libraries. The linker would determine the path from which the executable is being run, calculate the sysroot location dynamically, then look for libraries in that tree. E.g. /path/to/usr/bin/program would look under /path/to/usr/lib and related places.

A possibly nice hack would be to extend the meaning of the rpath variable. Give it a syntax, like say that if it starts with @, then the rest of it denotes a relative sysroot path fragment.

E.g. the program that gets installed as /path/to/usr/bin/program would be built with an rpath of "@/usr/bin". So then the dynamic linker sees the @ and does a sysroot calculation. First it strips off the basename to get just the directory part "/path/to/usr/bin". Then it sees, hey, the suffix of "/path/to/usr/bin" matches the "/usr/bin" in the rpath. The suffix is stripped to produce "/path/to" and that path is then used as the root for the library searching. Instead of searching literally in /lib or /usr/lib or whatnot, the "/path/to" part is prefixed to every search place to look in /path/to/lib and /path/to/usr/lib.

Patching binaries is very poor; it changes their cryptographic hash like SHA-256. You want your distro to be installing bit-exact stuff from the packages, and treating it as immutable.

clhodapp

As a rule, NixOS doesn't patch binaries, it causes RPATH to get baked in at build time. Also, as a rule, it doesn't have some sort of materialized FHS-like subtree for each package, it manages a complex filesystem tree where the (bit-exact) stuff from the packages is stored immutably by the hash of the full build description.

Binary patching only comes in when they're trying to get closed source binaries to run on NixOS and is fully managed by the packaging process to happen the same way on every system. These days, though, the approach of using filesystem namespacing to give packages a custom FHS-like view of the world seems to be growing more common instead of the patching.

cryptonector

IMO RPATH is fine, but $ORIGIN-relative RPATH values are best.

That said, it's generally very difficult to build deploy-time relocatable code in Unix-land. The problem is that there's nothing like $ORIGIN for finding static assets, and all the autoconf tooling just makes it so easy to make all paths in object code by absolute paths that include the install $prefix/$bindir/$libdir/$sharedir/$statedir/$etcdir, etc.

Not that one cannot write deploy-time relocatable code -- I've done it plenty. But that it requires so much foreknowledge, intent, and know-how, that it just doesn't get done.

matklad

> I suspect that NixOS is playing with this in order to have a relocatable install

Kinda the opposite: NixOS doesn’t really have a sysroot.

eptcyka

nixOS is built around hashing the outputs of its builds, so you can verify if a build produces the expected output given the same inputs. The reason nixOS patches binaries is so that they can actually find the shared objects they expect when they are not stored in /usr/lib. Since every build output gets its own unique path, this allows for a binary to link against two slightly different versions of the same library, which in practice is never an issue one needs to resolve. However, a more practical issue this solves is that you can have a single sysroot with two binaries that need two slightly different versions of the same library.

nwmcsween

Alternative is to store the hash in the filename which avoids this.

geraldcombs

> A possibly nice hack would be to extend the meaning of the rpath variable. Give it a syntax, like say that if it starts with @, then the rest of it denotes a relative sysroot path fragment.

This sounds vaguely similar to dyld's @executable_path variable on macOS.

account42

This is already supported as $ORIGIN under Linux (well at least with glibc).

mangix

Or avoid all of this with static libraries :).

Rucadi

This is the only sane way...

And for people saying "Buy you can't get security updates".

I would rather have a dynamically-linked binary that includes all the dependences in it, at which you can upgrade the dependencies with a tool over the binary, than the madness of shared libraries in system paths. (Well, you kinda get that with appimage and similar).

Daily Digest email

Get the top HN stories in your inbox every day.