Get the top HN stories in your inbox every day.
AndyKelley
conaclos
> Case in point: the new self-hosted compiler is 1.5x faster than the C++ implementation and uses 3x less peak RAM.
I am not sure it is comparable: the self-hosted compiler uses a different architecture [1].
Laremere
This is true, but I don't think your point entirely fair either. Zig's design lends itself to writing this style of code. In contrast the usual way(s) C++ is used encourages code that makes these types of optimizations hard/impossible. In practical terms, if you wanted to write fast software, the ergonomics of the language allowing you to use the desired architecture matter a lot.
(Also, you're replying to the primary author of Zig and presenter of that talk, if you weren't aware.)
kaba0
While ergonomics matter, I don’t believe the question is settled at all - for very high performance applications C++ is the answer for a long time to come.
chris2860
Hi Andrew, sorry for the unrelated question but where can we find the financial reports of the Zig Software Foundation for 2022? The Finances spreadsheet hasn't been updated for the past half a year:
https://docs.google.com/spreadsheets/d/14_ljFHGFXY5NhBhlfjgk...
killingtime74
Isn’t it quite common to only do financial reports only every quarter, 6 months, yearly?
llimllib
> having better debug tooling
What debug tooling are you referring to? It seems like people just use lldb to debug zig programs?
I'm honestly interested, and searching doesn't turn up good tools as far as I can tell.
AndyKelley
Here are some features of Zig that make debugging much easier than C++:
* safety checks (equivalent to UBSAN) when triggered, print stack traces that include source code lines and point to the relevant line/column
* one of the safety checks is using the wrong field of an untagged union. This one is so underappreciated. Wrong union field access costs so much time to debug in C++ but it is a breeze in zig, you get a crash explaining the problem with a stack trace immediately, not a corrupt value that causes problems down the road. untagged unions are core part of the strategy that makes zig fast & have a small memory footprint.
* valgrind client request integration[1]. valgrind "just works" better with Zig than C/C++ because zig emits communicates more information about memory regions that are "undefined".
* error return tracing [2]
* segfaults print stack traces
* std.debug has features to collect stack traces in a compact manner and then dump them at a relevant time [3]
* std.heap.GeneralPurposeAllocator detects leaks and prevents memory corruption from Use-After-Free/Double-Free, making debugging easier
[1]: https://valgrind.org/docs/manual/manual-core-adv.html#manual...
[2]: https://ziglang.org/documentation/master/#Error-Return-Trace...
[3]: https://github.com/ziglang/zig/blob/4a98385b0aa3808ab05a1ebf...
llimllib
Thanks!
dilap
> * Zig's std lib data structures are incredibly useful, especially compared to the C++ STL.
I think elaborating on this could make for a very interesting article. :-)
(I have a guess already at one item, which is MultiArrayList, something that kind of blew my mind when I first learned about it, and is in my opinion a very impressive testament to both the power and ease of use of Zig's approach to compile-time computation.)
sekao
I didn't even know about it until today, very impressive. Generic SoA in less than 500 lines of userland code...
https://github.com/ziglang/zig/blob/master/lib/std/multi_arr...
jcelerier
SoA in C++ isn't that long, here's my version (which for sure does not have the utility functions defined here though, but is useful enough for me): https://github.com/celtera/ahsohtoa/blob/main/include/ahsoht...
labrador
> Zig produces smaller, faster binaries than C++
This seems fairly significant to me and makes me wonder how this is possible with all the effort that has gone into optimizing C++
rk06
Because they can't do breaking changes and must live with legacy baggage.
While zig can work use modern techniques without such issues
labrador
This is wrong because you can strip out C++ code you don't need. Green code doesn't need to worry about legacy baggage
kaba0
A smaller binary will not necessarily be faster - specialization can bloat code size, but having a specific version for a given subtype can be much faster. It’s a tradeoff.
kristoff_it
right after that sentence there's a link to a talk that explains one way this is true
labrador
We can hardly be expected to watch a 46 minute video on "A Practical Guide to Applying Data-Oriented Design" to find our answer. It seems unpromising on the surface.
kuon
Congratulations. This is an important milestone.
childintime
For Andrew it must feel he's now on the home stretch towards 1.0, finally!
ksec
In a previous presentation I think he mentions they are still aiming at something like 2025.
lewurm
Thanks for the context!
What's the difference between the stage2 and stage3 binary? Does stage1 produce different binaries for the same input compared to stage2/stage3?
amaranth
Ideally there should be no difference and building stage 3 is basically a sanity check to ensure the compiler is working correctly.
AndyKelley
Not quite - what you said is true for a hypothetical "stage4" however there is a distinct difference between stage2 and stage3. While they are built from the same source code, and therefore have the same logic, they are lowered by different backends, meaning they will have potentially drastically different performance characteristics depending on the differences between the stage1 and stage2 backend, respectively.
christophilus
> Memory usage is improved by a factor of about 3x. For Zig, building itself went from using 9.1 GiB to 2.7 GiB.
Pretty impressive.
dralley
Andrew did a nice talk on the kinds of modifications that were necessary to get those improvements https://vimeo.com/649009599
dom96
Not bad. For comparison Nim builds itself using just 668MiB.
AndyKelley
Nice work!
I suspect the difference mainly comes down to the fact that Zig compiles everything into a single compilation unit. As you can see, this has tradeoffs. It produces better code (single-compilation-unit is what "LTO" approximates) but it requires more memory and more code needs to be rebuilt on changes.
undefined
mhd
How many processes does this involve, i.e. `make -jX` for which X?
tomcam
That is stunning
rnhmjoj
I never understood why "self-hosting" is considered a good thing. Sure, the compiler developers can write in their favorite language and I guess it means the language is stable enough to be used in a complex project; however requiring an (older) compiler to build the compiler seems a significant complication for software distributions.
You either have to set up an ever increasing chain of compilers, as the complexity of the language and the features required to build the compiler grows, or rely on pre-existing binaries. Either way, it seems like a nightmare compared to keeping it in plain old C and building with any compiler of your choice. Just Imagine if any project did this, like, building Firefox now requires an existing Firefox installation.
hardwaregeek
I believe Andy noted that it's way easier to find contributors for the self hosted compiler. If someone is passionate about the language, they want to write the language, not C++. And specifically for languages like Rust and Zig, the explicit orthodoxy is that the language is a better successor to C++/C. If you're spending your time telling people that they shouldn't write C++/C and should instead write your language, you probably should put your money where your mouth is and switch the compiler to your own language. Note that this is significantly less common with high level languages. Ruby, Python, even Swift are not bootstrapped. That's because none of these are claiming to take over the systems software niche and therefore don't need to prove anything by bootstrapping.
v3ss0n
PyPy is written in pure python and transpiled to C++
ferdowsi
Pypy is mostly written in a restricted dialect of Python (rpython), not pure python. They're substantively different.
pjmlp
Better educate yourself on Swift, not only does Apple clearly assert on their documentation that Swift is their replacement for C, C++ and Objective-C, they have started the planning efforts to bootsptrap it.
Ruby and Python are scripting language, as such seldom bootstraped.
AndyKelley
There's a third option that you missed which is what Zig does. It has a "bootstrap" compiler, written in C, which is kept in sync with the self-hosted compiler. Features are implemented twice, once in the bootstrap compiler, once in the self-hosted compiler. So the build process involves a fixed set of steps - stage1, stage2, stage3 - and the chain never grows more than this. No pre-existing binaries are required.
jessermeyer
> No pre-existing binaries are required.
Except for of course a c compiler.
nyberg
Which is something mescc and friends solve. Zig doesn't need to solve the full bootstrap from nothing chain, just the entry.
pjmlp
Currently, alternatively they could cross-compile.
erichocean
Which is…self-hosted.
undefined
nicoburns
> Either way, it seems like a nightmare compared to keeping it in plain old C and building with any compiler of your choice. Just Imagine if any project did this, like, building Firefox now requires an existing Firefox installation.
But this is exactly the same for C! Building C compiler requires an existing C compiler. C compilers are admittedly more widely available at the moment, but that probably won't be the case forever. Languages like zig aim to supplant C. How are they supposed to do that when they're directly dependent on C?
WalterBright
Because a large motivation of developing a new language is wanting to program in that new language. Being forced to keep programming in C can become very frustrating.
tialaramex
Although if you're Jonathan Blow you can stream yourself writing C++ and ranting about how terrible C++ is, while implementing your new Jai language.
[I watched a few hours of Jon doing this, but annoyingly I can't find a recording of Jon actually finding and fixing the subtle Heisenbug he realises during those hours must be in his Jai code - he resolves to investigate "later" and I can't find "later". I think this would be insightful to watch because this is exactly the sort of bug Jon says he doesn't have much trouble with in his C++ and so doesn't need to prevent in Jai... so if it's five minutes to fix that shows Jon is correct, if it's hours of difficult searching or eventually was resolved by just unrelated changes to the code not so much]
WalterBright
It was quite a pleasure for me when the D compiler was converted 100% into D. I immediately set about refactoring it (a process that continues today).
One consequence is that pointer bugs in the compiler have become extremely rare.
easeout
If Zig is built on a C compiler, and (I assume) the C compiler self-hosts, then if Zig starts to self-host, it doesn't _introduce_ the dependency on a chain of previously built binaries, it just makes it a step more severe while removing a dependency.
Would you prefer if Zig not only didn't self-host, but didn't have any self-hosting dependencies? For your ideal compiler, would a full build from scratch require making a platform-specific assembler from machine code, then a few stages of successively higher-level languages?
pjmlp
That would be a possibility, another one is to cross-compile.
nindalf
You don’t know if the product you’re building is any good unless you’re using it yourself.
hypertele-Xii
Also known as dogfooding.
pjmlp
Once upon a time C was bootstrapped as well.
A great outcome of bootstrapped compilers is that most contributors can use the language they known instead of two, and better it kills the myth that C is the only game in town for writing compilers.
chrisseaton
> it kills the myth that C is the only game in town for writing compilers
A compiler is a pretty pure, high-level operation. Makes no sense to use a low-level language for it.
pjmlp
Agreed, but that is how things go, bad teachers that propagate the myth of UNIX being the genesis, language runtimes that happen to use C mostly of convenience than anything else, and around goes the myth.
Hence why stuff like MaximeVM, Jikes and now Graal are so relevant.
0x0203
I understand that C is the same way and that there are benefits for language developers in using their own language, so I can't complain too loudly about languages that do this, but I can attest to the "nightmare" that this creates. I have personally ported java to an operating system that didn't previously have a java compiler. A colleague recently ported rust. Neither had usable cross-compilers. Both require a compiler of their own language to build their own compilers. It's a _giant_ pain in the neck.
I wish that all language devs would provide a way of compiling the build tools with something that nearly everybody already has, like C or C++. Having language support baked into a commonly used compiler like gcc is also nice. I think there are good efforts to get a rust compiler into gcc, and I believe it already works for Go and D and a few others. Such things make adoption outside the standard Windows/Linux world much, much easier.
defen
I believe the long-term goal is to implement a C backend for the compiler (compile Zig code to C). Then compile the Zig compiler (which is written in Zig) to C, so you can compile that on your target system which only has a C compiler. Then you can use that to rebuild the native Zig compiler on that system.
LegionMammal978
Since February, the mrustc project [0] has been capable of bootstrapping a relatively-recent Rust compiler from C++. (It currently takes another 8 Rust-to-Rust steps to reach the latest stable version.) I've personally tested it on my x86_64-linux-gnu machine, but I don't know how tough it would be to get it to support new targets.
easeout
Zig in particular stands out for having strong support for cross-compiling. I wonder how much simpler that makes adding a new platform to a self-hosted language, and what challenges persist?
0x0203
I actually wouldn't mind porting Zig to our OS; right now it seems it would be pretty easy, and looks like they're going to make the bootstrapping process pretty straight forward in the future as well. I just wish other languages would do the same... Unfortunately for me, we don't have any users that need Zig right now and all my other side-projects are higher priority. I'll probably give it a try eventually though.
int_19h
In case of Java, wouldn't you only need to bootstrap JVM? Bytecode for the initial compiler can be compiled on another platform and then brought over.
endgame
Very exciting. Are there plans to maintain the bootstrap path over the longer term? It always makes me sad when I hear about compilers which must bootstrap from magic binary blobs.
Schroedingersat
I believe the goal is to replace the C++ with a C bootstrap that is initially auto generated from the zig code by zig but manually cleaned up and maintained to match
pabs3
Starting from auto-generated code isn't considered a proper bootstrap process by the Bootstrappable Builds project. You need to be able to start without any binaries or auto-generated code from the project itself. Usually that would mean starting with a basic implementation in another language, but starting with an older version of the same project that was written in another language and then going through several older versions of different milestones is often easier than reimplementing the language from scratch in another language.
Kamq
> Starting from auto-generated code isn't considered a proper bootstrap process by the Bootstrappable Builds project.
Is there a reason we should weigh the Bootstrappable Builds project's opinion highly here?
There's nothing on their benefits page that is subverted by an auto-generated compiler that's hand maintained.
Actually, there's no mention of auto-generated code in their best-practices section. Do you have a link to where they even say this?
pjmlp
I am quite sure bootstraping as concept, precedes by several decades that website.
Schroedingersat
Please read comments before responding
edflsafoiewq
Why?
hiccuphippo
Probably because they'll start adding features to the Zig version that might not be trivially portable to the C++ version due to architectural differences.
Gibbon1
I think because Zig compilers can compile C but not C++.
anonymoushn
I hope so, but it seems like even if they didn't maintain it you could build today's stage1 using LLVM 14 and then work from the resulting binary instead of a binary you downloaded from online.
hiccuphippo
They have a project for maintaining this: https://github.com/ziglang/zig-bootstrap
boomer918
C++ stage1 compiler is a magic binary blob :)
movq
It's possible to bootstrap GCC starting from only a 357-byte binary seed: https://github.com/fosslinux/live-bootstrap
solarkraft
This work is impressive, but ... that's still a binary blob, and isn't the system it has to be run on kind of one as well?
fezfight
New programming languages are so much fun. I think it maybe scratches the same itch as home renovation.
It also makes me nostalgic for the first time I compiled a program (on the c64).
overflyer
Oh boi, oh boi, oh boi, oh boi, this is a historic moment. I think at the current rate I will start focusing my energy on learning and using Zig when version 0.12.0 is out :)
Awesome work guys. I am so looking forward to this language!
anonymoushn
You can use it today!
badpun
Is there any IDE support? Any debugger?
flohofwoe
VSCode works well. There's a handful plugins which integrate the Zig language server, and any gdb/lldb frontend plugin works for debugging (not sure about the msvc debugger that's used on Windows in the MS C/C++ extension).
badpun
I found this IntelliJ IDEA plugin, the list of features looks pretty good already: https://plugins.jetbrains.com/plugin/10560-zig
woggy
I use VSCode with the C/C++ extension on Windows, the debugger works great.
hiccuphippo
There's a Language Server: https://github.com/zigtools/zls
anonymoushn
There's a language server. It seems to work pretty well.
_trackno5
any debugger support DWARF should work just fine.
Also imagine it's the same on Windows. Anything that can manage a PDB file should work
NeutralForest
Congrats to the team, that's a big milestone!
SLWW
Congratulations Zig! That's a pretty monumental moment.
Can't wait to see what comes down the pipeline for such a fantastic language!
selfhosted
Congrats to the zig team! This is a big milestone and they must have put an enormous amount of work into making it happen.
At the same time, the reason this milestone seems so significant is because the language is already quite complex. It seems likely to get even more complex over time. Some metrics of complexity that jump out are the amount of compiler source code, compiler compilation time and required resources (cpu/memory). There are other languages that have simpler and more efficient bootstrap compilers.
Ideally, a self hosting compiler contains only the minimum amount of information that is strictly necessary to compile itself. It is possible to write one for a simple, imperative, javascript style language in at most a few thousand lines of code in pretty much any reasonable turing complete language (something like lisp with barely any syntax can be done even more succinctly). Because the simple language is so small, the implementation can also be written in a high level language that targets that same high level language, i.e. the bootstrap compiler could be a transpiler to javascript written in javascript. There is little point in compiling to native assembly during the bootstrap process because the language is so small that a good javascript implementation should be capable of compiling the bootstrap source in under 200ms on a decent laptop. Once the compiler can compile itself, then one can add more features to it, such as a more robust type system, a native backend or automatic memory management without gc. A positive feedback loop emerges because these features are implemented as optimizations to the compiler itself. It gets faster as more features are added to it but it is always fast because it was fast from the beginning. With good benchmarking in place, it should never get slower at compiling itself.
Such a compiler/language would be very minimal by design but a more batteries included language can be built on top of it, much like how an os kernel is extended by user space programs.
A historical example of this kind of self-hosting compiler is Forth. It is easy to bootstrap (see e.g. JonesForth). Derived programs are written by extending the compiler with a new vocabulary specific to the problem at hand. Development is often done in a REPL environment for fast feedback. REPL sessions can be saved as source files once the program works as expected. Forth syntax is unfortunately inscrutable to most and the stack based design is not great for every problem so I wouldn't recommend actually using it, but it contains important ideas that can be adopted into more modern languages.
What the zig team has done strikes me as very, very difficult, so I tip my cap for the effort it must have required. In the long run though, it feels almost inevitable that a simpler language with the more desirable high level properties described above will eat its lunch. As painful as it would be, I believe the best thing the zig team could do to ensure the language's long term survival would be to completely rewrite the language from scratch (zag?) using the knowledge that they've gained during their initial bootstrap process to distill zig to its essence.
dang
We changed the URL from https://github.com/ziglang/zig/issues/89.
BratishkaErik
Originally it was meant to be https://github.com/ziglang/zig/issues/89#issuecomment-122118..., but somehow last aprt was stripped. But that doesn't matter, thank you so much! New URL is definitely better.
Get the top HN stories in your inbox every day.
To clarify what this means, since things can get confusing with subtle differences in wordings:
* stage1 is when zig is built from C++ code[1] (the "bootstrap" compiler) using system C/C++ compiler toolchain.
* stage2 is when zig is built from Zig code[2] using stage1.
* stage3 is when zig is rebuilt from the same Zig code[2] using stage2.
Before today, zig would give you stage1 by default, and you could opt in to stage2 using `-fno-stage1`. After today, zig gives you stage3 by default, and you can opt in to stage1 using `-fstage1`.
In all three cases, LLVM is being used. Although Zig has started to fully self-host by providing backends that have no dependency on LLVM, none of these fully self-hosted backends are complete.
Why bother self-hosting? Because:
* Zig produces smaller, faster binaries than C++ that use less memory[4]. Case in point: the new self-hosted compiler is 1.5x faster than the C++ implementation and uses 3x less peak RAM.
* Development velocity in Zig is much faster than C++, and debugging is a breeze in comparison due to Zig being much safer than C++ and having better debug tooling. Similarly, comptime features let us add more assertions and debug checks that are not possible in C++.
* Zig compiles much faster than C++.
* Zig cross compiles better than C++ making it easier to create builds of the compiler for every target.
* The self-hosted compiler does not need a softfloat library dependency to support f16 or f128 operations.
* Zig's std lib data structures are incredibly useful, especially compared to the C++ STL.
There are still many known bugs; not every project will be able to upgrade immediately. See the full upgrade guide [3] for help deciding when and how to upgrade.
[1]: https://github.com/ziglang/zig/tree/master/src/stage1
[2]: https://github.com/ziglang/zig/tree/master/src
[3]: https://github.com/ziglang/zig/wiki/Self-Hosted-Compiler-Upg...
[4]: https://media.handmade-seattle.com/practical-data-oriented-d...