Zig, parser combinators, and why they're awesome

devlog.hexops.com

Daily Digest email

Get the top HN stories in your inbox every day.

lhorie

Interesting read, especially as someone poking w/ writing a parser in zig for fun :)

One area of improvement worth mentioning in this area is that currently zig errors cannot be tagged with extra information, which tends to be important for surfacing things like line numbers in parsing errors. There was a proposal[0] to improve this and it's not the end of the world for me anyways (as a workaround, I'm toying with returning error tokens similar to what treesitter does)

On a different note, there's another cool zig thing I found recently that is mildly related to parsing: a common thing in parsers is mapping a parsed string to some pre-specified token type (e.g. the string `"if"` maps to an enum value `.if` or some such so you can later pattern match tokens efficiently). The normal way to avoid O(n) linear search over the keyword space is to use a hashmap (naively, one would use a runtime std.StringHashMap in zig). But I found an article from Andrew[0] about a comptime hashmap where a perfect hashing strategy is computed at comptime since we already know the search space ahead of time! Really neat stuff.

[0] https://github.com/ziglang/zig/issues/2647

[1] https://andrewkelley.me/post/string-matching-comptime-perfec...

kristoff_it

The comptime switch idea has been expanded into a full fledged implementation in the standard library!

std.ComptimeStringMap

https://github.com/ziglang/zig/blob/master/lib/std/comptime_...

lhorie

Oh, very cool, I somehow missed that! Thanks!

ifreund

The zig standard library also has a ComptimeStringMap type for this use case which is used by the self hosted tokenizer for example.

https://github.com/ziglang/zig/blob/master/lib/std/comptime_...

kristoff_it

Beat you by 3 full minutes :P

anderspitman

I'd be curious to hear your thoughts on Zig so far. I have a lot of respect for your design taste based on mithril.js, particularly when it comes to tradeoffs between functionality and simplicity.

lhorie

I'll get the bads out of the way first: there are areas where the language isn't quite there yet (e.g. the error thing I mentioned) I ran into an issue where you can't do async recursive trampolines yet (think implementing client-side http redirect handling in terms of a recursive call).

The io_mode global switch plus colorblind async combo is something I'm a bit wary of since it's a fairly novel/unproven approach and there are meaningful distinctions between the modes (e.g. whether naked recursion is a compile error).

Another big thing (for me, as a web person) is lack of https in stdlib, which means realistically that you'd have to setup cross compilation of C source for curl or bearssl or whatever. There's a project called iguana-tls making strides in this area though.

With all that said, there's a pretty high ratio of "huh, that's a cool approach" moments. There are neat data structures that take advantage of comptime elegantly. There's a thing called copy ellision to avoid returning pointers. The noreturn expression type lets you write expressive control flows such as early returning a optional none (called null in zig lingo) from the middle of a switch expression. Catch and its cousin orelse feel like golang error tuples done right. Treeshaking granularity is insanely good ("Methods" are treeshaken by default; so are enums' string representations, etc). The lead dev has a strong YAGNI philosophy wrt language features, which is something I really value.

Overall there's a lot of decisions that gel with my preferences for what an ideal language should do (as well as what it should avoid)

e12e

> Another big thing (for me, as a web person) is lack of https in stdlib, which means realistically that you'd have to setup cross compilation of C source for curl or bearssl or whatever.

But that should be very easy in zig, as zig can compile c?

https://andrewkelley.me/post/zig-cc-powerful-drop-in-replace...

anderspitman

Awesome, thanks!

slimsag

Super interesting, thanks for sharing! I'd be curious to learn more about how you workaround the lack of extra info in error types in practice? Are you just returning e.g. a struct with additional info?

lhorie

Yes, e.g. `const Token = struct { kind: Kind, otherStuff: ... }`, where Kind is an enum where one of the values is Kind.error. Then since switch is exhaustive, you can just pattern match on kind as you iterate over tokens to handle the error case at whatever syntactic context is most appropriate.

The nice side-effect about this approach is that rather than following the standard error flow of bailing early and unwinding stack, you can keep parsing and collecting errors as you go.

eperdew

There are some features exclusive to errors though (errdefer, stack traces, implicit error unions). Did you find yourself missing any of these by doing it this way? I'm partially asking because I was just making this decision the other day, and I went with errors for now.

breck

I absolutely love parser combinators, and years ago I did a huge rewrite of my compiler compiler so that it'd have the easiest possible parser combinators: simple string concatenation. grammarA + grammarB = grammarC. You can play with that here (https://jtree.treenotation.org/designer/). You still need to make a line change or two (good design after all, requires some integration), but it just works. Haven't seen anything beat it and not sure if that's possible. (Note: I do apologize for the crappy UX, and it's high on the priority list to give the app a refresh)

adsweedler

Awesome app. Do you plan on using it for anything in particular? Or are you just creating it as a passion project. It's totally cool.

***

Learning about https://treenotation.org/ (linking this for other people, not for you, Breck :P), and I like what I see. My first impression was: "tree notation is like lisp, but with python indenting".

But then looking at it more and I see it's much more like YAML or CSV or whatever. But then I read:

> We no longer need to store our data in error prone CSV, XML, or JSON. Tree Notation gives us a simpler, more powerful encoding for data with lots of new advanced features

And I didn't understand! Tree notation seems equivalent to these. Like at a certain level, it's all just data. Now, the major benefit is that you're supposed to think differently about what you're doing when using tree notation. Would love to hear your opinion about this conjecture.

***

Not getting into the beautiful and amazing work that's been done WITH tree notation (yet), that's a whole other conversation!

breck

> Do you plan on using it for anything in particular?

It's hard to believe, but I use that thing everyday, haha. To me every problem is a nail solvable with my DSL hammer.

> are you just creating it as a passion project.

I want other people to use it, but it took a while for me to be truly confident that it will work and the underlying math is sound (that 2-d languages are better than 1-d langs). So just in the past week formed a corporation (https://publicdomaincompany.com/) and growing the team and are actually going to try and make a good user experience and help people use these technologies to solve their problems.

> Would love to hear your opinion about this conjecture.

I always think of code now in 2 and 3 dimensions. To me after many years this is just second nature but it's not an obvious thing and not sure I've ever met anyone else that does this. I gave an early talk about it in 2017 (https://www.youtube.com/watch?v=ldVtDlbOUMA) at ForwardJS (there should be an actual recording out there on the web somewhere but I can't find it).

Traditionally all programming languages are 1-D, read by a single read head that moves linearly from start to finish (with some backtracking, but always just along 1 axis in order), building up an AST and then going from there. There's no reason it has to be that way, it's just the way things developed with our 1-D register machines.

Human beings do not process language this way at all. Not even close. If you have a physical newspaper by you, pick it up and pay attention to the way your eyes parse things. You'll likely notice a random access pattern, with your eyes moving constantly across both the x and y axis, and you parsing the semantics by things like position, font size, layout, etc. To me it's so obvious that this is the way computer languages should work. Their shouldn't be lots of physical transformations of the code, using cryptic syntax characters like ( and " and [ and { as hacks to provide instructions to the parser. Code should be written in accordance with a strict grammar, yes, but it should also be written in way pleasing to the human eye and the way human brains work.

We should make human languages that machines can understand instead of making machine languages that humans can understand.

Anyway, I'm rambling on and on, but this is the bigger idea than simply the tree notation implementation, which is really just a subset of a whole new world of possibilities in 2d and 3d languages. (here's another recent one showcasing the new possibilities: https://www.youtube.com/watch?t=145&v=vn2aJA5ANUc&feature=yo... — when I write Tree Code I see spreadsheets and vice versa)

Also, when I play with legos now I see code, and when I write code I see legos. That's a hard thing to communicate and an early tool I wrote called Ohayo shows it off a bit, but need to write a single function that takes a tree program and spits out a lego vis (something like LDraw), and maybe that will help explain the idea https://github.com/treenotation/research/issues/33

Jakobeha

I see a lot of articles posted on HN about Zig. What's so special about Zig, and how does it fill its own niche with other languages like Go and Rust?

lhorie

As far as I know, zig is the only language that is able to output binaries on par in size w/ C (a hello world in zig is about the same size as a C one, whereas most other languages can only manage at a minimum an order of magnitude bigger binaries, sometimes several orders of magnitude). Zig interops cleanly w/ C ABI and C toolchains, can also cross compile AND can cross compile C proper. You can even drop all the way down to asm (standard C technically doesn't support this).

I often refer to zig as a "very sharp knife": it's "cool" for new languages to have more safeguards to protect you from yourself, but Zig feels a bit like it goes in opposite direction in the sense that it exposes the underlying plumbing more than most languages. For example, Go and Rust memory allocations and memory layout are fairly opaque; in zig you can control it idiomatically with obsessive precision.

But unlike C, zig offers a host of safety features, like integer overflow checks, compiler-checked optionals and exhaustive switch, as well as a well behaved compile-time system, and a bunch of syntactical sugar ("method" syntax, if/while/block/return expressions, etc).

And unlike C++, zig is a very small language.

pcwalton

It's possible to make very small Rust binaries: https://github.com/johnthagen/min-sized-rust 8kB for that Rust example, compared to 16kB for my optimized C hello world compiled with GCC. (Not that I think it makes a difference to anything.)

Furthermore, how are Rust memory allocations and layout "fairly opaque"?

lhorie

For example, usually it's not obvious when memory is being freed. I recall a thread about esbuild where the author was saying they had better GC perf from go due to GC happening in a thread whereas in Rust it didn't and it was not clear how to make that happen or if it was even possible.

In zig, you can use an arena allocator, you can free things piecemeal while using an arena, you can free the arena on program exit or in a thread, you can enforce a stack allocator, have multiple allocators, etc and because of the orthogonality of the Allocator interface, most of this knowledge is something you can probably pick up within the first 2 hours of zig (assuming you know C).

Re: edit about rust binary size. When I say zig is on par w/ C in terms of binary size, I mean I get the small binary with the basic `zig build-exe hello.zig -O ReleaseFast --strip` command, without any attempts at optimization. Do you know if this 8kb number includes the UPX optimization? If so, I'm calling shenanigans :)

AndyKelley

for zig it's 1.8 kB on x86_64-linux with minimal fiddling. i386-windows comes out to 2.5kB.

    $ cat hello.zig
    const std = @import("std");
    pub fn main() void {
          std.io.getStdOut().writeAll("Hello, World!\n") catch {};
    }
    $ zig build-exe hello.zig --strip -OReleaseSmall
    $ zig build-exe hello.zig --strip -OReleaseSmall -target i386-windows
    $ ldd hello
     not a dynamic executable
    $ strip hello
    $ ./hello
    Hello, World!
    $ wine hello.exe
    Hello, World!
    $ ls -hl hello hello.exe
    -rwxr-xr-x 1 andy users 1.8K Mar 10 21:15 hello
    -rwxr-xr-x 1 andy users 2.5K Mar 10 21:15 hello.exe

No min-sized-zig github repository needed.

steveklabnik

I work in embedded these days, commented on this topic on Reddit a few days ago: https://www.reddit.com/r/rust/comments/m0irjk/opinions_on_ru...

anderspitman

Not parent, but from my perspective implicit RAII might fall into this category, though I suppose that's specific to deallocation. Zig also requires you to pass an allocator to any function that requires dynamic allocation, which while this could be considered a bit extreme, is certainly less "opaque".

FlashBlaze

Would very much like to read your thoughts on V[0]

[0] https://vlang.io/

lhorie

Honestly, I was put off by the early marketing hullabaloo, so I haven't looked closely. What's most problematic in my mind is that from what I've seen, I don't feel that the V lead is as forthcoming about issues and limitations as other project leads are. I'd much rather talk to a project lead that will give it to me straight if their stuff doesn't work.

From a technical perspective, my understanding is that V similar to Nim, in the sense that it compiles to C source code. That's fine and all, but IMHO, zig has a huge leg up in this area because it does first class cross compilation of C source code to binary form. To my knowledge, no other tool does this (other than maybe cosmopolitan, if you ignore everything outside posix...)

If I were to use V, I'd probably end up using `zig cc` as the `cc` for it

MaxBarraclough

How does V handle reference cycles?

> Most objects (~90-100%) are freed by V's autofree engine: the compiler inserts necessary free calls automatically during compilation. Remaining small percentage of objects is freed via reference counting.

> The developer doesn't need to change anything in their code. "It just works", like in Python, Go, or Java, except there's no heavy GC tracing everything or expensive RC for each object.

So how does V handle reference cycles? A partial solution based on static-analysis (autofree) isn't going to catch all reference cycles, by definition, and automatic reference-counting won't catch reference-cycles either.

I have to agree with lhorie's comment: I get the impression I'm only getting half the story.

The Nim folks also often seem to skim over the question of reference cycles.

abnercoimbre

If you can spare 60-90 minutes, I hosted [0] a conversation with the creator of Zig, as well as Odin, plus a former co-worker who's a compiler engineer.

They're all vibing in the same space that try to fill the niche we believe is missing.

[0] https://media.handmade-seattle.com/the-race-to-replace-c-and...

kristoff_it

That's a great window on programming languages that exist in the systems programming space but that are not Rust, C or C++.

tiddles

Thank you for doing this!

I've listened to this and a few other episodes with Andrew and Ginger Bill, and they're very interesting, I highly recommend them.

anderspitman

I've written a decent amount of Rust and Go. The reason Zig is on my watch list is because I see it as a potentially better (ie faster) Go, or C+=1. Go handles most of my needs, but the way I write it tends to involve a decent amount of manual resource management (mostly mutexes), so I don't see the manual memory management in Zig as a big downside.

Rust is a great language, and I love it, but it's hard for me. Writing in Rust reminds me a lot of test-driven development. It gives me such a great feeling of safety and control over your process, but at the end of the day it can be very tedious. The real killer is when I'm trying to prototype. If I don't already know what my interfaces are going to look like, Rust really slows me down. Compile times don't help here either.

If I were implementing a well-known protocol and had a general idea of how to architect it, or just really needed it to be rock solid, I would strongly consider Rust first. I've been working on a lot of protocol design the last couple years so it's been more prototype-heavy.

Note that most of my Rust experience involved pre-async/await asynchronous networking code. I'm sure it would be a better experience for me now. I should also note that programming in Rust yields some very magical moments, such as parallelizing loops by changing a single line. It's a special language, and not going anywhere. I hope to find reasons to use it again in the future.

EDIT: Oh, another place Zig may have a big advantage over Go is binary sizes, particularly for things like WebAssembly.

volta83

> The real killer is when I'm trying to prototype. If I don't already know what my interfaces are going to look like, Rust really slows me down.

This requires practice, but given the choice, competitive programmer teams picking Rust do quite well in competitions (ICFP has been won by teams using Rust 3 years in a row, and in the last 3 years, two teams in each year's the top 3 used Rust): https://www.reddit.com/r/rust/comments/ctzmo2/icfp_2019_prog...

There are teaching materials for competitive programming with Rust, you might want to check those. I think they are a good way to learn how to "prototype" in Rust.

It's different enough from other languages that this is a skill that must be actively learned :/

anderspitman

I have no doubt it can be learned, and I've tried to get over the hump multiple times, but still don't feel as comfortable with Rust as I do with every other language I've worked in, even those I've only spent a few hours in.

It's all about tradeoffs, and at a certain point you have to question whether the academic assurances offered by Rust are worth the complexity, given the problem at hand. As I said, it's obviously worth it for some problems, but I don't think it is for all problems. Which is great! I think the world would be boring if there was one language to rule them all.

pron

Zig, like C, C++, Ada, and Rust -- but very much unlike Go -- is a low-level programming language, i.e. one that gives you virtually full control over implementation details such as function dispatch and memory management. I would say that Zig is the first (more-or-less) known low-level language that is revolutionary in its design. Rust adopts the C++ (and also, arguably, Ada) philosophy of a low-level language that attempts to appear high-level, and ends up being very complex. I would say that C++, Ada, and Rust are all easily among the top five most complex programming languages in software history. On the other hand, you have C, which is "simple" in the sense that it has few features, but is extremely unsafe.

Zig offers a completely new beast, and I'd say a vision for how low-level programming can be done. It is a very simple language -- a C/C++/Rust programmer can probably fully learn it in a day or two -- and yet it is about as expressive as C++/Rust, and much safer than C++. It does that with a single general partial evaluation feature called "comptime" that replaces, rather than augments, generics, traits, macros, while maintaining their capabilities but being arguably simpler than all of them.

Like Rust, Zig places a very high emphasis on correctness, but its approach is different. While safe Rust eliminates all undefined behaviour, safe Zig eliminates many/most, leaving others up to detection via simple testing. What Zig lacks in sound guarantees, it makes up for in being easy to understand, analyse and test.

Zig also has an exceptionally good tooling story around incremental compilation and cross compilation.

In short, I would say Zig offers a completely novel approach to low-level programming that would appeal to those who value a simple language. While Zig and Rust do target a similar niche, I believe they'd attract programmers with very different aesthetic preferences.

logicchains

For ultra-low-latency programming, you generally want to avoid dynamic memory allocation on the hot path. Instead it's common to use a slab allocator, from memory on the stack or that was dynamically allocated on startup, to continuously reuse the same memory. E.g. when a web request comes in, allocate all temporary per-request state from a single bump allocator, then after the request has finished just "free" everything by setting the "allocateFromHereNext" pointer back to the start of the allocator's buffer. This is not only the fastest means of allocation/deallocation (just bump a pointer), but also means that new requests can reuse the same memory addresses (the allocator's internal buffer), achieving better cache locality as the memory's already in cache.

Zig is ideal for this as all stdlib functions that allocate take an allocator as a parameter (so there's no hidden allocation), and the stdlib provides a bunch of allocators. This is even better than C++, where some functions like std::stable_sort unavoidably allocate memory. Rust doesn't have much support for custom allocators in the stdlib although it's being worked on, but it's more difficult to implement in Rust as the compiler must be able to ensure that the allocator (or at least its buffer) outlives all the things being allocated from it.

nmfisher

I'd be interested in "Why Zig over Rust?".

Having dabbled with Rust, it seems to be the perfect solution/replacement for C/C++. I'm not clear why I should bother with an alternative.

socialdemocrat

I have spent some time with Rust and written some non-trivial Zig code (a simpler assembler). My simple take-away is that Rust is simply too complicated.

I spend some time on other technologies on occasion. I could get into Zig fairly quickly and do some useful stuff with it.

With Rust it felt like I had to really decide on some significant time investment to be able to get anything done. It really reminds me a lot of Haskell. Those kinds of pure, elegant languages which are awesome once you grok them, but which is never really going to be used apart from in a small niche because they are too hard to learn from an average developer with limited time on his hands.

Rust and Haskell is more like the languages I looked for when I was younger. When I had these utopian visions of THE best programming language. I have long lost any belief in that. I do think strong type systems are helpful, but I also think they tend to get overhyped and overrated.

I am with Rob Pike, one of the Go designers on this. Writing correct code is a lot about understanding and being able to reason about that code. A simpler language makes it easier to reason about code and understand it. That makes it more likely that you make the code correct or is able to maintain and fix it.

I do believe this complexity barrier begins at different places for different people. But for me I think Rust is too complex. Knowing myself I would get back to 3-4 weeks old code and wonder what the hell I wrote.

If you get into that situation you are likely to make mistakes. This is what I believe developers often forget when chasing down the BEST tool. That ultimately it is your brain that is supposed to solve problems not the tool. If a tool solves some problems but reduce your brains ability to do its job, then the tool isn't really an aid.

To be honest I am not actually certain if Zig has hit the sweat spot either. Go is quite good. You can pick up quite old Go code and still read it with relative ease.

Zig is definitely not as easy as Go to read. I feel like I have to wait until a 1.0 release before passing judgment. Some of the issues I felt I had with Zig is down to lack of documentation and rough edges in the standard library.

dm3

I really enjoyed your comment. Probably because I've been thinking the same for a long time now. I've had to learn and write a lot of C++ in the past year and the only way to stay sane is to use as small subset of the beast as possible. In many ways Rust is a better C++ - no undefined behaviour, no lvalues/rvalues/xvalues/..., complicated move semantics, tooling, etc. Rust feels like a nice subset of C++ with additional safety features on top.

However, once people reach for C, C++, Rust or Zig - performance is on the line. To push the maximum out of the program we need to do as much compile-time programming as possible. C++ template meta-programming language is Turing-complete but extremely hard to learn and effectively program with. Rust compile-time programming is evolving - two sorts of macros and a Zig-comptime-like `const fn` feature. What I like about Zig's approach is summarized well by pron[0].

[0]: https://news.ycombinator.com/item?id=22112382

xtian

The Zig site has an answer: https://ziglang.org/learn/why_zig_rust_d_cpp/

linkdd

> it seems to be the perfect solution/replacement for C/C++

Many languages tried for 50 years to replace C without success.

While Rust brings many improvements for low-level development on classical architecture, simplifying hard problems (especially memory safety), there is still a HUGE number of platforms (embedded ones mainly?) where C and even Assembly are still relevant.

Believing that any language will burry C/C++ is ignoring what C/C++ are.

cageface

It looks like Zig requires manual memory magagement, right? That seems like a pretty big downside compared to GC or the Rust approach.

diragon

True. That is however countered by a vastly simpler language. Language support for deferred cleanups[0] helps a bit. Also the GeneralPurposeAllocator[1] contains memory safety debugging features that make analyzing such bugs bit nicer.

IMHO it's a significant net gain over something like Rust because of the simplicity for most of the problems out there. I got 99 problems but needing absolute memory safety ain't one.

[0] https://ziglang.org/documentation/master/#defer

[1] https://ziglang.org/learn/samples/#memory-leak-detection

linkdd

That's because the Zig approach is "no hidden control flow".

Everything in Zig is explicit, this is not a downside, this is a feature of the language.

Choose the tool that best fits your needs.

efiecho

The ubiquitous SJW bullshit in the Rust community made me run for the hills.

http://www.paulgraham.com/say.html

dunefox

Like what?

enriquto

> Rust, it seems to be the perfect solution/replacement for C/C++

If you are the sort of person who writes "C/C++" you'll have some trouble understanding the purpose of Zig... maybe start by grokking why the expression "C/C++" is nonsensical?

flohofwoe

Argh I'm so tired of this "technicality". Yes, technically C and C++ are different languages, and C is not a subset of C++ (which is a pretty big problem btw). But C++ has been derived from C, then forked its C subset into an incompatible dialect, while not fixing any of C's problems (which would've been an actually good reason to not call it "C/C++"). I think it's fair to call C++ "C/C++", because it will always be a confusing mishmash of C and the features that C++ added on top.

Sorry about the rant.

thraway123412

Look bro I like Haskell so Haskell is a perfect replacement for Scheme/Java.

fulafel

Zig is much easier to learn and simpler and has much of the safety when compared to the Rust-C spectrum. It also seems to be fun to write for lots of people. The metaprogramming system is also first rate and easy to understand.

diragon

It's pretty dreamy for us who wanted Rust to become "better C" but instead it became "better C++". Granted, that's what it was trying to become from the get-go, I just got confused and hopeful.

kristoff_it

Another interesting parser combinator written in Zig is Mecha.

https://github.com/Hejsil/mecha

slimsag

(author here) Mecha looks really nice!

One difference that makes Mecha's look really nice/simple is they are being built at compile time, whereas the ones I outline in this blog post are being built at runtime (with the intention of later building parsers at runtime _based on other parsers output_.

Or, at least that's what I gleaned from a quick skim of Mecha. Very excited to play around with it soon!

kristoff_it

You got it right! I think it's nice to be able to contrast the two implementations :)

ncmncm

Zig is more capable than I thought.

It looks like there are definite plans to make it moreso, and more interesting.

kristoff_it

The "more ergonomic C" crowd is missing some important nuance.

andrepd

Yep. It can be "C but good" or it can be "a language with all these fancy features". It can't be both at the same time.

dralley

Zig is pretty close to being both at once, simply because they've found clever ways of combining existing features to avoid needing entirely separate features to accomplish the same goals.

e.g.

They avoid the macro system by leaning hard on compile-time evaluation, leveraging dead code elimination to replace preprocessor/macro-driven conditional compilation.

They get generics in a similar way, by letting compile-time functions construct types.

Varargs are implemented with anonymous structs, and modules are also secretly just structs, which makes interop with C much simpler.

It's actually deeply impressive how far they've been able to go with a language that actually has quite a small number of separate constructs, and it doesn't feel hobbled by doing so.

socialdemocrat

This is all about creating orthogonal features. In any system where you are able to create a small number of features which combine in very flexible ways you can achieve a minimalist language together with immense expressiveness.

Zig is in that category. It takes some powerful ideas and combine them in ways that gives Zig a lot of power despite a small feature set.

dnautics

The features are coming out more slowly and it feels like things are asymptotically converging to what will be the language in 1.0

olah_1

I wish someone would teach low level programming from scratch with Zig

AnIdiotOnTheNet

Can you clarify what "low level" means to you?

olah_1

Code that manages and manipulates memory, network stuff, GUI. Basically anything more intensive than client-side JavaScript

thdn

mtzet

While having nice libraries is always great, I'd also like to plug the simple zig std library functions.

I was very pleasantly surprised by how easy was to start parsing simple stuff using std.mem.tokenize() and std.mem.split(). Somehow watching Andrew programming some Advent of Code using these very simple functions made it 'click' for me how parsers are just ordinary programs.

amir734jj

The only parser combinator experience I have is with FParsec in F# and Haskell parsec. I am not sure if non (purely) functional programming language is suitable for parser combinators. Yes, it works but why?

slimsag

Performance. I'm looking to try implementing a regexp-like engine via a runtime parser generator built using parser combinators and _hoping_ to get performance near what optimized regex engines get.

Seems like an interesting way to get nice, performant template parsing without having the performance overhead that higher level languages typically bring.

This is all speculative, though - I haven't actually done it yet, so maybe I'll find its incredibly slow and a bad idea :)

eperdew

I'm not sure if this is done in practice, but in theory, if you compile a regex to a DFA, you can then minimize the DFA. I don't know if you can do anything similar with parser combinators, but if not, that could be a potential performance gap to watch out for.

slimsag

I imagine this works for regex engines like RE2 that don't support backtracking, but would not be possible if supporting more advanced regex features like backtracking?

I definitely have more to learn about automata and FSMs :)

rbonvall

Python has pyparsing: https://pypi.org/project/pyparsing/

Maybe it's because non-FP languages typically don't support operator overloading, so you'd end up with verbose grammars like:

    new Literal("true").or(new Literal("false"))

and then you'll probably use a parser generator like ANTLR.

drhagen

Python is no Scala when it comes to operator overloading, but it has enough to make parser combinators work cleanly. funcparserlib [1] and Parsita [2] are two such libraries.

For example, this does what you probably expect in Parsita:

    real = reg(r'[+-]?\d+\.\d+(e[+-]?\d+)?') | 'nan' | 'inf' > float

(Full disclosure: I wrote Parsita.)

[1]: https://github.com/vlasovskikh/funcparserlib

[2]: https://github.com/drhagen/parsita

Koshkin

So... why are they awesome, again? Is this some kind of functional approach to parsing? (I know that functions compose very well.)

vanderZwan

I was already slightly familiar with parser combinators but still really enjoyed this article to see how one might implement them in Zig, which I've looked at but not written anything in myself.

For another introduction to parser combinators, I can recommend the Parser Combinators From Scratch series on the Low-Level JavaScript YouTube channel. In general he has a lot of really nice videos that are great introductions to low-level programming concepts, especially for people who mainly know JavaScript.

[0] https://www.youtube.com/playlist?list=PLP29wDx6QmW5yfO1LAgO8...

z92

Checked Zig. Doesn't have a memory manager. It's "free all at program termination" or manually manage memory like C.

socialdemocrat

It is much better structured than C. Everything in the standard library that allocated memory takes an allocator as first argument.

And you got cleanup with defer statements. Together this makes the memory management story very different from C.

anderspitman

It does have some nice improvements over C though, such as defer-style deallocation, which can help with many potential bugs.

linkdd

One of the best thing about parser combinators is that I was writing them before knowing what it was.

My first programming language being C[1], all you had was functions and composition, it's the natural pattern to write parsers.

[1] - Well, in truth it was GML from Game Maker 6, but I quickly went to learn C :p

Daily Digest email

Get the top HN stories in your inbox every day.