Zq: An easier and faster alternative to jq

Daily Digest email

Get the top HN stories in your inbox every day.

mccanne

Hi, all. Author here. Thanks for all the great feedback.

I've learned a lot from your comments and pointers.

The Zed project is broader than "a jq alternative" and my bad for trying out this initial positioning. I do know there are a lot of people out there who find jq really confusing, but it's clear if you become an expert, my arguments don't hold water.

We've had great feedback from many of our users who are really productive with the blend of search, analytics, and data discovery in the Zed language, and who find manipulating eclectic data in the ZNG format to be really easy.

Anyway, we'll write more about these other aspects of the Zed project in the coming weeks and months, and in the meantime, if you find any of this intriguing and want to kick the tires, feel free to hop on our slack with questions/feedback or file GitHub issues if you have ideas for improvements or find bugs.

Thanks a million!

https://github.com/brimdata/zed https://www.brimdata.io/join-slack/

preferjq

"cobbled-together" jq as it often appears in the wild will often compare badly with crafted solutions because the writer's goal is usually GSD and not write pretty code.

People with the time and inclination to slow down and think a little more about how the tools work will produce cleaner solutions.

In your example to convert

    {"name":"foo","vals":[1,2,3]}

    {"name":"foo","val":1}
    {"name":"foo","val":2}
    {"name":"foo","val":3}

All you need is this jq filter

    {name:.name, val:.vals[]}

To me this is much better than the proposed zq or jq solution you're using as a basis for comparison. You could almost use the shorter

    .vals = .vals[]

if the name in the output didn't change.

These filters takes advantage of how jq's [] operator converts a single result into separate results. For people new to jq this behavior is often confusing unless they've seen things like Cartesian products.

.[] - https://stedolan.github.io/jq/manual/#Array/ObjectValueItera...

MarkMarine

counter point: I reach for jq probably twice a year. It's a slog every time, but way way less work than diving into the terse syntax and understanding the inner workings of jq. A good abstraction is the border of my understanding, a leaky abstraction means I have to have mastery of the internals to be successful. jq is a leaky abstraction.

hyperpallium2

can also use name instead of name:.name

I think jq is very elegant - genius even - but whenever I use it, I have to look up the docs for syntax. But I guess that's true for any infrequently used tool.

chris37879

This exactly. I think JQ's problem in this regard is further compounded because its query language just doesn't feel like anything else most people have used, I've certainly never come across anything quite like it, anyway.

1vuio0pswjnm7

Thank you for your work on tcpdump, (original) bpf and the pcap library. I benefit from those projects everyday.

ZSON looks way better than JSON. I pray that the Zed project becomes more popular.

mccanne

Wow, thanks.

Coincidentally, after hearing of a friend's woes dealing with massive amounts of CSV coming from a BPF-instrumental kernel, I played around a bit with integrating Zed and BPF. Just an experimental toy (and the repo is already out of date)...

https://github.com/brimdata/zbpf

The nice thing about Zed here is any value can be a group-by key so it's easy, for example, to use kernel stacks (an array of strings) in a grouping aggregate.

(p.s. for the record, the only thing I have to do with the modern linux BPF system is the tiny vestige of origin story it shares with the original work I did in the BSD kernel around 1990)

rienko

Ever since my team started using Splunk (circa 2012), we claimed for a more open version we could tinker with and not cost an arm and a leg to ingest multiple terabytes of daily data.

Positioning as an opensource Splunk would be an interesting play. Going through your docs the union() function looks like it returns a set, akin to splunk values(), is there the equivalent to list()?

Elastic is great in its lane, but it requires more resources and has a monolith weight, that has left a sour taste from our internal testing. Doing a minimal ElasticSearch compatible API would open up your target audience, are there any plans to do you it in a short term horizon (< 1 year)?

mccanne

That's a cool idea. We've had many collaborators using Zed lakes for search at smallish scale and we are still building the breadth of features needed for a serious search platform, but I think we have a nice architecture that holds the promise to blend the best of both worlds of warehouses and search.

As for list() and values() functions, Zed has native arrays and sets so there's no need for a "multi-value" concept as in splunk. If you want to turn a set into an array, a cast will do the trick, e.g.,

echo '1 2 2 3 3' | zq 'u:=union(this) | cast(u,<[int64]>) ' -

[1,2,3]

(Note that <[int64]> is a type value that represents array of int64.)

gauravphoenix

there is Dassana[1] if someone wants to try out json native,index-free, schema-less solution built on top of ClickHouse.

ShowHN post(FAQ)[2]

disclaimer- I'm founder/CEO of Dassana.

[1] https://lake.dassana.io/

[2] https://news.ycombinator.com/item?id=31111432

noborus

I wrote about how to solve with SQL. https://noborus.github.io/blog/jqsql/

undefined

[deleted]

weinzierl

jq is incredibly powerful and I'm using it more and more. Even better, there is a whole ecosystem of tools that are similar or work in conjunction with jq:

* jq (a great JSON-wrangling tool)

* jc (convert various tools’ output into JSON)

* jo (create JSON objects)

* yq (like jq, but for YAML)

* fq (like jq, but for binary)

* htmlq (like jq, but for HTML)

List shamelessly stolen from Julia Evans[1]. For live links see her page.

Just a few days ago I needed to quickly extract all JWT token expiration dates from a network capture. This is what I came up with:

    fq 'grep("Authorization: Bearer.*" ) | print' server.pcap | grep -o 'ey.*$' | sort | uniq | \
    jq -R '[split(".") | select(length > 0) | .[0],.[1] | gsub("-";"+") | gsub("_";"/") | @base64d | fromjson]' | \
   jq '.[1]' | jq '.exp' | xargs -n1 -I! date '+%Y-%m-%d %H:%M:%S' -d @!

It's not a beauty but I find the fact that you can do it in one line, with proper parsing and no regex trickery, remarkable.

[1] https://jvns.ca/blog/2022/04/12/a-list-of-new-ish--command-l...

kitd

Also highly recommended is gron [0], to make json easily searchable

[0] https://github.com/TomNomNom/gron

spudlyo

Most of the time I can get what I need with gron and traditional UNIX tools, without needing to reach for jq, and without having to re-learn its somewhat arcane syntax.

zikduruqe

I came here looking for a gron recommendation. I use this very often.

toxik

Is your example not easier to write and read as a 10-something line Python script? I never understood the appeal of jq etc because of this very reason.

undefined

[deleted]

hoherd

I would definitely add dasel to that list. It's become my de facto serialized data converter, and regularly use it to convert between csv, toml, yaml, json, and xml using jq-ish syntaxes.

https://github.com/tomwright/dasel

chriswarbo

The yq tool also provides 'xq', which works on XML :)

stormbrew

tbh my biggest problem with all these tools is that I really don't want to have to learn one for each of the json-y formats I have to use every day. If jq supported toml and yaml natively I'd be much much much happier to learn its kind of obtuse syntax.

samcal

Have you checked out https://www.nushell.sh/? It seems like exactly what you're describing. Although I know of people who are happily using it as their main shell, I only really use it when I need to read and manipulate data in files.

wwader

Hi, lover of jq and author of fq here! Just wanted to mention that fq is jq so you can do things like `fq 'grep("Authorization: Bearer.*") | split("\n") | ...' file.pcap`.

Also i'm working and prototyping some kind of http decoding support that will make things like select on headers and automatic/manual decoding body possible.

chris37879

I need someone to make a `Q` wrapper that amalgamates all of them. And if that's already taken by a common utility, I vote we name it deLancie, instead.

msluyter

Whenever jq comes up I feel obligated to mention 'gron'[1]. If all you're doing is trying to grep some deeply nested field, it's way easier with gron, IMHO.

[1] https://github.com/tomnomnom/gron

RulerOf

Gron and jq are complementary tools IMO. I frequently use gron to trim down large json files such that I can determine what my ultimate jq query is going to look like.

radicality

For a moment I thought that this is `glom`, which is also a tool I can recommend if you need to be doing any json processing in python (comes with a cli too). It does have a relatively steep learning curve for the advanced features, but does allow you to do interesting things like concisely write recursive parsers in the mini-dsl Glom provides.

https://glom.readthedocs.io/en/latest/

zimpenfish

Used it only this morning to find out if/where the JSON for a tweet mentioned the verification status of the poster and/or retweetee[1]. Quick and easy to dump it through `gron | grep verif` to find out the paths.

[1] "the person who was retweeted" in lieu of a better word.

psacawa

Since no one seems to know about it, jq is described in great detail on the github wiki page [0]. That flattens the learning curve a lot. It's not as arcane as it seems.

The touted claim that is fundamentally stateless is not true. jq is also stateful in the sense that it has variables. If you want, you can write regular procedural code this way. Some examples [1]

The real problem of jq is that it is currently lacking a maintainer to assess a number of PRs that have accumulated since 2018.

[0] https://github.com/stedolan/jq/wiki/jq-Language-Description

[1] https://github.com/fadado/JBOL/blob/master/fadado.github.io/...

Beltalowda

> It's not as arcane as it seems.

The issue with jq is that I use it maybe once a month, or even less. The syntax is "arcane enough" that I keep forgetting how to use it because I use it so sporadically.

In comparison awk – which I also don't use that often – has a much easier syntax that I can mostly remember.

Not entirely convinced by the zq syntax either though; it also seems "arcane enough" that I would keep forgetting it.

hiram112

Bingo.

There are at least a dozen tools and languages and syntaxes that I've used sporadically over the years - awk, sed, bash, Mongo, perl, etc. I don't use them often enough to remember exactly how they work, and so I always have to spend a few hours reviewing manuals or old code repos or an O'Reilly book.

But if I do end up using it for a few days in a row, it starts to make sense, and I improve each time I use it.

But not with jq.

It just does not make sense to my brain, no matter how many times I've had to use it. Every single time I need to use it, it requires finding some Stack Exchange or blog and just copying and pasting. Even after seeing the solution, rarely do I then really understand why or how it works. Nor can I often take that knowledge and apply it to similar problems.

About the only other syntax or language that gives me such problems is Elastic Search DSL.

silon42

Same for me... everytime I have to lookup the basics... and I love awk,perl and xpath/xslt.

laurent123456

I wonder if someone tried to use plain JS as a filtering language? It would be more verbose but it would be easy to remember. For example:

   [1,2,3] | js "out = 0; for (const n of this) out += n"

That would print "6". `out` would be a special variable you write to to print the result, and `this` would be the input.

rane

Not quite that, but ramda-cli[1] which I've created solves this problem, at least for me, by offering the familiar set of functions from Ramda, and you can create pipelines with those to do operations on your data.

[1]: https://github.com/raine/ramda-cli

mechanicalpulse

I've used trentm's json (formerly known as jsontool) package from npm as my default tool for command-line manipulation of JSON for many years now. It provides CLI arguments for passing JavaScript code for filtering and executing on input. I have resisted investing the time into becoming fluent in jq because I've found that many of the common use cases I have are readily handled by jsontool.

https://www.npmjs.com/package/json

Edit: added more information

Beltalowda

A few of the tools listed here seem to work like that, or roughly similar: https://ilya-sher.org/2018/04/10/list-of-json-tools-for-comm...

I didn't check any of them out though.

lgas

My hope was to one day add JS eval support to https://github.com/SuperpowersCorp/refactorio but as you can tell by the timestamps I haven't found any time to work on it in the last 4 years.

anitil

That's a really interesting suggestion, similar to how AWK uses $0, $1 etc.

ts0000

Interesting, for me it's the exact opposite.

I've tried a couple of times to get into awk, but still find the syntax arcane.

Beltalowda

I don't know; I wouldn't presume to tell you what you do or don't find arcane, but once I understood the somewhat unusual flow of awk ("for every line, check if the line matches this condition, and if it does run this block of code") I found it's quite easy to work with. It's "arcane" in the sense that it has an implicit loop and that it's a specialized language for a very limited class of problems, but I found that for this limited class of problem it's surprisingly effective.

taude

Same issue. However, I do successfully rely on using ctrl-r a lot to search prior invoked commands. And have a few core aliases that I've cobbled together....

rgoodwintx

Here because.... I didn't know of ctrl-R. What a life changer (although I had an alias for "hg" to "history | grep" :) )

zeroimpl

I use both awk and jq infrequently enough that I tend to struggle with anything non-trivial. I think zq would be the same.

> Not entirely convinced by the zq syntax either though; it also seems "arcane enough" that I would keep forgetting it.

I think this is the main thing. I’d prefer a streamlined CLI tool where you passed in some JS code and it’d just run it on the input (with the same slurp/raw args as jq). Could just be npm with underscore.js autoimported.

ar_lan

This is ironic - I use `awk` so infrequently, I have no idea how to use it without reading its man page or using Google. But I use `jq` often and find it simple.

j1elo

Sadly very few authors seem to acknowledge or even know that github wiki pages are not indexed by search engines so if it wasn't for third-party sites like github-wiki-see.page (which could stop working at any time) their contents would be undiscoverable by the very same people they are usually intended...

oblio

What? That's crazy! Does Github block indexing?

bckygldstn

There's more details on https://github-wiki-see.page/ and https://github.com/github/feedback/discussions/4992#discussi...

> we have also introduced an x-robots-tag: none in the http response header of Wiki pages

> Abusive behavior in Wikis had a negative impact on our search engine ranking

> GitHub is currently permitting a select criteria of GitHub Wikis to be indexed

beembeem

https://github.com/robots.txt

I don't see anything here about wiki specifically but maybe one of the rules hits wiki pages?

jdnier

Here's podcast interview with the creator of jq about what he's been working on at Jane Street: https://signalsandthreads.com/memory-management/

klysm

I didn't realize jq was missing a maintainer, it's one of my most used CLI tools.

ethanwillis

It really is a fundamental problem where lots of these important projects aren't maintained simply because the reality is the maintainers can't beat the economics of a lot of rich freeloaders having no real short term incentive to compensate these maintainers..

avgcorrection

> can't beat the economics

This makes it sound like this is some antagonistic relationship where the OSS maintainer loses. But the idealistic scenario that you are alluding to[1] is about a developer who develops free OSS in their free time. And then, yes, very few end up paying or donating anything. But how is a predictable chain of events a loss? What is the “economics” of it?

[1] Some OSS developers do it as their day job.

lenkite

Pity he quit before Github opened up sponsorships.

jahewson

It’s not though, because in this case the (ex-)maintainer works at a Wall St firm.

skybrian

In this case it doesn't seem too critical? It means jq remains stable, which is probably what should happen once a tool like this gets a lot of users.

adamgordonbell

I found it hard to approach at first, but I think it was just the lack of material that worked through simple examples step by step.

I ended up writing my own guide to it, that in my unbiased opinion makes it easier to get the point where in-depth examples and language descriptions are easier to understand.

Edit: Oh, wow, it's even mentioned in this article. Maybe I should read before commenting.

https://earthly.dev/blog/jq-select/

sfink

I discovered jq after I wrote my own (extremely limited) version of it. I need it quite often, and yet I've never managed to get up the activation energy to learn enough for it to be useful. I need to have some notion of the computation model before anything is going to make sense to me. I hate learning things in completely disparate pieces that I need to memorize in hopes that someday it will just click together and I'll derive the underlying principles.

Your guide was great for this. It stepped me through enough of the bare basics in a way that the underlying model was obvious. It didn't get me nearly far enough for many of the tasks that I need jq for, but it got me started and that's all I really needed. Everything additional that I need to learn becomes obvious in retrospect—"of course there's an operator for this, there kind of has to be!".

Thank you!

dilap

From that page:

> The jq documentation is written in a style that hides a lot of important detail because the hope is that the language feels intuitive.

Yeah, not so much boys! Also, that disclaimer should really be at the top of the manual, with a link to the wiki, rather than vice-versa, as it is now.

The wiki is like secret information -- "oh, hey, here's the page that actually tells you how it works!"

undefined

[deleted]

eatonphil

If jq is getting too slow for you (that's never happened for me), it really seems like it's time to put your data in a database like sqlite or duckdb at least.

Incidentally there are many tools that help you do this like dsq [0] (which I develop), q [1], textql [2], etc.

[0] https://github.com/multiprocessio/dsq

[1] https://github.com/harelba/q

[2] https://github.com/dinedal/textql

jeffbee

I don’t agree. There is a great deal of room for improvement in jq performance. I profiled one invocation and it spent the majority of its time asserting that the stack depth was lower than some amount, which is crazy. I rebuilt it with NDEBUG defined and it was seriously ten times faster, but it’s not safe to run it that way because it has asserts with side effects, which is also crazy.

Rewriting all or parts of it in C++ would make it dramatically faster. I would start by ripping out the asserts and using a different strtod which they spend an awful lot of time in.

eatonphil

Fair point! I don't mean to say jq performance can't or shouldn't be improved.

Just that jq does two things: 1) ingest and 2) query.

If you're doing a bunch of exploration on a single dataset in one period of time or if the dataset is large enough and you're selecting subsets of it, you can ingest the data into a database (and optionally toggle indexes).

Then you can query as many times as you want and not worry about ingest again until your data changes.

All three of the tools I listed have variations of this sort of caching of data built in. For dsq and q with caching turned on, repeat queries against files with the same hashsum only do queries against data already in SQLite, no ingestion.

jeffbee

I have a large GeoJSON dataset I analyze to answer local government questions. It is of course loaded into a database for common questions but I also find myself doing ad hoc queries that aren’t suited to the database structure, and that’s where I find myself waiting for jq. Also I use jq as the ETL for that database.

algesten

I don't get it. "Instead of learning jq DSL, learn zq DSL".

To me they look similarly complicated and the examples stresses certain aggregation operations that are harder to do in jq (due to it being stateless).

loeg

> "Instead of learning jq DSL, learn zq DSL"

I think you got it — that’s exactly the idea. They claim (reasonably?) that it’s a more intuitive DSL; and it supports state. They also make some performance claims towards the end of the article.

jerrysievert

> They also make some performance claims towards the end of the article.

essentially a marginal speed increase they think on json, but a much bigger speed increase (5x-100x they claim) if you switch to their native format ZNG.

if I'm switching formats completely, I'm not sure why I care about jq vs zq in json performance ...

loeg

Marginally faster is better than marginally slower, at least. I agree the JSON use is probably more compelling than their ZNG thing.

p5a0u9l

Yes, but fortunately, your efforts will pay dividends when parsing all the 'z*' boutique formats that it supports, zson, zst, zng, the list goes on. /s

mattnibs

Not sure if this came across in the article, but all the "boutique" z* formats are all representations of the same zed model https://zed.brimdata.io/docs/formats/zed/

enriquto

> "Instead of learning jq DSL, learn zq DSL".

A saner approach is to gron the damn json and just use regular unix tools on the data.

knome

These guys must really hate functional programming.

I can see where jq might confuse someone new to it, but their replacement is irregular, stateful, still difficult, and I don't even see variable binding or anything.

jq requires you to understand that `hello|world` will run world for each hello, passing the world out values to either the next piped expression, the wrapping value-collecting list, or printing them to stdout.

it's a bit unintuitive if you come in thinking of them as regular pipelines, but it's a constant in the language that once learned always applies.

this zed thing has what appears to be a series of workarounds for its own awkwardness, where they kept tacking on new forms to try to bandaid those that came before.

additionally, since they made attribute selectors barewords where jq would require a preceding reference to a variable or the current value (.), I'm not sure where they'll go for variables should they add them.

johnday

No kidding!

This part in particular jumped out at me:

> To work around this statelessness, you can wrap a sequence of independent values into an array, iterate over the array, then wrap that result back up into another array so you can pass the entire sequence as a single value downstream to the “next filter”.

This is literally just describing a map. A technique so generally applicable and useful that it's made its way into every modern imperative/procedural programming language I can think of. The idea that this person fails to recognise such a common multiparadigmatic programming idiom doesn't fill me with confidence about the design of zq.

aarchi

In fact, jq already has `map`, which would replace the article's pattern of `[.[]|add]` with `map(add)`. It is defined as such:

    def map(f): [.[] | f];

Many built-in functions in jq are implemented in jq, in terms of a small set of core primitives. The implementations can be inspected in builtin.jq.

https://github.com/stedolan/jq/blob/master/src/builtin.jq#L3

thaliaarchi

I find the stateless streaming paradigm in jq very pleasing.

Results can be emitted iteratively using generators, which are implemented as tail-recursive streams [0]. Combined with the `input` built-in filter, which yields the next item in the input stream, and jq can handle real-time I/O and function as a more general-purpose programming language.

I built an interpreter for the Whitespace programming language in jq using these concepts and it's easily one of the most complex jq programs out there.

[0]: https://stedolan.github.io/jq/manual/#Generatorsanditerators

[1]: https://github.com/andrewarchi/wsjq

qmacro

Wow, I've been on the lookout for larger jq programs from which to learn. I'm going to enjoy learning from wsjq, thank you!

mattnibs

Variables exist in zq, "this" is a reserved word: echo {x:1} | zq 'x := x+1' -

thayne

I think their main complaint is that you can't iteratively operate on a stream as a whole without first converting it to an array, which besides sometimes requiring awkward syntax, can require a lot of memory for large datasets.

undefined

[deleted]

undefined

[deleted]

micimize

Their syntax comparison under "So you like chocolate or vanilla?" is disingenuous. You can do variable assignment and array expansion in jq:

  expand_vals_into_independent_records='
    .name as $name | .vals[] | { name: $name, val: . }
  '
  echo '{"name":"foo","vals":[1,2,3]} {"name":"bar","vals":[4,5]}' |
    jq "$expand_vals_into_independent_records"

Also, generally, not a fan of the tone of this article.

lilyball

Your `.name as $name` was my immediate attempt too, but it turns out you can go even simpler with

  jq '{name, val: .vals[]}'

diehunde

Pardon my ignorance, but would I spend time learning something like jq or zq when it only takes me a couple of minutes to develop a script using some high-level language? I've had to process complex JSON files in the past, and a simple Python script gets the job done, and the syntax is much more familiar and easier to memorize. Is there a use case I'm missing?

meowface

If you're doing a lot of JSON munging every day and have good mastery of something like jq or zq, you can probably get things done faster.

Like you, I almost always just write Python scripts for such tasks because it's a lot easier for me to reason through it and debug it, but it's definitely slower-going than what I might do if I were very adept in a terse language like jq. I don't do this too often, so it makes little difference to me, but if someone is doing this multiple times a day, every day, it'll add up. As you say, it takes a few minutes; with jq, it could be a few seconds.

meepmorp

The same thing could be said for grep, or really any other utility that can have its functionality reproduced in a programing language.

eru

Indeed! Jq is basically something like grep for JSON.

It might actually make sense to embed jq functionality into your favourite language (as a library or so), as it is quite a nice and well-chosen set of functionality.

preferjq

I would love to see jq libraries become as common as regex libraries so I could use jq directly in whatever stack or environment I'm working on.

folkrav

Honestly, I've only really used `jq` to quickly parse JSON structures in interactive sessions e.g

  curl -s http://foo.bar | jq .some.nested.value

Anything more complicated I would indeed go for writing a proper script.

eru

Don't tell anyone, but jq is secretly a pretty well thought out functional programming language.

johnthuss

There is certainly a learning curve with jq that can put people off. The attraction is that the end result is a very small amount of code that does only one thing: parse a JSON file, rather than invoking an external script that might send many HTTP requests or launch a missile.

As the complexity of the input JSON grows or the complexity of your processing, it does makes sense to leave jq behind for a higher level language.

eru

I agree with most of what you say.

I disagree with 'leaving for a higher level language'. Jq is an extremely high level language.

What it is _not_ is a general purpose language.

aftbit

This is how I felt about regular expression when I was first learning them. Now I feel that they're one of the most powerful text-processing tools that I know. I also felt similarly about SQL at the very beginning. IMO if you find yourself doing a _lot_ of JSON processing, learning at least basic jq gives you superpowers.

ris

1. The High Level Language of your choice may not be the flavour liked by other members of your team. Ruby? ew please use Python - unnecessary discussion ensues... 2. Your High Level Language of choice would probably require a non-trivial container image, which requires extra decisions to be made about sourcing, which is something you'd rather not think about if this is just e.g. a step in a CD pipeline. jq is tiny and a very simple addition to an existing image. It's even present by default in GitHub Actions' `ubuntu-latest`. 3. Your High Level Language of choice may require dependencies to do the same job. How are those dependencies going to be defined, pinned, who's going to be responsible for bumping them...?

I used to 100% agree with you, but these days I understand why so much stuff ends up being bash and jq.

undefined

[deleted]

orthecreedence

You can spend a few days getting to know jq or you can happily live with your 100+ purpose-built scripts. I know which one I prefer.

I don't even process complex JSON...it's usually pretty basic. But being able to quickly select parts out of streams of JSON data on the CLI is incredibly useful to me, and learning even just the basics of jq has paid for itself a hundred times over by now.

Granted, a lot of my job right now is data forensics stuff, so I breath this kind of stuff. You might never need jq.

brushfoot

The name of its corporate progenitor may leave a bad taste in some mouths, but I highly recommend PowerShell for this sort of thing. It's cross platform, MIT licensed, and comes with excellent JSON parsing and querying capabilities. Reading, parsing, and querying JSON to return all red cars:

  Get-Content cars.json | ConvertFrom-Json | ? { $_.color -eq 'red' }

The beauty of this is that the query syntax applies not just to JSON but to every type of collection, so you don't have to learn a specific syntax for JSON and another for another data type. You can use Get-Process on Linux to get running processes and filter them in the same way. The same for files, HTML tags, etc. I think nushell is doing something similar, though I haven't tried it yet.

I prefer this approach to another domain-specific language, as interesting as jq's and zq's are.

ptx

PowerShell "sends basic telemetry data to Microsoft [...] about the host running PowerShell, and information about how PowerShell is used" [1].

And since it relies on .NET, that also requires its own separate opt-out for its telemetry. There might be other components, now or in the future, that also send data to Microsoft by default and would have to be separately discovered and disabled.

[1] https://docs.microsoft.com/en-us/powershell/module/microsoft...

sandyarmstrong

> And since it relies on .NET, that also requires its own separate opt-out for its telemetry.

Building a program with .NET does NOT cause that program to send telemetry to Microsoft.

You're thinking of the .NET SDK itself. Using PowerShell does not trigger any use of the .NET SDK.

Disclaimer: I work for Microsoft.

ptx

Ah, yes, my mistake. Although PowerShell sends its own telemetry, the additional telemetry from the .NET platform is only sent when you use the dotnet command [1] and, as a special case, not when you very carefully invoke it only "in the following format: dotnet [path-to-app].dll" and never e.g. "dotnet help".

However, presumably PowerShell requires at least the .NET Runtime if not the .NET SDK, doesn't it? The docs [2] suggest running "dotnet --list-runtimes" to "see which versions of the .NET runtime are currently installed", so it sounds like the Runtime also includes the dotnet command. Does running the recommended "dotnet --list-runtimes" command send telemetry, like most of the commands? Or are you saying that the Runtime, unlike the SDK, doesn't include telemetry at all?

[1] https://docs.microsoft.com/en-us/dotnet/core/tools/telemetry

[2] https://docs.microsoft.com/en-us/dotnet/core/install/how-to-...

brushfoot

To me a telemetry opt-out is a small price to pay for what PowerShell brings to the table, but to each their own.

> There might be other components, now or in the future, that also send data to Microsoft

Of course. Do your due diligence on whatever you install. No tool should be exempt from that.

mschuster91

> Do your due diligence on whatever you install. No tool should be exempt from that.

That's a ridiculous take. 99% of users don't understand what all that technobabble in a typical EULA means, they will just go for the option they are nudged to (which is why first the courts and now enforcement agencies are stepping up their game against that practice [1]).

The way that the GDPR expects stuff to be handled is by getting explicit user consent, the consent must be a reasonably free choice (i.e. deals like "give me your personal data and the app is free, otherwise pay" are banned), and there must not be any exchange of GDPR-protected data without that consent unless technically required to perform the service the user demands. Clearly, a telemetry opt-out is completely against the spirit of the GDPR and I seriously hope for Microsoft to get flattened by the courts for the bullshit they have been pulling for way too long now.

What I would actually expect of Microsoft is to follow the Apple way: have one single central place, ideally at setup and later in the System Preferences, where tracking, analytics and other optional crap can be disabled system-wide.

[1] https://www.hiddemann.de/allgemein/lg-rostock-bejaht-unterla...

vips7L

> The beauty of this is that the query syntax applies not just to JSON but to every type of collection,

This is the best part of pwsh. Everything is standardized, you're not guessing at the idioms of each command, and you're working with objects instead of parsing strings!

My second favorite part is having access to the entire C# standard library.

bblb

PowerShell is "Python interactive done right". It's too bad it has a bad rap in open source community and it might never get the traction it really deserves. Sure it has it's downsides, which tech doesn't, but PowerShell has solved so many issues and annoyances with the shells that we've been used to, that it still comes out as the winner.

I've been using it since day one from 2006, every single day. It has come a long way and the current PS7 is the best shell experience there is. Hands down no contest.

Snover's passionate early presentation about the PS pipeline is a pretty cool tech video. https://www.youtube.com/watch?v=325kY2Umgw8

spiralx

> PowerShell is "Python interactive done right".

Actually PowerShell is "Perl interactive done right" if you read what the designers say about their influences - the automatic variable $_ is straight from Perl and the array creation syntax @(a, b, c) is also a Perl-ism from @arr = (a, b, c). Which is funny as I dislike Perl intensely but really like PowerShell :)

To be fair there's not much Perl in PS, it's as much influenced by KSH, Awk, cmd.exe and VBScript as Perl. Thankfully "influenced by" isn't "a melange of", because a combination of all of those sounds like an abomination lol, and PS is wonderful in being about as consistent and simple as a proper shell can get.

klysm

I want to learn powershell, but I have an internal ick bias because I've been using bash for so many years. The tab behavior is the exact opposite of what I expect and it short circuits my brain every single time I press it. Having structured data in the pipes seems very useful and powerful though so I should probably just bite the bullet.

vips7L

Tab behavior is configurable. I have mine set to menu expansion.

    Set-PSReadLineKeyHandler -Key Tab -Function MenuComplete

icedchai

Thanks for that! I've recently been learning PowerShell. After 30 years of bash it is interesting.

ilyash

.. or you can try Next Generation Shell (author here):

fetch("cars.json").filter({"color": "red"})

# or

echo(fetch("cars.json").filter({"color": "red"}))

Powershell's object pipes are more inspectable than any of Bourne's text-based decendants. But the tool itself occupies a niche between "write shell, dealing with esoteria of grep/sed/awk/jq/etc" and "write python getting constructs than handle complexity better than pipes".

Looking at the popularity of VSCode, I don't think Microsoft hatred blocks its adoption.

ComputerGuru

> Looking at the popularity of VSCode, I don't think Microsoft hatred blocks its adoption.

In-apt comparison. The people using VS Code are more likely to be migrating from proprietary tools like PyCharm, Sublime Text, etc or bloated offerings like NetBeans or roughly equivalent offerings like Atom.

The people that would use PowerShell would be migrating from the likes of Zsh, Bash, Fish, and other “hard core free” software.

mdaniel

I conceptually like pwsh, but even as your example shows, I don't have the RSI budget left to spend on typing that extremely verbose expression every day

jq and its unix-y friends allow me to trade off expressiveness against having to memorize arcane invocations

brushfoot

I hear that, I use and like *nix too. PowerShell aliases help a lot. It comes with some predefined, like `gc` for `Get-Content`. The above example could be rewritten:

  gc cars.json | ConvertFrom-Json | ? color -eq 'red'

`ConvertFrom-Json` doesn't have a default alias, but you can define one in your PowerShell profile. I do that for commands I find myself using frequently. Say we pick convjson:

  gc cars.json | convjson | ? color -eq 'red'

That's more like what my typical pipelines look like.

The nice thing about aliases is you can always switch back to the verbose names when clarity is more important than brevity, like in long-term scripts.

Edit: Seems I've been using too many braces and dollar signs all these years. Thanks to majkinetor for the tip.

majkinetor

You don't need $_ for immediate properties which looks much cleaner:

    gc cars.json | convjson | ? color -eq 'red'

undefined

[deleted]

Arnavion

jq can not only process JSON input but also emit JSON output. So on that note, has ConvertTo-Json stopped mangling your JSON yet? https://news.ycombinator.com/item?id=25500632

AcerbicZero

I'm pretty new to jq (maybe 2 years of exposure) but from my perspective - on some level, jq does to json what powershell does to everything windows, except powershell gives me the get-member cmdlet, so when I don't know what is even in my object, I can explore.

Sometimes jq -r '.[]' works, but its all just trial and error. I use plenty of jq in my scripts, but I can never seem to visualize how jq looks at the data. I just have to toss variations of '.[whateveriwant].whatever[.want.]' until something works....I suppose the root of my complaint is that jq does not do a good job of teaching you to use jq. It either works, or gives you nothing, and while I've learned to work around that, I'll try anything that claims to be even 1% better than jq.

anitil

I use jless to manually find what I'm looking for and then using the result as a starting point. Unfortunately I don't know how to get that query in to the paste buffer yet so there's a manual step in the middle

abledon

There is also "JP" https://github.com/jmespath/jp

which follows the jmespath standard

mdaniel

My heartburn with jmespath is that it lacks pipelines, only projections, so doing _crazy_ stuff to the input structure is damn near impossible

NateEag

I suspect the JMESPath people would argue that if you want to do major transformations to the input, you should write a proper program, and that a CLI query tool should focus on, well, querying.

I'm personally trying to move away from jq and towards jp, because

- there's a standard defining it, not just an implementation, decreasing the odds of being stuck with an unmaintained tool

- there are libraries supporting the syntax for most of the major programming languages

- JMESPath's relative simplicity compared to jq is a good thing, IMO - Turing-completeness is a two-edged sword

- JMESPath is the AWS CLI query language, which is a convenient bonus

mdaniel

> JMESPath is the AWS CLI query language, which is a convenient bonus

And in ansible, too, FWIW, but yes it's my hand-to-hand combat with the language in both of those circumstances that has formed my opinion about it

Regrettably, "kubectl get -o jsonpath" is _almost_ the same, but just different enough to trip me up :-(

remram

From a computer science point of view, what kind of transformations are impossible to express in jmespath but are possible in jq?

mdaniel

I dunno how to speak to your "computer science" part, but pragmatically anything that requires a "backreference", because unlike with JSONPath (and, of course, jq) there are no "root object" references

    $ printf '{"a": {"b":"c", "d":["d0","d1"]}}' | jq -r '[ .a as $a | $a.d[] | {x: ., y: $a.b}]'
    [
      {
        "x": "d0",
        "y": "c"
      },
      {
        "x": "d1",
        "y": "c"
      }
    ]

and I realize this isn't as pure CS-y as you were asking, but this syntax is hell on quoting

    $ printf '["a","b"]' | jp -u 'join(`"\n"`, @)'
    # vs
    $ printf '["a","b"]' | jq -r 'join("\n")'

Daily Digest email

Get the top HN stories in your inbox every day.