Conventions for Command Line Options

Daily Digest email

Get the top HN stories in your inbox every day.

exmadscientist

Just, whatever, you do, please please PLEASE PLEASE support `--help`. I don't mean `-h` or `-help`, I mean `--help`.

There's only one thing the long flag can possibly mean: give me the help text. I understand that your program might prefer the `-help` style, but many do not. And do you know how I figure out which style your program likes? That's right, I use `--help`. I have to use `--help` rather than just `-help` because of GNU userspace tools, among others. It seems unlikely they're going to suddenly clean up their act this decade, so I have to default to starting with `--help`.

So it's very frustrating when the response to `program --help` is "Argument --help not understood, try -help for help." This is then often followed by me saying indecent things about the stupid program.

arp242

Even better: support both -h and --help.

Also, don't pipe the help output to stderr if the user requested help. "prog --help | less" not working is annoying.

misnome

Or worse - you pass --help and the script runs and starts doing stuff

klysm

And is then not robust against being stopped via ctrl c

vasilakisfil

also --version (and -v if possible, although that can be mapped to --verbose)

Annatar

Traditional UNIX®️ options for help are -? and -h. --long-options are a horrid GNU-ism and shunned by clean UNIX®️ compliant programs, because such programs come with detailed, professionally written manual pages which contain extensive SYNOPSIS and EXAMPLES sections.

Implementing --long-options makes the poor users type much more, hurting ergonomy and the users' long-term productivity and efficiency.

m463

I think long human-readable options are very nice when invoking a command in a script.

A (hypothetical) script can invoke rsync like this:

  rsync -Oogx

but long options makes things more readable:

  rsync \
    --omit-dir-times \
    --owner \
    --group \
    --one-file-system

It also helps when you're wondering if '-d' means debug or directory.

(although yes, it can allow options to be added to a program ad-nauseum)

jaen

-? - what are you smoking? :)

That is a glob in most shells, try putting a file named "-a" in your current directory and see what happens...

The proper way to write it would be -\?

Annatar

It goes without saying that it has to be escaped in most shells, but that is the traditional option for help on UNIX®️. You would be well advised to educate yourself on the history of UNIX®️ before coming up with "what are you smoking?"

hibbelig

I see nothing wrong with extensive SYNOPSIS and EXAMPLES sections for programs using --long-options.

The fish shell extracts completion information from man pages, so that the user does not have to type so much and they get useful output, pulled from the manpage, no less.

chriswarbo

I write a lot of commandline utilities and scripts for automating tasks. I find environment variables much simpler to provide and accept key/value pairs than using arguments. Env vars don't depend on order; if present, they always have a value (even if it's the empty string); they're also inherited by subprocesses, which is usually a bonus too (especially compared to passing values to sub-commands via arguments, e.g. when optional arguments may or may not have been given).

Using arguments for key/value pairs allows invalid states, where the key is given but not the value. It can also cause subsequent, semantically distinct, options to get swallowed up in place of the missing value. They also force us to depend on the order of arguments (i.e. since values must follow immediately after their keys), despite it being otherwise irrelevant (at least, if we've split on '--'). I also see no point in concatenating options: it introduces redundant choices, which imposes a slight mental burden that we're better off without.

The only advice I wholeheartedly encourage from this article is (a) use libraries rather than hand-rolling (although, given my choices above, argv is usually sufficient!) and (b) allow a '--' argument for disambiguating flag arguments from arbitrary string arguments (which might otherwise parse as flags).

TeMPOraL

Some tangential musings:

I can't stop but see here parallels between cmdline arguments vs. environment variables for programs, and keyword arguments vs. dynamic binding for functions in a program (particularly in a Lisp one).

That is, $ program --foo=bar vs. $ FOO=bar program

seems analogous to:

  (function :foo "bar")
  ;; vs.
  (let ((*foo* "bar"))
    (function))

When writing code (particularly Lisp), keyword arguments are preferred to dynamic binding because the function signature is then explicitly listing arguments it uses, and dynamic binding (the in-process phenomenon analogous to inheriting environment from a parent process) is seen as dangerous, and a source of external state that may be difficult to trace/spot in the code.

I suppose the latter argument applies to env vars as well - you can accidentally pass values differing from expectation because you didn't know a parent process changed them. The former doesn't, because processes don't have a "signature" specifying its options, at least not in a machine-readable form. Which is somewhat surprising - pretty much all software written over the past decades follows the pattern of accepting argc, argv[] (and env[]), and yet the format of arguments is entirely hidden inside the actual program. I wonder why there isn't a way to specify accepted arguments as e.g. metadata of the executable file?

aidenn0

Lisp will at least warn you if you bind a variable with earmuffs that isn't declared special though. Biggest downside to environment variables is the lack of warning if you misspell something.

chriswarbo

Interesting that you bring up Lisp and dynamic scope, since I've previously combined env vars with Racket's "parameters" (AKA dynamic bindings): https://lobste.rs/s/gnniei/setenv_fiasco_2015#c_owz2pp

TeMPOraL

Yeah, I just can't stop thinking about envs / dynamic binding when someone brings up the other.

Speaking of using both direct and dynamic arguments, a common pattern in (Common) Lisp is using dynamically-bound variables (so-called "special" variables) to provide default values. Since default values in (Common) Lisp can be given as any expression, which gets evaluated at function-call time, I can write this:

  (defun function (&key (some-keyword-arg *some-special*))
    ...)

And then, if I call this function with no arguments, i.e (some-function), the some-keyword-arg will pull its value from current dynamic value of some-special. In process land, this would be equivalent to having command line parser use values in environment variables as defaults for arguments that were not provided.

jpitz

In my head there's been a hierarchy for a long time.

when I build command line utilities and I think about the way that they'll be used, I tend to use configuration files for things that will change very slowly over time, and environment variables as a way to override default behaviors, and command line arguments to specify things that will often vary from invocation to invocation. In fact, most of the time, I use the environment variables either for development/testing features that I don't really intend to expose to most users, or for credentials that don't get persisted.

it's never occurred to me to use environment variables as a primary way to configure an application. I'll have to noodle on that for a while. My gut says that it's enough of a deviation from the Unix convention that I probably won't use that.

Izkata

> My gut says that it's enough of a deviation from the Unix convention that I probably won't use that.

It's not completely unheard of. Checking the man pages for a few I remember being affected:

* `ls` looks at LS_COLORS

* `grep` looks at GREP_OPTIONS, GREP_COLOR, GREP_COLORS, as well as the not-grep-specific LC_ALL, LC_COLLATE, LANG, etc, and POSIXLY_CORRECT

* `find` also looks at LC_%, LANG, POSIXLY_CORRECT as well, plus TZ and PATH for some of its other options

Looks like the program-specific ones LS_% and GREP_% can be overridden by options directly on the command, but the others don't have such options.

(Makefile wildcards because asterisk keeps doing italics)

jpitz

I feel like I didn't make "primary" clear enough - it would be like using an environment variable for the path for ls. That's gonna be a big no from me, dog.

Annatar

These are all GNU/Linuxisms and are nowhere to be found in a real UNIX®️ like illumos, Solaris, HP-UX or IRIX64.

This is also a good example of why having GNU/Linux as one's first OS instead of a real UNIX®️ is so toxic.

arp242

There's a few reason I don't like environment variables for this: the first is that a random env variable can influence the operation of a program. Do "export foo=bar" and then maybe 30 minutes later you unexpectedly pass "foo". It's also hard to inspect; the output of "env" tends to be somewhat long, and you don't know which the program will pick up on. Flags are much clearer and more explicit.

There's also the issue that typo'ing "foo" as "fooo" will silently fail. Okay, that's a simple example, but some tools have "PROGRAMNAME_LONGDESCRIPTION". Being in all-caps also makes it hard to spot typos for me. You generally want your env vars to be long, so ensure they're unique.

boring_twenties

> It's also hard to inspect; the output of "env" tends to be somewhat long, and you don't know which the program will pick up on. Flags are much clearer and more explicit.

I don't think I disagree with your post, but this particular point seems to be trivially solved by prefixing the variable name with the program name?

IOW: `env | grep '^GIT_'` to see what you're passing to `git`.

arp242

Yeah, fair enough; that's what I do as well.

AnonC

I’m curious if your preference for environment variables is only for writing programs or for using programs as well. From a user’s perspective, would you prefer that common commands use environment variables to get options? For example, would you prefer a “find” command that uses environment variables instead of command line options?

chriswarbo

> For example, would you prefer a “find” command that uses environment variables instead of command line options?

I use 'find' so much that my muscle memory would hurt if it changed, but it's actually a really interesting example. Its use of arguments to build a domain-specific language is a cool hack, but pretty horrendous; e.g.

    find . -type f -a -not \( -name \*.htm -o -name \*.html \)

We can compare this to tools like 'jq', which use a DSL but keep it inside a single string argument:

    key=foo val=bar jq -n '{(env.key): env.val}'

Note that 'jq' also accepts key/value pairs via commandline arguments, but that requires triplets of arguments, e.g.

    jq -n --arg key foo --arg val bar '{(env.key): env.val}'

pklo

The second example didn't work for jq-1.6 ---I had to write it as

  jq -n --arg key foo --arg val bar '{($key): $val}'

syshum

That would depend on the target, the Grand Parent talks about Automation thus the target for their utilities is most likely the system not a user.

When writing automation routines env vars are often better form my experience

Tools that will be manually run by a user or admin then cli options are better

pfranz

I think there's a place for both, but env vars can make things really annoying to troubleshoot. Already, I often print all env vars before running automated commands and it can be a mess to dig through. Culling down environment variables when spawning a subprocess is difficult. Bad flags can error immediately if you made a typo. I've often misspelled an envvar and its hard to tell it did nothing (I think I saw a recent bug where trailing white space like "FOO " was the source of a years long bug). `FOO=BAR cmd` is also weird for history (although that's mostly a tooling issue).

vbernat

It seems equivalent to long options requiring a value. Also, if you mistype an environment variable, you won't get warned about it.

ucarion

Another nice thing about env vars is that they don't appear in the process table, unlike argv. For many applications, that makes them a suitable place to inject secrets, and makes them convenient to inject via wrapper tools.

For instance, the fact that the aws(1) cli tool and the AWS SDKs by default take IAM creds from AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, etc. means that you can write programs like aws-vault (https://github.com/99designs/aws-vault) which can acquire a temporary AWS session and then execute another command with the creds injected in the standard AWS_... env vars. For instance:

    aws-vault exec prod-read -- aws s3 ls
    aws-vault exec stage-admin -- ./my-script-that-uses-the-aws-sdk-internally

Also, passing arguments as env vars is part of the "12-factor app" design (https://12factor.net/config). That page has some good guidance.

fanf2

The environment does appear in the process table: see `ps e` on Debian at https://manpages.debian.org/buster/procps/ps.1.en.html or ps -e on BSD at https://www.freebsd.org/cgi/man.cgi?query=ps&apropos=0&sekti... or pargs -e on Solaris at https://www.unix.com/man-page/opensolaris/1/pargs

yjftsjthsd-h

It's protected by at least uid, but `/proc/$PID/environ` does exist, and it might be exposed other ways, too.

thayne

That's actually a bad thing for non-secure arguments, and environment variables aren't much better for secure arguments.

klhugo

Quick comment, not an expert, but environ vars keep their state after the program is called. From a functional programming perspective, or just for my own sanity, wouldn’t it be more interesting to keep the states into minimum?

chriswarbo

You're right that mutable state should be kept to a minimum, but immutable state is fine. There usually isn't much need to mutate env vars.

Some thoughts/remarks:

- If we don't change a variable/value then it's immutable. Some languages let us enforce this (e.g. 'const'), which is nice, but we shouldn't worry too much if our language doesn't.

- We're violating 'single source of truth' if we have some parts of our code reading the env var, and some reading from a normal language variable. This also applies to arguments through.

- Reading from an env var is an I/O effect, which we should minimise.

- We can solve the last 2 problems by reading each env var once, up-front, then passing the result around as a normal value (i.e. functional core, imperative shell)

- Env vars are easy to shadow, e.g. if our script uses variables FOO and BAR, we can use a different value for BAR within a sub-command, e.g. in bash:

    BAR=abc someCommand

This will inherit 'FOO' but use a different value of 'BAR'. This isn't mutable state, it's a nested context, more like:

    let foo  = 123
        bar  = "hello"
        baz  = otherCommand
        quux = let bar = "abc"
                in someCommand

As TeMPOraL notes, env vars are more like dynamically scoped variables, whilst most language variables are lexically scoped.

rabidrat

That's only if you `export` the vars. You can do `FOO=1 BAR=2 cmd` and FOO and BAR will only have those values for the process and children. This is isomorphic to `cmd --foo=1 --bar=2`.

treebog

Absolutely, this is one of the reasons that environment variables are a nightmare to work with

ucarion

Most cli parsers don't fully support the author's suggested model because it means you can't parse argv without knowing the flags in advance.

For example, the author suggests that a cli parser should be able to understand these two as the same thing:

    program -abco output.txt
    program -abcooutput.txt

That's only doable if by the time you're parsing argv, you already know `-a`, `-b`, and `-c` don't take a value, and `-o` does take a value.

But this is a pain. All it gets you is the ability to save one space character, in exchange for a much complex argv-parsing process. The `-abco output.txt` form can be parsed without such additional context, and is already a pretty nice user interface.

For those of us who aren't working on ls(1), there's no shame in having a less-sexy but easier-to-build-and-debug cli interface.

saurik

But every single command line parser I have ever used (and I have used many over the past 25 years in many programming languages) does in fact know before parsing that a b and c don't take a value and o does: accepting the grammar for the flags and then parsing them in this way is like the only job of the parser?

TeMPOraL

I suppose the only reason why you wouldn't want to have a centralized description of the command line grammar is if you're using flags which alter the set of acceptable flags. Like e.g. if you put all git subcommands into one single executable - the combinations and meanings of commandline flags would become absurdly complicated.

tom_

This is usually handled by adding support for that sort of syntax to the parsing library. The result is typically not complicated: you have a parser for the global options, then, if using subcommands, in effect a map of subcommand name to parser for options for that command.

rabidrat

Some programs may allow for plugins which can have their own options. But you may want to provide an option for where to load the plugins from.

goto11

Why would someone ever want to use the second syntax? (Genuine question, not rhetorical!)

It seems using = for optional option arguments is only allowed with long form. Why is that? Wouldn't it be nicer to be able to write "program -c=blue" rather than "program -cblue"?

mehrdadn

The second syntax when you want to ask a program to pass an argument to another program. Having to pass 2 arguments is a lot more annoying since you'd often then have to prefix each one.

barrkel

Very opinionated, and IMO without enough justification.

The assertions around short options with arguments - conjoining with other short options, for example - are actively harmful to legibility in scripts, since there's no lexical distinction between the argument and extra short options. I don't recommend using that syntax when alternatives are available and I deliberately don't support it when implementing ad-hoc argument parsing (typically when commands are executed as parsing proceeds).

Spivak

Counterexamples where this is good for legibility.

    tar -xf archive.tar.gz

    tar -czf archive.tar.gz dir/

curryhoward

I'm guessing the comment was talking about examples like this:

    program -abcdefg.txt

Just from reading this, you can't tell where the flags end and the filename begins unless you have all the flags and their arities memorized.

DangitBobby

Worse than that, it adds an unstable convention to an otherwise stable interface. What if there used to be a, b flags and then they add a c flag? Who knows what your command does now. It's just bad all around.

Spivak

That’s always true even before you talk about option combining.

    python scr.py -a -v

Interpreting this command correctly requires you to know that python passes everything after the first positional argument to the script.

    ansible play.yml -v

Where ansible itself consumes the -v argument.

It gets even worse with subcommands.

    sudo -k command -a subcommand -v

Options parsing is nice because the arity of options is always 0 or 1.

cellularmitosis

> tar -czf dir/

You forgot the output filename.

The thing I like about tar is that you only need to learn two options: 'c' and 'x':

    tar c somedir | gzip > somedir.tar.gz
    cat somedir.tar.gz | gunzip | tar x

Spivak

Thank you! Fixed!

arp242

> short options with arguments - conjoining with other short options, for example - are actively harmful to legibility in scripts

Agreed, which is why you shouldn't do that :-) They're a lot easier to type though. I consider short and long arguments as solving different problems: the short are short so it's easy for interactive usage (especially for common options), and the long ones are for scripting (or discoverability, especially with things like zsh's completion).

So I usually use "curl -s" on the commandline, and "curl --silent" when writing scripts.

m463

I really do like argparse.

It will cleanly do just about anything you need done, including nice stuff like long/short options, default values, required options, types like type=int, help for every option, and even complicated stuff like subcommands.

And the namespace stuff is clever, so you can reference arg.debug instead of arg['debug']

onei

I always found argparse did argument parsing well enough but it felt clunky when you need something more complicated like lots of subcommands. I find myself using it exclusively when I'm trying to avoid dependencies outside the Python standard library.

My choice of argument parsing in Python is Click. It has next to no rough edges and it's a breath of fresh air compared to argparse. I recently recommended it to a colleague who fell in love with it with minimal persuasion from me. I recommend it highly.

[1] https://click.palletsprojects.com/en/7.x/

cb321

I feel like argh and plac preceded/inspired Click.

Also, it's not Python but in Nim there is https://github.com/c-blake/cligen which also does spellcheck/typo suggestions, and allows --kebab-case --camelCase or --snake_case for long options, among other bells & whistles.

onei

Spell check is something I'd love to see in Click. As complicated as git can be, I always liked the spell check it has for subcommands.

As for the different cases, I personally avoid using camel or snake case purely because I don't need to reach for the shift key. Maybe some people like it, but I find it jarring.

m463

That is a fascinating project. I will look at it.

What I wonder is -- there are some very nice third-party libraries for python, and I wonder why they don't make it into the standard library. It would be nice if there were ways of "a tide that raises all ships"

onei

I don't disagree, but I think there's a fair amount of friction when maintaining standard libraries rather than third party ones just because of the implied stability. Your versioning is tied to python itself so it's probably kind of dull having to work with an API that you can't improve simply by bumping the version of the library.

juped

Try Typer (https://typer.tiangolo.com/) sometime, which is built on Click.

onei

At a glance it looks a bit too simple. I didn't look to far into it, but it seems to be missing short options, prompting for options and using environment variables for options. As it's built on click, I'd guess you can call into click to do those, but at that point I don't see a major benefit typer is providing.

alkonaut

> program -iinput.txt -ooutput.txt

What good is that? Who wants to save a space? Given -abcfoo.txt I can't tell whether it's abcf oo.txt or -abc foo.txt? So that's a definite drawback, and the benefit is?

ibejoeb

Right, for all we know, that's 8 options and two duplicate. It's also harder to implement because now I have to keep symbol table (or I have to refer more frequently to it.) I can't see much of a utility argument here except terseness for its own sake. In that case, why don't we just pack our own bytes?

ohazi

> Go’s [...] intentionally deviates from the conventions.

Sigh

Of course if does.

programd

My impression is that nobody bothers with the Go flags package and most people use the POSIX compatible pflag [1] library for argument parsing, usually via the awsome cobra [2] cli framework.

Or they just use viper [3] for both command line and general configuration handling. No point reinventing the wheel or trying to remember some weird non-standard quirks.

[1] https://github.com/spf13/pflag

[2] https://github.com/spf13/cobra

[3] https://github.com/spf13/viper

akdor1154

Nobody.. except Hashicorp :(

DangitBobby

I always wondered who to blame for the single - long arguments in terraform.

trasz

Go -options sound like what X11 uses.

crehn

Well it does keep things simple, and as a side effect removes the cognitive burden of choosing a certain argument style (for both the developer and user).

DangitBobby

It definitely doesn't make things simpler. Maybe in a vacuum, it would have been simpler. But it's not in a vacuum, and it breaks with well established conventions, so we have to remember which special snowflake programs decided not to do what we expect.

kazinator

> When grouped, the option accepting an argument must be last ... program -abcooutput.txt

Good grief, no.

There may be some utilities out there which work this way, but it is not convention and should not be regarded as one.

Single letter options should almost always take an argument as a separate argument.

Some traditional utilities allow one or more arguments to be extracted as they are scanning through a "clump" of single letter options:

  -abc b-arg c-arg

Implementations of tar are like this.

Newly written utilities should not allow an option to clump with others if it requires an argument. Only Boolean single-letter options should clump.

Under no circumstances should an option clump if its argument is part of the same argument string. For instance consider the GCC option -Dfoo=bar for defining a preprocessor macro, and the -E option doing preprocessing only. Even if -Dfoo=bar is last, we don't want it to clump with -E as -EDfoo=bar --- and it doesn't.

But, in the first place, even if it did, we don't want to be looking to C compilers for convention, unless we are specifically making a C compiler driver that needs to be compatible.

desc

Some other commenters have mentioned environment variables as input.

IMO there are broadly two types of command: plumbing and porcelain. There's a certain amount of convention and culture in distinguishing them and I'm not going to try to argue the culture boundary...

For the commands which are plumbing (by whatever culture's rules), the following apply:

* They are designed to interact with other plumbing: pipes, machine-comprehension, etc

* Exit code 0 for success, anything else for error. Don't try to be clever.

* You can determine precisely and unambiguously what the behaviour will be, from the invocation alone. Under no circumstances may anything modify this; no configuration, no environment.

For the commands which are porcelain (by the same culture's rules, for consistency), the following apply:

* Try to be convenient for the user, but don't sacrifice consistency.

* If plumbing returns a failure which isn't specifically handled properly, either something is buggy or the user asked for something Not Right; clean up and abort.

* Environment and configuration might modify things, but on the command line there must be the option to state 'use this, irrespective of what anything else says' without knowing any details of what the environment or configuration currently say.

To make things more exciting, some binaries might be considered porcelain or plumbing contextually, depending on parameters... (Yes, everyone sane would rather this weren't the case.)

mmphosis

Do I add more to this code just for convention? The command line option parsing (or broken ParseOptions dependency) will become magnitudes larger and more complex than what the program does.

  usage = 0
  argc = len(sys.argv)
  if argc == 2 and sys.argv[1] == "-r":
   hex2bin()
  else:
   if argc == 2 and sys.argv[1].startswith('-w', 0, 2):
    s = sys.argv[1][2::]
   elif argc == 3 and sys.argv[1] == '-w':
    s = sys.argv[2]
   elif argc >= 2:
    usage = 1
   if usage == 0:
    try:
     width = int(s)
    except ValueError:
     print("Error: invalid, -w {}".format(s))
     usage = 1
    except NameError:
     width = 40
   if usage == 0:
    bin2hex(width)
   else:
    print("usage: mondump [-r | -w width]")
    print("       Convert binary to hex or do the reverse.")
    print("            -r reverse operation: convert hex to binary.")
    print("            -w maximum width: fit lines within width (default is 40.)")
  sys.exit(usage)

tom_

No, you take code away by using argparse! Handles all this GNU longopt and argument parsing stuff for you, and autogenerates the --help display. Probably something like this:

    import argparse

    def auto_int(x): return int(x,0) # http://stackoverflow.com/questions/25513043/

    def main(argv):
        parser=argparse.ArgumentParser()
        parser.add_option('-r',dest='reverse',action='store_true',help='reverse operation: convert hex to binary')
        parser.add_option('-w',default=40,dest='width',nargs='?',type=auto_int,help='maximum width: fit lines within width (default is %(default)s.)")
        options=parser.parse_args(sys.argv[1:])
        if options.reverse: bin2hex(options.width)
        else: hex2bin()

    if __name__=='__main__': main(sys.argv[1:])

misnome

I recently ran into a case I hadn’t seen before with python’s argparse. Multiple arguments in a single option, e.g. “—foo bar daz” with —foo set to ‘*’ swallows both bar and daz, where I would have expected to have to explicitly specify “—foo bar —foo daz” to get that behaviour. I guess this is a side effect of treating non-option arguments the same as dash-prefixed arguments, but I have no idea what the “standard” to expect with this is?

Otherwise, my main bugbear is software using underscore instead of dash for long-names, and especially applications or suites that mix these cases.

I really like the simplicity that docopt somewhat forces you into, which avoids most of these tricky edge cases, but am seeing less and less usage nowadays of it.

mixmastamyk

Hmm, if you configure an option to take all args following, why would one be surprised by that?

bschwindHN

I use clap/structop in Rust for all my CLIs ever since those libraries came out and it's just stupid easy to make a nicely functioning CLI. You define a data structure which holds the arguments a user will pass to your program and you can annotate fields to give them short and long names. You can also define custom parsing logic via plain functions if the user is passing something exotic.

At the end you can parse arguments into your data structure in one line. At that point the input has been validated and you now have a type safe representation of it. Well-formatted help text comes for free.

Daily Digest email

Get the top HN stories in your inbox every day.