Get the top HN stories in your inbox every day.
lolinder
tzhenghao
Yup. I've personally used both YAML and TOML for configurations, much more the latter recently and can see pros and cons for both.
> How well suited are their syntactic choices to the community they're targeting?
Also, "best" practices. One could reduce the pain of the other, but by no means is the right solution to a deeper problem at hand. For example, if one has very deep and complex nesting for configs, TOML "may be a lot nicer" compared to YAML, but that doesn't mean using TOML will make all the config parsing problems go away. It just mask away code smell. Maybe time to check if they're overcomplicating configurations in general.
jnxx
> This is the major problem with most comparisons of config file formats: the actual semantic domain of a config file format is extremely limited [ ... ]
So, why not use Scheme ?
mindslight
Scheme lacks most syntactic affordances that imply semantics. Even if some of those implications are dead wrong, they're still useful.
Personally I think the right answer for configuration files is to define them in terms of a generic object model. A program could even support multiple formats (TOML+JSON+YAML). If a user dislikes all the supported formats or the file is generated with something like NixOS, it can be handled with straightforward conversion.
withinboredom
> A program could even support multiple formats
I invite you to check out Symfony where you configure your app using yaml, attributes in code, code itself, or a mix of all the above.
You will cry.
rini17
Having syntactic affordances for every nuance of semantics is what led to the current state of zoo. What is wrong with having trivial syntax and distinguishing semantics by labeling parts of the syntax tree with symbols?
undefined
undefined
lapinot
Some do, using s-expr as config files is pretty common in the ocaml ecosystem (ie dune).
HideousKojima
>Make white space significant and you'll frustrate people.
Or worse: make whitespace significant and strike a blow in the eternal tabs vs. spaces holy war at the same time, like YAML.
eternityforest
It's kinda crazy we have 2 kinds of whitespace.
Even the customizability argument makes less sense now, IDEs could just change the width of leading spaces, I'm not sure why they don't.
gabereiser
Isn’t the whole premise behind this article that, coming from Python where indentation is program structure, that TOML confuses the reader with syntax foreign to the reader?
Like a C++ developer crying foul because inheritance doesn’t exist in YAML.
giaour
Make that C++ developer's day by pointing out that YAML does support inheritance: https://dmitryrck.com/how-to-use-inheritance-in-yaml-files/
riwsky
Thankfully, YAML supports SFINAE
undefined
sundarurfriend
The first and the last ones, at least, are tradeoffs where TOML made the right decision for most users.
Not being DRY is a good thing in a config file - it makes it much easier to understand and work with just one section of the file (which is what you most often want to do), because the context information is right there without having to jump around and figure things out.
And whatever the downsides of syntactic typing are, requiring a schema file to go along with your config file is far more of a downside; it's one more point of potential failure, another thing to maintain and sync up and keep in your head, and not worth it for most use cases.
And that's the crux of it: it all depends on what you need from your markup language, what your use case is, today and through the lifetime of your project. "What's wrong with TOML" makes much less sense as a question than "What's wrong with TOML(/JSON/YAML/etc.) for _this project and its needs_".
nerdponx
I think the primary limitation with TOML is the restriction that in-line tables cannot cross multiple lines. This is not done for technical reasons, it's an aesthetic choice on the part of the designer. It's analogous to forbidding comments in JSON.
I love TOML and will continue to use it as my default choice for configuration files, because I think most applications simply do not need the power and flexibility of YAML, even if The outright safety problems are mostly resolved in YAML 1.2. But I do agree that the inability of the syntax to convey nested structure is a limitation and it definitely gets annoying in larger configuration files, such as pyproject.toml files that tend to accumulate in larger Python projects. I have considered just manually indenting nested table blocks, even though that would look pretty ugly and is decidedly non-standard.
masklinn
> I think the primary limitation with TOML is the restriction that in-line tables cannot cross multiple lines. This is not done for technical reasons, it's an aesthetic choice on the part of the designer.
I'm sure you'll be happy to know this is getting relaxed in toml 1.1, wherever that comes out (and the implementations adopt it): https://github.com/toml-lang/toml/issues/516
Though the difficulty will then be knowing whether a given piece of software uses 1.0 (and single-line tables) or 1.1 (and more flexible tables).
eternityforest
YAML has a big problem in that you can't work with it in standard tools.
Most every common GCedlanguage these days supports native JSONlike object, YAML can represent things outside of strings, lists, dicts, bools, numbers, and null.
Lack of nested structure is a positive in some applications. Flat is better than nested. I've seen way too many config files where someone says "add foo=3" to the file and you can't even figure out where in the structure it goes.
And worse, sometimes people reorganize things into options. They'll move all the stuff for one subcomponent into its own nested thing, and you can't configure it without knowing the full architecture.
With flat stuff you get an obvious single way to represent any given config options. Maybe not the nicest way, but it's obvious and unique.
hitchstory
Hi, I'm the author of this piece. Thanks for your comments.
>Not being DRY is a good thing in a config file - it makes it much easier to understand and work with just one section of the file (which is what you most often want to do), because the context information is right there without having to jump around and figure things out.
If the contextual information is relevant that's true. However, syntactic noise of the form of lots of [s, ]s and equal signs isn't necessarily relevant. A YAML file can exhibit identical information with fewer characters and that makes the files easier to read and maintain - especially if they are information dense.
>And whatever the downsides of syntactic typing are, requiring a schema file to go along with your config file is far more of a downside; it's one more point of potential failure
Schemas aren't technically required by strictyaml provided you're happy mapping everything to a dict, list and string, but they're recommended because they make it much easier to prevent something from going wrong and it means you can directly use the types you were expecting.
Schemas in config files are equivalent to static types that generate compiler errors in a program. If you can use them, it's an easy way to get your program to fail fast on invalid input and save on debugging time.
If you don't have a schema and some invalid data gets put into your config file, instead of getting an error that says "didn't expect key "ip addresses" on line 14" you tend to get a really cryptic error a bit later on when your program tries to get a key from a dictionary that doesn't exist.
This is an example of the principle of https://en.wikipedia.org/wiki/Fail-fast design.
camgunz
> However, syntactic noise of the form of lots of [s, ]s and equal signs isn't necessarily relevant.
I don't know that ': ' is any better than ' = '. I get being opinionated about it but this feels squarely in the realm of the subjective to me. Further, adding an errant '-' and accidentally creating another object is real common in YAML, which is something you can't do with TOML's lists. I think this washes out, tbh.
hitchstory
I actually tried to demonstrate this with numbers a while back. I tried taking a few random JSON files as a control and representing them with StrictYAML and TOML and the TOML varied from 30% to 100% longer.
There is an element of opinion here, but there is no question that equivalent TOML files are longer, and most of that is syntax.
It's much more pronounced when you have more than one or two levels of nesting. With 4 or 5 levels of object nesting TOML files grow huge, whereas YAML is still fine.
>Further, adding an errant '-' and accidentally creating another object is real common in YAML
Yep, this is one of the things that type safety helps with though. Similarly it's quite easy to mess up an indent in YAML, but a schema can catch that stuff.
aeurielesn
I don't find TOML files easy to read/understand.
Especially when I'm scrolling through a file, I encounter myself backtracking to understand again its structure.
taeric
Similarly, I don't find yaml easy to read/understand. XML had the curse of people trying to use every feature possible in most documents. And as much as I do prefer the "program" approach of emacs, I will make no defense of giant emacs config files, either.
dwattttt
> A YAML file can exhibit identical information with fewer characters and that makes the files easier to read and maintain - especially if they are information dense.
I don't think this is a particularly good yardstick. Code without comments is shorter than code with comments, but I wouldn't call comment-less code easier to read; the more information dense, the worse, really.
AndyKluger
Thanks for your work!
I first encountered strictyaml years ago and have used it, happily. I especially appreciate that you made clear arguments for what ought to be excluded from a configuration format itself, and how proper validation ultimately requires real code anyway.
I was always disappointed however that your project didn't amount to a formal specification (at least at the time, unsure if that's changed).
More recently I came to know and love NestedText, which seems very close to what a strictyaml spec could be/have been. I'm curious if you've engaged with that project/format, and what you think of it.
hitchstory
>I was always disappointed however that your project didn't amount to a formal specification (at least at the time, unsure if that's changed).
I would dearly love to do this. I would ideally like to work with somebody who can help me though because it's a lot of work and I struggle to find the time.
slowmovintarget
Doesn't YAML have the unfortunate issue of ambiguity in the variety of parsing and versions? (Edit: I see you're advocating for a new subset... never mind. :) )
If you want a "language" for expressing data (like configuration data), you might be interesting in having a look at EDN. https://github.com/edn-format/edn
Hendrikto
> A YAML file can exhibit identical information with fewer characters and that makes the files easier to read and maintain
According to that logic, binary files would be easiest to maintain and read, which is obviously bogus.
politelemon
I don't even think this is about DRY and possibly misunderstands the DRY principle. DRY is about having a single authoritative source of information ("Every piece of knowledge must have a single, unambiguous, authoritative representation within a system"). Repeating a portion of a key in a configuration file is not in violation, it would be if it were the value being repeated.
stared
I beg to differ. If there are indentions, it is easy to fold long lists. For TOML, it takes mental effort to check whether items are from the same list or another. Additionally, in TOML, there are multiple ways to write a list (unlike in JSON), which makes it harder to parse - at least for me.
taeric
The problem with that is that indentations within indentations are just obnoxious. It is far from uncommon to add an item at the wrong level in a large config file. Worst is when you have a large file that is showing several meaningful lines of indentation on one screen, where the roots of several of those levels is not visible.
hot_gril
> Not being DRY is a good thing in a config file
Also unit test code is expected to be a lot less DRY.
arp242
The (upcoming) TOML 1.1 will alleviate some of this; that example document could then be written as:
[params]
profile = {
name = "Gareth",
tagline = "..",
}
contact = {
enable = true,
list = [
{class = "email", icon = "fa-envelope"},
{class = "phone"},
]
}
The whole business with the syntax typing has no "one correct way" to do it. No matter what you do it will cause problems and headaches for someone at some point somewhere.> Dates and times, as many more experienced programmers are probably aware is an unexpectedly deep rabbit hole of complications and quirky, unexpected, headache and bug inducing edge cases. TOML experiences many of these edge cases because of this.
Eh? In the original text it links to three issues to back this up:
That first issue it links to is "failed to parse long floats like x = 0.1234567891234567891" and the third is a feature request for hex values (v = 0xff, not even a bug report). That has nothing to do with dates? The second issue did relate to dates, but was just a simple bug, not an "unexpectedly deep rabbit hole of complications and quirky, unexpected, headache and bug inducing edge cases".
This just seems repeating a tautology. I maintain a TOML implementation that sees some reasonable use. Dates have not been a huge source of bugs, confusion, or other issues. All you need is to be able to parse RFC 3339 style dates (and some things derived from that), which is usually just calling strftime() or whatever your language has for this.
I do think TOML had some bits I wouldn't have added (not dates though), but the feature sets and complexity of TOML and YAML and not even comparable; it's like comparing Iceland (pop: ~300k) to Ireland (pop: ~5M). Yes, they're both islands and both are small countries, yet the scale of their "smallness" is just completely different.
hddqsb
The inline table syntax is awesome! Details:
https://github.com/toml-lang/toml/blob/main/toml.md#inline-t...
ollien
This might convince me to start using TOML. I hate JSON for configs (not all parsers support comments, yes I know JSON5 exists), and TOML's table syntax really sucks. YAML has its flaws, but it fits the bill the best IME.
Now I just have to hope enough TOML tools support this syntax, lest I end up in the same boat as JSON5.
mdaniel
> The (upcoming) TOML 1.1
Out of curiosity, how does the deserializer know which "standard" to use?
AIUI one can pin a version of YAML via its directive: %YAML <https://yaml.org/spec/1.2.2/#681-yaml-directives> with a missing one implying 1.2 although (heh) the 1.1 version says that documents which are missing their yaml directive are implied to be 1.1 <https://yaml.org/spec/1.1/#YAML%20directive/> so ... versioning, it's hard!
arp242
There is nothing for this, which is probably fine. Previous thread on this: https://news.ycombinator.com/item?id=36023321
tgv
What's the difference with JSON in that example? No double quotes around keys, that seems to be about it. It seems more practical/readable than JSON if you have very limited need for nesting, but the old format would suffice then too.
arp242
In that specific example not too much, but obviously there's a whole bunch of differences between TOML and JSON, starting with the fact that you can add comments as has already been discussed, and many more.
rightbyte
> you can add comments
Given how many comment remover util functions I have written to make almost JSON config files proper JSON ... why could they not just have included those in the spec.
mardifoufs
Will python support the new version for pyproject.toml?
arp242
I would expect so, yes.
kzrdude
It already supports inline tables
AndyKluger
But the whitespace handling involved is an extra complication, at least the provided example fails to be parsed by `tomli`, which is following the currently release TOML spec: https://github.com/hukkin/tomli/issues/199
hprotagonist
> StrictYAML, by contrast, was designed to be a language to write readable 'story' tests where there will be many files per project with more complex hierarchies, a use case where TOML starts to really suck.
I can’t be the only one who feels this way; isn’t it this use case the thing that sucks? “plain text but not really” config formats that aren’t “code” but have special syntax but lack a debugger and handy IDE tools and you’re never really sure of what you’re doing … isn’t that the thing that sucks?
masklinn
That was also my first reaction opening the example, I’m perfectly fine with a half-assed programming language (embedding more programming languages) not working in TOML, I don’t think it works in yaml in the first place, and it’s exactly the sort of usual mess which makes me recoil at the sight of a yaml extension.
This specific example is like somebody saw cucumber and went “I think this should be a lot worse, and in an ecosystem which doesn’t want to come anywhere near that too”.
This would probably be half the size and actually comprehensible if it were just a pytest test file.
hitchstory
Hi, I'm the author of hitchstory.
With the stories in YAML two new use cases (which are demonstrated in the example projects) are enabled:
* Automatically updated how-to docs. I used to write these types of docs manually and if they existed at all they would ALWAYS get out of sync with the code and were painful to maintain manually. Now I have YAML files and a simple jinja2 template I can push out new markdown how-to docs on each new build - with snippets of JSON, screenshots from the app, whatever.
* Tests that rewrite themselves. E.g. if I have a REST API test of the form "call API x and expect y blob of json", I don't have to actually write that blob of json into the test, I just write the code that produces it and run the test in rewrite mode so it updates the "expected JSON" field with actual json. I can then eyeball it and 20 seconds later it's part of the test and part of the docs.
The productivity improvements from doing both of these things means that writing tests is cheaper so I do more of them. Having how-to docs for all scenarios is way cheaper so I now always have them.
These use cases are impossible with pytest. They are impossible with cucumber.
They would be too painful to maintain with regular (i.e. not type-safe) YAML and those stories have enough indents that they would be an epic unreadable mess of syntactic noise if they were built in something like TOML or JSON.
dharmab
The first thing reminds me of example tests in Go: https://go.dev/blog/examples
fishyjoe
> Tests that rewrite themselves.
That sounds like expect tests in OCaml [1]. I've found them quite a joy to work with and I'm surprised more languages don't have something similar.
[1] https://dev.realworldocaml.org/testing.html#expect-tests
skrebbel
The second thing reminds me of how Jest does snapshot testing: the first time you run it, it simply edits the JS source file with the result. For a test runner to edit test code feels weird at first but it works spectacularly well.
Terretta
Your way of thinking, shared transparently not just in your "why not" section but throughout, is a breath of fresh air in the miasma of grabbing a tech because dogma or hype.
It's not that I agree with all your choices. It's that I'll defend to the end your method of making them.
schmuelio
Yeah that use case really sounds like they're on the cusp of just making a DSL from a config language.
If you need it to be _that_ complex then just write your configs as code...
marcosdumay
I don't have any problem with complex "symbol vs literal" resolving in configuration languages. Unless your syntax is very weird, it should be very hard to confuse those.
That said, YAML's syntax is very weird. YAML just sucks. And any such implementation must necessarily be unityped (up to the point where the data is coerced into the configuration structure at your program) and completely preserve the original data.
TOML could be extended to support it. I don't think it's "tasteful", but I see no practical problems with it.
masklinn
> TOML could be extended to support it.
Seems doubtful as that's specifically something TOML was created not to support. If you want unityped ini files you can define that dialect of ini files.
myaccountonhn
I agree, I quite like https://dhall-lang.org/ for that reason. It strikes a good balance between features and being a config language.
baq
Eagerly waiting for a Yet Another Human Readable and Writable Graph (Which Is Mostly a Tree But Not Always) Serialization Language With Built In Schemas and Pure Functions, Maybe made with <3 for humans.
Actually, that, but without sarcasm. YAML is crap. TOML is crap. JSON is crap. INIs are actually fine for flat lists but major crap otherwise. Anything Turing-complete is completely inadequate crap for the purpose.
RedNifre
dkersten
I especially love the Integrant[1] version which uses the E from EDN (extensible) to add references:
{:adapter/jetty {:port 8080,
:handler #ig/ref :handler/greet}
:handler/greet {:name "Alice"}}
[1] https://github.com/weavejester/integrantmasklinn
I like edn but I’m not convinced it’s great for configuration.
First off, it shares the desire for extensibility with yaml (in the form of tags), which is a giant trap. And then it’s quite syntactically noisy.
dkersten
I love EDN (especially the integrant extensions) for configuration of the stack — that is, dependency injection and so on, the developer-focused configuration.
For user-facing configuration I still favour TOML. I think it’s a bit application dependent, sone applications (eg nginx) have complex configuration needs and for that it makes sense to use something more sophisticated, but for many user-facing config settings, a simpler-TOML would be a great fit. Basically just some basic key-value pairs that can be collected into groups. As the article states, the parsing types should perhaps be enforced by the parser not the written config.
eternityforest
Too many features that can't be represented in standard programming language constructs..
JSON-alikes are opinionated. They don't let you do stuff that requires a plugin for the parser to work with
jmaker
Dhall is superb, https://dhall-lang.org
37469920away
Thanks for that, it is interesting.
Is this actually something I want to see in a configuration script? Why not just use a scripting langauge and be done with it? I wonder if the safety features can't be replicated with rigorous testing of say python as config scripts instead of learning yet another programming language?
https://prelude.dhall-lang.org/Text/concatSep.dhall
I think this is the key fulcrum for me: "config is code", sure, but not the same kind of "code".
It -is- compelling to argue for statistically deterministic config code but my practical objection here is 'can we arrive at same safety using testing with a known language?'
Writing this has made consider whether configuration should be conceptually looked at as a database instead of "code". How many people even know how e.g. postgres stores its tables and why would modulo some performance niche would you care anyway?
It seems configuration management is a graph db query and update matter. Standardize on configuration query language (if necessary) and stop worrying how the damn thing is represented by the config management tool.
jmaker
Depends on your requirements. If your config complexity is getting beyond manageable, the benefit of something more reliable is apparent. Type safety clears lots of common bugs no test suite would be certain to filter out. How complex should your tests get? Do you want to take on that responsibility or rather delegate it to something that provides you certain guarantees? It’s all very subjective in the end.
I run my configs mostly as YAML in Consul and Vault, sometimes in Spring Cloud Config with a git backend. This way I have dynamic config evolution. But I prefer to generate those yaml files from Dhall to avoid unnecessary bugs. After years with Haskell, the syntax is very natural, too.
As for Postgres internals, they do matter if your data set keeps growing.
Xkcd covered standardization. YAML and JSON ASTs are graphs, YAML not necessarily a tree. JSON extensions also support references. As for the ops side, YAML has become a de facto standard, HCL is used with the HashiCorp tools. Nix has its own language.
It’s not about how it’s represented but how you express dependencies across config key nodes. It’s good to avoid repetition and have a syntax linter, a compiler even better. Small static configs are amenable to querying and writing. But you need to separate the writing from the querying the configs. With Dhall you write code to generate the actual config, whether as a Dhall AST or exported to YAML or JSON, with certain correctness guarantees upfront.
baq
> Why not just use a scripting langauge and be done with it?
I want my configuration to be guaranteed to halt. Turns out it's hard to not make anything useful accidentally Turing-complete!
Hendrikto
At this point, I‘d rather just use a proper language.
conradludgate
Maybe kdl[0]? It's a document language somewhere in between xml and yaml without all the crap of either IMO
[0]: https://kdl.dev/
alpaca128
I use KDL in a (not too complicated, yet) config file of a project and I like it a lot. Tree structure with attributes like XML but with less syntax than JSON. Nothing redundant but has basics like comments.
BiteCode_dev
Let'add another one: CUE
larschdk
INI-files are crap too, as there is no standard. Every program has it's own dialect, and automating around them can be a pain.
rapsey
INI files are toml
PennRobotics
I have a beef with YAML/TOML/JSON as an inline front matter (header) format for SSG (static site generator) posts, but that's because if I want to save a mostly Markdown SSG post on Github...
1. the post is previewed with Github's dialect of Markdown and not the SSG's (and definitely with none of the inline configuration applied)
2. the preview is still a Markdown document, so you get no benefits of syntax highlighting or auto-formatting w.r.t. the config header (short of opening in Vim and explicitly declaring the filetype or temporarily changing the extension in Github's editor)
3. you HAVE to put trailing spaces in the YAML/TOML/JSON or the preview pane crams everything into an unbroken paragraph
4. there's not a quick preview of how the configuration will parse, just specific workflows (live update, compile single page) that you can test and then modify as needed. This is either in the rich online editor or your own machine and will require console commands and a browser window
5. I still have to know all of the modifiable attributes as well as defaults, which will be in a separate document and probably not in a Ctrl+Space dropdown
-----
For point 5, it would be nice if configuration formats had completions for common editors and/or their own scripts:
* "generate big config file with all possible keys and default values"
* "condense modified config file so it only contains non-defaults (and hope the schema doesnt change hahaha)"
* "suggest a valid fix for a currently invalid config file"
-----
Sure, this all isn't a direct criticism of TOML, but inlining configuration is a great-to-okay idea that is simply poorly executed. It is extremely unfriendly to non-technical users; I can fully understand why someone would pay a few hundred a year for a WYSIWYG templated website builder to just handle everything.
WorldMaker
I'm surprised no one has thrown in raw S-Expressions, yet, in this thread. That's an HN perennial favorite. Great at trees, decent at graphs. You can easily go Turing complete or not on the whim of any Lisp at hand and the hammer and forge of macros until your heart's content.
tikhonj
I interned at a company that used an s-expression format as an alternative to JSON and it was great—much better than any mainstream format for config files and human-readable representations for API data. It was great at representing structured data (including tagged variants, so full algebraic data types) and even pretty good for text markup.
It also has best-in-class editor support thanks to paredit :)
The only real downside is the hassle of convincing people to use something weird and non-standard, which is really more a problem with people than with the format.
ducktective
Nickel[1] maybe? Though I'm not sure about "for humans" part :)
thayne
> It's very verbose. It's not DRY. It's syntactically noisy.
I don't completely disagree with this. However, in most cases TOML is used, it isn't that much of a problem.
And I actually like that the full key is repeated. When you have several layers of nested mappings, it can be hard to determine exactly where the current value is in the hierarchy. Especially if the top level key is above the current screen of text. It can also make it easier to search for a specific key. IMO, this is a case where more verbosity and repetition makes it more readable.
That said, it seems a little arbitrary to me that inline tables don't allow newlines within them. If they did, then if you didn't like repeating the keys, you could use inline tables.
> TOML's hierarchies are difficult to infer from syntax alone
This is a little subjective, and depends on the actual data represented in the config.
But in general, my experience is that when you have several layers of nesting, and the only indication of the hierarchy is indentation, it can be a little hard to follow where a specific value fits in the hierarchy. See above.
And I disagree that meaningful indentation is "generally considered a good idea". I won't enumerate the pros and cons here, as it has been discussed a lot elsewhere, but it is definitely controversial, and subjective.
> Overcomplication: Like YAML, TOML has too many features
This section lists exactly one feature that it thinks TOML shouldn't have. Maybe dates shouldn't have been included, but it isn't anywhere close to the complexity of YAML.
bemusedthrow75
I'm not sure it's clear from the article -- only the URL -- that the writer is also the author/maintainer of StrictYAML.
This article has been written in a way that (most likely inadvertently) implies a measure of distance from StrictYAML.
dagw
One of the more interesting config formats I've come across was an application that used an Excel file. Once you get over the horror of such a terrible decision, it was actually a quite interesting choice that allowed a fair few advantages. First of all each config subcategory was on a separate sheet making easy to navigate and find what you where looking for. You could use formulas to relate different config options (If you wanted A to be 20% of B you just set that in a formula). You could use drop-down for fields where there were only a limited number of valid values. You could include as much comments and documentation as you wanted (including diagrams and images) as long you only wrote in unused cells. And finally, my favourite, when configuring colours, instead of typing in RGB or hex values you simply changed the colour of the cell to the colour you wanted.
Now I would obviously never ever recommend doing this, but it was certainly an interesting and eyeopening experience.
macNchz
Too many encounters with Excel’s “smart” date parsing would make me very concerned about using it this way.
https://support.microsoft.com/en-gb/office/stop-automaticall...
Rygian
That page is such a gem.
> "make it easier to enter dates. For example, 12/2 changes to 2-Dec"
12/2 is obviously the 12th of February in my locale. But I need to keep Excel in English as a company policy, so this is not only unhelpful, it's outright wrong.
> Unfortunately there is no way to turn this off.
How does Microsoft justify this choice?
macNchz
I’ve never understood why they don’t provide the option to disable it, it’s not like Excel is a sleek, minimalist piece of software with strong opinions and limited configuration.
I also love how the support article describes a behavior of their own software as “very frustrating”.
fluidcruft
At my co-op decades ago we used to use Excel "templates" as masters to generate/maintain text config files (and iirc a few C headers). You would save as text to make it usable. The grid layout, ability to highlight/color/border/style etc and use formulas and plot was very helpful.
ics
Underrated technique. I’ve used it similarly to generate scripts and it works great when used with care (error checking generally must happen at a different stage).
btreecat
That's just begging to be an actual database.
Imagine if you said everything with SQLite instead of Excel, and all of a sudden your just talking about structured config in a database. Not new, not crazy, and generally a decent practice.
dagw
The big difference is that every Windows computer (in a professional environment) comes with a very nice GUI tool for easily editing Excel files. A tool that basically everybody know how to use. The same cannot be said for SQLite.
SQLite is great for storing application state and config options set from within the app, but it is a pretty terrible format for end users to edit.
johannes1234321
While I'm not sure I'd be happy about "end users" to edit the Excel files as config. Somehow they get the cells mixed up and you get utter mess. And then excel confuses some value for a date and stores some other mess ...
For a somewhat trained audience however it can be quite interesting for some specific problem domain ...
btreecat
>SQLite is great for storing application state and config options set from within the app, but it is a pretty terrible format for end users to edit.
I think you are conflating the file with the workflow. A proper UI is the solution to making something not "terrible for end users to edit".
eternityforest
Why doesn't Microsoft just add SQLite to Excel, so your formulas can query it?
harperlee
Well, that's a complete apples-to-oranges proposal with completely different requirements.
With the original solution, you have an autocontained file that virtually any user knows how to edit, structure, expand, version, email, compare, discuss...
The level of user knowledge that you need for a similar solution based on a standalone SQLite file that you can version is another different world, e.g. to relate two values you would need to perhaps create a view or a trigger. And even with the most knowledgeable user you would still lack functionality such as simply pasting an image as a means of documentation and be able to see it, or WYSIWYG colors.
sofixa
> version, compare
Gonna have to disagree with you here. Very few people know how to version, compare, let alone if there are multiple collaborators, Excel files. Is there even a decent way of doing that outside of Excel Online / Google Sheets?
btreecat
Fundamentally they need a tabular data store. Most everything else is a nice to have/usability point.
SQLite is also an auto contained file. There are similar tabular GUI tools that could let you interact with sqlite using a similar workflow. Users knowing that thing is not inherent, they had to be trained on it, and they can be retrained as they will be for other workflows and business tools.
Remember, you still need an external app (Excel) to open it's files, the files themselves are just data, exactly like sqlite. So you could just make an excel plugin to interface with SQLite.
SQLite is as version-able as an excel file, as in not very with standard tools.
Why are you pasting images into excel? Doesn't matter, sqlite handles that fine actually. https://www.sqlite.org/fasterthanfs.html
The main argument against sqlite is that you would need to build an interface or figure out how to train folks on existing tools. That's not a huge argument against it in my experience, it's a strong social/political one in many orgs but rarely a technical issue.
michaelbuckbee
Now I just want a spreadsheet style front end to SQLite and I'm nerd sniped into trying to figure out how to do formulas (triggers I suppose).
speed_spread
Congratulations, you're about to reinvent MS Access / DBase / FoxPro... 30 years later.
masklinn
For the most simplistic row-wise formulas, sqlite has generated columns.
I’d assume the issue will be that an sql table is not free form, you can’t randomly decide to write in some other cell.
lucumo
Were there any downsides? Because this sounds kind of awesome.
dagw
Once you got used to it, honestly not really, other than needing Excel (not a huge deal since it was a Windows only application). I've no idea what the config parsing code looked like, but with the right library I doubt it was worse than any other non-trivial config parsing code. Mainly it just felt very wrong to my Unix, everything must be a text file, brain.
sowbug
Never thought I'd get a chance to tell this story.
Early 1990s, college internship. The company did presentations for clients, like many do. They had an unusual way of presenting data that required using actual protractors to draw circles and curves, with pencil, on otherwise computer-generated charts. They read numbers from Excel spreadsheets and plotted them on paper.
I was shocked, to say the least. I proposed writing a program that read Excel spreadsheets and emitted the graphics. They loved the idea, especially from a summer intern.
So I wrote a letter to Microsoft asking for documentation of the Excel file format. A week later I got a thick envelope with a photocopied manual completely describing the format. I remember the word BIFF throughout. I wrote the program, it worked great, and I even negotiated a hefty lump-sum payment to sell it to them at the end of the summer.
It left me with a very positive impression of Microsoft as a developer-friendly company. Makes sense; developers are their platform's customers, and they're good at serving their customers.
arethuza
There are some pretty awesome libraries for reading and working with Excel files - Aspose.Cells being one I used for years - basically a headless re-implementation of most of Excel usable via an API.
throwaway290
Well, xlsx is just a zipped dir of text (XML) files, so you can think of it as sort of following the .d pattern...
andrew_eu
For better or worse I created a very similar system, but using Google Sheets instead of Excel files and fetched with the GSheets API. In that case the configuration was gigantic (many of the configuration values were product-decisions, and it ran in hundreds of different environments with different tweaks), so the tabular structure made navigating things very natural. It also had the advantage of structurally highlighting how environments differed. Doing it with Google Sheets came with some extra nice benefits: online sharing, versioning, access control down to specific ranges, etc.
Basically every engineer who joined the team thought it was a unforgivable blemish on the system, yet it survived a few years with no major issues, long enough for the team to build an internal backoffice and port the whole sheet structure into a proper CRUD API.
hot_gril
Biggest downside I've discovered by doing this is that there's not a very good way to diff separate versions of one sheet. Best I've done is export to CSV and use a generic text diff.
nonethewiser
Most of these sound like things a native programming language can do. Without being a damn excel file. So it’s basically an argument for making your config files JavaScript, Python, etc.
dagw
You're missing the point. Power came not from Excel the file format, but Excel the GUI tool for editing Excel files. You cannot have drop-downs and colour pickers and separate tabs and embedded images in a single python file without writing a a whole custom GUI config tool.
nonethewiser
Python can and often is edited by GUI tools with color pickers and dropdowns. IDEs such as Vscode have these.
Also, I said “most.” Not, “there exist no exceptions.”
mnstngr
Having a visual way to configure this makes it much more accessible to non-programmers, with error checking available through the host (Excel, in this case), while also reducing the eng effort in building this.
At Google, many internal tools use Sheets as their source of truth for config data, and it works really well.
nonethewiser
> Having a visual way to configure this makes it much more accessible to non-programmers
I agree completely but thats not the point he appears to be making. He never stated this was the use case and he reiterated that it was a bad idea which should never be done.
Regardless, I’m saying those arent really unique advantages to excel. They just look unique compared to json, toml, yaml, etc.
LeonenTheDK
I think the difference there though is that an Excel file is a lot more approachable to non-technical folks (even superficially). Depends on what's being configured and by whom though.
speed_spread
Excel sheets are an underappreciated tool for sharing technical info with analysts. They're easy to write and read from both dev and analyst side. They can be safely zipped, archived, emailed, saved on a USB drive by the user without any additional programming. Editing in Excel makes the expected schema clear. It's an excellent data interface.
WeAddValue
I use a Google Sheet as it can be accessed from anywhere. Even my non-techie business folks can make changes (they feel at home in spreadsheets). I even added a menu button to the gsheet to launch a re-build (it is a static site on Netlify; the build fetches it's config/data from the gsheet).
kagevf
Did they version the spreadsheets? Or maybe converted them to text and versioned the text format?
Tainnor
I feel that as long as you use it for some simple configuration, you can use JSON, YAML, .properties files, TOML, whatever.
The problem IMHO is that we're using "configuration" files for things that aren't configuration. The "story" example from the blog post illustrates this. I find it hard to read YAML files that are dozens or hundreds of lines long.
Furthermore, once your "configuration" starts being so long, there's usually going to be enough repetition that you want to extract some duplication. YAML does have some facilities for this (anchors), unlike some other formats, but they're extremely limited.
So what happens is that different tools using YAML all start designing their own mechanism for sharing behaviour. It's all usually very ad-hoc, has edge cases and may not do things in the way you expect them. It also forces you to learn the specific rules for these facilities instead of allowing you to reuse your general programming knowledge.
On top of it all, YAML is essentially just a structureless key/value data structure. You can add schemas, but as far as I know, this isn't really standardised and editor support is... variable. In the worst case, you don't get any indication that you've configured something wrong. This is also part of the reason why I think that significant whitespace is OK for a programming language (still not a fan of it though), but bad for a configuration format, because bad indentation in a program either won't parse or will lead to obvious runtime errors, whereas bad indentation in a YAML file might just mean that a key isn't being set even though you think it should be.
For authors of tools that consume YAML, this means writing a lot of custom validation logic instead of relying on standard techniques like type systems.
I think we're on the wrong track and essentially just repeating XML's mistakes (just slightly less verbose, but also without schemas). We should rather use the programming constructs we know, e.g. by leveraging internal DSLs (I think that's part of the reason why Ruby was popular for tools like Chef for a short period, why Jenkins uses Groovy and Gradle now uses Groovy or Kotlin - these languages make internal DSLs easy). If we're worried about Turing completeness, maybe Dhall or something like it is the answer. But 400 line long YAML files with custom "!reference" tags that my editor doesn't understand doesn't seem like the solution.
seanhunter
This exactly. When the author is complaining that the configuration syntax doesn't support DRY you know something has gone wrong and configuration isn't really to blame.
hk1337
I've thought this for as long as I can remember. People overcomplicate the config file and try to make the one config file to rule them all.
I like TOML, I started to look into using Hugo over Jekyll though and the TOML seems weirdly abstract and difficult to follow.
fomine3
Great wrap up.
NoboruWataya
It is the least bad configuration format I have found. Granted I have only ever used it for fairly simple projects. But every config format is plagued with issues. A bit like programming languages, the fundamental problem is that they need to be easily understandable by both humans and computers which is an impossible problem to truly solve for any non-trivial use case. For example, TOML is criticised for verbosity but a lot of the abstractions that are used to implement DRY in a programming context may make the configuration confusing and unintuitive for non-programmers.
IshKebab
Jsonnet or JSON5 are much better than TOML or YAML.
Both are much easier to read, and don't have the footguns of YAML or StrictYAML.
I would generally say JSON5 is more appropriate because it is simpler, but Jsonnet does have some neat features and its IDE support is much better.
hot_gril
At least general programming languages like Python or JS are well-understood by many programmers. More than makes up for not being as specialized as a DSL.
AndyKluger
Have you looked at NestedText?
lr4444lr
I can't be the only person who thinks the monumental effort spent on config formats is bike-shedding.
JSON is good enough for anything I've done. Not perfect, but no serious flaw that can't be fixed by just adding a simple app-specific post-process step that I will inevitably do for any other format anyway. JSONSchema gives us some typing sanity.
Can we just move on already to more interesting problems? It's not like git fulfills every VCS wish I've ever had either, but I have to move on. Projects and libs that introduce new config formats that continually remake the wheel, whose quirks have to be learned, are not helping my net productivity.
</rant>
ihateolives
> JSON is good enough for anything I've done.
I want comments in config files.
ok_computer
A horrible workaround I use is a blank redundant key and a leading // in my string to draw my eye to it. This only preserves the last comment in my python dictionary but I only use comments to work in the json file.
{
“”:”// comment here”,
“Entry”:[-1,0,2],
“”:”// next comment”,
“Flag”:true
}timmytokyo
And trailing commas on final list elements and object properties.
skrebbel
JSONC is exactly that, JSON with comments. Works fine, eg typescript’s tsconfig file is JSONC and I’ve yet to find a problem with it.
hddqsb
I can relate. But after using JSON for a while (in files that I edit by hand), I found that I really want comments and trailing commas (which leads to https://nigeltao.github.io/blog/2021/json-with-commas-commen...). Next I'd probably want multiline strings (leading to https://github.com/json5/json5).
But if you use those extensions, all your tooling breaks.
(Aside: I think the real bike-shedding would start when you want to add some syntax for raw string literals, e.g. heredocs; it's one of those features that feels redundant, until the day when you really need it and you can't bear the pain of repeatedly escaping and unescaping.)
rewmie
> JSON is good enough for anything I've done. Not perfect, but no serious flaw that can't be fixed by just adding a simple app-specific post-process step that I will inevitably do for any other format anyway.
If someone wants JSON with extra features like comments and typingz they are better off switching to Ion.
GuB-42
JSON is good enough for anything I've done.
...
</rant>
Except for closing comments, for that, you need XML.eviks
Yeah, don't know why some people don't want to settle on bad formats without such basics of human economics like comments and keep improving
mikece
Given how little I deal with config files compared to the rest of my work I prefer formats that are obvious, even if verbose, to those with sneaky syntax. I'll take JSON or even XML over TOML any day.
dystroy
I previously argued that TOML wasn't good enough in this blog post https://dystroy.org/blog/hjson-in-broot/ where I show an example of problem which frequently hurts my users and leaves them lost without even understanding that the problem is in how they wrote their TOML.
I moved the configuration of several of my programs to Hjson. There are still problems but they're less puzzling. Hjson isn't ideal either but might still be the best configuration format we have today.
eviks
hjson is indeed more H vs json
You've mentioned in the blog that ": it's meant to be written by humans, read and modified by other humans, then read by programs", but is it possible for apps to (roundtrip)-edit those configs preserving all the human syntax intact? It's rather common for apps to e.g. have font size changed, but unfortunately also common to destroy human formatting in the process
dystroy
This is theoretically possible, and I actually toyed with the idea.
I didn't do it in my deserializer because of the big value you have in Rust in being compatible with serde and that wouldn't be. But this would be interesting, probably as an side library.
ftrobro
Roundtrips without destroying comments or formatting is supported in the JavaScript, Go and C++ implementations of Hjson, but not in the other implementations (I think).
throwawee
Thank you for introducing me to Hjson. I've been using simple colon delimited lists which seem to be, hilariously enough, already valid Hjson.
dystroy
A lot of formats are Hjson compatible, notably JSON, and also what users wrote thinking it was JSON but they forgot some quotes or had a trailing comma so the JSON parser refuses it while the Hjson one is perfectly happy.
moogly
Hjson is also the format I use for all my things. Strikes a good balance.
Get the top HN stories in your inbox every day.
Honestly, all of these arguments feel pretty subjective to me.
This is the major problem with most comparisons of config file formats: the actual semantic domain of a config file format is extremely limited, which means the main thing left to disagree over is syntax, which is highly subjective and extremely difficult to get people to agree on.
Add too many syntactic features and a lot of people will disavow you for being too complicated. Add too few and you'll be missing someone's pet feature. Make white space significant and you'll frustrate people. Require extra characters to delineate and you'll frustrate another group.
It's worth noting that this article is primarily talking about TOML in the context of the Python ecosystem, and I think that's a healthier way to talk about config file formats: How well suited are their syntactic choices to the community they're targeting?