Tracing: Structured logging, but better

andydote.co.uk

Daily Digest email

Get the top HN stories in your inbox every day.

zoogeny

One thing about logging and tracing is the inevitable cost (in real money).

I love observability probably more than most. And my initial reaction to this article is the obvious: why not both?

In fact, I tend to think more in terms of "events" when writing both logs and tracing code. How that event is notified, stored, transmitted, etc. is in some ways divorced from the activity. I don't care if it is going to stdout, or over udp to an aggregator, or turning into trace statements, or ending up in Kafka, etc.

But inevitably I bump up against cost. For even medium sized systems, the amount of data I would like to track gets quite expensive. For example, many tracing services charge for the tags you add to traces. So doing `trace.String("key", value)` becomes something I think about from a cost perspective. I worked at a place that had a $250k/year New Relic bill and we were avoiding any kind of custom attributes. Just getting APM metrics for servers and databases was enough to get to that cost.

Logs are cheap, easy, reliable and don't lock me in to an expensive service to start. I mean, maybe you end up integrating splunk or perhaps self-hosting kibana, but you can get 90% of the benefits just by dumping the logs into Cloudwatch or even S3 for a much cheaper price.

phillipcarter

FWIW part of the reason you're seeing that is, at least traditionally, APM companies rebranding as Observability companies stuffed trace data into metrics data stores, which becomes prohibitively expensive to query with custom tags/attributes/fields. Newer tools/companies have a different approach that makes cost far more predictable and generally lower.

Luckily, some of the larger incumbents are also moving away from this model, especially as OpenTelemetry is making tracing more widespread as a baseline of sorts for data. And you can definitely bet they're hearing about it from their customers right now, and they want to keep their customers.

Cost is still a concern but it's getting addressed as well. Right now every vendor has different approaches (e.g., the one I work for has a robust sampling proxy you can use), but that too is going the way of standardization. OTel is defining how to propagate sampling metadata in signals so that downstream tools can use the metadata about population representativeness to show accurate counts for things and so on.

_moog

> Newer tools/companies have a different approach that makes cost far more predictable and generally lower.

What newer tools/companies are in this category? Any that you recommend?

hooverd

I haven't used anything else, but I'll gladly shill for https://honeycomb.io.

mikeshi42

I think we fit in that bucket [1] - open source, self-hostable, based on OpenTelemetry and backed by Clickhouse DB (columnar, not time-series).

Clickhouse gives users much greater flexibility in tradeoffs than either a time-series or inverted-index based store could offer (along with S3 support). There's nothing like a system that can balance high performance AND (usable) high cardinality.

[1] https://github.com/hyperdxio/hyperdx

disclaimer (in case anyone just skimmed): I'm one of the authors of HyperDX

makeavish

Companies like https://signoz.io/ are Opentelemetry native and have very transparent approach to predictable pricing. You can self host easily as well.

alexisread

Any of the Clickhouse-based Otel stores can dump the traces to s3 for long-term storage, and can be self-hosted. I know the following use CH: https://uptrace.dev/ https://signoz.io/ https://github.com/hyperdxio/hyperdx

hosh

I have made use of tracing, metrics, and logging all together and find each of them have its own place, as well as synergies of being able to work with all three together.

Cost is a real issue, and not just in terms of how much the vendor costs you. When tracing becomes a noticeable fraction of CPU or memory usage relative to the application, it's time to rethink doing 100% sampling. In practice, if you are sampling thousands of requests per second, you're very unlikely to actually look through each one of those thousands (thousands of req/s may not be a lot for some sites, but it is already exceeding human-scale without tooling). In order to keep accurate, useful statistics with sampling, you end up using metrics to store trace metrics prior to sampling.

thangalin

> In fact, I tend to think more in terms of "events" when writing both logs and tracing code.

They are events[1]. For my text editor, KeenWrite, events can be logged either to the console when run from the command-line or displayed in a dialog when running in GUI mode. By changing "logger.log()" statements to "event.publish()" statements, a number of practical benefits are realized, including:

* Decoupled logging implementation from the system (swap one line of code to change loggers).

* Publish events on a message bus (e.g., D-Bus) to allow extending system functionality without modifying the existing code base.

* Standard logging format, which can be machine parsed, to help trace in-field production problems.

* Ability to assign unique identifiers to each event, allowing for publication of problem/solution documentation based on those IDs (possibly even seeding LLMs these days).

[1]: https://dave.autonoma.ca/blog/2022/01/08/logging-code-smell/

jameshart

But events that another system relies upon are now an API. Be careful not to lock together things that are only superficially similar, as it affects your ability to change them independently.

thangalin

Architecturally, the decoupling works as follows:

    Event -> Bus -> UI Subscriber -> Dialog (table)
    Event -> Bus -> Log Subscriber -> Console (text)
    Event -> Bus -> D-Bus Subscriber -> Relay -> D-Bus -> Publish (TCP/IP)

With D-Bus, published messages are versioned, allowing for API changes without breaking third-party consumers. The D-Bus Subscriber provides a layer of isolation between the application and the published messages so that the two can vary independently.

jameshart

Observability costs feel high when everything’s working fine. When something snaps and everything is down and you need to know why in a hurry… those observability premiums you’ve been paying all along can pay off fast.

pondidum

As other posters have mentioned, the incument companies rebranding to Observability definitely are expensive, because they are charging in the same way as they do for logs and/or metrics: per entry and per unique dimension (metrics especially).

Honeycomb at least charges per event, which in this case means per span - however they don't charge per span attribute, and each span can be pretty large (100kb / 2000 attributes).

I run all my personal services in their free tier, which has plenty of capacity, and that's before I do any sampling.

csomar

How does one break into the industry though? I worked in a project tangentially related and the problem is that sales were done by corporate sales man rather than on technicality. The companies buying the product didn't care because the people involved were making "deals". The company selling the product didn't care about making the product better because it was selling and having high AWS bills sounds like they were doing something (even though they were burning money).

BiteCode_dev

You don't have to keep traces for long though.

Log for long term, traces for short debut and analisys is a fine compromise.

layer8

> Log Levels are meaningless. Is a log line debug, info, warning, error, fatal, or some other shade in between?

I partly agree and disagree. In terms of severity, there are only three levels:

– info: not a problem

– warning: potential problem

– error: actual problem (operational failure)

Other levels like “debug” are not about severity, but about level of detail.

In addition, something that is an error in a subcomponent may only be a warning or even just an info on the level of the superordinate component. Thus the severity has to be interpreted relative to the source component.

The latter can be an issue if the severity is only interpreted globally. Either it will be wrong for the global level, or subcomponents have to know the global context they are running in to use the severity appropriate for that context. The latter causes undesirable dependencies on a global context. Meaning, the developer of a lower-level subcomponent would have to know the exact context in which that component is used, in order to chose the appropriate log level. And what if the component is used in different contexts entailing different severities?

So one might conclude that the severity indication is useless after all, but IMO one should rather conclude that severity needs to be interpreted relative to the component. This also means that a lower-level error may have to be logged again in the higher-level context if it’s still an error there, so that it doesn’t get ignored if e.g. monitoring only looks at errors on the higher-level context.

Differences between “fatal” and “error” are really nesting differences between components/contexts. An error is always fatal on the level where it originates.

Hermitian909

The OP is wrong, log levels are very valuable if you leverage them.

Here's a classic problem as an illustration: The storage cost of your logs is really prohibitive. You would like to cut out some of your logs from storage but cannot lower retention below some threshold (say 2 weeks maybe). For this example, assume that tracing is also enabled and every log has a traceId

A good answer is to run a compaction job that inspects each trace. If it contains an error preserve it. Remove X% of all other traces.

Log levels make the ergonomics for this excellent and it can save millions of dollars a year at sufficient scale.

abraae

> In addition, something that is an error in a subcomponent may only be a warning or even just an info on the level of the superordinate component.

Or, keep it simple.

- error means someone is alerted urgently to look at the problem

- warning means someone should be looking into it eventually, with a view to reclassifying as info/debug or resolving it.

IMO many people don't care much about their logs, until the shit hits the fan. Only then, in production, do they realise just how much harder their overly verbose (or inadequate) logging is making things.

The simple filter of "all errors send an alert" can go a long way to encouraging a bit of ownership and correctness on logging.

layer8

> - error means someone is alerted urgently to look at the problem

The issue is that the code that encounters the problem may not have the knowledge/context to decide whether it warrants alerting. The code higher up that does have the knowledge, on the other hand, often doesn’t have the lower-level information that is useful to have in the log for analyzing the failure. So how do you link the two? When you write modular code that minimizes assumptions about its context, that situation is a common occurrence.

chii

> When you write modular code that minimizes assumptions about its context, that situation is a common occurrence.

so your code isn't modular after all, because the code is _doing_ logging as a side-effect of the actual functionality.

The modularity of your code should mean that the outcome of the functionality is packaged into a bundle of data, and this bundle includes information about errors (or warnings) - aka, a status result.

The caller of this module will inspect this data, and they themselves will decide to log (or, if they are a module of their own, pass the data up again). This goes on, until the data goes into a logging layer - solely responsible for logging perhaps.

abraae

If the code detecting the error is a library/subordinate service then the same rule can be followed - should this be immediately brought to a human's attention?

The answer for a library will often be no, since the library doesn't "have the knowledge/context to decide whether it warrants alerting".

So in that case the library can log as info, and leave it to the caller to log as error if warranted (after learning about the error from return code/http status etc.).

When investigating the error, the human has access to the info details from the subordinate service.

SkyPuncher

I agree with your premise, but do consider debug to be a fourth level.

Info is things like “processing X”

Debug is things like “variable is Y” or “made it to this point”

BillinghamJ

I tend to think of "warning" as - "something unexpected happened, but it was handled safely"

And then "error" as - "things are not okay, a developer is going to need to intervene"

And errors then split roughly between "must be fixed sometime", and "must be fixed now/ASAP"

layer8

> I tend to think of "warning" as - "something unexpected happened, but it was handled safely"

It was handled safely at the level where it occurred, but because it was unusual/unexpected, the underlying cause may cause issues later on or higher up.

If one were sure it would 100% not indicate any issue, one wouldn’t need to warn about it.

BillinghamJ

That would indicate an issue - i.e. something we don't want. Just that it's not something where an engineer needs to go and mop up, and in theory would continue to operate correctly indefinitely. I guess correct as in - safe but not necessarily the most desirable behavior

fnordpiglet

Tracing is poor at both very long lived traces, at stream processing, and most tracing implementations are too heavy to run in computationally bound tasks beyond at a very coarse level. Logging is nice in that it has no context, no overhead, is generally very cheap to compose and emit, and with including transaction id and done in a structured way gives you most of what tracing does without all the other baggage.

That said for the spaces where tracing works well, it works unreasonably well.

riv991

I think Open Telemetry has solved the stream processing problem issue with span links[1]. Treating each unit of work as an individual trace but being able to combine them and see a causal relationship. Slack published a blog about it pretty recently [2]

[1] https://opentelemetry.io/docs/concepts/signals/traces/#span-...

[2] https://slack.engineering/tracing-notifications/

cschneid

When I worked at ScoutAPM, that list is basically the exact areas where we had issues supporting. We didn't do full-on tracing in the OpenTracing kind of way, but the agent was pretty similar, with spans (mostly automatically inserted), and annotations on those spans with timing, parentage, and extra info (like the sql query this represented in Active record).

The really hard things, which we had reasonable answers for, but never quite perfect: * Rails websockets (actioncable) * very long running background jobs (we stopped collecting at some limit, to prevent unbounded memory) * trying to profile code, we used a modified version of Stackprof to do sampling instead of exact profiling. That worked surprisingly well at finding hotspots, with low overhead.

All sorts of other tricks came along too. I should go look at that codebase again to remind me. That'd be good for my resume.... :)

https://github.com/scoutapp/scout_apm_ruby

phillipcarter

Hmmm, for long-lived processes and stream processing we use tracing just fine. What we do is make a cutoff of 60 seconds, which each chunk is its own trace. But our backend queries trace data directly, so we can still analyze the aggregate, long-term behavior and then dig into a particular 60 second chunk if it's problematic.

fnordpiglet

So, here are a few examples -

Suppose you have a long data pipeline that you want to trace jobs across. There are not an enormous number of jobs but each one takes 12 hours across many phases. In theory tracing works great here, but in practice most tracing platforms can’t handle this. This is especially true with tailed based tracing as traces can be unbounded and it has to assume at some point their time out. You can certainly build your own, but most of the value of tracing solutions is the user experience; which is also the hardest part.

On stream processing I’ve generally found it too expensive to instrument stream processors with tracing. Also there’s generally not enough variability to make it interesting. Context stitching and span management as well as sweeping and shipping of traces can be expensive in a lot of implementations and stream processing is often cpu bound.

A simple transaction id annotated log makes a lot more sense in both, queried in a log analytic platform.

alkonaut

I like a log to read like a book if it’s the result of a task taking a finite time, such as for example an installation, a compilation, a loading of a browser page or similar. Users are going to look into it for clues about what happened and they a) aren’t always related to those who wrote the tools b) don’t have access to the source code or any special log analytics/querying tools.

That’s when you want a log and that’s what the big traditional log frameworks were designed to handle.

A web backend/service is basically the opposite. End users don’t have access to the log, those who analyze it can cross reference with system internals like source code or db state and the log is basically infinite. In that situation a structured log and querying obviously wins.

It’s honestly not even clear that these systems are that closely related.

WatchDog

It’s a good distinction to make, logging for client based systems, is essentially UI design.

For a web app, serving lots of concurrent users, they are essentially unreadable without tools, so you may as well optimise the logs for tool based consumption.

mrkeen

> If you’re writing log statements, you’re doing it wrong.

I too use this bait statement.

Then I follow it up with (the short version):

1) Rewrite your log statements so that they're machine readable

2) Prove they're machine-readable by having the down-stream services read them instead of the REST call you would have otherwise sent.

3) Switch out log4j for Kafka, which will handle the persistence & multiplexing for you.

Voila, you got yourself a reactive, event-driven system with accurate "logs".

If you're like me and you read the article thinking "I like the result but I hate polluting my business code with all that tracing code", well now you can create an independent reader of your kafka events which just focuses on turning events into traces.

rewmie

> 3) Switch out log4j for Kafka, which will handle the persistence & multiplexing for you.

I don't think this is a reasonable statement. There are already a few logging agents that support structured logging without dragging in heavyweight dependencies such as Kafka. Bringing up Kafka sounds like a case of a solution looking for a problem.

lmm

> I don't think this is a reasonable statement. There are already a few logging agents that support structured logging without dragging in heavyweight dependencies such as Kafka. Bringing up Kafka sounds like a case of a solution looking for a problem.

If it's data you care about then you put it in Kafka, unless you're big enough to use something like Cassandra or rich enough to pay a cloud provider to make redundant data storage their problem. Logs are something that you need to write durably and reliably when shit is hitting the fan and your networks are flaking and machines are crashing - so ephemeral disks are out, NFS is out, ad-hoc log collector gossip protocols are out, and anything that relies on single master -> read replica and "promoting" that replica is definitely out.

Kafka is about as lightweight as it gets for anything that can't be single-machine/SPOF. It's a lot simpler and more consistent than any RDBMS. What else would you use? HDFS (or maybe OpenAFS if your ops team is really good) is the only half-reasonable alternative I can think of.

jimbokun

OK, but then how do you perform ad hoc queries on everything you logged to Kafka when it's time to debug an issue?

There are plenty of well known, battle tested solutions for solving that problem with old school logging.

mrkeen

> There are already a few logging agents that support structured logging without dragging in heavyweight dependencies such as Kafka.

What are they? Because admittedly I've lost a little love for the operational side of Kafka, and I wish the client-side were a little "dumber", so I could match it better to my uses cases.

ahoka

I think OP meant event sourcing.

rewmie

> I think OP meant event sourcing.

That is really besides the point. Logging and tracing have always been fundamentally event sourcing, but that never forced anyone ever at all to onboard onto freaking Kafka of all event streaming/messaging platforms.

This blend of suggestion sounds an awful lot like resume driven development instead of actually putting together a logging service.

bowsamic

How to get me to leave your company 101

mrkeen

I did write a pretty glib description of what to do ;)

That said, I've had conflicts with a previous team-mate about this. He couldn't wrap his head around Kafka being a source of truth. But when I asked him whether he'd trust our Kafka or our Postgres if they disagreed, he conceded that he'd believe Kafka's side of things.

crabbone

> The second problem with writing logs to stdout

Who on Earth does that? Logs are almost always written to stderr... In part to prevent other problems author is talking about (eg. mixing with the output generated by the application).

I don't understand why this has to be either or... If you store the trace output somewhere you get a log... (let's call it "un-annotated" log, since trace won't have the human-readable message part). Trace is great when examining the application interactively, but if you use the same exact tool and save the results for later you get logs, with all the same problems the author ascribes to logs.

OJFord

Loads of people, it drives me around the twist too (especially when there's inevitably custom parsing to separate the log messages from the output) but it happens, probably well correlated with people that use more GUI tools, not that there's anything wrong with that, just I think the more you use a CLI the more you're probably aware of this being an issue, or other lesser best practices that might make life easier like newline and tab separation.

FridgeSeal

I do, as does everyone at my work? Along with basically everyone I’ve ever worked with, ever?

Like, I develop cli apps, so like, what else would go to stdout that you suppose will interfere?

crabbone

Nothing will go to stdout! Nothing is the best thing you can have when it comes to program output. Easiest validation! This is also how all Unix commands work -- they don't write to stdout unless you tell them to. But, if there's nothing extraordinary happening during the program execution -- nothing is written.

But why would you write your own logs instead of using something built into your language's library? I believe Python's logging module writes to stderr by default. Go's log package always goes to stderr.

But... today I've learned that console.log() in NodeJS writes to stdout... well, I've lots another tiny bit of faith in humanity.

FridgeSeal

> This is also how all Unix commands work -- they don't write to stdout unless you tell them to

Ok? But as per my other comment, I’m not writing CLI apps, it’s mostly services and I have supporting services which harvest the logs from each containers stdout.

> But why would you write your own logs instead of using something built into your language's library?

I’m not writing my own logging setup? I am using the provided tools?? Every language logging library I’ve ever used writes to stdout?

Structlog in Python, nodejs obvs, all the Rust logging libraries I’ve ever used, I know you can configure Java/scala 3 million different ways (hello yes log4j lol), but all the Spark stuff I’ve written has logged to stdout.

FridgeSeal

Can’t edit this now, but this is supposed to say “I don’t develop cli apps”

dalyons

Being doing it for decade+, ever since the 12 factor app concept became popular. It’s way more common imho for web apps than stderr logging.

benreesman

As a historical critic of Rust-mania (and if I’m honest, kind of an asshole about it too many times, fail), I’ve recently bumped into stuff like tokio-tracing, eyre, tokio-console, and some others.

And while my historical gripes are largely still the status quo: stack traces in multi-threaded, evented/async code that actually show real line numbers? Span-based tracing that makes concurrent introspection possible by default?

I’m in. I apologize for everything bad I ever said and don’t care whatever other annoying thing.

That’s the whole show. Unless it deletes my hard drive I don’t really care about anything else by comparison.

hardwaresofton

I think there's an alternate universe out there where:

- we collectively realized that logs, events, traces, metrics, and errors are actually all just logs

- we agreed on a single format that encapsulated all that information in a structured manner

- we built firehose/stream processing tooling to provide modern o11y creature comforts

I can't tell if that universe is better than this one, or worse.

andrewstuart2

Traces are just distributed "logs" (in the data structure sense; data ordered only by its appearance in something) where you also pass around the tiniest bit of correlation context between apps. Traces are structured, timestamped, and can be indexed into much more debug-friendly structures like a call tree. But you could just as easily ignore all the data and print them out in streaming sorted order without any correlation.

Honestly it sounds like you're pitching opentelemetry/otlp but where you only trace and leave all the other bits for later inside your opentelemetry collector, which can turn traces into metrics or traces into logs.

hardwaresofton

So this is kind of what I was talking about but it's more than that -- if your default is structured logs (simplest example is JSON) then all you have to do is put the data you care about into the log.

So I'm imagining something more like:

   {"level":"info", "otlp": { "trace": { ... }}}

   {"level":"info", "otlp": { "error": { ... }}}

   {"level":"info", "otlp": { "log": { ... }}}

   {"level":"info", "otlp": { "metric": { ... }}}

(standardizing this format would be non-trivial of course, but I could imagine a really minimal standard)

Your downstream collector only needs one API endpoint/ingestion mechanism -- unpacking the actual type of telemetry that came in (and persisting where necessary) can be left to other systems.

Basically I think the systems could have been massively simpler in most UNIX-y environments -- just hook up STDOUT (or scrape it, or syslog or whatever), and you're done -- no allowing ports out for jaeger, dealing with complicated buffering, etc -- just log and forget.

phillipcarter

That's more or less the model Honeycomb uses. Every signal type is just a structured event. Reality is a bit messier, though. In particular, metrics are the oddball in this world and required a lot of work to make economical.

hardwaresofton

Ah thanks for noting this, I that's exactly the insight I mean here.

Yeah I think the worst case you basically just exfiltrate metrics out to other subsystems (honestly, you could kind of exfiltrate all of this), but the default is pipe heavily compressed stuff to short and long term storage, and some processors for real time... blah blah blah.

Obviously Honeycomb is actually doing the thing and it's not as easy as it sounds, but it feels like if we had all thought like this earlier we might have skipped making a few protocols (zipkin, jaeger, etc), and focused on just data layout (JSON vs protobuf vs GELF, etc) and figuring out what shapes to expect across tools.

dalyons

Is that really an alternate universe? That’s the universe that splunk and friends are selling, everything’s a log. It’s really expensive.

hardwaresofton

Splunk does have margins and I think they're quite high. Same with Datadog (see: all the HN startups that are trying to grab some of that space).

There's a big gap between what it takes for the engineering to work and what all these companies charge.

My point is really more about the engineering time wasted on different protocols and stuff when we could have stuffed everything into minimally structured log lines (and figured out the rest of the insight machinery later). Concretely, that zipkin/jaeger/prometheus protocols and stuff may not have needed to exist, etc.

ec109685

Once you have logs, you can index them in a variety of ways to turn them into metrics, traces, etc., but having logs as the fundamental primitive is powerful.

jeffbee

This is a great article because everyone should understand the similarity between logging and tracing. One thing worth pondering though is the differences in cost. If I am not planning to centrally collect and index informational logs, free-form text logging is extremely cheap. Even a complex log line with formatted strings and numbers can be emitted in < 1µs on modern machines. If you are handling something like 100s or 1000s of requests per second per core, which is pretty respectable, putting a handful of informational log statements in the critical path won't hurt anyone.

Off-the-shelf tracing libraries on the other hand are pretty expensive. You have one additional mandatory read of the system clock, to establish the span duration, plus you are still paying for a clock read on every span event, if you use span events. Every span has a PRNG call, too. Distributed tracing is worthless if you don't send the spans somewhere, so you have to budget for encoding your span into json, msgpack, protobuf, or whatever. It's a completely different ball game in terms of efficiency.

xyzzy_plugh

I will agree that conceptually logging can be much cheaper than tracing ever can, but in practice any semi-serious attempt at structured logging ends up looking very, very close to tracing. In fact I'd go so far as to say that the two are effectively interchangeable at a point. What you do with that information, whether you index it or build a graph, is up to you -- and that is where the cost creeps in.

Adding timestamps and UUIDs and an encoding is par for the course in logging these days, I don't think that is the right angle to criticize efficiency.

Tracing can be very cheap if you "simply" (and I'm glossing over a lot here) search for all messages in a liberal window matching each "span start" message and index the result sets. Offering a way to view results as a tree is just a bonus.

Of course, in practice this ends up meaning something completely different, and far costlier. Why that is I cannot fathom.

nithril

It is actually simpler to conceptualize the difference, one is stateless, the other one is stateful.

Actually structured logging exists since years like in Java https://github.com/logfellow/logstash-logback-encoder

hyperpape

I don't generally disagree, but using json for structured logs is a growing thing as well.

perpil

I was recently musing about the 2 different types of logs:

1. application logs, emitted multiple times per request and serve as breadcrumbs

2. request logs emitted once per request and include latencies, counters and metadata about the request and response

The application logs were useless to me except during development. However the request logs I could run aggregations on which made them far more useful for answering questions. What the author explains very well is that the problem with application logs is they aren't very human-readable which is where visualizing a request with tracing shines. If you don't have tracing, creating request logs will get you most of the way there, it's certainly better than application logs. https://speedrun.nobackspacecrew.com/blog/2023/09/08/logging...

ec109685

Stripe is big believer in request logs: https://stripe.com/blog/canonical-log-lines

ducharmdev

Minor nitpick, but I wish this post started with defining what we mean by logging vs tracing, since some people use these interchangeably. The reader instead has to infer this from the criticisms of logging.

ryanklee

I've never encountered this confusion anywhere, so I wouldn't ever think to dispel it. Which isn't to say that I disagree with the more general point that defining your terms is good thing.

In any case, the post itself (which is not long) illustrates and marks out many of the differences.

ducharmdev

I would guess that you're either not around junior engineers, or people are very good at hiding their confusion.

ryanklee

I wouldn't assert that the confusion is non-existent. But I think the audience for a post comparing technical differences between logging and tracing is unlikely a junior one.

But again, I do think the (brief) post marks out the differences throughout, so regardless, it still doesn't strike me as a problem here.

jlokier

I agree. I'm working with code that uses 'verbose "message"' for level 1 verbosity logs and 'trace "message"' for level 2 verbosity. Makes sense in its world, but it's not the same meaning as how cloud-devops-observability culture uses those words.

waffletower

There are logging libraries that include syntactically scoped timers, such as mulog (https://github.com/BrunoBonacci/mulog). While a great library, we preferred timbre (https://github.com/taoensso/timbre) and rolled our own logging timer macro that interoperates with it. More convenient to have such niceties in a Lisp of course. Since we also have OpenTelemetry available, it would also be easy to wrap traces around code form boundaries as well. Thanks OP for the idea!

Daily Digest email

Get the top HN stories in your inbox every day.