Brian Lovin
/
Hacker News
Daily Digest email

Get the top HN stories in your inbox every day.

siraben

Since this is built on top of tree-sitter, it can be extended[0] to work with other languages as well, no matter how obscure, as long as a tree-sitter grammar exists. This IMO really highlights the power of having an ecosystem built around tools like tree-sitter because they allow for powerful dev UX tools to be more democratized. Excellent syntax highlighting, error recovery, linting, now tree-sensitive diffing can be provided to languages big and small.

[0] https://difftastic.wilfred.me.uk/adding_a_parser.html

diffxx

As impressive as the tree-sitter ecosystem is, there is a certain danger in building tooling on top of it. Unless tree-sitter is used as the parser in the actual language compiler, it is almost always going to be an approximation to the language grammar. This is somewhat risky if it is important that any kind of semantic analysis, including diffing, is correct. In the case of difftastic, what worries me is that it might be possible to inject malicious code into a repo by exploiting a bug in the tree-sitter grammar/parsing implementation such that the malicious code is hidden from difftastic because, for example, it is treated as whitespace when it actually isn't. This of course wouldn't be a problem if PRs are being handled by github, but what if down the road github starts offering difftastic (or something similar) as a diff option for PRs?

Admittedly, this is a somewhat far fetched scenario that I have posited but I am somewhat concerned that the proliferation of incorrect tree-sitter grammars is going to lead to the worst kind of problem down the road: tools that work great _most_ of the time.

kibwen

Isn't this the case for any tool that does semantic analysis, including any interesting feature of any IDE? Even in the realm of basic text editors, there's this classic exploit: if your editor and language disagree on what characters are allowed to end a line, you can smuggle lines of code into a program that appear to be commented out but are actually executed (or the inverse, lines that appear to be executed but are actually commented out).

naikrovek

in theory a compiler could have a mode which gives any tool results it can display as errors or whatever anyone needs. I think this was one of the original goals of the language server protocol, which I still do not understand the need for at all.

I don't know if any languages do this.

in reality damn near all languages are parsed easily enough that there is no difference.

titaniczero

My neovim is drooling over this!

> This IMO really highlights the power of having an ecosystem built around tools like tree-sitter because they allow for powerful dev UX tools to be more democratized

LSP is also an example of this and it is also an example of how dangerous it can be when these tools are backed by big companies: https://news.ycombinator.com/item?id=31760684

chii

Why is it dangerous?

If microsoft, tomorrow, pulled the plug on the pylance LSP and removed all access, and the only thing left is the opensource version, the world is still at a better position than if they hadn't invested any money or opensourced anything.

This is strictly different from google owning chrome, because with that ownership, they have the ability to dictate protocols for the web.

The LSP is a protocol, which now is quite widespread, and will continue to survive and be enhanced, regardless of microsoft's involvement or not. The fact that they can choose to make some language servers close-sourced sucks, but being able to access a closed source component is _still_ strictly better than having zero access in the first place.

titaniczero

Embrace, extend, and extinguish [1]. This is the extend phase.

Microsoft has the resources and ability to make better language servers which could be closed source now, so people end up switching to those tools, owned by Microsoft, with better language servers, killing other editors if they are not able to catch up.

E.g.: Why would a C# programmer use Neovim if the language server is worse than the one that VSCode has? (Which is now proprietary and closed-source). The difference right now is insignificant, but the future tools and features that they are adding to the new proprietary C# language server will not be available for other editors.

[1] https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguis...

runeks

tree-sitter is the wrong choice here IMHO, because it requires that most languages reimplement their parser in tree-sitter’s DSL, rather than reusing a compiler's existing parser.

A better interface would be a simple binary that produces JSON output from AST, and another binary that produces AST from the same JSON output. Then you'd do a diff on the JSON and converting it to AST before displaying. Then you’d only need to modify the compiler to print out the AST as JSON (and read it in as JSON), instead of reimplementing the parser.

trishume

This is a really cool example of tree diffing via path finding. I noticed that this was the approach I used when I did tree diffing, and sure enough looks like this was inspired by autochrome which was inspired by my post (https://thume.ca/2017/06/17/tree-diffing/).

I'm curious exactly why A* failed here. It worked great for me, as long as you design a good heuristic. I imagine it might have been complicated to design a good heuristic with an expanded move set. I see autochrome had to abandon A* and has an explanation of why, but that explanation shouldn't apply to difftastic I think.

dan-robertson

I think (maybe I’m wrong) that your graph searches correspond to diffing single lists and you can have an expensive diagonal step to recurse into two sublists whereas the tool in this post has extra nodes for every token and extra edges for inserting/deleting delimiters. That seems to be the biggest difference to me and I guess is what you mean by it being complicated to design a good heuristic for the expanded move set. I agree it sounds complicated. I think that my guess was that bigger graphs would make things harder but that isn’t a reason for A* to fail.

mfrw

Although, I do not have much to add; using `difftastic`[0] & `delta` [1] is a very cool combo to make _git_ a little more approachable for newbies like me.

I use delta as my daily driver but sometimes when I want the contextual info, switching to `env GIT_EXTERNAL_DIFF=difft git log -p --ext-diff` gives a better picture.

[0]: https://github.com/Wilfred/difftastic

[1]: https://github.com/dandavison/delta

lapser

I've been using delta for a long while, but this is the first time I've heard of difftastic. Why wouldn't you want the context all the time?

mfrw

I feel the visual eye-candy provided by delta far more appealing than what difft does. Also, delta is much more configurable and has nice binding for moving through hunks.

That's probably the only reason, otherwise, you are correct. I would want the context always.

WakiMiko

difftastic doesn't show whitespace or formatting changes

robin_reala

Fun fact: I thought of diffing programs as working out what has changed. The goal of diffing is actually to work out what hasn’t changed!

+1 insightful

grogers

My dream would be to have a three-way merge tool that worked like this at a semantic level. It feels like merges almost always have the information needed to automatically resolve, but our line-based tools are too simple to see it.

ruricolist

Resolve is another tree-sitter based tool that does this:

https://github.com/grammatech/resolve

werdnapk

I used to use Araxis Merge [1] years ago and found it worked very well.

[1] https://www.araxis.com/merge/index.en

swozey

Geez, I've been looking at diff programs and the personal licenses for them are so expensive. I get they're incredibly useful when you need them but I personally don't need a 3rd party differ very often. Kaleidoscope is $250ish IIRC.

The VSCode extension "Diff & Merge" give you the right/left arrows to merge lines if anyone is looking for a tool that does that. I haven't needed another one since I found that.

activitypea

Sublime Merge was pretty alright, don't know if it's still around

jereees

This is now supported without extensions in latest vscode

yboris

My favorite diff tool is diff2html - see the diff in your browser as HTML!

https://diff2html.xyz/

Install the CLI, run the command (alias diff='diff2html -s side') - I run this at least every time before committing to quickly see all I've done.

bufferoverflow

What's the advantage of seeing the diff in HTML?

Seems like an unnecessary extra step.

yboris

Visually easier on my eyes, easier to scroll, I can filter out `package-lock.json` file from the diff (with a command line argument). In the end it's a preference thing, not an "objectively better" thing.

Wilfred

Side-by-side diff displays work really great in a browser. You usually have more screen real estate, and you can offer a responsive UI.

yewenjie

One more tree-sitter based diffing tool - diffsitter

https://github.com/afnanenayet/diffsitter

X-Cubed

SemanticMerge is an existing commercial product that works in a similar fashion. I've found it much nicer to use than text-based diff tools.

https://www.plasticscm.com/semanticmerge/documentation/intro...

radicalbyte

One of the better merge tools but I'd only recommend it if your language of choice is supported. Otherwise, BeyondCompare is just as good.

soperj

I like BeyondCompare, as a free option, I really like Meld, it's similar.

ziftface

It looks like difftastic can support way more languages. I haven't used semantic merge (or even heard about it before this) so idk if its language support is somehow better though.

Gehinnn

It would be so cool if there was a json output, so that other tools (eg VS Code) can use this diffing algorithm! Thanks for explaining the algorithm!

ChadNauseam

It’s a bit out of date but my fork adds this functionality https://github.com/anchpop/difftastic

luke-stanley

This post reminded me of the Trail Of Bits post about Graphtage: https://blog.trailofbits.com/2020/08/28/graphtage/ Graphtage is written in Python and can be used as a library. But it seems Difftastic can be used as a git diff tool directly.

prirun

A friend (Hi Jeff!) wrote DiffMerge: https://sourcegear.com/diffmerge - another alternative diff & merge.

emsixteen

I use this occasionally on Mac, and it's pretty handy. Crashes/throws an error when closing though.

janaagaard

Is this something you can turn on be default for Git when working with others that don’t use Difftastic, or could that lead to some weird behaviors?

(I don’t know enough about the internals of Git to answer this myself.)

Sander_Marechal

IIRC the configurable diff tool is just to show diffs to the user, not for internal storage. So everyone on your project can use a different diff tool without problems.

mfrw

An analogy would be like of the editor. Think of this tool as the editor that one uses. It does not matter to git what editor one uses; similarly, it does not matter what diff pager one uses :)

Yes, you can turn it on without having any side-effects for others.

Wilfred

You can use difftastic as your default git diff tool, but you can also use it as an opt-in diffing tool. I recommend using it as an opt-in, but defining a git alias so you can do 'git difft'.

https://difftastic.wilfred.me.uk/git.html has docs.

tuetuopay

You can turn it on safely. Git does not store diffs (like a diff chain), but a snapshot of the files at each commit

Daily Digest email

Get the top HN stories in your inbox every day.

Difftastic, the fantastic diff - Hacker News