Get the top HN stories in your inbox every day.
siraben
diffxx
As impressive as the tree-sitter ecosystem is, there is a certain danger in building tooling on top of it. Unless tree-sitter is used as the parser in the actual language compiler, it is almost always going to be an approximation to the language grammar. This is somewhat risky if it is important that any kind of semantic analysis, including diffing, is correct. In the case of difftastic, what worries me is that it might be possible to inject malicious code into a repo by exploiting a bug in the tree-sitter grammar/parsing implementation such that the malicious code is hidden from difftastic because, for example, it is treated as whitespace when it actually isn't. This of course wouldn't be a problem if PRs are being handled by github, but what if down the road github starts offering difftastic (or something similar) as a diff option for PRs?
Admittedly, this is a somewhat far fetched scenario that I have posited but I am somewhat concerned that the proliferation of incorrect tree-sitter grammars is going to lead to the worst kind of problem down the road: tools that work great _most_ of the time.
kibwen
Isn't this the case for any tool that does semantic analysis, including any interesting feature of any IDE? Even in the realm of basic text editors, there's this classic exploit: if your editor and language disagree on what characters are allowed to end a line, you can smuggle lines of code into a program that appear to be commented out but are actually executed (or the inverse, lines that appear to be executed but are actually commented out).
naikrovek
in theory a compiler could have a mode which gives any tool results it can display as errors or whatever anyone needs. I think this was one of the original goals of the language server protocol, which I still do not understand the need for at all.
I don't know if any languages do this.
in reality damn near all languages are parsed easily enough that there is no difference.
titaniczero
My neovim is drooling over this!
> This IMO really highlights the power of having an ecosystem built around tools like tree-sitter because they allow for powerful dev UX tools to be more democratized
LSP is also an example of this and it is also an example of how dangerous it can be when these tools are backed by big companies: https://news.ycombinator.com/item?id=31760684
chii
Why is it dangerous?
If microsoft, tomorrow, pulled the plug on the pylance LSP and removed all access, and the only thing left is the opensource version, the world is still at a better position than if they hadn't invested any money or opensourced anything.
This is strictly different from google owning chrome, because with that ownership, they have the ability to dictate protocols for the web.
The LSP is a protocol, which now is quite widespread, and will continue to survive and be enhanced, regardless of microsoft's involvement or not. The fact that they can choose to make some language servers close-sourced sucks, but being able to access a closed source component is _still_ strictly better than having zero access in the first place.
titaniczero
Embrace, extend, and extinguish [1]. This is the extend phase.
Microsoft has the resources and ability to make better language servers which could be closed source now, so people end up switching to those tools, owned by Microsoft, with better language servers, killing other editors if they are not able to catch up.
E.g.: Why would a C# programmer use Neovim if the language server is worse than the one that VSCode has? (Which is now proprietary and closed-source). The difference right now is insignificant, but the future tools and features that they are adding to the new proprietary C# language server will not be available for other editors.
[1] https://en.wikipedia.org/wiki/Embrace,_extend,_and_extinguis...
runeks
tree-sitter is the wrong choice here IMHO, because it requires that most languages reimplement their parser in tree-sitter’s DSL, rather than reusing a compiler's existing parser.
A better interface would be a simple binary that produces JSON output from AST, and another binary that produces AST from the same JSON output. Then you'd do a diff on the JSON and converting it to AST before displaying. Then you’d only need to modify the compiler to print out the AST as JSON (and read it in as JSON), instead of reimplementing the parser.
trishume
This is a really cool example of tree diffing via path finding. I noticed that this was the approach I used when I did tree diffing, and sure enough looks like this was inspired by autochrome which was inspired by my post (https://thume.ca/2017/06/17/tree-diffing/).
I'm curious exactly why A* failed here. It worked great for me, as long as you design a good heuristic. I imagine it might have been complicated to design a good heuristic with an expanded move set. I see autochrome had to abandon A* and has an explanation of why, but that explanation shouldn't apply to difftastic I think.
dan-robertson
I think (maybe I’m wrong) that your graph searches correspond to diffing single lists and you can have an expensive diagonal step to recurse into two sublists whereas the tool in this post has extra nodes for every token and extra edges for inserting/deleting delimiters. That seems to be the biggest difference to me and I guess is what you mean by it being complicated to design a good heuristic for the expanded move set. I agree it sounds complicated. I think that my guess was that bigger graphs would make things harder but that isn’t a reason for A* to fail.
mfrw
Although, I do not have much to add; using `difftastic`[0] & `delta` [1] is a very cool combo to make _git_ a little more approachable for newbies like me.
I use delta as my daily driver but sometimes when I want the contextual info, switching to `env GIT_EXTERNAL_DIFF=difft git log -p --ext-diff` gives a better picture.
lapser
I've been using delta for a long while, but this is the first time I've heard of difftastic. Why wouldn't you want the context all the time?
mfrw
I feel the visual eye-candy provided by delta far more appealing than what difft does. Also, delta is much more configurable and has nice binding for moving through hunks.
That's probably the only reason, otherwise, you are correct. I would want the context always.
WakiMiko
difftastic doesn't show whitespace or formatting changes
robin_reala
Fun fact: I thought of diffing programs as working out what has changed. The goal of diffing is actually to work out what hasn’t changed!
+1 insightful
grogers
My dream would be to have a three-way merge tool that worked like this at a semantic level. It feels like merges almost always have the information needed to automatically resolve, but our line-based tools are too simple to see it.
ruricolist
Resolve is another tree-sitter based tool that does this:
werdnapk
I used to use Araxis Merge [1] years ago and found it worked very well.
swozey
Geez, I've been looking at diff programs and the personal licenses for them are so expensive. I get they're incredibly useful when you need them but I personally don't need a 3rd party differ very often. Kaleidoscope is $250ish IIRC.
The VSCode extension "Diff & Merge" give you the right/left arrows to merge lines if anyone is looking for a tool that does that. I haven't needed another one since I found that.
activitypea
Sublime Merge was pretty alright, don't know if it's still around
jereees
This is now supported without extensions in latest vscode
yboris
My favorite diff tool is diff2html - see the diff in your browser as HTML!
Install the CLI, run the command (alias diff='diff2html -s side') - I run this at least every time before committing to quickly see all I've done.
bufferoverflow
What's the advantage of seeing the diff in HTML?
Seems like an unnecessary extra step.
yboris
Visually easier on my eyes, easier to scroll, I can filter out `package-lock.json` file from the diff (with a command line argument). In the end it's a preference thing, not an "objectively better" thing.
Wilfred
Side-by-side diff displays work really great in a browser. You usually have more screen real estate, and you can offer a responsive UI.
yewenjie
One more tree-sitter based diffing tool - diffsitter
X-Cubed
SemanticMerge is an existing commercial product that works in a similar fashion. I've found it much nicer to use than text-based diff tools.
https://www.plasticscm.com/semanticmerge/documentation/intro...
radicalbyte
One of the better merge tools but I'd only recommend it if your language of choice is supported. Otherwise, BeyondCompare is just as good.
soperj
I like BeyondCompare, as a free option, I really like Meld, it's similar.
ziftface
It looks like difftastic can support way more languages. I haven't used semantic merge (or even heard about it before this) so idk if its language support is somehow better though.
Gehinnn
It would be so cool if there was a json output, so that other tools (eg VS Code) can use this diffing algorithm! Thanks for explaining the algorithm!
ChadNauseam
It’s a bit out of date but my fork adds this functionality https://github.com/anchpop/difftastic
luke-stanley
This post reminded me of the Trail Of Bits post about Graphtage: https://blog.trailofbits.com/2020/08/28/graphtage/ Graphtage is written in Python and can be used as a library. But it seems Difftastic can be used as a git diff tool directly.
prirun
A friend (Hi Jeff!) wrote DiffMerge: https://sourcegear.com/diffmerge - another alternative diff & merge.
emsixteen
I use this occasionally on Mac, and it's pretty handy. Crashes/throws an error when closing though.
janaagaard
Is this something you can turn on be default for Git when working with others that don’t use Difftastic, or could that lead to some weird behaviors?
(I don’t know enough about the internals of Git to answer this myself.)
Sander_Marechal
IIRC the configurable diff tool is just to show diffs to the user, not for internal storage. So everyone on your project can use a different diff tool without problems.
mfrw
An analogy would be like of the editor. Think of this tool as the editor that one uses. It does not matter to git what editor one uses; similarly, it does not matter what diff pager one uses :)
Yes, you can turn it on without having any side-effects for others.
Wilfred
You can use difftastic as your default git diff tool, but you can also use it as an opt-in diffing tool. I recommend using it as an opt-in, but defining a git alias so you can do 'git difft'.
tuetuopay
You can turn it on safely. Git does not store diffs (like a diff chain), but a snapshot of the files at each commit
Get the top HN stories in your inbox every day.
Since this is built on top of tree-sitter, it can be extended[0] to work with other languages as well, no matter how obscure, as long as a tree-sitter grammar exists. This IMO really highlights the power of having an ecosystem built around tools like tree-sitter because they allow for powerful dev UX tools to be more democratized. Excellent syntax highlighting, error recovery, linting, now tree-sensitive diffing can be provided to languages big and small.
[0] https://difftastic.wilfred.me.uk/adding_a_parser.html