Tree Sitter and the Complications of Parsing Languages

Daily Digest email

Get the top HN stories in your inbox every day.

matklad

> Well, because it’s gosh-darn hard to do it the right way.

I think this overstates the difficulty. This of course depends a lot on the language, but for a reasonable one (not C++) you can just go and write the parser by hand. I’d ballpark this as three weeks, if you know how the thing is supposed to work.

> it doesn’t have to redo the whole thing on every keypress.

This is probably what makes the task seem harder than it is. Incremental parsing is nice, but not mandatory. rust-analyzer and most IntelliJ parsers re-parse the whole file on every change (IJ does incremental lexing, which is simple).

> The reason (most) LSP servers don’t offer syntax highlighting is because of the drag on performance.

I am surprised to hear that. We never had performance problems with highlighting on the server in rust-analyzer. I remember that for Emacs specifically there were client side problems with parsing LSP JSON.

> Every keystroke you type must be sent to the server, processed, a partial tree returned, and your syntax highlighting updated.

That’s not the bottleneck for syntax highlighting, typechecking is (and it’s typechecking that makes highlighting especially interesting).

In general, my perception of what’s going on with proper parsing in the industry is a bit different. I’d say status quo from five years back boils down to people just getting accustomed to the way things were done. Compiler authors generally didn’t think about syntax highlighting or completions, and editors generally didn’t want to do the parsing stuff. JetBrains were the exception, as they just did the thing. In this sense, LSP was a much-needed stimulus to just start doing things properly. People were building rich IDE experiences before LSP just fine (see dart analyzer), it’s just that relatively few languages saw it as an important problem to solve at all.

chubot

I don't think you can write a production quality parser for any "real" language in 3 weeks ... You can get something working in 3 weeks, but then you'll be adding features and fixing bugs for a year or more.

If you take something like Python or JavaScript, the basics are simple, but there are all sorts of features like splatting, decorators, a very rich function argument syntax, etc. and subtle cases to debug, like the rules of what's allows on LHS of assignment. JavaScript has embedded regexes, and now both languages have template strings, etc. It's a huge job.

It's not necessarily hard, but it takes a lot of work, and you will absolutely learn a lot of stuff about the language long after 3 weeks. I've programmed in Python for 18 years and still learned more about the language from just working with the parser, not even implementing it!

And this doesn't even count error recovery / dealing with broken code ...

Kranar

I don't see what is challenging about any of what you mentioned, furthermore parsing a language is not the same thing as verifying that what is parsed is a semantically valid. Python is almost a context free language with the exception of how it handles indentation. With indentation taken into account, the entire language can be parsed directly from the following grammar using something like yacc:

https://docs.python.org/3/reference/grammar.html

JavaScript is not a strictly context free grammar either, but like Python the vast majority of it is and the parts that are not context free can be worked around. Furthermore the entire grammar is available here:

https://262.ecma-international.org/5.1/#sec-A

It isn't trivial to work around the parts that aren't context free, but it's also nothing insurmountable that requires more than 120 hours of effort. The document explicitly points out which grammar rules are not context free and gives an algorithm that can be used as an alternative.

Parsing is really not as challenging a job as a lot of people make it out to be and it's an interesting exercise to try yourself and get an intuitive feel for. You can use a compiler compiler (like yacc) if you feel like it to just get something up and running, but the downside of such tools is they do very poorly with error handling. Rolling out a hand written parser gives much better error messages and really is nothing that crazy. C++ is the only mainstream language I can think of that has a grammar so unbelievably complex that it would require a team of people working years to implement properly (and in fact none of the major compilers implement a proper C++ parser).

For statically typed languages things get harder because you first need to parse an AST, and then perform semantic analysis on it, but if all you need is syntax highlighting, you can skip over the semantic analysis.

ModernMech

> but if all you need is syntax highlighting, you can skip over the semantic analysis.

I wish we could move toward semantics highlighting.

I will chime in with you though and agree, as a writer and teacher of parsers, it doesn’t have to be that hard. In fact, if you implement your parser as a PEG, it really doesn’t have to be much longer than the input to a parser generator like YACC. Parser combinators strongly resemble the ebnf notation, it’s almost a direct translation. That’s why parser generators are possible to write in the first place. But in my opinion they are wholly unnecessary, since true grammar itself is really all you need if you’ve designed your grammar correctly. Just by expressing the grammar you’re 90% of the way to implementing it.

matklad

The thing is, for IDE purposes “production ready” has a different definition. The thing shouldn’t have 100% parity with the compiler, it should be close enough to be useful, and it must be resilient. This is definitely not trivial, but is totally within the reach of a single person.

> And this doesn't even count error recovery / dealing with broken code ...

With a hand written parser, you mostly get error resilience for free. In rust-analyzer’s parser, there’s very little code which explicitly deals with recovery. The trick, is, during recursive descent, to just not bail on the first error.

jhck

Those are some very nice insights, thanks for sharing them! Can you recommend a good resource on writing a parser by hand that doesn't bail on the first error? Or would you instead suggest studying the source code for e.g. the rust-analyzer parser?

natrys

> I remember that for Emacs specifically there were client side problems with parsing LSP JSON.

I am given to understand that this is not a problem any more (since Emacs 27.1). Before that, the JSON parser was written in elisp which is a slow language (though somewhat mitigated with recent native-compilation). But now Emacs has preference to just use native bindings (jansson), and afaik this had solved most of the performance grievances raised by LSP clients.

bsder

> I think this overstates the difficulty. This of course depends a lot on the language, but for a reasonable one (not C++) you can just go and write the parser by hand.

I don't agree. Newer languages are all being designed with the constraint that the grammar should be easy to parse and not require indefinite lookahead and full compilation to get back on track after an error.

That's a big change from the C/C++ heritage.

It's no coincidence that "modern" languages (call it the last 10 or so years) tend to have things like explicit variable assignment (let-statement-like) and delimiters between variable and type, for example.

zesterer

> Newer languages are all being designed with the constraint that the grammar should be easy to parse

I think that says less about the difficulty of parsing and more that language designers have realised that 'easy to parse' is not incompatible with good readability and terse syntax. In fact, the two go hand in hand: languages that are easy for computers to understand are often easy for users to understand too.

patrec

This has nothing to do with old or new and everything with both C and C++ being serious aberrations in programming language design. Most languages not directly influenced by C (new or old) simply don't have these bizarre issues. Also a lot of languages are becoming significantly harder to parse as time goes on (python for example).

bsder

> Most languages not directly influenced by C (new or old) simply don't have these bizarre issues

I don't agree. Lisp is "easy" to parse, but difficult to add structure to. Tcl similarly. Typeless languages are now out of favor--everybody wants to be able to add types.

Perl is a nightmare and probably undecidable. Satan help you if you miss a period in COBOL because God sure won't. FORTRAN is famous for it's DO LOOP construct that would hose you.

About the only language that wasn't hot garbage to parse was Pascal. And I seem to recall that was intentional.

jiggawatts

I had an interesting experience making a simple "Wiki" style editor for a web app back around 2008 or so. To my surprise, even an ANTLR-generated JavaScript parser could easily handle keystroke-by-keystroke parsing and fully updating the entire DOM in real time, up to about 20-30KB of text. After 60KB the performance would drop visibly, but it was still acceptable.

A hand-tuned Rust parser on a 2021 machine? I can imagine it handling hundreds of kilobytes without major issues.

Still, there's some "performance tuning itch" that this doesn't quite scratch. I can't get past the notion that this kind of things ought to be done incrementally, even when the practical evidence says that it's not worth it.

BobbyJo

> This is probably what makes the task seem harder than it is. Incremental parsing is nice, but not mandatory. rust-analyzer and most IntelliJ parsers re-parse the whole file on every change (IJ does incremental lexing, which is simple).

Glances at the memory usage of Goland in a moderately sized project and weeps

deng

> I think this overstates the difficulty. This of course depends a lot on the language, but for a reasonable one (not C++) you can just go and write the parser by hand. I’d ballpark this as three weeks, if you know how the thing is supposed to work.

Having a parser which generates an AST is just the first step. Then, you actually need to implement all the rules of the language, so for instance the scoping rules, how the object system works, any other built-in compound/aggregate types, other constructs like decorators, generics, namespaces or module systems, and on and on and on. Depending on the language, this will usually be the main work.

And then of course there's dynamic typing - if you want to enable smart completions for a dynamically typed language, you need to implement some kind of type deduction. This alone can take a lot of time to implement.

loup-vaillant

If you want syntax highlighting, the AST is enough to generate pretty colours for the source code. If you want semantic highlighting… sure, that's another story entirely. And even then you don't necessarily have to do as much work as the compiler itself.

And don't even try to be smart with dynamically typed languages, it cannot possibly be reliable, short of actually executing the program. If your program are short enough you won't need it, and if you do need such static analysis… consider switching to a statically typed language instead.

lenkite

Rust-analyzer uses Salsa - incremental computation library that uses memoization.

https://github.com/rust-analyzer/rust-analyzer/blob/master/d...

zesterer

Interesting choice to reply to matklad, one of rust-analyzer's primary authors, to explain how it works.

matklad

Nah, it’s a totally valid question: I indeed didn’t clarify where incrementality starts to happen.

lenkite

I slapped myself in the face.

matklad

Yeah, to clarify, memoization happens after parsing. So for syntax highlighting we have a situation where from-scratch parsing is faster than incremental typechecking.

lenkite

Thanks for explaining and making rust-analyzer!

afranchuk

I was under the impression that rust-analyzer (and more generally LSP) provides augmentative (contextual) syntax highlighting, whereas most of the highlighting still comes from editor-specific configuration. Is this not the case? If so I would be thrilled; as someone authoring a custom language right now it has been very frustrating to not be able to provide a single source of syntax highlighting for all popular editors.

matklad

rust-analyzer highlights everything, I have an empty vs code theme for it somewhere. But yeah, in general LSP highlighting is specified in augmentative way.

aidos

Before this conversation is railroaded by talk about language servers, as the article points out, tree sitter tends to need to be a bit closer to the environment to be effective.

There’s still work to do, but having tree sitter in neovim feels like a great step forward.

omnicognate

Yes, it's more for syntax highlighting where you don't want the lag of an external server and don't need the deep language analysis needed for diagnostics, refactoring, etc. I'm not sure what other use cases it would be superior to LSP for, but I'm sure there are some.

clktmr

It can also be used for text editing, e.g. changing, deleting, swapping of function arguments or any other text object defined by the language syntax.

yakubin

Cursor movement as well.

mickeyp

Author here. Yes, both are very useful, with some overlap of purpose, but work well together.

aidos

Thanks for the article. Even though I'm not active on the development side in either editor, I love the idea that people are toiling away on these same sorts of enhancements in both environments (and I get the benefit in neovim).

smcameron

> Semantic Bovinator

Heh. A long time ago I wrote a video game[1] somewhat similar to Williams Defender, and casting about for some sort of "theme" for the game, I hit upon the "editor wars", the ancient storied battle between vi and emacs. You are ostensibly "vi", (a little spaceship vaguely reminiscent of the Vipers from Battlestar Galactica) cruising through system memory, evading system processes, GDB instances, etc trying to recover your ".swp" files. How to represent Emacs? Obviously, via a giant blimp! and I could display all sorts of messages on the side of the blimp, singing the praises of Emacs, and disparaging fans of vi. And the Emacs blimp had a "memory leak", which meant that pieces of the xemacs source code would literally leak out of the back end of the blimp, with the letters floating lazily away, like smoke. So that meant I had to take a look at the xemacs source, dig through it and try to find some funny bits to put in. Of course, "semantic bovinate" jumped out at me.[2]

[1] https://github.com/smcameron/wordwarvi [2] https://github.com/smcameron/wordwarvi/blob/master/wordwarvi...

krylon

That is gorgeous! Thanks for sharing!

dgellow

Checkout the project page here: https://tree-sitter.github.io/tree-sitter/

Quite a lot of languages are already supported, it's really nice to see. I might have a use for such a library for a personal project :)

You can play around with the playground here: https://tree-sitter.github.io/tree-sitter/playground

kieckerjan

I suppose that these days I am one of the few professional programmers who has an active dislike of syntax highlighting. I find it immensely distracting. The only stuff I allow the highlighter to touch are my comments (I turn them bold) and I consider this a somewhat frivolous indulgence.

(I appreciate the complexity of the problem, btw)

catskul2

To each their own, and fortunately most(all?) editors allow such features to be turned off.

On the other hand, I find the "frivolous indulgence" perspective extremely obnoxious along with the related implication of moral or technical superiority of not using syntax highlighting.

As a side note, the way it helps many people who prefer it has some fascinating cog-psych underpinnings: https://en.wikipedia.org/wiki/Visual_search

Sometimes I wonder if those who don't prefer it might have some synesthesia which might allow their brain hardware to provide what the syntax highlighting does for the rest of us.

IshKebab

Yes it's pure egotism. "I don't need silly colours to code, not like those noobs."

You get the same attitude a lot for things like autocomplete and even for static typing.

orf

I guess it turns out actively depriving yourself of relevant information at a glance isn’t that popular, no.

mst

While I'm aware that OP and I are in a minority - there is a cognitive overhead to having that information surfaced at all times when you may be trying to focus on something at e.g. the method level rather than the individual syntactic element level, and if that cognitive overhead exceeds the utility of having that information available, the sensible answer is to turn the highlighting off.

If I could have some sort of focus follows mind where highlighting automatically happens commensurate to what level of granularity I'm currently thinking about the code at I would be extremely interested, but absent "focus follows mind" it's a trade-off that everybody has to make for themselves.

Some people prefer to highlight almost everything, some almost nothing, some people find it helpful for some languages/tasks but not for others.

It's similar IME to the extent to which preferred debugging styles (printf versus interactive versus hybrid versus situational choices) are also something people have to figure out, and, well, different people are different, and that's neither a bad thing nor an avoidable thing.

nusaru

I wonder if perhaps this is also a generational thing. Programmers from before syntax highlighting became popular would be less likely to prefer it, no? I’m not even sure if programmers from current/recent generations ever prefer not to have syntax highlighting, but I’m genuinely curious if there are such people out there.

azeirah

Not sure which editor I'm thinking about, but I do remember the exact feature you're describing implemented in one I used a while ago.

Ie, the paragraph (or block of code) your cursor is focused on is visible, the rest of the code is blurred out.

Kinrany

The IDE could temporarily turn off all syntax highlighting outside the node that has the cursor.

kieckerjan

The way I see it, most syntax highlighting is actively adding mostly irrelevant information to the cognitive load of programming: stuff that should be obvious if you know the language. It does as little for my understanding as a novel in which, say, every proper noun was printed in red.

I can imagine more useful highlighting than color coding the types of the symbols encountered. Lighting up the active scopes. Giving the same hue to names that look like each other. There are probably highlighters out there that do that. But "simple" syntax highlighting is still the norm.

orf

The same argument could be made for seeing anything in colour:

> most colours are actively adding irrelevant information to the cognitive load of existing. It should be obvious that apples and red and the sky is blue.

That’s silly, because it does add relevant information. Obviously it’s a spectrum - too many colours can hide information, but when used appropriately it’s fine.

Also everyone is different. Perhaps your brain gets distracted by the colours more than the majority of people.

mijoharas

> Lighting up the active scopes

As you had guessed a little later, there are a few different emacs packages that do this. One of them is "rainbow parentheses" that gives every bracket a different colour (remember that emacs supports lisp, so differentiating between lots of different parentheses is arguably more useful in emacs than any other editor). [0].

Another one is highlight parentheses [1] which highlights all parens that enclose the cursor position, and gives a darker colour to those "further away" from the cursor.

[0] https://github.com/Fanael/rainbow-delimiters

[1] https://sr.ht/~tsdh/highlight-parentheses.el/

undefined

[deleted]

ReleaseCandidat

> Giving the same hue to names that look like each other.

Emacs' 'Rainbow Identifiers' does that. I like it.

    https://github.com/Fanael/rainbow-identifiers

turminal

Define relevant

AkshitGarg

While I don't fully disable syntax highlighting, I use a minimal theme [0,1] that only has highlighting for comments, strings and globals. It reduces eye strain for me, and I never find myself relying on highlighting to navigate through code.

LSPs provide an "outline" which can be very useful to navigate through code. I find "jump to symbol" function in my text editor to be faster than scanning all of the code to find the line.

Also most themes dim the comments, but IMO if something in the code needed an explanation, it should be brighter, not dimmer.

[0]: https://github.com/tonsky/sublime-scheme-alabaster

[1]: https://github.com/gargakshit/vscode-theme-alabaster-dark

llimllib

> Also most themes dim the comments, but IMO if something in the code needed an explanation, it should be brighter, not dimmer.

That makes me crazy! I use base2tone, which is not nearly as minimal as your theme but more than most, and I modify the comments to be bright.

chriswarbo

Syntax highlighting is pretty redundant. Some interesting alternative uses of colour information are given at https://buttondown.email/hillelwayne/archive/syntax-highligh... (e.g. colouring different scopes, or different imports)

I also like the idea of using colour to distinguish different identifiers, e.g. https://wordsandbuttons.online/lexical_differential_highligh...

https://medium.com/@evnbr/coding-in-color-3a6db2743a1e

https://zwabel.wordpress.com/2009/01/08/c-ide-evolution-from...

Derbasti

A few years ago I switched my color theme to something very simple, just as an experiment.

Somehow I never found a need to change that. I highlight comments, keywords, and strings. Comment and string highlights are helpful if they contain code-like text, to make them obviously not-code. Keywords give some structure to the text.

Everything else is frivolous to me. Books do not highlight verbs in green, either.

xyzzy_plugh

> Books do not highlight verbs in green, either.

While I will not argue with your general point -- I also don't really need highlighting and I read a lot of plaintext code -- I wonder about this.

Would this make languages easier for non-native speakers? Would improve comprehension?

It's funny that the industry spends so much time on syntax highlighting for programming languages, when humanity's written languages are arguably more complex and difficult to parse and master.

unhammer

> Would this make languages easier for non-native speakers? Would improve comprehension?

When I've been trying to learn languages, I can typically part-of-speech tag unknown words quite easily (common prefixes/suffixes/word length/sentence position give lots of information – and some of this is shared across languages as well). The comprehension difficulty is nearly always due to content words I haven't seen before (or have forgotten).

Derbasti

I think the point is that books do highlight things: Headlines, italics, Capitalization. Just not silly technicalities like parts of speech.

NoGravitas

Not to bikeshed on this, but I have a pretty strong preference for minimalist syntax highlighting. I'm currently using tao-themes in emacs: matching light and dark themes that are grayscale or sepiatoned, and mostly use character properties like bold or italic along with a few shades of gray. Much more calming than the usual "angry fruit salad on black" programmer themes, but also providing more intuitive information than no syntax highlighting.

jpe90

Thanks for the recommendation! I've been on the lookout for a good monochrome theme, this looks great to me w/ boxes off

harrisfarris

I feel the same way. Never understood what the point of highlighting certain keywords or if something is a type or a function would be, it's all obvious from the grammar and where things are positioned anyway. And When I read code I want to read all of it, not draw any particular attention to "if" or "else".

undefined

[deleted]

robert-brown

Keyword highlighting is explicitly called out as an antipattern in the book Human Factors and Typography for More Readable Programs, which I highly recommend.

brabel

I would normally respond that, as others have pointed out, you're basically saying you prefer to "hide" information that, to most people, is relevant (is this a keyword, a global or local variable, a type, a method, a static function...), but I've noticed that when I'm doing code review, using the shitty BitBucket interface which shows everything red or green, without any code highlighting, actually helps me a little bit to focus on the changeset as opposed to what the code is actually doing in general. This is helpful because the changeset is what I care about when doing a review (what's different than before is the first question, with understanding what code is actually doing comes second)... Later, I might need to look at the code in my IDE with proper highlighting to better understand what the changed code is actually doing in more detail, but that's rarely needed (unless it's comprehensive changes).

So, it occurred to me that whether syntax highlighting is actually useful depends somewhat on the context, what are you trying to do?!

I suppose it's easy to extend that realization to people who are different and might feel overloaded by information more easily, so I can sympathize with what you're saying (hope this doesn't sound condescending, I am just trying to say people can have very different cognition overload levels, regardless of how capable they actually are in general).

grenoire

I am in love with language servers, the quality of life improvement is just unreal.

5e92cb50239222b

Wait until you try a fully featured "real" IDE. The features language servers provide are only some of the many things that IDE users have had for literally decades.

omnicognate

It's kind of hilarious that programmers, who learn again and again the value of decoupling and cohesion, fell so hard for the idea of an Integrated Development Environment. There's nothing about syntactic/semantic code analysis, to pick one example, that requires it to be packaged along with a particular text editor in a single big blob.

Ironically, the most successful IDEs today, the Jetbrains ones, are demonstrations of this. They are built out of reusable components that are combined to produce a wide range of IDEs for different languages.

LSP and DAP aren't perfect, but they're a huge step in the right direction. There's no reason people shouldn't be able to use the editor of their choice along with common tooling for different languages. The fact that IDEs had (for a while) better autocomplete, for example, than emacs wasn't because of some inherent advantage an IDE has over an editor. It's because the people that wrote the language analysis tools behind that autocomplete facility deliberately chose to package them in such a way that they could only be used with one blessed editor. It's great to see the fight back against that, and especially so to see Microsoft (of all people) embracing it with LSP, Roslyn, etc.

gmueckl

The technical design isn't the user experience. IDEs are an integrated user experience. It literally doesn't matter to the user how nicely decoupled everything is or isn't under the hood if the end results are indistinguishable.

One point in favor of tight integration and against LSPs is that editing programs isn't like editing unstructured text at all and shouldn't be presented as such. There are tons of ways in which the IDE UX can be enhanced using syntactic and semantic knowledge of programs. Having a limited and standardized interface between the UI and a module providing semantic information will just hamper such innovation.

mst

DAP being the Debug Adapter Protocol described here? https://microsoft.github.io/debug-adapter-protocol/

azeirah

I think the primary reason why IDEs are generally better than maximally customised editors like vim, emacs, sublime or vscode and whatnot is pretty simply put: money.

People buy IDE -> money goes to improving the IDE -> IDE gets better

People download one of 6 competing open-source plugins -> a couple of people improve it a little -> 3 years pass, the author loses interest -> someone else else reinvents the wheel, there are now 7 competing open-source plugins 3 of which are good but not maintained anymore.

Great features require time, I just don't see non-commercial work succeeding here.

Doesn't mean it's not possible to create commercial fantastic open-source standalone language tools, it's just not happening for some reason. Probably just because most businesses are still hesitant to open-source their core business?

grenoire

I'm old enough to have used IDEs, the issue is that my job involves dealing with multiple different languages and markup files. In turn, a general purpose editor with language servers just suits my workflow better.

smw

Try a jetbrains ide? They handle pretty much any language you can think of, including fun things like lua with ERB substitutions.

riffraff

surely any decent IDE will handle that nicely?

I used to work with Eclipse and it supported everything* through plugins just nicely.

* ..that I was using at the time: java, xml, python, html, jsp, javascript

tapia

For me, there is no IDE feature that can compete with the experience of editing in vim/neovim. When I use any other editor I just feel like I have a hand tied. The development of LSP and tree-sitter just makes the whole experience even better.

nick_

LSP is just a generalization of the implementations IDEs have had for decades.

Lio

Could you provide some concrete examples?

I ask because modern editors can do most things people often regard as IDE only but there is still the odd gem that’s worth hearing about.

sa46

I'm not familiar with state of the art for language servers but here's common IntelliJ refactors I use across Go and TypeScript (and Java a while ago):

- Add a parameter to a method signature and fill it in with a default value.

- Reorder method parameters.

- Extract a block of code to a method that infers the correct input and output types.

The most advanced refactoring I've done with IntelliJ is structural replace for Java which can do something like: for every private method matching the name "*DoThing" defined on a class that extends Foo, insert a log line at the beginning of the method: https://www.jetbrains.com/help/idea/structural-search-and-re...

I make heavy use of the "integrated" aspect of IntelliJ. One of the nicer benefits is that SQL strings are type-checked against a database schema.

colordrops

I want to customize every last element of my editor, have native VI bindings for everything, and run in a terminal. What IDE does that?

rowanG077

I have not ever had a good experience with an IDE. They are always bloated messes that try to force you into their shitty project structure.

aidos

Sure, but a lot of us don’t like that extra overhead. LSP is great in that it’s a tool you can tap into to use in a workflow that’s best for you. More of a library than an application (not technically, but in terms of how you use it).

IshKebab

VSCode is a "real" IDE.

> The features language servers provide are only some of the many things that IDE users have had for literally decades.

Yes of course, because that's what they were explicitly designed to do. The novel thing about language servers isn't that they enable code intelligence features like auto-complete and variable renaming. It's that they do so over a standard protocol that any editor or IDE (or website or CI system or ...) can use.

mickeyp

It's hard for an open source community to build features that compete with a commercial offering like that of Microsoft (or Borland in the 90s.)

And the reason for that is mostly down to fragmentation: the vim guys are doing their thing; the Emacs theirs, etc.

Now focus that energy into a singular project like a Language Server and the payout is likely to be many orders of magnitude greater.

turminal

Language server is a Microsoft offering

jicea

I'm a maintainer of a cli HTTP client with a text plain file format, Hurl [1]. I would like to begin to add support for various IDE (VSCode, IntelliJ), starting from syntax highlighting, but I have hard time to start.

I struggle on many "little" details, for instance: syntax error should be exactly the same in the terminal and in the IDE. Should I reimplement exactly the same parsing or should I reuse some of the cli tools parser? If I reuse it, how do I implement things given than, for instance, IntelliJ plugin are written in Java/Kotlin, while VScode plugin are Javascript/TypeScript, and Hurl is written in Rust...

Very hard to figure all when it's not your core domain,

[1] https://hurl.dev

IshKebab

If it's something simple (and it sounds like it is) then I would strongly recommend just making a single parser library that you use in both the language server and CLI. That's what I've done for my RPC format.

I used Nom. Even though it's not incremental, parsing is easily fast enough to just reparse the entire document on each change.

An alternative is to just use Tree Sitter as your parser for the CLI too. You won't use the incremental parsing feature in the CLI but that's fine.

Supporting IntelliJ may be tricky but there is a WIP plugin that adds LSP support.

billconan

https://microsoft.github.io/language-server-protocol/

rcshubhadeep

tree-sitter is a great framework. I have used it quite a bit in past. I even created a small library on top of it, called tree-hugger (https://github.com/autosoft-dev/tree-hugger) Really enjoyed their playground as well.

IshKebab

> The reason (most) LSP servers don’t offer syntax highlighting is because of the drag on performance. Every keystroke you type must be sent to the server, processed, a partial tree returned, and your syntax highlighting updated. Repeat that up to 100 words per minute (or whatever your typing speed is) and you’re looking at a lot of cross-chatter that is just better suited for in-process communication.

While I agree... he might be surprised to know that that is what all language servers do anyway, even if they don't provide syntax highlighting. Every keystroke gets sent over the LSP. As JSON. It's amazing it works as well as it does.

0x008

Not coming from the vim/eMacs world, I fail to understand what treesitter is compare to a language server? Why would I need both?

dbalatero

The article talks about why LSP servers don't typically implement syntax highlighting (performance).

petepete

Here's a great video on the topic by one of Neovim's core team

https://www.youtube.com/watch?v=c17j09vY5sw

0x008

thank's that is very helpful.

I was wondering if the both were to achieve similar goals it makes no sense to run them both, but now I can educate myself.

deriramdani

Momok

Daily Digest email

Get the top HN stories in your inbox every day.