Get the top HN stories in your inbox every day.
chearon
nicoburns
We have a decent chunk of layout and paint implemented in an HTML renderer I'm working on (https://github.com/DioxusLabs/blitz), which is targeting the "electron" use case (but with a rust scripting interface rather than a JS one).
The implementation is currently very immature and there are a lot of bugs and missing features (I only got a first cut of inline layout working yesterday (but we already have flexbox and grid implemented)), but we're already seeing pretty decent results on a bunch of real-world web pages and hope to be at the point where we can render most of the web (excl. JS) in the next 6 - 12 months.
There are some screenshots on the PR for the inline layout branch https://github.com/DioxusLabs/blitz/pull/63
yencabulator
Sometimes it's really hard to tell the exact boundary between current day software development and elaborate jokes:
> Blitz builds upon:
> Parley for text/inline-level layout
> Currently, Parley directly depends on four crates: Fontique, Swash, Skrifa, and Peniko.
> Peniko builds on top of kurbo
bratao
We have been using https://github.com/rushter/selectolax as a faster alternative to BeautifulSoup with html5lib because many malformed webpages in the wild don't work with lxml.
nwellnhof
The problem is that libxml2's 20-year old HTML parser never supported HTML5 [1], leading to more and more problems with downstream consumers like lxml, PHP or Nokogiri. PHP recently switched to Lexbor [2] and Nokogiri to libgumbo [3]. That said, I'm hopeful to receive enough funding to implement a HTML5 parser in libxml2.
[1] https://gitlab.gnome.org/GNOME/libxml2/-/issues/211
postepowanieadm
libxml is xml parser, html5 is not xml.
tedunangst
It's a bit late to be saying that to people already using libxml because "It should be able to parse "real world" HTML." https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2...
sgc
Speaking of which, I don't understand why not. It seems like it would have been trivial to keep html5 a true xml. I do not understand what the actual technical reason for not doing that was. Naively, it just seems like breaking compatibility out of disdain rather than actually useful progress. Saving a couple of characters every once in a while does not justify the change, so I presume there must be a better reason?
thomasfromcdnjs
Ah this answers my question in another comment.
Thanks!
hliyan
Rarely does one see a C++ quick start guide that's actually this quick: https://lexbor.com/docs/lexbor/#quick_start
lelanthran
> Rarely does one see a C++ quick start guide that's actually this quick: https://lexbor.com/docs/lexbor/#quick_start
Could be because it isn't C++?
zamadatix
Step 1 is a bit of a "draw the rest of the owl" step in that it's either done for you on your specific platform with default settings already or you have to go do all of the actually hard stuff of building the app (and sure enough that's where the typical cmake build step is hidden as well). Step 2 is just "and remember to link your code against the hard part when you compile it, by the way here's a single minimal example".
Maxatar
Step 1 is:
cmake .
make
make installboxed
C, not C++
hartator
We open sourced our Ruby bindings and port:
- https://github.com/serpapi/nokolexbor
- https://serpapi.com/blog/nokolexbor-a-performance-focused-ht...
It is super fast compared to Nokogiri with libxml.
thomasfromcdnjs
Inspiring infrastructure.
The module aspect is super cool, is there much adoption with any other projects using the individual modules? e.g. a webparser using the dom module
troupo
Quite unusual to see Elixir among languages supported via bindings
lelanthran
> Quite unusual to see Elixir among languages supported via bindings
Not due to difficulty, usually. Bindings to non-mainstream languages are unusual to see.
I never heard of a language that couldn't interface to C in one way or another; it's one of the advantages of using C over (say) C++.
Get the top HN stories in your inbox every day.
The title made me think this could actually layout and paint HTML, but I couldn't find anything remotely layout-related in the source tree. Then I found this comment saying even block sizing isn't done: https://github.com/lexbor/lexbor/issues/219#issuecomment-207.... Looks like a nice groundwork, though. It's nice to see things like parsing and Unicode being part of the same source tree.