WebVM: Server-less x86 virtual machines in the browser

Daily Digest email

Get the top HN stories in your inbox every day.

williamstein

The JIT compilation to WebAssembly they are doing with WebVM is pretty cool!

I didn't see any benchmarks on the linked to page. I tried their sample Fibonacci program, but up to 100000 and ONLY timing actual execution (using the time Python module) to not include startup time, and WebVM only took 6.7 times as long as native for me. That's very impressive.

There's a similar open source project called https://copy.sh/v86/. Using their arch Linux image with the exact same Fibonacci benchmark, it take 44 times as long as native.

Recursing

As a smoke test, I tried running `time python3 -c 'print(max(range(2*10**7)))'`

It's about ~10x faster on webvm.io compared to copy.sh/v86 and only ~20x slower than native, impressive stuff

hughrr

20x slower than native suggests this solution has its own ozone layer hole.

syrusakbary

Thinking more about this.

I'd love to try v8 there, so we can benchmark the WebVM v8 against the Native JS in the browser... all using the same engine (is a bit meta, isn't it?)

apignotti

Well, nodejs uses V8 and it's installed

syrusakbary

Exciting... trying it as I type!

Edit: I'm trying to run the following benchmarks [1]:

    function mySlowFunction(baseNumber) {
     console.time('mySlowFunction');
     let result = 0; 
     for (var i = Math.pow(baseNumber, 7); i >= 0; i--) {  
      result += Math.atan(i) * Math.tan(i);
     };
     console.timeEnd('mySlowFunction');
    }
    
    mySlowFunction(8); // higher number => more iterations => slower

Results: 99ms in my Chromium browser (v8 JIT enabled), it breaks in the WebVM (after typing `node` and enter) with `TODO: FAULT af5147bf / CODE da d9 83`

Results (not using console time): 99ms in my Chromium browser (v8 JIT enabled), 680ms in WebVM (1050ms in the first run).

Conclusion: WebVM node is only 7 times slower than native v8 in my local machine (macOS M1 Max)

[1]: https://gist.github.com/sqren/5083d73f184acae0c5b7

benou

I timed a very simple loop in C: "for (volatile int i=0; i<N; i++);" (handful of arithmetic, compare and branch instructions) with N=1e9 and it was 70% the speed of native which looks really good. I'd love to see LINPACK now :)

syrusakbary

Thanks for the benchmarks! I was curious about timing.

Python is also a bit tricky because it does things with pointers that I believe are hard to optimize (or maybe not, who knows!). Have you tried other languages/programs?

westurner

Is WebVM a potential solution to "JupyterLite doesn't have a bash/zsh shell"? The current pyodide CPython Jupyter kernel takes like ~25s to start at present, and can load Python packages precompiled to WASM or unmodified Python packages with micropip: https://pyodide.org/en/latest/usage/loading-packages.html#lo...

Does WebVM solve for workload transparency, CPU overutilization by one tab, or end-to-end code signing maybe with W3C ld-proofs and whichever future-proof signature algorithm with a URL?

miohtama

The VM cannot have full TCP/IP stack, so any data research tasks are likely to need a special code paths and support for downloads. No SQL databases, etc.

westurner

From "Hosting SQLite Databases on GitHub Pages" https://news.ycombinator.com/item?id=28021766 https://westurner.github.io/hnlog/#comment-28021766 :

DuckDB can query [and page] Parquet from GitHub, sql.js-httpvfs, sqltorrent, File System Access API (Chrome only so far; IDK about resource quotas and multi-GB datasets), serverless search with WASM workers

https://github.com/phiresky/sql.js-httpvfs :

> sql.js is a light wrapper around SQLite compiled with EMScripten for use in the browser (client-side).

> This [sql.js-httpvfs] repo is a fork of and wrapper around sql.js to provide a read-only HTTP-Range-request based virtual file system for SQLite. It allows hosting an SQLite database on a static file hoster and querying that database from the browser without fully downloading it.

> The virtual file system is an emscripten filesystem with some "smart" logic to accelerate fetching with virtual read heads that speed up when sequential data is fetched. It could also be useful to other applications, the code is in lazyFile.ts. It might also be useful to implement this lazy fetching as an SQLite VFS [*] since then SQLite could be compiled with e.g. WASI SDK without relying on all the emscripten OS emulation.

westurner

Also, I'm not sure if jupyterlab/jupyterlab-google-drive works in JupyterLite yet? Is it yet possible to save notebooks and other files from JupyterLite running in WASM in the browser to one or more cloud storage providers?

https://github.com/jupyterlab/jupyterlab-google-drive/issues...

https://github.com/jupyterlite/jupyterlite/issues/464

Klasiaster

The VM could have its own TCP/IP stack, possibly with a SLIRP layer for translation of connections to the outside. Internet connectivity can be done by limiting it to AJAX, or forwarding the packets to a proxy (something like http://artemyankov.com/tcp-client-for-browsers/), or including a Tor client that connects to a Tor bridge, etc.

westurner

Is all of that necessary to LD_PRELOAD sockets and tunnel them over WebSockets, WebRTC, etc?

So e.g. curl doesn't work without (File System Access API,) local storage && translation of e.g. at least normal curl syscalls to just HTTP/3?

marwis

StackBlitz' WebContainers have in-browser TCP/IP stack, I think from MirageOS.

dboreham

How does it get raw ip packets out from inside the browser?

mahoro

Wow, this is amazing!

Now we could create a /dev/dom virtual device, and write dynamic web pages in pure bash. I love this.

colejohnson66

Or a DOMFS in /dom that's organized in the same hierarchy as the browser DOM. For example, to write a whole page:

    echo "...." > /dom

Update the <title> tag:

    echo "TITLE" > /dom/html/head/title

Change the charset:

    echo "EBCDIC" > /dom/html/head/meta[1].charset // second <meta> tag
    echo "EBCDIC" > /dom/html/head/1.charset       // second child of <head>

Even go full XPath and replace a tag's inner HTML:

    echo "<div>abc</div>" > /dom/[@id='myID']

This is a horrible idea...

eurasiantiger

Oracle Acquisitions team would like to discuss a business transaction.

antocv

This comment is worse and more horrible than the parent.

asplake

XPath? What's wrong with the find command?

colejohnson66

The way I envisioned it was that attributes are also files themselves, and the contents contain the values. So:

    // given:
    <div id="myID">

    // `id` attribute is located in
    /dom/html/body/…/div[n].id

So unless you look at the contents of the files, you wouldn’t be able to find a certain ID. Because of that, DOMFS (when XPath is enabled) would expose that same "file" at `/dom/[@id='myID']` as well.

I guess you could do something like this?

    grep "myID" /dev/**/*.id

But why would you even use this “DOMFS”?

undefined

[deleted]

syrusakbary

This is awesome. Really. Props to all the Leaning Tech team (creators of Cheerp, an alternative to Emscripten)!

I believe it will be possible to achieve similar state in the future just using Native Wasm/WASI (so no transpilation from x86 -> Wasm will be needed), but we are far from it given how slow the WASI standards move.

The shell is impressive: https://webvm.io/ (only downloads ~5Mb of resources for a full Debian distro)

apignotti

Thanks, appreciated.

By the way, it's spelled "Cheerp", with a lowercase p :-)

syrusakbary

Corrected!

iFire

License check without commentary.

https://github.com/leaningtech/webvm/commit/6efab7e60bf6f173...

parksy

The repo doesn't contain the actual distro itself, it appears to be loading CheerPX's virtualisation engine and feeding it a disk image here https://github.com/leaningtech/webvm/blob/6efab7e60bf6f173a2...

Assuming they wrote their own xterm interface (no idea if they did, I got as far as that), seems everything open-source is fetched by the client at runtime. This feels to me more like a bootloader than an OS. Not sure where that lands it license-wise whether merely linking to the image requires appropriate licensing and attribution but either way the work seems pretty straightforward to replicate assuming you have / can supply an xterminal-esque interface and can compile your OS image appropriately.

I don't think they're doing anything wrong licensing-wise but I guess it depends on how the law defines including software as a library, whether that needs to happen at compile time or run time, or whatever. Seems like a grey area?

apignotti

Hello HN, author of the post here, happy to answer questions.

brian_herman

How does this compare to https://bellard.org/jslinux/tech.html? https://bellard.org/jslinux/

apignotti

Perf. Our JIT is extremely advanced. Of course different workloads will behave differently, but you are welcome to try multiple payloads and see for yourselves.

mhh__

What constitutes advanced?

easrng

And also https://copy.sh/v86

remisharrock

And jor1k (JavaScript openrisc processor emulator) : well, they emulate processors (x86, openrisc..) in JavaScript while WebVM execute (transpiled?) code in webassembly

Klasiaster

So, this is a reimplementation of the Linux ABI and no Linux kernel source is involved, right?

apignotti

That's correct.

Klasiaster

Have you tried compiling Linux as User Mode Linux with emscripten? I imagine something like this https://github.com/nabla-containers/nabla-linux would run on wasm, too?

arturventura

Hey dude, I've been screwing around implementing plan9 semantics in a OS like system for the browser (https://github.com/intigos/possimpible). I'm interested in using a x86 emulator inside a webwoker that I'm using for processes so I can run x86 code. How hard is something like this? Can you give me some pointers on how to start working on this? Thanks!

btdmaster

It seems to not work with my eager block settings. It works with a fresh Firefox profile though, so it's not clear what the issue exactly is though. I know for sure that the ext2 is never actually downloaded (0 byte response) and when I try to check anything in DevTools cxcore.wasm triggers a pause on debugger statement, which spikes the CPU.

Any chance there could be a version with all the assets in one thing (say, GitHub Pages)?

easrng

Is there support for loopback networking (for IPC)? Is there a way to translate HTTP(S) requests to `fetch` requests? How difficult would it be to port a Go app that uses https://github.com/pion/webrtc to use the browser's native WebRTC?

apignotti

HTTP request could be intercepted, but due to CORS they would most likely not succeed. I have not studied the WebRTC protocol in detail but it might be possible.

bonzini

What performance bottlenecks are there that can still be improved?

Right now I see a ~6x slowdown on small-code benchmarks like sieve, but it goes up to 50x or more for large code like GCC. For QEMU it's roughly 1.6x and 2-3x respectively, so it seems like your JIT is slower than QEMU's.

PeterisP

Is this tech a reasonable way for cross-platform emulation, e.g. a x86 VM running on a browser on ARM hardware like the new Apple chips?

In essence, for this approach, would the x86-on-x86 performance hit be similar or very different than the x86-on-ARM performance hit?

apignotti

The actual host hardware does not matter, neither in terms of support or performance, theoretically. We currently optimize the codegen for Chrome/V8 on x86.

xmly

How about the network stack? Is the VM can talk with other VMs from other browsers.

apignotti

Not in the current implementation, but absolutely possible with WebRTC. We have done something equivalent some time ago: https://medium.com/p/29fbbc62c5ca

neurostimulant

> HTTP servers (microservices): By combining Service Workers with virtualized TCP sockets, we will allow HTTP servers inside the WebVM to be reachable from the browser itself. A JavaScript API will be exposed allowing to connect an iframe to the internal servers. This is mostly supported already, although not exposed on the current demo page. A similar solution could be used to support REST microservices as well since Service Workers also handle fetch/XHR requests.

I wonder if this can be used to create a semi-decentralized website where visitors automatically served a vm to run, turning them into an edge server to offload requests from other visitors. The more active visitors, the more edge servers you have. Infinite scaling on the cheap! The visitors may not like you abusing their browser though, but there might be use case where this is acceptable, such as popular community run websites that too expensive to run due to huge amount of traffic.

geocar

> I wonder if this can be used to create a semi-decentralized website where visitors automatically served a vm to run, turning them into an edge server to offload requests from other visitors. The more active visitors, the more edge servers you have. Infinite scaling on the cheap! The visitors may not like you abusing their browser though, but there might be use case where this is acceptable, such as popular community run websites that too expensive to run due to huge amount of traffic.

You can buy ads. Ads cost about as much as ads, so you can buy and sell the same unit, and then run some compute for free.

I ported k(5) to html+js (ecmascript) during Iverson College (I think this was 2014?) and used webtorrent to connect to secondaries to run a scale test. The cpus are cheap, and they are slow, but it was a lot of (distributed) fun. I pitched the idea to KX (and a few others) to sell compute for fractions-of-pennies-per-hour but I think it was still a little early.

Do you think Now's the time?

neurostimulant

Hold on, so you buy an ad slot, serve your code in that ad slot, AND serve another ad inside that slot to resell? Is that actually allowed by ads provider?

The idea of running vm inside an ad to harvest compute from unsuspecting visitors... I think this might accelerate widespread use of adblockers even more if you successfully deploy this in the wild because people will notice ads are getting heavier.

geocar

> Hold on, so you buy an ad slot, serve your code in that ad slot, AND serve another ad inside that slot to resell?

Exactly.

> Is that actually allowed by ads provider?

On some ad networks it's prohibited by ToS and (in some cases) a review process, but it is extremely difficult to prevent in practice, especially if you have any understanding of how this works. I estimate perhaps as much as 50-90% of Google's adsense revenue comes from this, so they aren't (directly) incentivized to stop it.

> The idea of running vm inside an ad to harvest compute from unsuspecting visitors... I think this might accelerate widespread use of adblockers even more if you successfully deploy this in the wild because people will notice ads are getting heavier.

Perhaps, but people also have a lot of idle cores, so if you don't block networking and you monitor system performance carefully to ensure you don't affect things, for the most part people simply won't notice.

Quite a few people have been caught out doing wasteful things like trying to generate "coin", which definitely doesn't help, but there's also some interesting applications that have been run on volunteer-cpu-time (folding, seti, etc), so it seems plausible with some charitable examples and some care to avoid impact, this might be a doer?

foota

I don't see how these could be exposed to another person's browser as an http site, my reading of this is that they are exposed to the user's environment running inside WebVM.

To my knowledge the only way that one browser can talk to another is through peer to peer webrtc, but that requires a handshake.

neurostimulant

Websites rendered via websocket instead of http is already possible these days using various frameworks, right (e.g. Phoenix.LiveView)? It might be possible to add webrtc support in addition to websocket for transport. The initial connection is served from a centralized server, just enough to render the initial page and initialized webrtc handshake. After that, subsequent page navigation is handled over webrtc to available peers. The vm will also get downloaded over this channel, turning the visitor into a node to expand the network.

foota

Yeah, I was thinking about this, but at that point there's not much reason to prefer a vm over just writing a server that's compiled to wasm, if you're going to have to write something that renders via javascript instead of serving http?

m3at

Sibling comments have mentioned web torrents, you could also look at IPFS for something related to what you're describing (without the abuse) https://ipfs.io/

NavinF

If you just want to serve static files between browsers, you can do that with WebTorrent. I also searched “webrtc rpc” and found another piece of the puzzle.

I suspect that running arbitrary computations on peer browsers is uncommon today because verifying the output of an RPC executed on hostile machines is nontrivial unless you’re serving static files with a known hash, mining crypto, or solving an NP problem. I guess you’d also need a quota system to prevent users from taking over your botnet’s CPU time by spamming peer browsers with expensive RPCs.

raggi

`su` sends the vm into a spin on a single core, is it going to get somewhere?

Is there anywhere that documents the covered/uncovered syscall surface?

zekrioca

Yeah, happened to me as well..

didip

I always wonder if someone will eventually put Docker container up on the browser. It will make tons of experimentation work easy.

mrtesthah

is this some kind of joke to make computing as slow as possible?

koolba

If you thought web pages were bloated before, just wait until they download an entire Ubuntu image on every page load.

emteycz

Some people don't care that it's slow. Availability and uniformity is much more valuable for example in school environment, especially at one where they teach IT one hour per week and the teacher is not really a programmer themselves.

geuis

Yes but...

The more likely case is you get corporate and military clients who adopt this for security (ya know, load a known 0-state image) to check email (which is loading some old version of outlook from the image), and it ends up taking the entire workday before you can briefly use your system.

Basically giving a take on the recent https://www.airforcemag.com/fix-my-computer-cry-echos-on-soc...

didip

I can see why running Docker daemon on the browser can be seen as excessive.

But running docker container does not need to use dockerd. One can flatten the filesystem and have systemd run the container directly.

dboreham

It'd only be a joke if it ran a VAX emularor in that docker container and then ran VMS on that emulator and then ran the RSX-11 emulator on VMS and then ran Adventure on that RSX-11.

indigodaddy

This would actually be super useful on Chromebook just as an alternative ssh client as I’m not super happy with existing ssh client solutions on Chromebook— especially that chrome ssh extension thingy which sucks pretty hard imo.

Note I don’t like to put my Chromebooks in developer mode or do the crouton stuff or whatever the latest is on that front..

EDIT, nm it has no tcp stack or outbound connectivity

yonixw

If you already simulate linux on linux why not transfer TCP over HTTPS over TCP using a tunnel server?

https://stackoverflow.com/questions/14080845/tunnel-any-kind...

indigodaddy

EDIT, nm it has no tcp stack or outbound connectivity

xiphias2

This looks awesome.

Would it be possible to compile GNU/Linux to WASM as a target platform? What's missing for that?

s5806533

Truly an impressive feat, and a lot of work no doubt. But why? Recently it seems to be some kind of fad to demonstrate that everything can be done inside a web browser. Again: why? Scope creep of web browsers is already beyond repair.

remisharrock

For educational uses, I have plenty of use cases for large scale teaching and learning, without backend servers and without installing anything complex on the client side.

Daily Digest email

Get the top HN stories in your inbox every day.