Skip to content(if available)orjump to list(if available)

Virgil: A fast and lightweight programming language that compiles to WASM

syrusakbary

I discovered this project when I was taking a look on Wizard, a fast and low-resource Wasm interpreter [1].

One of the things that excites me about Virgil is that is completely self-hosted (the runtime is fully implemented in Virgil, and it can compile itself).

Curiously enough, the person that created Virgil (programming language) and Wizard (Wasm runtime) was Ben L. Titzer, who worked on the V8 team at Google and co-created Wasm. I'm pretty excited with what he can bring to the table :)

Update: I'm trying to have virgil to self-compile into Wasm/WASI, and will try to upload it to WAPM [2] so everyone can use it easily (as the first compilation of v3c currently requires the JVM)... stay tuned!

[1]: https://github.com/titzer/wizard-engine

[2]: https://wapm.io/

titzer

> (as the first compilation of v3c currently requires the JVM)... stay tuned!

It will bootstrap using whichever stable binary appears to work on your platform.

The compiler does have support for specifying arbitrary Wasm imports and I wrote most of the "System" module against WASI. The only thing missing is 'chmod' to flip the permissions on a just-generated executable, but other than that, it should be possible to bootstrap on WASI.

seg_lol

Schism (Scheme) also self hosts on Wasm. https://github.com/schism-lang/schism

spaghetti_regex

Not to be confused with Vigil, an esolang by the great Munificent.

“It goes without saying that any function that throws an exception which isn't caught is wrong and must be punished.”

https://github.com/munificent/vigil

krylon

This cracked me up.

> Q: Is this serious?

> A: Eternal moral vigilance is no laughing matter.

pcwalton

I'm glad to see that this language has a garbage collector, in a time when "lightweight" languages increasingly forego GC and memory safety. Even in the Harvard architecture of WebAssembly which mitigates many of the security problems that lack of memory safety causes, memory safety is the right choice.

titzer

Thanks. I feel I have gotten tons more done over the years by having GC. It's just so much more productive. For the really nitty-gritty low-level details of a particular platform, like directly calling the kernel to set up signal-handling and memory protection, do I/O, etc Virgil has unsafe pointers. On targets like the JVM or the built-in interpreter where there are no pointers, there is no Pointer type.

I am also working on a new Wasm engine and I have a neat trick (TM) where the implementation of the proposed Wasm GC features are implemented by just reusing the Virgil GC. So the engine is really a lot simpler, doesn't need a handle mechanism, doesn't have a GC itself, and the one Virgil GC has a complete view of the entire heap, instead of independent collectors that need to cooperate.

hollerith

Will your new Wasm engine be written in Virgil?

b3morales

I am curious about your comment. What new languages have you seen that are not memory safe by default?

raphlinus

Not GP, but the following come to mind: Jai, Zig, Odin, and Hare, all of which aspire in one way or other to be a modernized take on C. There is also a larger class of languages I call "safe-ish" as they are generally safe for single-threaded programs but can exhibit undefined behavior from data races; this includes Swift and Go, and likely other newer languages inspired by those.

saghm

> There is also a larger class of languages I call "safe-ish" as they are generally safe for single-threaded programs but can exhibit undefined behavior from data races; this includes Swift and Go, and likely other newer languages inspired by those.

Go does have a garbage collector though, so maybe the conflating of GC with safety (not by you, but earlier in thy thread) is a bit misleading.

mojuba

GC is not the only way though, see Rust and Swift, both very safe languages.

m0th87

For what it’s worth, the person you’re replying to is one of the heavyweights of the rust community.

littlestymaar

Not only is a “heavyweights of the rust community”, he was literally one of the main designer of the language at Mozilla (he's still contributor #6 by commit[1] despite not having worked on it for the past 7 years!)

[1]: https://github.com/rust-lang/rust/graphs/contributors

mojuba

I don't understand, how does that invalidate my response to that person's comment about GC's?

ihnorton

Skimming the docs, I was surprised to see that there appears to be no built-in support for foreign function calls to C libraries ("It's a bold strategy Cotton...")

This makes more sense in light of a comment by the author in a previous discussion about C ABIs [1]:

> Virgil compiles to tiny native binaries and runs in user space on three different platforms without a lick of C code, and runs on Wasm and the JVM to boot. [...] No C ABI considerations over here.

[1] https://news.ycombinator.com/item?id=30705383

titzer

Indeed, I've wanted to see how far I can get rebuilding userspace from the ground up. Virgil does that by exposing a raw syscall primitive on each target architecture, so when you target x86-linux you get "Linux.syscall<T>(int, T)". The compiler knows the calling convention of the kernel on each target platform, so it just puts things in the right registers and does "int 80" or "syscall". So library and runtime code that implements I/O, signals, etc just dial up the syscalls they want to get off the ground. So there's no need to resort to assembly; I view asm as the compiler's job.

vlovich123

Doesn’t that not work on many OSes since they treat libc as the stable ABI surface and make no guarantees about the syscall interface? If I recall correctly that’s what gave Go so many headaches on MacOS as they chose a similar strategy until, if I recall correctly, they abandoned it in recognition that it’s not tenable.

titzer

> since they treat libc as the stable ABI surface

For the subset that Virgil uses to get off the ground, I haven't been broken by the kernel changing system calls. MacOS has been a pain for a number of other reasons though, not the least of which is deprecating 32-bit altogether...with a student's help I finally got around to generating x86-64 Mach-O binaries. That works on x86 macs again. But something is still wonky and they don't run under Rosetta 2.

Linux is rock solid though. I've never been broken by the kernel.

NonEUCitizen

What is the definition of "tiny" here? 10's of KB? Thanks.

titzer

The smallest x86-linux binary, without runtime or GC, is 248 bytes. With runtime 6KiB, with runtime and GC, 11KiB.

martinflack

The 'Coming From' page compares Virgil to a couple other languages: https://github.com/titzer/virgil/blob/master/doc/tutorial/Co...

null

[deleted]

sanxiyn

The title is somewhat misleading in that Virgil has native x86 backend as well as WebAssembly backend.

boardwaalk

I almost skipped over this because of WASM in the title. For me, WASM is neat but not actually something I need and just an extra layer for just running a program on a computer. So it's good to see it supports other targets. Now it just needs ARM support ;).

seg_lol

Except you can run the wasm through wasm2c and run that on Arm. So that needless thing gives you exactly what you want.

extheat

No code samples without digging into links in the documentation doesn’t seem very inviting. Especially when the project itself is a programming language.

syrusakbary

I was able to find a link to examples in the main README, although it's true that in the Documentation markdown file itself is missing the reference.

In case it's useful for future readers, here are some samples: https://github.com/titzer/virgil/tree/master/doc/tutorial/ex...

conaclos

The syntax is not the same.

This is an example from itzer/virgil [1]:

  def fib(i: int) -> int {
    if (i <= 1) return 1;
    return fib(i - 1) + fib(i - 2);
  }
And an example from munificent/vigil [2]:

  def fib(n):
    if n < 2:
        result = n
    else:
        result = fib(n - 1) + fib(n - 2)
    swear result >= 0
    return result
[1] https://github.com/titzer/virgil/blob/master/doc/tutorial/Me...

[2] https://github.com/munificent/vigil

titzer

Documentation could definitely use a nicer coat of paint. Not sure how to do syntax highlighting in raw markdown, and GitHub won't accept a new language definition unless it is in "hundreds" of projects. So maybe an image?

brabel

I've used a markdown to html converter to convert my blog posts into HTML with very nice and customizable code samples... in my case I used Go's Blackfriday library with bfchroma[1] doing syntax highlighting with Chroma[2]. To add your language to Chroma you have to provide a lexer, which in turn is written in Pygments[3] syntax.

Once you have that, you can post your docs in GitHub Pages (or something like Netlify[4] or Cloudflare[5]), they both can run a command to build your website (from markdown to html) every time you push to a branch, and then serve the HTML generated as a static site.

Before this though, your language seems similar enough to others (maybe Java or C#?) that if you tell the converter to use those languages, you'll get decent enough highlighting. I did this to highlight Zig code before it became supported by telling the converter it was typescript code (coincidentally, many keywords seem to have aligned well enough)!

[1] https://github.com/Depado/bfchroma/

[2] https://github.com/alecthomas/chroma#supported-languages

[3] https://pygments.org/docs/lexerdevelopment/

[4] https://www.netlify.com/blog/2016/10/27/a-step-by-step-guide...

[5] https://pages.cloudflare.com/

tiffanyh

Dumb question: do all languages that compile to WASM perform relatively the same.

Or are there large performance differences between compiled WASM?

eatonphil

Language implementations that share a compile target have the same lower bound for performance (being fast) but they have infinite and unrelated upper bounds for performance (being slow).

Let's say there's a Python to C compiler (there is) and let's say there's a C++ to C compiler (there was, may still be idk). The performance characteristics of programs compiled via both won't be very similar at all. They can't be any faster than C. They can both be much slower than it. They'll differ wildly from each other.

mkl

No, they don't perform the same, just like different languages that compile to native machine code don't all perform the same. It depends a lot on the compiler, which depends on both the language and other goals like compilation speed, memory/storage usage, effort making the compiler, etc.

mayama

Wasm doesn't yet have a GC. So, when you compile languages requiring GC like python or Go, they have to include that GC in the resulting wasm binary, their wasm GC implementation might not be as performant as native either. On the other hand languages like C or rust doesn't have this overhead. This makes wasm binaries from GC languages big in size and will affect their performance.

ricardobeat

Are there any examples with more complex applications? It's hard to grasp what the language feels like from looking at purely academic code.

https://github.com/titzer/virgil/tree/master/doc/tutorial/ex...

akavel

AFAIR, JVM used to not have an u64 type, only i64; does JVM now support it? Or how does Virgil handle it? If there's some custom overhead added to simulate u64 semantics on JVM, that should be clearly documented in a language purporting to be fast - yet I didn't see it. Did I miss it?

I'm asking also because if the compiler supports JVM as a target, it shouldn't be hard to add support for a Dalvik target; however, u64 is also a problem in case of this one.

One more question I have is about FFI - I didn't find a mention of it after a quick skim; can I call functions from some thirdparty JARs or JS/WebAPI?

titzer

You're right that the "long" type in Java is signed. The JVM uses two's complement representation, so for many operations, unsigned arithmetic is bit-equivalent to signed arithmetic. For the remaining ones, like less-than, divide, shift, etc, Virgil generates more complicated code to do unsigned arithmetic or comparison manually, e.g. by first checking if an input is negative.

> One more question I have is about FFI - I didn't find a mention of it after a quick skim; can I call functions from some thirdparty JARs or JS/WebAPI?

For the Wasm target, Virgil allows you to write an imported component, so the module that gets generated has the imports with the signatures that you want. You can then load the module in JS and supply Web bindings and such. That latter process is quite clunky, but in theory gives you access to any API that can be expressed in terms of wasm (before externref).

kaba0

The JVM doesn’t have u64, but it does have standard lib functions like unsigned add that do view their i64 arguments as unsigned and will reliably get JIT compiled down to the correct machine code. The resulting code in Java would not be the most readable, but it is absolutely fine as a target.

boardwaalk

Flipping through the tutorial, given the mentions of functional-style and concision, I'm surprised the control structures aren't expressions ala Rust.

pabs3

> Virgil is fully self-hosted: its entire compiler and runtime system is implemented in Virgil.

I wonder if they have a bootstrap compiler, so you can build entirely from source without their pre-existing binaries.

https://bootstrappable.org/

titzer

You cannot bootstrap any language entirely from source, you always need a compiler or interpreter written in another language.

Virgil bootstrapped for the first time using an interpreter I wrote in Java, back around 2009. When the new compiler could finally compile itself well enough to be stable, I checked in the first bootstrap binary, a jar file. Since then, every once in a while (41 times so far), when a major set of bugfixes or new features is done, I've rev'd stable by checking in binaries that are generated by the first compiling the existing code with the stable compiler, and then compiling the compiler with that compiler. I generally wait several stable revisions before using new features in the compiler. What that means is that you can always compile the source in the repo with the stable binary in the repo, and that compiler can compile itself again too, and both should behave identically. You can usually even go back a revision, but I've never had to do that.

There is a full interpreter built into the compiler as well, so if there is a bug in the stable compiler's codegen that is a showstopper, it can be fixed in the source and then the new source run in the interpreter of the old compiler in order to get a new stable binary. I've never had that happen, though.

pabs3

I'd encourage you to write a smaller implementation for bootstrap of Virgil in a couple of other languages, so that folks who don't trust binaries can still use your language.

The Bootstrappable Builds folks are working on this for every part of a modern Linux distro. The main approaches are alternative smaller implementations written in other languages (including interpreters) and also compiling with chains of older versions that were written in other languages. They are also working on a full bootstrap from ~512 bytes of machine code plus a ton of source all the way up to a full Linux distro. They have gotten quite far in that and are continually improving the situation everywhere.