Mangle, a programming language for deductive database programming

Daily Digest email

Get the top HN stories in your inbox every day.

triska

This seems already almost valid Prolog syntax, which is also a syntactic superset of Datalog.

The main question I have for implementors of Datalog and Prolog variants like this: If you are that close to using Prolog syntax, why not go all the way and rely fully on Prolog, a language for which a well-defined ISO standard and several interesting implementations already exist. One of the key benefits you get in this way is Prolog's strength for meta-programming and reasoning about programs with the same formalism you use to state the specifications and queries. Abstract interpretation, query optimization etc. can be easily implemented in this way, instead of having to parse an additional formalism.

It may be possible to implement such Prolog-"variants" entirely within Prolog by defining suitable infix or prefix operators, or adding conforming extensions in implementations. A conforming extension is one that does not conflict with existing ISO syntax. For example, something that would be a syntax error in conforming Prolog implementations could be used as an implementation-specific extension.

PaulHoule

Prolog is not purely logical in that you can write imperative programs by taking advantage of the fixed order of search and also alter the executing with cuts. One of the great disasters of symbolic AI in the 1980s was the discovery that you can’t parallelize Prolog.

egl2021

"...you can write imperative programs...". It's worse than that: you have to look carefully at the order of the alternatives to understand the computation. I re-read Clocksin and Mellish during a recent programming language binge and realized that an imperative core lurks under the logical veneer.

tmptmpgo

Because syntactic doesn’t mean semantic subset. Datalog uses bottom-up evaluation by default while Prolog uses top-down. As such they are computationally very different.

triska

Yes well, as you mention, that is only the default execution strategy, and nothing prevents us from using other execution strategies for either language. In fact, the main advantage of using a declarative language such as Prolog or Datalog is precisely that it can be readily interpreted with different execution strategies, and indeed many Prolog implementations already provide alternative execution strategies, the most common of which is currently SLG resolution, also known as tabling. Alternative execution strategies are applicable as long as you keep to the pure monotonic core of the language.

The fact that the default execution strategies of Prolog and Datalog differ is not a valid argument for or against using a slightly different syntactic formalism. If, as seems plausible, an alternative execution strategy of Prolog can provide the same semantics as Mangle and other Datalog "variants", then it may be worth considering doing that instead of devising a different syntactic formalism, especially if the alternative syntactic formalism is already so close to valid Prolog syntax as it is in this concrete case.

tmptmpgo

Yes, I’m well aware of tabling. But high performance Datalog implementations will fundamentally be very different than the usual Prolog implementations. But sure, it isn’t impossible. Just long a ways from ISO Prolog.

Consider Soufflé, designed for large scale program analysis. Or LogiQL, designed for efficient incremental evaluation.

You also failed to disclose that you are not exactly unbiased individual here.

rad_gruchalski

Datalog programs guarantee termination while Prolog programs do not.

javajosh

>pure monotonic core of the language

What does that mean? Monotonic describes a function, or a sequence, that only every increases or decreases, never a combination of both. What does an "always decreasing" or "always increasing" language core look like? This may make sense for concatenative languages.

philzook

I agree this would be very useful. There are a number of datalog idioms (demand transformations is a big one for encoding functional programs, adding provenance, inlining relations, doing some light compile time backwards proof search) that it would be nice to have a good meta-programming/macro language to express. Prolog seems like a natural choice. I briefly tried going this route programming in prolog syntax so prolog could parse it, but generating souffle syntax out of the prolog metaprogram.

burakemir

This is my project! So soon ... I was considering a Show HN but wanted to wait until things like documentation are a bit more complete, but here we are.

Ask me anything.

cmrdporcupine

What is the provenance of the project? Is it for something you're working on at Google? Or is it a personal open source project?

I never got to work on anything this interesting in my 10 years at Google :-)

burakemir

Mangle is used in an internal application. That application is a staffed project with internal usage.

In mid 2021, I saw an opportunity to apply some ideas on query languages to support development of that particular application. That application was an early idea and a prototype back then.

It was clear that it would need integration with many data sources and that this data would need to be queried in various ways that were not fixed in advance and would evolve. A very common situation...

With the team, we decided to try a novel approach using datalog, and it did indeed help development. Soon they started asking for features, found my bugs, optimized a few things and pushed for design changes.

As the programming language person, I tried to keep internal consistency of the language design and ease of use (static checks!) and wrote most code, yet Mangle certainly would not exist without this team, the application, it's users and management support at various levels.

I insisted on keeping the language as a separate thing since I had these other uses in mind. Open sourcing was therefore not difficult. For the mentioned application, Mangle is merely a mechanism, a library that enables flexible access to data.

So Mangle is not my or anyone's main project, yet many people (including management support at various levels) contributed to it's development.

I am glad you find it interesting! I feel very privileged to work with the larger team of people whose efforts led to this opportunity to do this kind of applied language research.

grose

Other resources for logic programming and Go:

ichiban/prolog - ISO Prolog interpreter in pure Go, getting close to v1: https://github.com/ichiban/prolog

trealla-prolog/go - ISO Prolog interpreter embedded via WASM: https://github.com/trealla-prolog/go

guregu/pengine - library for interfacing with Pengines (SWI-Prolog's RPC protocol): https://github.com/guregu/pengine

biscuit-auth/biscuit-go - Biscuits are a fancy auth token with a little Datalog engine: https://github.com/biscuit-auth/biscuit-go

I'm a big fan of logic programming. We've been seeing a small resurgence of interest in it (for example Yarn using Prolog made some waves) and I have some optimism for its future.

burakemir

Thanks for sharing Biscuit, I was collecting examples of authentication policy languages.

Datalog is also the basis for Open Policy Agent https://www.openpolicyagent.org/docs/latest/ , more specifically it's Rego language which is also implemented in go https://github.com/open-policy-agent/opa/tree/main/rego

jitl

Interesting; Google engineer previously published Datalog variants for BigQuery: https://research.google/pubs/pub43462/ & https://logica.dev/

This new language seems similar to differential-Datalog (which is sadly in maintenance mode): https://news.ycombinator.com/item?id=33521561

evgskv

That's right. And btw Logica is also open source: https://github.com/evgskv/logica

linkdd

Inventing a language seems to be a rite of passage for every engineer at google.

Go, Dart, Carbon, Mangle, am I missing some?

I'm not criticizing, I would not dare as I'm creating my own language as well :P

throwaway484

I can think of a few others…

Sawzall, a language focused around processing logs. Rob Pike led on this but use has pretty much all been replaced by Go. https://en.m.wikipedia.org/wiki/Sawzall_(programming_languag...

Dex, a language focused around array processing from the team behind the Jax machine learning library. Early stage research project. https://github.com/google-research/dex-lang

Rune, a language focused on security, early stage research project. https://github.com/google/rune

Wuffs, a language focused on writing safe file format handlers (parsing, encoding, decoding) https://github.com/google/wuffs

d3nj4l

Cue and skylark (now starlark), too.

fidgewidge

Also other internal languages: borgmon, borgcfg. Don't think they use borgmon anymore though.

kccqzy

The internal GCL language, as well as its failed replacements. There's a nice introduction in this paper: https://pure.tue.nl/ws/portalfiles/portal/46927079/638953-1.... Back when I worked there, I actually quite liked the syntax of the language, though the semantics of the language was quite a dumpster fire.

Another internal language for querying Monarch. See section 5.1 of http://www.vldb.org/pvldb/vol13/p3181-adams.pdf

nicoburns

4 languages doesn't really seem that many given how many engineers there are at google...

operator-name

Inventing a language is a rite of passage for every engineer. Domain specific problems are best appreciated in the constraints (and flexability) of a domain specific language.

jstx1

Rune - posted very recently -

https://github.com/google/rune

https://news.ycombinator.com/item?id=33761193

pjmlp

To be fair, it is a rite of passage for anyone doing a proper software engineering degree anyway.

Hence why there should be a very good reason to start an ecosystem from scratch, given how many languages get invented per year across the world universities.

drittich

Yes - my only attempt was in the early 1980s when I started to develop a language implemented on top of interpreted BASIC. It was called MAL (My Attempt at a Language). I wish I still had it - I have no recollection what my design goals were, if any! The only thing I have left is the name, and I'm quite sure that was the best part about it anyway.

capableweb

Initial version of AngularJS by Miško could almost be considered it's own (compile to JS) language.

theodpHN

Some of the other stuff looks intriguing, but regarding the claim that "Unlike SQL, our Mangle rule projects_with_vulnerable_log4j has a name and can be referenced in other queries." goes, SQL in a VIEW or common table expression (CTE) can also be referenced in other queries.

burakemir

Yes, there are CTE and recursive queries, and various DBMS also offer views. There are even table-valued functions.

These things are not widespread, and differ by implementation, and the way these are used by clients are copy-and-paste. Something as thoughtful as ZetaSQL https://github.com/google/zetasql does not have mechanisms for structuring (modules, packages, interfaces). SQL will not, cannot evolve into such a direction (or, anything that evolves, will not be recognizable as SQL).

rad_gruchalski

I don’t think a CTE or a VIEW is an intended reasoning behind this claim. It’s more like choosing a part of a WHERE statement based on previous criteria and prior results.

zozbot234

CTE's and VIEW's can indeed express inference in relational databases. A "projects_with_vulnerable_log4j" view is a nice example, but possibly-recursive CTE's can achieve pretty much arbitrary inference, e.g. on graph-like data.

JimmyRuska

RDFox looks like the best bet for datalog databases, it computes changes incrementally, also with aggregation extensions. Logicblox, Soufflé, datomic, inter4ql, corese are also worth a look. Looks like there's a lot of innovation possible in the space, like distributed logic processing, incremental sorting, adding assert statements, figuring out why specific rules don't match recursively, etc

burakemir

RDFox using datalog underneath is definitely interesting. Maybe worth making a distinction between products/services and open source projects that can be used as building blocks for such products/services or whatever custom setup one may have.

UncleEntity

I’m probably wrong (I’ve been deep diving into RDF triplestores lately) but I think sparql does all that and has a W3C specification.

Maybe the difference is you don’t have to convert your data into (subject, property, object) triples?

Been reading this hexastore paper and they seem to be trying to solve the same problems but I’m no data scientist so who knows.

cmrdporcupine

Datalog generally has n-ary relations instead of binary relations like RDF. (That's not to say there aren't Datalogs restricted to binary relations. I believe the Datomic stuff is that way, but not this one).

Datalog precedes SPARQL by years... decades. Though not standardized AFAIK. But also ... honestly... SPARQL also is not nearly as elegant to my eyes as Datalog.

I'm glad to see a recent flush of interest for this stuff. Seems like there's a new Datalog article every week.

(I am not a Datalog expert, but I'm a relational-model-nerd. Also as of a couple weeks from now I will be working at https://relational.ai/. Check out their 'Rel' language, it's got similar vibes)

sonicgear1

Another useless and overly convoluted project just like all things google.

Daily Digest email

Get the top HN stories in your inbox every day.