Get the top HN stories in your inbox every day.
anilgulecha
manigandham
There are plenty of serverless database options already: Firestore, DynamoDB, CosmosDB, FaunaDB, even MongoDB, and there are "newsql" distributed relational systems like CockroachDB and Planetscale with serverless plans.
jillesvangurp
Many teams would prefer a postgresql compatible database with full sql support without compromises, missing features, etc. So, this could break that market open a little. Both AWS and Google are unreasonably expensive for this stuff. Most teams don't need a huge database and would be able to run postgresql on a tiny instance and get away with it. Have 2 of those and failover and backups and it's good enough for a lot of small shops.
Most managed/serverless options begin at hundreds of dollars per month. So, you get lots of companies either just handing over the cash or jumping through hoops to get something more reasonable. The latter is a stupid waste of time if you can afford the former. That's how Google and Amazon make money: they make the expensive option more tempting and the cheap option needlessly hard. They are not interesting in supporting frugal teams. The whole point is squeezing their customers hard.
So, this is potentially very nice if it offers some competition on the cost front. I'd certainly consider using this if it proves reliable. In fact, the whole reason I opted out of a relational database is the above. What I'd need is something that is reasonable in cost relative to the modest data I store and retrieve.
davidzweig
We have a single big bare-metal machine. We run Postgres with a ~1TB DB, moderate load, on a Hetzner AX101 (16C/128GB ram). It has 2* 3.84TB nvme drives (zfs mirror with hourly snapshots) used for postgres storage only, and a seperate pair of mirrored sata drives for system/boot (had to request the extra drive, ask support to change boot option in BIOS, and reinstall OS using rescue system). It's about 100 EUR/mo with unlimited data. We bounce all incoming requests from clients (the machine also runs a node backend) through a digital ocean machine (NGINX proxy), as their peering agreements are better, without this some users in Brazil, Turkey etc. have very slow access. OVH I think would be even better for this use (better? peering and IIRC cheaper data). ZFS snapshots are backed up with sanoid to a machine under my desk with spinning disks. AX101 can be fitted with up six 3.84TB drives, that's almost 12TB of mirrored storage, we should be good for a while. You can (should) use at least lz4 compression on zfs.. can consider zstd-1, bit slower, that could double the effective space. The compression also applies to the in-ram ZFS cache, that can be 100GB+.
We used firestore before.. got a bit tired of some of the limitations (latency, indexing). Cost-wise I don't think it's that different actually, but we aren't using much bandwidth, then self-hosted can be dramatically cheaper. Have to manage some details of course (zfs filesystem parameters, set up backups, config postgres etc.), but I found that stuff quite interesting and it's knowledge that will always be useful.
nikita
Hi Mani! For sure there are many serverless options - fewer that separated storage and compute and fewer that are open source end-to-end. Neon is also 100% compatible with Postgres (unlike CockroachDB) because compute is Postgres.
Our intention is to standardize the separation of storage and compute cloud architecture - that's why it's open source under the Apache 2.0 license.
boomskats
You should update your HN profile :). Neon looks really great. Far more interesting to me and my team than S2.
I noticed you mention Azure BS in your RFCs as a potential backend. Have you done much work towards that yet?
anilgulecha
(Some of these are options I've not looked deeper into - Fauna, Planetscale)
This sentiment is perhaps right, but I was careful about calling out scale-to-zero. We do have options that are zero cost (or pay as you use), but there's a fundamental difference in something that may be zero cost because a cloud provider is using it as a customer-acquisition ploy.
Options like litestream+sqlite+s3, or what Neon seems to be, are verifiable you-pay-for-when-db-is-booted up, else the verifiable cost is storage only.
So the trifecta that will be very productive for masses is 1) database where compute is scale-to-zero, 2)open source or commoditised, and 3) is RDBMS.
nikita
There is a big difference between in architecture between Neon and PlanetScale, CockroachDb, and Yugabyte. Neon is shared storage (storage is distributed but shared) and the others are shared nothing. Shared nothing systems are hard to build with supporting all the features of the base system. E.g. https://vitess.io/docs/13.0/reference/compatibility/mysql-co....
Neon is 100% compatible from Postgres b/c we didn't (or almost didn't) change the Postgres engine.
manigandham
What is the effective difference to you? Technically all compute can enter hibernation by dumping RAM (or just using virtualized memory backed only by SSDs).
CockroachDB already does true scale-to-zero if that's your requirement: https://www.cockroachlabs.com/blog/how-we-built-cockroachdb-...
pid-1
Most stuff you mentioned:
1 - Is not an actual relational DB
2- Doesn't really scale to zero
Planetscale does scale to zero, but has a ridiculous billing model.
manigandham
Both CRDB and Planetscale "scale" to zero - but neither expose any concept of individual instances so I'm still not sure what difference it makes.
acjohnson55
What's the billing model / how is it ridiculous?
rektide
MongoDB & CockroachDB are the only open source ones, the only ones we can hack on & improve & grow.
Neon seems like a vast vast improvement & great & desperately needed potential leap for mankind.
manigandham
Planetscale is Vitess which is also opensource: https://vitess.io/
> "great & desperately needed potential leap for mankind"
Are you being serious? That's very hyperbolic if so.
avinassh
Minor correction, both of them are source available, but not open source
Cockroach DB license - https://github.com/cockroachdb/cockroach/blob/2c4e2c6/LICENS...
Mongo license - https://github.com/mongodb/mongo/blob/39e4b70/LICENSE-Commun...
tluyben2
There is also gigabyte as open source. In our tests (which means nothing in general as it’s specific to our business case), it outperforms cockroachdb.
logifail
> This opens up a world of try-out mini applications that cost cents to host
Given how much performance you can squeeze out of a $5/month VPS (I've been spinning them up and indeed down regularly over the last couple of years), is this really a paradigm shift?
sofixa
On that $5/month VPS there's management overhead - you need to have at least basic Linux knowledge, and ideally more than that to know not to do stupid things like chmod 777 and database exposed on the public internet. You also need to do your updates, etc.
I'm an (former) SRE and run my own Kubernetes cluster for fun, but still use serverless (containers as a service, static website hosting) depending on the project.
lelanthran
> On that $5/month VPS there's management overhead - you need to have at least basic Linux knowledge
Don't you need that as well as cloud-specific knowledge if you go serverless?
> and ideally more than that to know not to do stupid things like chmod 777 and database exposed on the public internet.
You still need some arcane knowledge to make sure your serverless doesn't experience cost overruns, right?
IME (your's obviously differs), the amount of cloud-specific + vendor-specific knowledge needed to avoid using a $5/m VM is a lot more in volume and a lot less in stability[1] than learning basic Linux once and using VMs everywhere[2].
[1] How the different cloud providers bill, when they bill, how to control your limits, etc changes much more often than knowing how to keep your server patched. Knowing how to get your serverless DB going on AWS doesn't help when you want to use Azure. And each cloud vendor regularly requires you to update your knowledge. Knowing how to keep a PostgreSQL-on-Linux up-to-date can be learned once and used for years. Even if running a managed DB, you'll still need to gain some of that knowledge anyway.
[2] Once you get to a scale where treating your machines like cattle rather than pets, you'll obviously have the team required to use cloud stuff optimally.
vidarh
With every cloud service there's a management overhead too. Just different skills you need to learn.
I've done SRE/devops work in various capacities including consulting longer than cloud services have existed, and my experience is that I've consistently earned more from clients who insisted on cloud services because they consistently need more help. Nothing is driving more demand for devops consulting services than cloud providers.
nikita
We don't know. But we built it anyway because it may be that.
anilgulecha
Millions of students and enthusiasts around the world would find that cost sufficiently friction-ful to not try out things.
logifail
> Millions of students and enthusiasts around the world would find that cost sufficiently friction-ful to not try out things
I appreciate there are indeed billions of people for whom $5 is a lot of money, but just how many of them are "students and enthusiasts" itching to get started with Postgres?
I realise that - perhaps particularly here - a $5/month VPS is a deeply unsexy thing. You can, however, achieve (and learn) an awful lot with one.
kortilla
Who absolutely would not want to give a credit card to AWS where a bill is dynamic. If $5/mo is bad they definitely can’t handle the screw up that scales up and runs overnight for $500.
vidarh
Free tier instances on various providers provides an option. And if you can't afford $5 a month you really shouldn't be playing with services where there's a risk of huge overages if you face a sudden spike in users.
pid-1
Fly.io does not scale to zero.
Lambda has many limitations.
In particular, for some reason AWS is allergic to providing a container deployment service that actually scales to zero.
russellendicott
> AWS is allergic to providing a container deployment service that actually scales to zero
Isn't this what Fargate is?
pid-1
No.
hamandcheese
Not yet, but soon.
dragonwriter
AWS Aurora Serverless v1 (in MySQL and Postgres flavors) has had serverless, scale-to-zero for quite a while.
shaicoleman
Aurora Serverless v1 has cold boot times of ~30 seconds when scaling up to zero, which precludes it from being a viable option for most usecases
pid-1
Unfortunately V1 is getting very little love from AWS and the new one, V2, does not scale to zero.
8organicbits
What's the cold start time for something using sqlite+lightstream on scale-to-zero compute? I think you'd need to pull the db out of storage, so I would be slow to go from 0->1 instance. Anyone know if that's right?
Is there any cold start delay for neon?
nikita
Right now it is 2 seconds. We are working on improving it.
rektide
> This is the missing piece on cloud for masses
I like this perspective a lot & think it's absolutely key here.
We- the world- still pick single-node writer postgres & read replicas when we have to store & query data. There's great Kubernetes postgres operators, but it's still a distinctly pre-cloud pre-scale type of technology, & this decoupling & shared-storage sounds ultra promising, allows independent & radicaly scale up & scale down, sounds principally much more managable.
hamandcheese
If you can scale your app to zero, couldn’t you also just scale your database to zero once no more app servers are running?
Or for try-out apps, as you mention, you could just run Postgres next to your app in the same container.
This might be possible with fly.io, or will soon, I think.
I’m not sure how comfortable I am using a custom flavor of Postgres (even if it’s just the storage layer).
antender
We already had serverless db for ages and it's called ... Google Sheets. You can even query it with simple SQL-like language.
The problem with most other "serverless" databases is that they don't offer HTTP API to query them from restricted environments like serverless functions.
SonOfLilit
> Neon allows to instantly branch your Postgres database to support a modern development workflow. You can create a branch for your test environments for every code deployment in your CI/CD pipeline.
> Branches are virtually free and implemented using the "copy on write" technique.
Unless I missed that everyone supports this, this here could be a killer feature and should be advertised higher.
samokhvalov
Agreed, this direction is underestimated and should be developed better -- we (Postgres.ai) do it for any Postgres with our Database Lab Engine [1], and Neon would bring even more power if it's installed on production
zxspectrum1982
You can get that feature on any Postgres server by installing Citus
mattashii
Does Citus provide any such storage-level multi-cluster features? I can't seem to find any documentation on that...
zxspectrum1982
I don't understand what you are looking for. Care to explain?
jvolkman
AWS Aurora Postgres supports this to an extent with "clones". You can even clone cross-account. The same copy-on-write stuff applies, so they're relatively cheap and fast. I hope that Google's new AlloyDB will also support it.
https://aws.amazon.com/about-aws/whats-new/2019/07/amazon_au...
There are some annoying restrictions, though. You can only have a single cross-account clone of a particular db per account.
samokhvalov
The problem with Aurora's thin clones is extra cost each clone adds
For CI/CD, you want multiple clones running on the same compute power, in a shared environment, to keep the budget constant
jhgb
It sounds like something you might be able to accomplish with a copy-on-write VFS on top of a Firebird database file. (Not sure about PostgreSQL, but with Firebird, you only deal with one file, so with Firebird, this should definitely work.)
nikita
There is an enterprise company called Delphix that does it on top of Zfs - so the idea was in the air.
Instead of duct taping this together with a filesystem we purpose built database storage. The advantage to this is that we can much tighter control execution paths and can profile them end-to-end. Additionally this allows us to integrate with S3 and make it much much cheaper to run.
jhgb
Technically Firebird just requires a block device, so you might not even need a filesystem.
rkwz
What are the intended usecases for "branching" a database? Currently, I use separate databases for different environments, are branches better?
kelvich
Now the most common setup is to copy the production database to the staging once in a while and test migration against staging. With branching, you can test each PR against its own production database branch -- just put branch creation in your CI config. Hence, it has fewer moving parts, is a bit easier to set up, and reduces the lag between prod and staging.
ukd1
Have a staging / qa env, then fork it for a branch for testing. Much faster than reseeding / restoring.
thejosh
It's a great feature on heroku for branches, it shares data between review apps. Quite nice.
rektide
Really interesting. I've seen so much disagregated database work, and so so so much of that exposes postgres interfaces. But all the good stuff has been closed source!
I'm very very excited to hear about a team taking this effort to postgres itself, in an open source fashion! From the Architecture[1] section of the README:
> A Neon installation consists of compute nodes and Neon storage engine.
> Compute nodes are stateless PostgreSQL nodes, backed by Neon storage engine.
> *Neon storage engine consists of two major components: A) Pageserver. Scalable storage backend for compute nodes. B) WAL service. The service that receives WAL from compute node and ensures that it is stored durably.
Sounds like a very reasonable disaggregation strategy. Really hope to hear about this wonderful effort for many more years. Ticks the boxes: open-source with a great service offering: nice. Rust: nice.
nikita
We are committed to building a durable company and we are well funded. So yes, you will hear from us for years to come as we will be shipping more and more features.
avinassh
I could not find funding information on the Neon site. Is that information not public?
edit: I found the info here: https://boards.greenhouse.io/neondatabase/jobs/4506003004
nikita
We will announce in a few weeks. Top tier Silicon Valley investors.
ranguna
Just yesterday I was comparing managed serverless postgres offers and was sad to temporarily end my investigation with a compromise of using managed aws RDS for development, hoping that a fully serverless postgres with a nice free tier would pop up before going to production, and here we are!
Congrats to the team for what feels like an amazing product. Signed up for the early access, can't wait to get my hands on this!
For anyone interested, these ere the DB offers I looked into:
* DO managed postgres, no free tier but price scaling was not too aggressive, the issue is that it's not natively serverless and we're gonna get 100s of ephemeral connections.
* Cockroach, was the best option for our use case but it doesn't support triggers and stored procedures, so we can't use it right now (closely following https://github.com/cockroachdb/cockroach/issues/28296)
* Fly.io price scaling is too aggressive 6$ -> 33 -> 154 -> 1000s a month and no free tier that I could find.
* Aurora serverless v2 is only for aws internal access and we are using gcp.
* Aurora v1 was what we were gonna go with, but a lot of people online have showed their negative opinion around slow scaling. I didn't investigate enough but I'm thinking we'd need to setup RDS proxy for it handle all our connections, which would've bumped up the price by a good amount. Also no free tier.
* Alloydb looked promising but also no free tier and starting price is a bit much for our current phase of development, but it was definitely something we'd look into in the future.
And now Neon, natively serverless with a (hopefully) good free tier to test things out and some hints about cross region data replication, amazing stuff!
rad_gruchalski
If CockroachDB was fitting your use case the best, you should have a look at YugabyteDB. It does triggers, stored procedures, extensions, almost everything. Some alter table features aren’t working yet but it’s getting there.
Not associated with the company but a very happy user.
Bonus point: YugabyteDB is full Apache 2-licensed so you can roll your own.
ranguna
Just took a look and it seems pretty nice!
But found their pricing page (which was very hard to find other than the generic "contact sales" page) and it seems the starting price is 360 USD/month, that's not something we're comfortable with right now.
USD 0.25/vCPU/hour, minimum 2 vCPU = 0.25*2*24*30 = 360
https://www.yugabyte.com/yugabytedb-managed-standard-price-l...
spiffytech
> Fly.io price scaling is too aggressive 6$ -> 33 -> 154 -> 1000s a month and no free tier that I could find.
Fly has a general purpose free tier of 3 of their smallest instances. You can use that to run their 2-node Postgres cluster plus an app server.
The pricing you pulled is examples of various compute + storage configurations, not the exhaustive list of options. It should look like $4 (or free tier) -> $11 -> $21 -> $62 -> $82 ... + storage, since it's just 2x their VM price (for the two nodes) + any storage above free tier.
nwienert
Last I used them (last year) their postgres offering, even scaled up to larger nodes, was significantly slower than the cheapest DO offering. I filed a few issues but haven’t checked back since.
ranguna
Ah nice!
So the prices I mentioned where just example configuration. That's pretty cool then, specially with that free tier.
Will put fly.io back on the list and do some benchmarking in the future.
Thanks a lot!
sitkack
Curious why a free tier is so important?
I think a FT encourages bad behaviors on both sides. I don't think pricing should be linear at all. But even for development, one is using resources, but most of the time they can be minuscule for individual devs.
Aside from production reliability, Postgres is one of the easiest things to get running on a VM and runs fine on a 5$ a month instance.
ranguna
Free tier, like all things, is pretty bad if misused.
The reason we want a free tier is to try things out before we can actually commit to something. We don't know if what we are doing is actually gonna make money and sometimes we go a few months without working on it. So it's kind of a pain to pay for something we don't use.
That's why serverless is also nice to have on our current stage, things can just scale to 0 and there's no wasting of resource.
> running on a VM and runs fine on a 5$ a month instance
Easier said than done, unfortunately.
lysecret
"Aurora serverless v2 is only for aws internal access and we are using gcp." You can have public access to serverless v2. I'm using it with retool for example. That said I moved a Postgres DB to Aurora, the process was hilarious in how crazy it was. Also they haven't implemented scaling to 0 yet!!!! And the minimum 0.5 Compute unites are actually pretty expensive.
ranguna
Nice point about the minimum 0.5 ACU, forgot about that one. From what I've read 0.5 on v2 is the same price as 1 on v1, which seems pretty dumb coming from aws.
Could you elaborate more on this:
> You can have public access to serverless v2
Because the docs mention the following:
> You can’t give an Aurora Serverless DB cluster a public IP address; you can only access it from within a VPC based on the Amazon VPC service.
Potentially I could setup an RDS proxy or vpn inside the vpc and give that public access, but that seems a bit of a roundabout way of handling this. https://aws.amazon.com/blogs/database/best-practices-for-wor...
lysecret
I can 100% confirm public access works :)
jvolkman
AlloyDB is free during its preview phase (not sure how long that is).
ranguna
Read gcp's policy and they say preview periods can last around 6 months, and I'm not sure when alloydb preview started.
But even if there's a free period, it'd be complicated to develop stuff around the DB for free, just to turn into 100s of dollars after 6 months, that's not something we want to see happening. So an indefinite free tier with limited resources would be better. Like aws lambda 1M or firebase function 2M request free tier.
rattray
Did you look at Crunchy Bridge? Not sure if they support that use case.
ranguna
Took a look just now and they start at 35/month. They have some nice points around support, backup and disaster recovery. But if that's the starting point, I'd prefer something like digitalocean that has a similar product offer starting at 15$.
Thanks for the tip though!
gorgoiler
Postgres is mind boggling, coming from sqlite. In a good way, and both are amazing tools.
with ordinal
jsonb_*
‘3 minutes’::interval
create index on my_json ->> ‘a key’
It’s amazing how much stuff there is available. All the toys!CGamesPlay
Just a quick point in defense of SQLite: that last one is almost verbatim possible in SQLite, and it is possible to calculate ordinals, although the syntax is with standard SQL rather than a custom syntax. The SQLite docs mention that they never found a use case for jsonb that ended up being faster or more efficient than json, so they left it out, although they do reserve the BLOB data type for jsonb if such a use case is discovered.
gorgoiler
Well this is a doozy: so you’re saying they are both equally awesome as opposed to being individually awesome in different ways.
What a time to be a developer.
manigandham
From the teams page, the CEO of Neon is the cofounder of MemSQL/Singlestore which is one of the best database products I've used. Looks like a solid team to get this done. Very similar approach to Yugabyte (real postgres compute layer + custom scale out data layer) and many others in the OLAP space.
ignoramous
Manish Jain of dgraph.io noted that building on top of Postgres or betting on Postgres seems like a necessary condition for database startups to be successful.
Some are commodotizing Postgres' wire format but implementing their own query and storage layers (like CockroachDB / Aurora / AlloyDB), while others are modifying parts of Postgres (like Timescale / EdgeDB / YugaByte), and others still are building atop it (Supabase).
manigandham
Interesting note but that seems to be recency bias with news than anything concrete. Companies from MongoDB to Snowflake to FaunaDB have been successful. Manish himself is from DGraph which is a brand new graph database with no relational to Postgres.
nikita
Thank you for the kind words Mani! Singlestore is indeed an amazing product and company. I'm really proud of it!
nikita
Nikita - CEO of Neon here. We intended to post this at the launch next month, but since it here, I'm happy to answer any questions.
We have been hard at work and looking to open the service to the public soon.
zeusly
Hey Nikita, could you maybe put some more legal information on the webpage?
I'm trying to find out if you're a company and where you are located. Is there no legal entity behind this? Do you have a privacy policy?
mattashii
The company is Neon, Inc., which is registered in the USA. We're a remote company, with a significant portion of the developers being located in Europe.
Privacy policy and related stuff will be ready when we publish the public beta, which we expect to happen soon.
nikita
Yep, Delaware corp with top tier US investors. I'm in the Bay Area. Heikki is in Finland. Stas is in Cyprus. Majority of engineering is in Europe, some in the US and Canada.
Postgres is a global phenomenon.
timmg
How “cheap” is it to create new db instances?
I can imagine a world where it might be practical to have one master db for all of your customers/accounts. But a separate db instance for each customer’s data.
Is that the kind of architecture you think might be workable with your system?
nikita
It's cheap. Storage footprint is 15Mb and will be shrunk further. Min compute footprint is a 1 core container that shuts down when not used.
We are already working with customers that do that. This is for sure a great use case for Neon.
unraveller
won't the tiny compute units on AWS have relatively slow storage? (no NVMe allowed for them I think) fine for small datasets that fit in ram but benchmarks are needed to show the bigger picture.
avinassh
This is really exciting and thank you for making it open source. I am still trying to wrap my head around the Neon, but is there any design document or architecture description? I want to learn more about the Neon storage engine and how it all fits together.
Also, how do I get an invite code to try?
edit: found this to get started - https://neon.tech/docs/storage-engine/architecture-overview/
akmodi
Hey Nikita! I was just looking at the docs but I was a bit confused about what the various compute instances were doing. Do they all serve reads and writes? If so, is there data partitioning or does this support distributed transactions?
nikita
Various compute instances are different endpoints to separate databases. So for now it's single writer system. You can get a lot of power out of a 128 core compute node. In the future will will also spin up extra compute to scale reads.
In the future after that future we will introduce data partitioning - we have a cool design for it, but one step at a time.
akmodi
Ah got it thanks! And what's the consistency on the instances that serve reads?
Super interested in this space since we're always looking for ways to evolve our pg!
httgp
Do you plan to solve for global data-at-the-edge availability? That to me is the killer feature for databases and one I’m direly in need of at work.
nikita
Yes, we are discussing either simply using Postgres replication to move data to other regions and use our proxy to rout reads to the datacenter closer to the user (like fly.io). This will have issues with supporting more than ~5 regions.
OR we can separate storage from replication and purpose build a multi-tenant replication service. This will support as many regions as you want (over 200) but it's more work. We will publish an RFC for that.
code_biologist
Cool stuff! Is PostGIS support difficult?
nikita
It's supported. The beauty of the architecture is that it doesn't break plugins.
tuukkah
Could you include it in the tech preview? https://neon.tech/docs/cloud/compatibility/
lewisl9029
Seems like this might implement database branching in the way most people would assume: branching both the data and schema? I remember being a bit disappointed to learn that PlanetScale's database "branching" was only for the schema [1], which is still quite useful, but this would be so much cooler!
I couldn't find much info about the replication models available/planned however. I would consider this to be table stakes at this point for a serverless database with the recent trend of pushing compute to the edge. This is much more interesting to me than scaling to 0, which is only really useful during the prototyping phase.
PlanetScale is single primary with eventually consistent read replicas, Fauna has strongly consistent global writes (or regional if you choose, but no option for replication between regions if you do) with a write latency cost, Dynamo/Cosmos are active-active eventual consistently replicated with fast writes globally. All useful in different scenarios, but I'd love to have one DB tech that can operate in all of these modes for different use cases within the same app, using the same programming model to interact with data across the board.
I think the decoupled storage engine here would open up some really interesting strategies around replication. What are the team's plans here?
nikita
Great questions!
1. Yes schema and data via "copy on write". This will let you instantly create test environments, backups, and run CI/CD. There is a long video here that shows a prototype with GitLab: https://www.youtube.com/watch?v=JVCN9X-vO1g&t=1s.
2. We don't have this feature at the launch, but Matthias van de Meent is already working on it. We will publish and RFC and solicit comments from the community.
3. We are working on two: regional read replicas and consistent multi-region writes (together with Dan Abadi who helped design FaunaDB). Former is much, MUCH easier.
4. An obvious one is a time machine - we want to allow you query at LSN (or timestamp). A less obvious one is templates: you can start your project with a pre-populated database. We will allow you to create and publish such "templates". Disclaimer - it might not be called templated when we ship it.
rattray
For those unfamiliar, LSN is "Log Sequence Number", a pointer to a location in the WAL (Write-Ahead Log).
https://www.postgresql.org/docs/current/datatype-pg-lsn.html
vira28
Amazing work by the Team. Congrats y'all. It was one of the best presentations in the PGcon22.
I did email Heikki the following questions, in case if someone from Neon is around here.
a) How does Neon compare to polardb https://github.com/ApsaraDB/PolarDB-for-PostgreSQL.
b) The readme mentions a component "Repository - Neon storage implementation". Does it use any special FileSystem? Any links to read more about it?
c) Heard the cold start is a second (IIRC), how does that value differ if one runs Neon on bare metal instead of k8s?
nikita
Thank you!
a. PolarDB is based on a similar idea. https://www.cs.utah.edu/~lifeifei/papers/polardbserverless-s.... This paper describes it. The biggest difference that I see glancing through the paper is that we really integrated S3 into the storage. In Neon architecture branches, backups, checkpoint are all the same thing and instant to run. This simplifies a good amount of database management AND deliver on better costs. S3 is cheap.
b. Neon doesn't need a special filesystem. Neon storage is in a way a filesystem, however it doesn't expose filesystem API. It's a key value store - serves 8k pages to Postgres and a consensus - update API to the key value store. Pages are organized in LSM trees and background processes put layers of the LSM trees to S3.
c. The cols start is 2sec right now. There is a dependency on K8S. Bare metal implementation will require new code to orchestrate starts and stops.
ignoramous
> S3 is cheap.
S3 has its limitations though, like too many small files and the get/delete/list ops get very expensive. There's also an upper-limit on throughput per S3-bucket partition. I guess, sstables that pageserver flushes periodically help work around these issues?
> Neon storage is in a way a filesystem, however it doesn't expose filesystem API.
Genuinely curious: When would anyone consider using filesystems like Amazon FSx for Lustre instead which is backed by S3 anyway over implementing a filesystem-esque abstraction of their own (like neon.tech does, and other solutions like rocketset.com, tiledb.com, xata.io, and quickwit.io do).
> Pages are organized in LSM trees and background processes put layers of the LSM trees to S3.
Curious how merges are handled? Also, are you using RocksDB / some other engine underneath?
> Bare metal implementation will require new code to orchestrate starts and stops.
Speaking of new code... SingleStore started as a very high-throughput OLTP database and eventually evolved to into a HTAP (?) database. Do you see Neon evolving in a similar manner, too?
Thanks!
nikita
1. Yes. Our first attempt at storage implementation had a problem with many small file. Then the team rearchitected it around LSM trees and it got a LOT better. Our benchmarks show that we are very close in performance with vanilla Postgres and Aurora. There are some "worst case" scenarios where Neon is worse than vanilla Postgres. Aurora has similar problems too.
2. It's best to custom build a storage system here. External distributed filesystems introduce complexity, cost, and bottlenecks that you don't control.
3. Purpose built. LSM trees also have a temporal dimension - LSN. You can fetch a page by pageId and LSN. This is what allows time machine and branching.
4. I call it convergence when OLTP and OLAP is one system - ultimate dream for a database systems engineer. Since I spent 10 years building it I have both scars and aspirations. I think it will come, but this will take a long time. HTAP is in a way a subset of convergence - most systems will have some HTAP. Neon will have some too, but for now it squarely focused on OLTP and helping developers build apps.
timmg
The way you describe it, to me, is one of those “this sounds obvious in retrospect”. Sounds completely elegant and “right”. Congratulations on a great idea. I really hope you pull it off!
nikita
Thank you! We are super hard at work. You can see our velocity here: https://github.com/neondatabase/neon
1500100900
> we really integrated S3 into the storage
Will it be possible to use something else in place of S3? I'm thinking on-premise or what some would call a private cloud.
mattashii
Right now, it should be possible to use anything that is compatible with the S3 API, as our current focus is on getting the product to the market. Once the business model is proven, we'll likely branch out to other clouds, with their storage providers.
If you can't wait that long to run Neon on your own cloud, feel free to contribute an integration to your persistent blob storage: the code is available under APLv2 here: https://github.com/neondatabase/neon/
unraveller
>serves 8k pages to Postgres
will page size be tunable on neon cloud for larger datasets?
nikita
No Postgres only requires 8K. One can imagine adapting Neon storage to other engines then of course this can be extended.
avinassh
> It was one of the best presentations in the PGcon22.
I can't find it on Youtube, do you have the link?
edit: I found the link, seems it is not on the Youtube yet: https://www.pgcon.org/events/pgcon_2022/schedule/session/236...
nikita
I can't recommend this presentation enough!
ololobus
> c) Heard the cold start is a second (IIRC), how does that value differ if one runs Neon on bare metal instead of k8s?
Yeah, as Nikita mentioned it's 2 seconds now. We did some tests and measurements and on bare metal, it's sub 500 ms usually, so the remaining part is the k8s (+ our own control plane) orchestration overhead. For example, with plain Docker (which we use in CI in addition to k8s) it's around 1 second already.
K8s provides a convenient abstraction layer, though. So I think that we'll continue using it and optimization will come with pods pool / over-provisioning and it'll be realistic to bring the startup time closer to bare-metal.
-- Cloud engineer @ Neon
talkingtab
Why is this a good idea? In my experience, getting Postgres up and running is trivial. Docker anyone? And in many cases your data is your business so why hand it off? And if you are going to offer this product why not just call it what it is, "Postgres as service", instead of serverless which seems a bit misleading. Really it is simply Postgres running on your server.
chimen
Not everyone can manage a database properly and, sorry to say this to you but, Docker is a terrible idea for a database in general. Setting up your own database somewhere still puts your trivial data on someone else's server more or less.
All these "Serverless" keywords pretty much mean you don't have to be spinning up servers (cloud) or setting up & maintaining one. Nothing is "Serverless" per-se so it's time to move on from picking on this, I agree, bad choice of words.
smokey_circles
> Docker is a terrible idea for a database in general
Why? Genuine question, my gut feel is there's something wrong about it too but I can't put words to it nor have I found a benchmark that convinced me, but it's worth noting I'm not sure what I'm looking for
chimen
I manage about 200 servers and docker crashing accounts for 20%+ of my issues so far. The servers are brought up easily on crash and that's not an issue for my services. For a database docker is nothing more than an extra layer of complications on top with iptables, volume system and all the layers it brings. It's just a bad wrapper for a production database which needs stability.
iknownothow
I knew my bet to sticking with Postgres would pay off! This looks super exciting.
I thought of doing something similar for our data warehouse with AWS Fargate and Postgres but the cold starts and limited disk space required too much engineering on top to make it work.
Moving to Snowflake comes at the cost of losing so many Posgtres features in exchange for speed. Things like foreign keys, constraints, extensions etc which requires so much engineering to replace in Snowflake. I would be happy to pay 25x the price for a 10x speed increase for a specific query.
nikita
Thank you!
Snowflake is a better cloud data warehouse than Postgres, but of course Postgres is so versatile. Neon will give you some of the Snowflake features: time machine, cloning - we call this branching, data sharing.
thejosh
Snowflake is a data warehouse though. Completely different use case.
If your data can be done via PG, highly recommend that over SF. Especially with this concept.
Snowflake is great when you use a tool like dbt, their modern SQL approach and functions are fantastic. Downsides is it's pretty pricey, and can catch you out.
iknownothow
We already use DBT and the data is less than 10TB, something Postgres can handle well. And most of the data is concentrated in a few tables. With a serverless approach I'd be happy to allocate 10x resources for just a query or two and for the rest a minimal server is fine.
I manage the data warehouse mostly alone because Postgres offers guarantees, unique constraints, triggers and relationships between columns of different tables. It does the work of two engineers. Snowflake is fast but not Postgres compatible. In order to move to Snowflake, I have to write tests and maintain them which Postgres does for me for free.
I'd stick with Postgres at least until 20TB before considering Snowflake.
manigandham
Snowflake is an OLAP system. It's an entirely different kind of "speed" designed for analyzing vast amounts of data through scans and aggregations.
iknownothow
Neon now opens the door (at least in my mind) for Postgres to be used for analytics or a data warehouse for almost an order of magnitude more data before having to consider Snowflake.
Basically if someone is already using Postgres as a warehouse, then they can prolong their migration to Snowflake by at least a year by using something like Neon.
manigandham
Sure but there are plenty of OLAP solutions like Greenplum, and extension-based offerings like Citus and Timescale, that can all partition and scale across nodes to massive datasets with column-oriented storage.
AWS Redshift is also built on Postgres (although a much older and customized version).
captnObvious
I hope y’all have a plan for when AWS decides to pick up your open source project and turn it into a managed cloud solution. It’s a pattern of theirs. And with the way egress charges are structured they’re likely to snap up any clients straddling their cloud and yours.
mattashii
AWS already has Aurora, which is their own in-house closed-source variant that does very similar things.
We think we'll be able to provide a better experience at lower cost for smaller developers, while having some very useful quality-of-life features like zero-cost branching and instant PITR.
oxfordmale
AWS already has this, Google for Aurora Serverless. It is not cheap though, and this might well be cheaper.
onphonenow
AWS has Aurora Serverless v2 out already for postgresql along with RDS for postgresql. Not scale to zero though / bottom is 40/month or so
Get the top HN stories in your inbox every day.
This is the missing piece on cloud for masses
With something like this we get a solid RDBMS engineered to be scale-to-zero, and with good developer experience.This opens up a world of try-out mini applications that cost cents to host. serverless db (postgres) + serverless compute (cloud-run) + use as you go storage+network. This is a paradigm-shift stack. Exciting days ahead.