How SQLite Helps You Do ACID

Daily Digest email

Get the top HN stories in your inbox every day.

simonz05

For people interested in this topic I can recommend "Can Applications Recover from fsync Failures?" by A. Rebello et al[1]. The paper analyzes how file systems and PostgreSQL, LMDB, LevelDB, SQLite, and Redis react to fsync failures. It shows that although applications use many failure-handling strategies, none are sufficient: fsync failures can cause catastrophic outcomes such as data loss and corruption.

- [1] https://www.usenix.org/conference/atc20/presentation/rebello

AdamProut

A nice outcome here for distributed SQL databases that store multiple copies of the data by default is that they can just failover over on an fsync failure and not try to "handle it". If fsync starts failing with EIO there is a good chance the disk is going to die soon anyways.

simonz05

That depends. Replicated state machines need to distinguish between a crash and corruption. Most systems don't do that. It can be disastrous to truncate the journal when encountering a checksum mismatch for instance.

See "Protocol-Aware Recovery for Consensus-Based Storage" for more research on that topic. [1][2]

- [1] https://www.usenix.org/system/files/conference/fast18/fast18... - [2] https://www.youtube.com/watch?v=fDY6Wi0GcPs

_vvhw

Indeed!

I was so glad to see you cite not only the Rebello paper but also Protocol-Aware Recovery for Consensus-Based Storage. When I read your first comment, I was about to reply to mention PAR, and then saw you had saved me the trouble!

UW-Madison are truly the vanguard where consensus hits the disk.

We implemented Protocol-Aware Recovery for TigerBeetle [1], and I did a talk recently at the Recurse Center diving into PAR, talking about the intersection of global consensus protocol and local storage engine. It's called Let's Remix Distributed Database Design! [2] and owes the big ideas to UW-Madison.

[1] https://github.com/coilhq/tigerbeetle

[2] https://www.youtube.com/watch?v=rNmZZLant9o

AdamProut

An fsync() failing doesn't necessarily mean there is a disk corruption. I agree all logging and recovery protocols should have different handling for a corruption vs a torn tail of the log for example, but I view that as mostly orthogonal.

I'm talking about exiting the process if fsync() fails and letting the distributed databases normal failover processing do its thing. This is a normal scenario for a failover (i.e, its the same as process crashing or getting OOM killed by linux, etc).

AdamProut

To be clear if fsync() on linux had well defined behavior on errors (no matter the file system used) I wouldn't suggest failing over as a reasonable solution for a distributed database. Its mentioned in the parent video, but fsyncgate[1] was a recent reminder of this.

[1] https://news.ycombinator.com/item?id=20491965

eis

I recommend the excellent articles by the SQLite authors themselves like

How To Corrupt An SQLite Database File: https://www.sqlite.org/howtocorrupt.html

Atomic Commit In SQLite: https://www.sqlite.org/atomiccommit.html

and more under the "Technical and Design Documentation" section of their docs at https://www.sqlite.org/docs.html

They go into the details of challenges SQLite faces due to how hardware and OSs work or not work and how they solve them.

pachico

Don't get me wrong, I run SQLite in production with millions of records, but it seems people have discovered it just very recently. All of the sudden, there's always a post about it with hundreds of comments.

What changed?

benbjohnson

Author here. I agree there's been a resurgence lately. My working theory is that the performance of the underlying hardware has increased so much in the last decade that the simplicity of SQLite has become a more important tradeoff than the extra performance boost you may see with Postgres/MySQL. Also, network overhead tends to dominate small queries so SQLite can actually end up a lot faster since it's in-process.

I wrote up a longer version here: https://fly.io/blog/all-in-on-sqlite-litestream/

_vvhw

“the performance of the underlying hardware has increased so much in the last decade that the simplicity of SQLite has become a more important tradeoff”

Thanks for articulating this so clearly! It's going to be interesting also to see how much simplicity is realized from new thread-per-core designs leveraging io_uring to get rid of multithreading, locks and memory barriers scattered throughout the control plane. The performance dynamics are all changing and it's easier than ever to be writing simple, fast systems, that hit the 90% sweet spot.

You've probably also seen McSherry et. al's “Scalability! But at what COST?” [1]

[1] http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...

EMM_386

It seems to come in waves, there was another round of SQLite posts hitting the front page around 6 months ago as I recall.

I'm not sure why, SQLite is the most widely deployed database engine on the planet, so it's not as if the technology is really "gaining" in popularity ... it's already there.

I was always curious if this is due to the huge number of new developers, many front-end, who have joined IT over the past decade or so, now venturing out and beginning to learn about relational databases and their benefits.

viraptor

Flyio is bullish on sqlite. They post a lot of tech stuff about it. That helps sqlite, but also keeps Flyio in the news.

tptacek

That's true as far as it goes, but also Ben works here,exclusively on the open-source Litestream project, so his blog posts end up on our blog sort of naturally.

tyingq

I suspect part of it is things like litestream, dqlite, rqlite, etc. Opens it up to people who have to make it work in some distributed fashion.

TylerE

Notice how all these articles have the same author?

likeabbas

NoSQL is no longer cool

lopatin

What about NewSQL?

likeabbas

That’s just SQL with extra (distributed) steps

Scarbutt

The new edge computing fad. These companies need a way to bring the database to the edge too, so they are heavily promoting sqlite as the best thing since slice bread to sell their services as the one solution to all your problems :)

DimmieMan

Possibly confirmation bias but 'Simple' has gotten really trendy and for databases it doesn't get much simpler.

Quite a few posts appealing to the idea of "why are we doing all this nonsense? X does the job just fine".

simonw

This was so useful to me. I feel like I have a pretty solid idea of how journal mode actually works now.

I cannot wait for the follow-up explaining WAL mode!

benbjohnson

Thanks! The WAL mode is really interesting because it just changes the design just a little bit but you get a ton of additional benefits. I have that post in the works right now.

kretaceous

Loved the article.

Whenever I read articles or releases by Fly.io, it makes me want to work with them. As a newbie who's interested in databases and networking, I reckon it would be an amazing experience in the team.

Unfortunately, they do not hire interns or likewise (from their job page) and I assume, from the amazing work they do, they aren't looking for beginners. =)

EDIT: Thanks to people encouraging me to apply. I'll definitely try!

michaeldwan

We do hire entry level people, in fact we're wrapping up our first cohort of interns right now. Exclusively hiring experienced dudes from the tech bubble is a failure mode in the long run. We're invested in hiring folks with diverse backgrounds and experience levels, and we need to talk about that more for sure.

Our jobs page is sparse right now because we’re focusing on hiring EMs to help grow a healthy eng org. We’ll have more openings before long, and you should absolutely apply when something sounds right. Feel free to hunt me down online in the mean time!

mhlakhani

Since you mention hiring EMs, is there a place to view the open roles for EMs?

The jobs page seemed mostly IC focused

davnicwil

Start the application transaction. If it fails you'll just roll back to current state in your career log (resumé). Nothing ventured nothing gained.

klysm

Just make sure you are in at least read committed isolation

corobo

More likely to succeed with an application than an assumption of failure :)

krono

If there's something you want, it might work better to just straight up ask them for it.

spike021

Not only just apply, but definitely try to network with people on the team. Works well if they're active on twitter. I got my first internship by building a rapport with someone on twitter and then they worked with their org to find an internship opening for me (still had to interview and stuff) that wasn't on their careers site.

ripley12

> After that, 510 bytes are used for the SHARED lock. A byte range is used here to accommodate older Windows versions with mandatory locks.

I was curious how old, and... wow, that code is for Windows versions that predate the NT kernel (Win95/98/ME). I'm surprised that it's still around, but the comment does a great job of explaining it.

https://github.com/sqlite/sqlite/blob/3cf46ee508e97b46736a26...

icedchai

The NT kernel is older than you think and predates Windows 95. NT 3.1 was released way back in 93!

ripley12

True, I misspoke; I was clumsily trying to describe the time before the NT kernel was used in consumer versions of Windows.

icedchai

Yes. And I personally also consider NT 4.0 the first "mainstream" NT, even though it wasn't quite consumer. I didn't see a lot of NT 3.x in the wild.

Karupan

Question for people who actually use SQLite in production: when do you start seeing performance degrade to the extent that you have to migrate to a "proper" DB? Can a transactional table scale to say 10 million+ records with a couple of indexes before it becomes a problem?

agildehaus

A company I work for runs MySQL on an IoT base station product (Beaglebone-like hardware). Tables seems to corrupt every so often, which typically are repairable with a "mysqlcheck --auto-repair" which they have as part of the boot sequence, but not always. These corruptions seem to be due to bad batteries or the field team holding down the power button too long.

Would sqlite be less prone to table corruption?

bob1029

I think yes - If you configured something like:

  PRAGMA journal_mode=WAL;
  PRAGMA synchronous=NORMAL;

This should be very safe & still reasonably fast.

See: https://www.sqlite.org/pragma.html#pragma_synchronous and https://www.sqlite.org/wal.html

eli

You should take a look at your mysql settings e.g. `innodb_flush_log_at_trx_commit` but yeah mysql isn't really built for frequent sudden shutdowns.

RedShift1

I run about 100 computers that regularly receive power cuts and have not have databases go corrupt. However we also write to the database every minute or so, all data is packed into compressed chunks that contain a minute of data.

slt2021

problem is in application layer and mysql settings.

you can try to switch from myisam engine to innodb engine, tune innodb settings, write inserts in batches, rather than continuously, and wrap insert into transaction

iddan

I love the writing style. Technical blogs take notes

im3w1l

Good article but on a lighter note the thought of undoing a half-made sandwich is surreal to me. I always start by buttering the bread!

benbjohnson

The SQLite Sandwich Shop is condiment free. :)

cp9

if you're not using vinegar I'm not sure what your source of ACID is

sophacles

A slice of tomato, and/or the neighborhood drug dealer can get you the ACID you need.

lenocinor

Pickles of course (too bad the codebase isn’t in Python or it could be a great pun in this circumstance).

Aperocky

minimalism and simplicity always bring great joy.

I wonder what is the largest server side system supported by SQLite as its primary data storage.

truth_seeker

F2fs (Flash Friendly file system based on LSM) natively supports Atomic writes.

https://github.com/sqlite/sqlite/blob/master/doc/F2FS.txt

https://lore.kernel.org/lkml/1411707287-21760-2-git-send-ema...

Daily Digest email

Get the top HN stories in your inbox every day.