Get the top HN stories in your inbox every day.
simonz05
AdamProut
A nice outcome here for distributed SQL databases that store multiple copies of the data by default is that they can just failover over on an fsync failure and not try to "handle it". If fsync starts failing with EIO there is a good chance the disk is going to die soon anyways.
simonz05
That depends. Replicated state machines need to distinguish between a crash and corruption. Most systems don't do that. It can be disastrous to truncate the journal when encountering a checksum mismatch for instance.
See "Protocol-Aware Recovery for Consensus-Based Storage" for more research on that topic. [1][2]
- [1] https://www.usenix.org/system/files/conference/fast18/fast18... - [2] https://www.youtube.com/watch?v=fDY6Wi0GcPs
_vvhw
Indeed!
I was so glad to see you cite not only the Rebello paper but also Protocol-Aware Recovery for Consensus-Based Storage. When I read your first comment, I was about to reply to mention PAR, and then saw you had saved me the trouble!
UW-Madison are truly the vanguard where consensus hits the disk.
We implemented Protocol-Aware Recovery for TigerBeetle [1], and I did a talk recently at the Recurse Center diving into PAR, talking about the intersection of global consensus protocol and local storage engine. It's called Let's Remix Distributed Database Design! [2] and owes the big ideas to UW-Madison.
AdamProut
An fsync() failing doesn't necessarily mean there is a disk corruption. I agree all logging and recovery protocols should have different handling for a corruption vs a torn tail of the log for example, but I view that as mostly orthogonal.
I'm talking about exiting the process if fsync() fails and letting the distributed databases normal failover processing do its thing. This is a normal scenario for a failover (i.e, its the same as process crashing or getting OOM killed by linux, etc).
AdamProut
To be clear if fsync() on linux had well defined behavior on errors (no matter the file system used) I wouldn't suggest failing over as a reasonable solution for a distributed database. Its mentioned in the parent video, but fsyncgate[1] was a recent reminder of this.
eis
I recommend the excellent articles by the SQLite authors themselves like
How To Corrupt An SQLite Database File: https://www.sqlite.org/howtocorrupt.html
Atomic Commit In SQLite: https://www.sqlite.org/atomiccommit.html
and more under the "Technical and Design Documentation" section of their docs at https://www.sqlite.org/docs.html
They go into the details of challenges SQLite faces due to how hardware and OSs work or not work and how they solve them.
pachico
Don't get me wrong, I run SQLite in production with millions of records, but it seems people have discovered it just very recently. All of the sudden, there's always a post about it with hundreds of comments.
What changed?
benbjohnson
Author here. I agree there's been a resurgence lately. My working theory is that the performance of the underlying hardware has increased so much in the last decade that the simplicity of SQLite has become a more important tradeoff than the extra performance boost you may see with Postgres/MySQL. Also, network overhead tends to dominate small queries so SQLite can actually end up a lot faster since it's in-process.
I wrote up a longer version here: https://fly.io/blog/all-in-on-sqlite-litestream/
_vvhw
“the performance of the underlying hardware has increased so much in the last decade that the simplicity of SQLite has become a more important tradeoff”
Thanks for articulating this so clearly! It's going to be interesting also to see how much simplicity is realized from new thread-per-core designs leveraging io_uring to get rid of multithreading, locks and memory barriers scattered throughout the control plane. The performance dynamics are all changing and it's easier than ever to be writing simple, fast systems, that hit the 90% sweet spot.
You've probably also seen McSherry et. al's “Scalability! But at what COST?” [1]
[1] http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...
EMM_386
It seems to come in waves, there was another round of SQLite posts hitting the front page around 6 months ago as I recall.
I'm not sure why, SQLite is the most widely deployed database engine on the planet, so it's not as if the technology is really "gaining" in popularity ... it's already there.
I was always curious if this is due to the huge number of new developers, many front-end, who have joined IT over the past decade or so, now venturing out and beginning to learn about relational databases and their benefits.
viraptor
Flyio is bullish on sqlite. They post a lot of tech stuff about it. That helps sqlite, but also keeps Flyio in the news.
tptacek
That's true as far as it goes, but also Ben works here,exclusively on the open-source Litestream project, so his blog posts end up on our blog sort of naturally.
tyingq
I suspect part of it is things like litestream, dqlite, rqlite, etc. Opens it up to people who have to make it work in some distributed fashion.
TylerE
Notice how all these articles have the same author?
likeabbas
NoSQL is no longer cool
Scarbutt
The new edge computing fad. These companies need a way to bring the database to the edge too, so they are heavily promoting sqlite as the best thing since slice bread to sell their services as the one solution to all your problems :)
DimmieMan
Possibly confirmation bias but 'Simple' has gotten really trendy and for databases it doesn't get much simpler.
Quite a few posts appealing to the idea of "why are we doing all this nonsense? X does the job just fine".
simonw
This was so useful to me. I feel like I have a pretty solid idea of how journal mode actually works now.
I cannot wait for the follow-up explaining WAL mode!
benbjohnson
Thanks! The WAL mode is really interesting because it just changes the design just a little bit but you get a ton of additional benefits. I have that post in the works right now.
kretaceous
Loved the article.
Whenever I read articles or releases by Fly.io, it makes me want to work with them. As a newbie who's interested in databases and networking, I reckon it would be an amazing experience in the team.
Unfortunately, they do not hire interns or likewise (from their job page) and I assume, from the amazing work they do, they aren't looking for beginners. =)
EDIT: Thanks to people encouraging me to apply. I'll definitely try!
michaeldwan
We do hire entry level people, in fact we're wrapping up our first cohort of interns right now. Exclusively hiring experienced dudes from the tech bubble is a failure mode in the long run. We're invested in hiring folks with diverse backgrounds and experience levels, and we need to talk about that more for sure.
Our jobs page is sparse right now because we’re focusing on hiring EMs to help grow a healthy eng org. We’ll have more openings before long, and you should absolutely apply when something sounds right. Feel free to hunt me down online in the mean time!
mhlakhani
Since you mention hiring EMs, is there a place to view the open roles for EMs?
The jobs page seemed mostly IC focused
davnicwil
Start the application transaction. If it fails you'll just roll back to current state in your career log (resumé). Nothing ventured nothing gained.
klysm
Just make sure you are in at least read committed isolation
corobo
More likely to succeed with an application than an assumption of failure :)
krono
If there's something you want, it might work better to just straight up ask them for it.
spike021
Not only just apply, but definitely try to network with people on the team. Works well if they're active on twitter. I got my first internship by building a rapport with someone on twitter and then they worked with their org to find an internship opening for me (still had to interview and stuff) that wasn't on their careers site.
ripley12
> After that, 510 bytes are used for the SHARED lock. A byte range is used here to accommodate older Windows versions with mandatory locks.
I was curious how old, and... wow, that code is for Windows versions that predate the NT kernel (Win95/98/ME). I'm surprised that it's still around, but the comment does a great job of explaining it.
https://github.com/sqlite/sqlite/blob/3cf46ee508e97b46736a26...
icedchai
The NT kernel is older than you think and predates Windows 95. NT 3.1 was released way back in 93!
Karupan
Question for people who actually use SQLite in production: when do you start seeing performance degrade to the extent that you have to migrate to a "proper" DB? Can a transactional table scale to say 10 million+ records with a couple of indexes before it becomes a problem?
agildehaus
A company I work for runs MySQL on an IoT base station product (Beaglebone-like hardware). Tables seems to corrupt every so often, which typically are repairable with a "mysqlcheck --auto-repair" which they have as part of the boot sequence, but not always. These corruptions seem to be due to bad batteries or the field team holding down the power button too long.
Would sqlite be less prone to table corruption?
bob1029
I think yes - If you configured something like:
PRAGMA journal_mode=WAL;
PRAGMA synchronous=NORMAL;
This should be very safe & still reasonably fast.See: https://www.sqlite.org/pragma.html#pragma_synchronous and https://www.sqlite.org/wal.html
eli
You should take a look at your mysql settings e.g. `innodb_flush_log_at_trx_commit` but yeah mysql isn't really built for frequent sudden shutdowns.
RedShift1
I run about 100 computers that regularly receive power cuts and have not have databases go corrupt. However we also write to the database every minute or so, all data is packed into compressed chunks that contain a minute of data.
slt2021
problem is in application layer and mysql settings.
you can try to switch from myisam engine to innodb engine, tune innodb settings, write inserts in batches, rather than continuously, and wrap insert into transaction
iddan
I love the writing style. Technical blogs take notes
im3w1l
Good article but on a lighter note the thought of undoing a half-made sandwich is surreal to me. I always start by buttering the bread!
benbjohnson
The SQLite Sandwich Shop is condiment free. :)
Aperocky
minimalism and simplicity always bring great joy.
I wonder what is the largest server side system supported by SQLite as its primary data storage.
truth_seeker
F2fs (Flash Friendly file system based on LSM) natively supports Atomic writes.
https://github.com/sqlite/sqlite/blob/master/doc/F2FS.txt
https://lore.kernel.org/lkml/1411707287-21760-2-git-send-ema...
Get the top HN stories in your inbox every day.
For people interested in this topic I can recommend "Can Applications Recover from fsync Failures?" by A. Rebello et al[1]. The paper analyzes how file systems and PostgreSQL, LMDB, LevelDB, SQLite, and Redis react to fsync failures. It shows that although applications use many failure-handling strategies, none are sufficient: fsync failures can cause catastrophic outcomes such as data loss and corruption.
- [1] https://www.usenix.org/conference/atc20/presentation/rebello