Brian Lovin
/
Hacker News
Daily Digest email

Get the top HN stories in your inbox every day.

iagovar

As a non-dev intruder I have to say that I love SQLite. I do a lot of data-analysis and it makes everything easy, from fast SQL Wizardry to sharing the DB just coping a file! Just how amazing is that?!

It must sound naive to some of you, but the first time stumbled upn sqlite I was so excited!

mmsimanga

I work in data analysis too and SQLite is to me what MS Excel is to other people in the industry. I have development background so I never picked up in MS Excel to same degree as my colleagues from statistics backgrounds. In corporate organisations where DBAs monitor the use of their database with an eagle eye SQLite is a fantastic option to analyse several millions of lines of data on your desktop. I find DB Browser for SQLite (DB4S)[0] is invaluable when using SQLite.

[0]https://sqlitebrowser.org/

kyllo

If you like SQLite for data analysis, you might want to check out DuckDB https://github.com/cwida/duckdb which is billed as "SQLite for analytics."

SQLite is a row store, which is best for OLTP (point queries, inserting/updating/deleting one or a few rows at a time), while DuckDB is a column store, which means the data layout has values from the same column stored contiguously, making aggregation queries (GROUP BY) perform much better.

erichocean

Wow, why isn't DuckDB not more widely known!?!?! Looks incredible!!!

pachico

This is great! Thanks for sharing!

edwinyzh

For Windows also check HeidiSQL (open source, very feature-rich)

mmsimanga

Thanks for this. I used to be a longtime user of HeidiSQL but drifted away because I landed up in an Oracle environment. At the time I drifted away SQLite was not supported. I look forward to being re-acquainted with HeidiSQL.

iagovar

Excel is nice as a simple sandbox, and PowerQuery is amazing IMO, the problem is that it's basically impossible to work with large datasets. If you didn't know about it I recommend you to play a bit with it.

And check SQLiteStudio for Windows, is nice too.

nojito

I handle hundreds of millions of rows in excel trivially.

PowerPivot is extremely amazing.

mkl

Does SQLiteStudio let you cancel an unexpectedly slow query part way through? DB Browser for SQLite doesn't seem to, and I don't really know what I'm doing so often accidentally set off a query that can't use indices without doing "EXPLAIN QUERY PLAN" first.

giancarlostoro

I used to use this but if you have a JetBrains sub you can open these files directly. Or even better if its a full subscription DataGrip is fantastic and they recently added MongoDB support. Which is not perfect but good enough!

undefined

[deleted]

fauigerzigerk

I have a lot of Excel-like use cases for which SQLite would be a great fit if only it had a decimal type.

lifepillar

I have developed SQLiteDecimal: http://chiselapp.com/user/lifepillar/repository/sqlite3decim...

Still very rough, but usable.

cordite

As in like Java Bigdecimal, with a BigInteger on the left side, and a fixed precision on the right?

darksaints

It does have a decimal type.

time0ut

My tools of choice are similar: sqlite, DB browser, Jupyter. I've found that DB browser struggles with analytical queries on tables over a few GB though. Still an invaluable exploratory tool.

iagovar

Check SQLiteStudio. DBeaver also supports SQLite as far as I remember, but it's a bit more cumbersome and really geared towards mysql etc.

StavrosK

> It must sound naive to some of you, but the first time stumbled upn sqlite I was so excited!

Why naive? That's the experience of all of us with SQLite.

jventura

Most of my web apps’ databases are an SQLite file. It’s more than enough for the ammount of traffic they serve and the db files are easy to set up and backup..

eska

I also always wondered why sqlite isn't used more in websites. Especially if you split heavy write workloads to a separate database file it scales quite far.

qes

There's no reasonable way to share a SQLite database between processes on separate machines or VMs.. What website doesn't use at least 2 instances for HA?

How do you make sure you don't lose data in a SQLite DB?

lenkite

How do you architect this ? This works only for single process web-apps right ? With no HA or failover ?

jventura

Right! But as I said, the low traffic volume doesn’t require more than this kind of architecture.

I do regular file backups and have a cron job restarting the web apps each 20 minutes, it’s enough for now. I would love to have a more solid architecture, but it matches the traffic volume and it’s good enough..

duskwuff

TBH, it'll work for a lot of multi-process apps, so long as the app runs on a single server and the app doesn't do anything foolish like write to the database on every page hit. For read-mostly applications -- like a blog, wiki, or even a small web forum -- SQLite is a surprisingly adequate database.

hobs

I can't even tell you how many times a dev came to me with some weird question that was very simply answered by "just use sqlite."

I've never thrown more than 20GB of data at it, but it always handles things admirably, and it often (to me) makes the right calls about the level of complexity required (accepts column type defs but really just doesnt care if it maps to string.)

darksaints

I have done several multi-terabyte queries on it. One time I had a race with another dev to see who could get the right answer on a 3TB dataset. I was finished before his spark cluster had even spun up.

nolok

SQLite handles dozens of DB databases surprisingly well, especially in a single user for data analysis scenario.

dragonshed

I totally agree.

I'm a front-end dev that can wing backend from time to time, and I use SQLite as much as possible. On multiple projects now I've run into complications due to complexity or environments, and adding a simplified local development backend with sqlite kept down time to a minimum.

SQLite is awesome.

ak39

Same. The last time I was this excited about an RDBMS in your pocket was when BerkleyDB released their Java database in a single JAR file. I am not sure when they initially released it but I remember doing some hobby projects in 2003.

mmsimanga

I do occasionally use Apache Derby[0] in similar manner to SQLite. Mostly when data types are critical. Works well when using DBeaver or any SQL editor that allows you to add JDBC connections.

[0]https://db.apache.org/derby/

giancarlostoro

At my job we started using H2 which much like SQLite has shareable databases. We now wish more databases were this simple. Its just too efficient to share a db in its total current state and someone with a little more experience or who does not have tunnel vision can probe and resolve issues. Plus you can always go between states of broken vs working.

iagovar

Is there any easy GUI for H2?

wener

I use DataGrip for all database

bob1029

For our B2B application, we've been using SQLite as the exclusive means for reading and writing important bytes to/from disk for over 3 years now.

We still have not encountered a scenario that has caused us to consider switching to a different solution. Every discussion that has come up regarding high availability or horizontal scaling ended at "build a business-level abstraction for coordination between nodes, with each node owning an independent SQLite datastore". We have yet to go down this path, but we have a really good picture of how it will work for our application now.

For the single-node-only case, there is literally zero reason to use anything but SQLite if you have full autonomy over your data and do not have near term plans to move to a massive netflix-scale architecture. Performance is absolutely not an argument, as properly implemented SQLite will make localhost calls to Postgres, SQL Server, Oracle, et. al. look like a joke. You cannot get much faster than an in-process database engine without losing certain durability guarantees (and you can even turn these off with SQLite if you dare to go faster).

rakoo

> build a business-level abstraction for coordination between nodes, with each node owning an independent SQLite datastore"

There even are solutions that do this already:

- rqlite (https://github.com/rqlite/rqlite)

- actordb (https://www.actordb.com/)

seektable

+1 we use SQLite in our BI tool to keep information about user accounts, data sources (cubes) / reports configs, access control rules etc. SQLite works like a charm even in cloud version where we have > 6,000 registered user accounts.

HelloNurse

A successful business, but not a demanding database: 6000 users accessing the database hundreds of times per day each, for small reads and small writes, are unlikely to need more than a few GB of space (the parts of the database in use should fit in the operating system's disk cache) and unlikely to exceed 20-30 small I/O operations per second.

untog

But it accurately describes a lot of use cases where something like Postgres (or MSSQL or even Oracle!) have been deployed because it’s what people assume is needed. It’s great to see accounts detailing how simple things can be.

hobs

I love sqlite, but just a wonder on how big you are going?

I regularly see 50TB total of databases on SQL Server, and scaling up to thousands of clients.

bob1029

With current volume, our largest transactional datastore is ~50GB.

That said, if we had a situation where we were pushing such volume that the transactional store was 50 TB, and we were still within the scope on the overall project, I see no reason why this would be a problem for us. As I mentioned prior, the context of our applications is single process, single server. Obviously, this is not suitable for all edge cases and you will potentially be better off with a hosted solution in such cases.

At the end of the day, there is no arbitrary number of bytes where B-Tree indicies just stop working because you didn't shell out for a SQL Server license.

hobs

Sure, but things like horizontally partitioning your data across multiple hard drives via splitting files, supporting partition elimination in your queries, etc are all things that I am pretty sure sqlite doesnt even want to bring to to the table.

CJefferson

Out of interest, what circles are you moving in? Are people filling their databses with videos or something?

I'm sure I just lack the imagination, but i cant imagine how there can be thwt many companies who can make TB of data they then want to shove in a database. The complete sales history of even a medium sized company should still easily fit in a GB or two.

hobs

If you decide you dont want to throw anything away forever you can grow pretty fast :)

Some of the huge ones absolutely do binary storage for HA/DR reasons, but payroll companies and realestate companies are the mainstay and they have a tendency to store a lot of XML crap on top of the lovely normalized rows.

combatentropy

> I regularly see 50TB total of databases on SQL Server

How big is each database? With SQLite, each database is its own file.

The theoretical limit for such a file is 140 TB, but the practical limit is probably much lower (https://www.sqlite.org/whentouse.html).

hobs

Anywhere from 40GB to 10-15TB.

Kaze404

I often connect to production databases in read only users to do various data analysis. Is this something you can do with SQLite (besides maybe SSHing into the machine)? If not, how do you get around it (if it ever even comes up)?

bob1029

We have a few paths for this type of thing.

One is to simply zip up the entire database and send it across the wire. This is most applicable for local development and QA testing scenarios.

Another is to have something in the business application and relevant tooling that allows for programmatic querying of the data we need to look into.

We also have some techniques where we do ETL of the data range we care about from 1 SQLite db to another, then pull down the consolidated db for analysis.

lenkite

How do you sync db data across nodes ? Custom built solution or use something off the shelf ?

dtf

While reading the documentation for iff(), I noticed the command line function edit(), which is pretty cool.

  UPDATE docs SET body=edit(body) WHERE name='report-15';
  UPDATE pics SET img=edit(img,'gimp') WHERE id='pic-1542';

thibran

If you like edit, you might like :memory: too :)

https://sqlite.org/inmemorydb.html

edwinyzh

I also didn't know about it before! Cool

ha470

While I love SQLite as much as the next person (and the performance and reliability is really quite remarkable), I can’t understand all the effusive praise when you can’t do basic things like dropping columns. How do people get around this? Do you just leave columns in forever? Or go through the dance of recreating tables every time you need to drop a column?

roenxi

SQLite is for storing data in an environment where SQL is meaningful. Anyone wanting to do database admin tasks (like adjusting table schema) would be well advised to go with a Real Database.

SQLite really shines when you have a more-or-less final idea of what tables you want to build and don't have to worry about users inputting crazy data. Anything up to that and it is a model of simplicity. Anything beyond that and it is undercooked.

cheez

I use SQLite as a Real Database and alembic helps me do silly things like alter columns by copying tables for me.

gwbas1c

I just sucked the existing table into RAM and recreated the table. I did it on a transaction so there was no risk of data loss.

In my case the data was always 10s of MBs.

Remember, the point of SQLite is a replacement for generating a file format. Although it's a database, it lets us (developers) re-use our knowledge of databases when doing basic file I/O.

72deluxe

How does dropping the existing table and recreating it affect FKs pointing to the table that is being dropped??

Do the FK relationships get destroyed??

kcolford

You gotta redo the foreign key constraints in the same transaction that you rebuild the table

gwbas1c

In my case foreign keys were enforced in the application business logic. The schema was extremely simple.

virvar

When do you need to drop a column in a production DB? Maybe my anecdotal bubble is about to burst, but I work in the public sector, and have for a while and on our 200 different production DBs behind around 300 systems we’ve never dropped a column.

calpaterson

Depends on the maturity of your schema - if it's all figured out based on your business domain it won't happen much. If you're still finding product-market fit (or equivalent) splitting a table into two happens sometimes.

HelloNurse

"Splitting" a table usually means creating two new ones and dropping the old one after migrating its content with a complex migration script followed by thorough testing. Dropping columns is not only abnormal (adding columns is far more common: features tent to be added, not removed, over time) but also a very crude tool.

james-mcelwain

There are also operational concerns here. Dropping columns may require rebuilding indices, which can have a high cost that isn't worth paying for just to keep the schema clean.

globular-toast

Splitting a table into two doesn't involve dropping fields. You just create two new tables then drop the old one.

eli

Don’t many MySQL backends also recreate the whole table when you drop a column? They just hide it from you better.

faceplanted

Pretty sure they must, row based storage on disk would practically require it just to not completely waste all of the space you've just gained from deleting the column by leaving a gap on every single row.

calpaterson

Adding a nullable column is constant time (ie: basically instant) in postgres and innodb, maybe also in other systems.

gwbas1c

And some of them have downtime during the schema update, too.

mmsimanga

I agree with you. SQLite drove me nuts when it came to changing your database. This is one of the reasons I just use DB Browser for SQLite (DB4S). It takes care of all the complexity.

Carpetsmoker

The general strategy is to create a new table, insert data from the old table, drop the old table, rename the new table, and re-create the indexes:

  create table foo2 (
   col1 int,
   col2 text
  );
  insert into foo2 sleect col1, col2 from foo;
  drop table foo;
  alter table foo2 rename to foo;
  create index on foo(col1);
See: https://www.sqlite.org/lang_altertable.html#making_other_kin...

As for the reason, see the next section on that link. It's not perfect, but it is what it is. SQLite wasn't designed to solve 100% of the use cases, and that's not a bad thing.

dirtydroog

We use sqlite as a smaller cache of a much larger postgres db. The cache is generated by a job and yes is regenerated every time before being pushed to machines that need it.

barrkel

Think of SQLite as a file format which happens to have a query interface, and not a database.

MySQL did DML for years with full table rewrites behind the scenes. It's not particularly hard to emulate, though not entirely pleasant.

(Although I really raise an eyebrow when people talk about performance. Every time I write anything server-like in sqlite I end up regretting it, and get enormous speed boost, not to mention actual concurrency, switching to PostgreSQL.)

why-el

One of the great things one can learn from SQLite is the degree to which they unit (and integration) test their source code. It's honestly the best unit test document I have read in my career to date: https://www.sqlite.org/testing.html.

ardy42

IIRC, some company wanted to use SQLite on an airplane, so they paid the devs enough to bring the test suite up FAA standards. IIRC, they have code coverage of every machine instruction.

SQLite

That was my business plan: Do the intense testing required for avionics, then sell the test cases to aviation manufacturers. That plan didn't work out - I've never sold the tests to any aviation manufacturer; not one. But the TH3 test harness has had side benefits that I did not anticipate, not the least of which is that it allows us to maintain a complex code base that is run on billions of devices with just a few developers.

why-el

Yep, I think (with 90% certainty) that Richard Hipp, the creator of SQLite, mentioned this in a Youtube Talk, but sadly I can't recall which one. :(

wenc

SQLite is great but its decision in not having a standard datetime/timestamp datatype -- a standard in all other relational databases -- has always struck me as a surprising omission, but in retrospect I kind of understand why. Datetimes are undeniably difficult.

So sqlite leaves the datetime storage decision to the user: either TEXT, REAL or INTEGER [1]. This means certain datetime optimizations are not available, depending on what the user chooses. If one needs to ETL data with datetimes, a priori knowledge of the datetime type a file is encoded in is needed.

In that sense, sqlite really is a "file-format with a query language" rather than a "small database".

[1] https://stackoverflow.com/questions/17227110/how-do-datetime...

combatentropy

"SQLite does not compete with client/server databases. SQLite competes with fopen()." --- https://www.sqlite.org/whentouse.html

undefined

[deleted]

trashburger

>Increase the default upper bound on the number of parameters from 999 to 32766.

I don't want to know the use case for this.

Keep rocking on, SQLite. It's the first tool I reach for when prototyping anything that needs a DB.

oefrha

Simple. Bulk insert with a 999-parameter limit is just painful; if each entry has 9 columns, you can’t even insert 112 rows at once. In practice distros already compile with higher default; e.g. Debian compiles with -DSQLITE_MAX_VARIABLE_NUMBER=250000, still way higher than this new default.

beachy

What's the point? Inserting batches of 1000 rows at once, or even 10k rows at once is hardly any faster overall than using batches of 100 rows, assuming there are no delays in presenting the batches to the DB.

Carpetsmoker

It's just easier: I won't have to split queries with 1,500 parameters in two because of some limit.

dtf

For instance, you might want to update a large subset of rows via WHERE id IN (?,?,?,...) instead of WHERE id IN (SELECT ...)

zubairq

Thanks so much for SQLite. Amazing and stable database. Yazz Pilot (https://github.com/zubairq/pilot) is built on it

oefrha

Good to see a ternary function iif() added. Case expressions are usually pretty painful and/or unreadable when using query builders.

RivieraKid

Is it reasonable to assume that in most current deployments of PostgreSQL or MySQL, SQLite would be at least an equally good choice?

I was recently choosing a database for a medium-size website and SQLite seemed like an obvious choice. One thing I was worried about was that the database locks for each write - but this is apparently not true anymore with write-ahead log.

duskwuff

Depends on the environment. SQLite will scale out reasonably well so long as it's only needed on one machine. As soon as you need a network-accessible database, traditional database servers start looking like a better option.

Carpetsmoker

I ran some performance/reliability benchmarks on the product I'm working on (which supports SQLite and PostgreSQL), and SQLite was about 30% faster than PostgreSQL.

This won't hold true for all use cases; one table now has 11 million rows, and I'm not sure how well SQLite would perform on that. The benchmark was very simple anyway, and it's mostly a read-only where users don't update/insert new stuff. Would be interesting to re-test all of this.

justinmeiners

Yes, most wordpress or joomla sites come to mind. There is typically only one application communicating with it, the user doesn't typically doesn't admin the database directly (and if they did they want a file), medium traffic load (hundreds per second), and most of the queries are reads, with the occasional content update.

As soon as you get into privilege levels or heavy loads, then those others make more sense.

undefined

[deleted]

therealdrag0

> “Most current deployments”

I doubt it but we’re both guessing. Personally I’ve never worked on a professional project that had all readers/writers on a single computer. So in my bubble SQLite is not an option.

rakoo

As usual, it depends. The official website has a page helping you decide (https://sqlite.org/whentouse.html)

TL;DR: if your data-access scenario is read-mostly write-sporadically, and your data being in a single place is ok, SQLite is fine.

It does make sense for many applications

emadda

I’ve been using SQLite on GCP for a few small projects and it seems to work well.

I use docker volumes to write to disk. I pass the disk directory to my process via a CLI arg.

When running on a VM these disk writes are replicated between zones (this is default for regional GCP disks). So you get zero config high availability (if you can tolerate down time during a reboot).

rhencke

You might find DQLite of interest.

https://dqlite.io/

emadda

Thanks I have seen this, but would prefer to use the data center provided replication at the disk level as I do not need to have real time failover (I just need to make sure I can recover data in case of a single zone failure). Also incremental disk snapshots are nice to have.

devwastaken

Are there resources for good practices on database formatting? I feel that what I make 'works', but I'd be curious on what experienced databases look like.

For example I have an app that you upload files through. Files can be local to the server or on s3 and have metadata. I end up making a new table for the API points. Like a table for listing files/directories. A table for local files and a table for s3 files. Then a table for the metadata, and a table for the kind of file it is, etc. It works, but it feels like a heavy hammer.

vbezhenar

You might want to check out Codd books. He invented relational model after all and his books cover database design.

Daily Digest email

Get the top HN stories in your inbox every day.

SQLite 3.32 - Hacker News