Brian Lovin
/
Hacker News
Daily Digest email

Get the top HN stories in your inbox every day.

wereallterrrist

I find it very, very hard to go wrong with Syncthing (for stuff I truly need replicated, code/photos/text-records) and ZFS + znapzend + rsync.net (automatic snapshots of `/home` and `/var/lib` on servers).

The only thing missing is -> I'd like to stop syncing code with Syncthing and instead build some smarter daemon. The daemon would take a manifest of repositories, each with a mapping of worktrees->branches to be actualized and fsmonitored. The daemon would auto-commit changes on those worktrees into a shadow branch and push/pull it. Ideally this could leverage (the very amazing, you must try it) `jj` for continous committing of the working copy and (in the future, with native jj formart) even handle the likely-never-to-happen conflict scenario. (I'd happily collaborate on a Rust impl and/or donate funds to one.)

Given the number of worktrees I have of some huge repos (nixpkgs, linux, etc) it would likely mark a significant reduction in CPU/disk usage given what Syncthing is having to do now to monitor/rescan as much as I'm asking it to (given it has to dumb-sync .git, syncs gitignored content, etc, etc).

JeremyNT

> Given the number of worktrees I have of some huge repos (nixpkgs, linux, etc) it would likely mark a significant reduction in CPU/disk usage given what Syncthing is having to do now to monitor/rescan as much as I'm asking it to (given it has to dumb-sync .git, syncs gitignored content, etc, etc).

Are you really hitting that much of a resource utilization issue with syncthing though? I use it on lots of small files and git repos and since it uses inotify there's not really much of a problem. I guess the worst case is switching to very different branches frequently, or committing very large (binary?) files where it may need to transfer them twice, but this hasn't been a problem in my own experience.

I'm not sure you could really do a whole lot better than syncthing by being clever, and it strikes me as a lot of effort to optimize for a specific workflow.

Edit: actually, I wonder if you could just exclude the working copies with a clever exclude list in syncthing, such that you'd ONLY grab .git so you wouldn't even need the double transfer/storage. You risk losing uncommitted work I suppose.

wereallterrrist

inotify has pretty paltry limits. My ~/code is only 40-50GB but there's no way inotify can watch it all.

Thus, syncthing basically constantly has to rescan. It's not great.

And yes, rebasing linux+nixpkgs on even an hourly basis is absolutely devastating. lol

killingtime74

For code I just use a self hosted git server

than3

I hate to be the one to point out the obvious, but replication isn't a backup. Its for resiliency just like RAID, the two aren't the same.

reacharavindh

Replication to another machine that has a COW file system with snapshots is backup though :-)

We backup our data storage for an entire HPC cluster, about 2 PiB of it to a single machine with a 4 disk shelves running ZFS with snapshots. It works very well. Simple raunchy every night, and snapshotted.

We use the backup as a sort of Time Machine should we need data from the past that we deleted in the primary. Plus, we don’t need to wait for the tapes to load or anything.. it is pretty fast and intuitive

jerf

The person you're replying to said "Syncthing ... and ZFS + znapzend + rsync.net" though. You're ignoring the rsync.net part.

I have something similar; it's Nextcloud + restic to AWS S3, but it's the same principle. You can give people the convenience and human-comprehensibility of sync-based sharing, but also back that up too, for the best of both worlds. Though in my case the odds of me needing "previous versions" of things approach zero and a full sync is fairly close to backup, but, even so I do have a full solution here.

NelsonMinar

Syncthing has file versioning but I don't know for sure if it's suitable for backup. https://docs.syncthing.net/users/versioning.html

wereallterrrist

When I mentioned de-duping and append-only logs, I had this in mind. It's hard to imagine implementing a backup system with those two properties that don't include snapshotting nearly by design-necessity.

(Beyond even the fact that ~/code is also on a ZFS volume that is snapshotted and replicated off-site, which I argue can be used in all of the same important ways any other "backup" is used.)

Hence the comment! After all this blockchain hoopla and everyone's understanding of how "cool" Git is, we really, really deserve better in our backup tools.

jrm4

But, it makes things easy. I have e.g. a home computer, a server in the closet thing, a laptop and a work computer all with a shared Syncthing folder.

So to bolster that other thing, I just have a simple bash script that reminds me every 7 days to make a copy of that folder somewhere else on that machine. It's not precise because I often don't know what machine I will be using, but that creates a natural staggering that I figure should be sufficient of something goes weird and lose something; like I'm likely to have an old copy somewhere?

whalesalad

What is the actual difference between a backup and replication? If the 1’s and 0’s are replicated to a different host, is that any different than “backing up” (replicating them) to a piece of external media?

jjav

> What is the actual difference between a backup and replication?

Simplest way to think about it is that a backup must be an immutable snapshot in time. Any changes and deletions which happen after that point in time will never reflect back onto the backup.

That way, any files you accidentaly delete or corrupt (or other unwanted changes, like ransomware encrypting them for you) can be recovered by going back to the backup.

Replication is very different, you intentionally want all ongoing changes to replicate to the multiple copies for availability. But it means that unwanted changes or data corruption happily replicates to all the copies so now all of them are corrupt. That's when you reach for the most recent backup.

That's why you always need to backup and you'll usually want to replicate as well.

chrishas35

When those 1s and 0s are deleted and that delete is replicated (or other catastrophic change, such as ransomware) you presumably don't have the ability to restore if all you're doing is replication. A strategy that layers replication + backup/versioning is the goal.

hk1337

I use Syncthing between Mac, Windows (have included Linux in the mix at one point), and with my Synology NAS. Syncthing is more for my short term backup though. I will either commit it to a repo, save it to a Synology share, or delete it.

*edit* my gitea server saves its backups to synology

ww520

Yes. I just let Syncthing sync among devices, using it for creating copies of the backup. The daily backup scripts do their things and create one backup snapshot, then Syncthing picks up the new backup files and propagate them to multiple devices.

acranox

Sparkleshare does something kind of similar. It uses git as the backend automatically sync directories on a few computers. https://www.sparkleshare.org/

fncivivue7

Sounds like you want Borg

https://borgbackup.readthedocs.io/en/stable/

My two 80% full 1tb laptops and 1tb desktop backup to around 300-400G after dedupe and compression. Currently have around 12tb of backups stored in that 300G.

Incremental backups run in about 5 mins even against the spinning disk's they're stored on.

0cf8612b2e1e

Python programmer here, but I actually prefer Restic [0]. While more or less the same experience, the huge selling point to me is that the backup program is a single executable that can be easily stored alongside the backups. I do not want any dependency/environment issues to assert themselves when restoration is required (which is most likely on a virgin, unconfigured system).

[0] https://restic.net/

SomeoneOnTheWeb

You can also take a look at Kopia (https://kopia.io/).

I've been using Borg, Restic and Kopia for a long time and Kopia is my personal favorite - very fast, very efficient, runs in the background automatically without having to schedule a CRON or anything like that.

Only downside is that the backups are made of a HUGE number of files, so when synchronizing it can sometimes take a bit of time to check the ~5k files.

wereallterrrist

No, I distinctly don't want borg. It doesn't help or solve anything that Syncthing doesn't do. The obsession with borg and bup are pretty baffling to me. We deserve better in this space. (see: Asuran and another who's name I forget...)

Critically, I'm specifically referring to code sync that needs to operate at a git-level to get the huge efficiencies I'm thinking of.

Syncthing, or borg, scanning 8 copies of the Linux kernel is pretty horrific compared to something doing a "git commit && git push" and "git pull --rebase" in the background (over-simplifying the shadow-branch process here for brevity.)

re: 'we deserve better' -- case in point, see Asuran - there's no real reason that sync and backup have to be distinctly different tools. Given chunking and dedupe and append-logs, we really, really deserve better in this tooling space.

formerly_proven

borg et al and "git commit" work in essentially the same way. Both scan the entire tree for changes using modification timestamps.

codethief

I don't think GP was talking about backups (which is what Borg is good for) but about synchronization between machines which is another issue entirely.

_dain_

They work together. I use syncthing to keep things synchronized across devices, including to an always-on "master" device that has more storage. Then borg runs on the master device to create backups.

anotherevan

I use a Raspberry Pi as my backup orchestrator. It backs up my Linux desktop, my wife's Windows desktop, and a couple of other Linux based devices including itself. Every night, it:

* Mounts the external drive.

* Starts Restic's Rest Server.

* SSH into each machine to be backed up to kick off the script that will backup to the above server.

* Stop Rest Server.

* Rsync the external drive to my office (which is a 25 minute drive away) for off-site protection.

* Unmount the external drive.

* Emails the results of it all to me.

Has been working really well so far.

Notes:

* The RPi has limited SSH access to each machine. The only thing it can really do is start the backup script on the machine.

* The Linux machines are on all the time, but the Windows machine sleeps. So first it sends a wake-on-lan. Using Cygwin for SSH and scripting on Windows. The script on the Windows machine sets the power configuration to not go to sleep during the backup, and restores the setting afterwards. Restic's ability to create a VSS snapshot on Windows is awesome.

* I still need to incorporate my two kid's Windows laptops into the backup somehow. I doubt the wake-on-lan tricks will work reliably with them. I've yet to explore Urbackup which I think I can use to have them backup to my Linux desktop periodically when they are awake.

scubbo

+1 for Restic. I'm only just barely scratching the surface of its use, and it's still just amazing. Here's the article I referenced: https://www.seanh.cc/2022/04/03/restic, though it looks like you're doing a lot more wizardry than that!

PopAlongKid

>I don't use Windows at the moment and don't really mount network drives, either. That might be a good alternative to consider.

Regarding Windows:

I have successfully mirrored a notebook and a desktop[0] (single user) with Windows using robocopy, which is a utility that comes with Windows (used to be part of the Resource Kit but I think it is now in the base product). When I say "mirror" I mean I can use either machine as my current workstation without any loss of data, as long as I run the "sync" script at each switch.

I use "net use" to temporarily mount a few critical drives on the local network, then robocopy does its work, it has maybe 85% of the same functionality of rsync (which I also used extensively when administering corporate servers and workstations). Back in the DOS days, I wrote my own very simple version of the same thing using C, but when robocopy came along I was glad to stop maintaining my own effort.

[0]or two desktops, using removable high-capacity media like Iomega zip drives.

EvanAnderson

Robocopy is very nice but has no delta compression functionality. For things like file server migrations (where I want to preserve ACLs, times, etc) robocopy is my go-to tool.

I've used the cwRsync[0] binary distribution of rsync on Windows for backups. I found it worked very well for simple file backups. I never did get around to trying to combine it with Volume Shadow Copy to make consistent backups of the registry and applications like Microsoft SQL Server. (I wouldn't expect to get a bootable restore from such a backup, though.)

[0] https://www.itefix.net/cwrsync

rzzzt

I used QtdSync, another frontend backed by a Windows rsync binary. A nice feature was that it supported the "duplicate entire target folder with hard links, then overwrite changes only"-style on NTFS volumes, so I could have lots of browseable point-in-time backup folders without consuming extra disk space: https://www.qtdtools.de/page.php?tool=0&sub=1&lang=en

gary_0

I use MSYS2 on Windows in order to run regular rsync and other such utilities. It's served me very well for years. I also have some bash scripts that I can conveniently run on either Linux or Windows via MSYS2.

paravz

I switched from Cygwin+rsync(over ssh) to robocopy+samba to speed up backups (up to saturating 1Gbit connection):

    for %i in (C D) do robocopy %i:\ \\backup-server\b-%COMPUTERNAME%\%i /MIR /DCOPY:T /NFL /NDL /R:0 /W:1 /XJ /XD "System Volume Informatiowsn" /XD "$RECYCLE.BIN" /XD "Windows" /XD "Windows.old"

UI_at_80x24

ZFS snapshots + send/receive are an absolute game changer in this regard.

I have my /home in a separate dataset that gets snapshotted every 30 minutes. The snapshots are sent to my primary file-server, and can be picked up by any system on my network. I do a variation of this with my dotfiles similar to STOW but with quicker snapshots.

customizable

ZFS is a game changer for quickly and reliably backing up large multi-terabyte PostgreSQL databases as well. In case anyone is interested, here is our experience with PostgreSQL on ZFS, complete with a short backup script: https://lackofimagination.org/2022/04/our-experience-with-po...

pmarreck

Came here to say this. Can you list your example commands for snapshotting, zfs send, restoring single files or entire snapshots, etc.? (Have you tested it out?) I am actually in the position of doing this (I use zfs on root as of recently and I have a TrueNAS) but am stuck at the bootstrapping problem (I haven't taken a single snapshot yet; presumably the first one is the only big one? and then how do I send incremental snapshots? and then how do I restore these to, say, a new machine? do I remotely mount a snapshot somehow, or zfs recv, or? Do you set up systemd/cron jobs for this?) Also, having auto-snapshotted on Ubuntu in the past, eventually things slowed to a crawl every time I did an apt update... Is this avoidable?

customizable

Yes, the first snapshot is the big one, the rest are incremental. Restoring a snapshot is just one line really. Something like ;)

sudo zfs send -cRi db/data@2022-12-08T00-00 db/data@2022-12-09T00-00 | ssh me@backup-server "sudo zfs receive -vF db/data"

GekkePrutser

Zfs send/receive is nice but it does lack the toolchain to easily extract individual files from a backup. It's more of a disaster recovery thing in terms of backup.

customizable

You can actually extract individual files from a snapshot by using the hidden .zfs directory like: /mnt-point/.zfs/snapshot/snapshot-name

Another alternative is to create a clone from a snapshot, which also makes the data writable.

GekkePrutser

A snapshot yes but not a zfs send stream which is a single file.

falcolas

So, quick trick with rsync that means you don't have to copy everything and then hardlink:

    --link-dest=DIR          hardlink to files in DIR when unchanged
Basically, you list your previous backup dir as the link-dest directory, and if the file hasn't changed, it will be hardlinked from the previous directory into the current directory. Pretty nice for creating time-machine style backups with one command and no SSH.

Also works a treat with incremental logical backups of databases.

paravz

--link-dest is also used in hrsync, another rsync wrapper: https://github.com/dparoli/hrsync/blob/master/hrsync#L52

amelius

This is good to know, I used an extra "cp -rl" step in my previous scripts.

rsync

Yes - they accomplish the same thing.

--link-dest is just an elegant, built-in way to create "hardlink snapshots" the same way that 'cp -al' always did.

But note:

A changed file - even the smallest of changes - breaks the link and causes you to consume (size of file) more space cascading through your snapshots. Depending on your file sizes and change frequency this can get rather expensive.

We now recommend abandoning hardlink snapshots altogether and doing a "dumb mirror" rsync to your rsync.net account - with no retention or versioning - and letting the ZFS snapshots create your retention.

As opposed to hardlink snapshots, ZFS snapshots diff on a block level, not a file level - so you can change some blocks of a file and not use (that entire file) more space. It can be much more efficient, depending on file sizes.

The other big benefit is that ZFS snapshots are immutable/read-only so if your backup source is compromised, Mallory can't wipe out all of the offsite backups too.

falcolas

It also reduces the amount of data transferred, making the backup faster.

> We now recommend

Who's we?

falcolas

One thing of note - the file is not transferred, so backups happen faster and consume less bandwidth (important if your target is not network-local to you).

e1g

Recent versions of rsync support zstd compression, which can improve speed and reduce the load on both sides. You can check if your rsync supports that with "rsync -h | grep zstd" and instruct to use it with "-z --zc=zstd"

However, compression is useful in proportion to how crappy the network is and how compressable the content is (e.g., text files). This repo is about backing up user files to an external SSD with high bandwidth and low latency, and applying compression likely makes the process slower.

greggyb

Compression is useful even with directly attached storage devices. Disk IO is still slower than compression throughput unless you are running very fast storage.

If your workload is IO-bound, then it is quite likely that compression will help. Most people, on their personal machines, would likely see IO performance “improve” with filesystem level compression.

kkfx

Oh, curious, It's the first backup in clojure I've seen :-)

My personal recipe is less sophisticated:

- znapzend on all home machines send to a homeserver regularly (with enough storage), partially replicated between desktops/laptop

- homeserver backup itself via simple incremental zfs send + mbuffer with one snapshot per day (last 2 days), one per week (last 2 w) and one per month (last 1 month) offsite

- manually triggered offline local backup of the homeserver on external USB drives and a physically mirrored home server, normally on weekly basis

Nothing more, nothing less. On any major NixOS release update I rebuild one homeserver and a month or so later the second one. Desktops and homeserver custom iso are built automatically every Sunday and just left there (I know, it simply took to much time checking so...).

Essentially in case of a fault of a machine I still have data, config and ready iso for a quick reinstall. In case of logical faults (like a direct attack who compromise my data AND zfs itself) there is not much protection beside different sync times (I do NOT use all desktops/latptop at once, when they are powered off they remain behind and I have normally plenty of time to see most casual potential attacks.

Long story short for anyone: when you talk about backups talk about how you restore, or your backups will probably be just useless bits a day...

neilv

You can combine this with restricted SSH and server-side software, so that the client being backed up to the server can only add new incremental backups, not delete old ones.

(So, less data loss, in event of a malicious intruder on the client, or some very broken code on the client that gets ahold of the SSH private key.)

yehia2amer

Did anyone tried https://kopia.io/docs/features/

It is Awesome !

It’s very fast usually I struggle with backup tools on windows clients. And it ticks all my needs. deduplication, End-to-End Encryption, incremental Snapshots with error Correction if any, mounting snapshots as a drive and using it normally or to restore specific files/folders, Caching. The only thing that could be better is the GUI but it works.

mekster

Backup tools are nothing until it can prove its reliability which can only be proved with many years of usage.

In that regard, I don't trust anything but Borg and zfs.

yehia2amer

zfs is not an option with windows clients and even most linux clients. Also finding these set of features is really scarce not sure why ! I am using zfs on my server though!

pmontra

His backup rotation algorithm is very close to what rsnapshot does.

https://rsnapshot.org/

NelsonMinar

I use rsnapshot still! It feels very old fashioned but it works reliably and is easy to understand.

russdill

You may really like https://github.com/bup/bup if you want something a bit more modern but in the same style

pmontra

> bup stores its data in a git-formatted repository.

I understand the benefits for deduplication etc. but this is a show stopper for me. I greatly prefer to be able to navigate my backups with cd and ls or the file manager in the GUI and inspect the files directly without having to extract them first. After all I only have to backup a laptop and little else.

mekster

It's good to keep multiple backups with different implementations local and remote.

Rsnapshot is hard to break by using very basic principles of file system based files and hard links. If your file system isn't zfs, I think it's a viable backup strategy for local copy while you can use others to take remote backups.

LelouBil

Speaking about backups, I recently set up a back up process for my home server including a recovery plan, and that makes me sleep better at night !

I have Duplicati [0] that does a backup of the data of my many self hosted applications Every day, encrypted and stored in a folder on the server itself.

Only the password manager backup is not encrypted by Duplicati, because it's encrypted using my master password, and it stores all the encryption keys of the other backups.

Then, I have a systemd service to run rclone [1] every day after the backups finished to sync the backup folder towards :

- Backblaze B2

- AWS S3 Glacier Deep Archive

For now I only use the free tier of B2 as I have less than a GB to backup, but that's because I haven't installed next cloud yet !

However, I still like using S3 because I am paying for it (even though deep Archive is very cheap) and I'm pretty sure if something happens with my account, the fact that I'm a paying customer will prevent AWS from unilaterally removing my data (I have seen posts about google accounts being closed without any recourse, I hope I'm protected of that with AWS)

Right now I only have CalDav/CardDav, my password manager and my configs being backed up, but I plan to use Syncthing to also backup other devices towards the home server, to fit inside what I already configured.

If anyone has advice on what I did/did not do/could have done better please tell me !

[0] https://www.duplicati.com/

[1] https://rclone.org/

smm11

I gave up on this at home long ago, and just use Onedrive for everything. I don't even have "local" files. My stuff is there, and in the event my computer won't start up I lose what's open in the browser. I can handle that.

At work I use Windows backup to write to empty SMB-mounted drives nightly, then write those daily to another drive on an offline Fedora box.

My super critical files are on an encrypted SD card I sometimes put in my phone when cellular connection is off, and this is periodically backed up to Glacier. The phone (Galaxy) runs Dex and can be my computer when needed to work with these files.

Daily Digest email

Get the top HN stories in your inbox every day.