Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, and torrent search

Daily Digest email

Get the top HN stories in your inbox every day.

chmod775

> The DHT crawler is not quite unique to bitmagnet; another open-source project, magnetico was first (as far as I know) to implement a usable DHT crawler, and was a crucial reference point for implementing this feature.

Heh. That was one of my first projects when I was still learning to code back in 2012: https://github.com/laino/shiny-adventure

The DHT crawler/worker lived seperately, and I eventually put it here to rescue it from a dying HDD: https://github.com/laino/DHT-Torrent-database-Worker

The code is abhorrent and you absolutely shouldn't use it, but it worked. At least the crawler did - the frontend was never completed.

Since the first implementation of mainline DHT appeared in 2005 and crawling that network is really quite an obvious idea, I doubt we (a friend was working on it as well) were first either.

JP44

Nothing substantial, I chuckled when I saw the commit history on your linked projects. I do not mean to belittle you (or the purpose/goal of the projects), genuinely enjoyed the distraction and 'results' from it:

Today was the first commit after 11 (9 oct 2012) and 5 years (24 nov 2018), respectively, on the projects. I think your repo might be part of some sort of oldest 'active'- or 'not ported to another repo'-repo

For what I've found in ~10 min (google/gpt), excluding git projects existing before spring 2008 (couldn't get a quick consensus on feb vs april of that year), there's not a lot

(I'll edit this part if sources are requested)

Fatnino

I recently committed to an old repo of mine after a 9 year gap.

It holds several one file python script experiments and toys that I lumped into one place to get them off my hdd and make them available from wherever. Recently remembered it existed and added another one. And while I was in there I also ran 2to3 on the ones that needed it and polished the results up.

the8472

It seems like every single of these things always cut corners and don't implement proper, spec-compliant nodes that provide the same services as they use. You know, the "peer" in p2p. BEP51 was designed to make it easier to not trample on the commons, and yet...

mgdigital

Author here. FWIW I wasn't intending this to make it onto HN, having posted about this on Lemmy looking for beta testers. The current version of the app is very much a preview. There's much further work to be done and this will include as far as possible ensuring Bitmagnet is a "good citizen". The suggestions made on the GH issue look largely feasible and I'll get round to looking at them as soon as I can.

The issue and my response on GH: https://github.com/bitmagnet-io/bitmagnet/issues/11

Y-bar

Hey! Don't know if the Github repo or here is the best way to ask since the Discussions on the repo are not active.

I have started the docker-compose.yml file in WSL and it has been running for an hour slowly accumulating a few megabytes of redis data running at about 5% CPU usage. Inspecting it shows magnet links. It appears to be working.

But visiting the the web interface at localhost:3333 just yields "Firefox can’t establish a connection to the server at localhost:3333." after a 30 second timeout.

Would you have a guess why?

e12e

Appreciate that you took the time to file an issue:

https://github.com/bitmagnet-io/bitmagnet/issues/11

zolland

Without the peers all we have is to!

coolspot

Please remove copyrighted movies from the screenshot on your website. It provides evidence that this program is designed for violating copyright, which makes DCMA takedown trivial.

mgdigital

Thanks - I only condone accessing the legal content available on BitTorrent, and my screenshots now embody this moral stance.

dotBen

Yes, I'm grateful for this being built so I can locate and identify all the copies of Linux I could download and install...

sneak

Indexing what is available for download is useful for research into piracy even if you don’t engage in piracy yourself.

rsync

Please determine if these images fall under fair-use provisions and, if so, leave them in place.

Bad actors - whoever they may be - need to see your rights constantly reasserted.

crazygringo

This has nothing whatsoever to do with fair use.

It's about arguments in court about the intention of the software if sued. Images of copyrighted content indicate intent to infringe copyright. Without those, you can argue it's only meant to find and index Linux image torrents or whatever.

Fair use doesn't enter the picture at all.

KennyBlanken

The MPAA and other organizations use screenshots that show copyrighted material as "proof" that the tools are used for copyright violation, and then DMCA them.

If you want to help pay for lawyers to fight those DMCA notices with counterclaims and lawsuits, put up or shut up; the FSF, EFF and ACLU have been noticeably disinterested in doing so.

gymbeaux

I don’t see any screenshots on the website

davidcollantes

There was, and were removed.

pipes

And they probably want to remove text like

"It then further enriches this metadata by attempting to classify it and associate it with known pieces of content, such as movies and TV shows. It then allows you to search everything it has indexed."

cchance

There's nothing wrong with that since there are LOTS of free to share movies and tv shows especially those past their copyright dates.

dewey

I don't see anything wrong with that if the example titles are under the correct licenses like the often used: https://en.wikipedia.org/wiki/Big_Buck_Bunny which can then be mapped to open databases like https://www.themoviedb.org/movie/10378-big-buck-bunny.

ipaddr

It might be too late.

Fnoord

See also: Magnetissimo [1], Torrentinim [2], and Spotweb [3].

[1] https://github.com/sergiotapia/magnetissimo

[2] https://github.com/sergiotapia/torrentinim

[3] https://github.com/spotweb/spotweb

e12e

Any comments on how these compare? Especially in relation to sibling comment about BEP51?

Fnoord

You want Torznab support. That is basically metadata you want to export, to import it into your application which holds the database of what you are after. If it is a match, it should attempt to download it via your download client (BitTorrent client).

Torrentinim is the successor of Magnetissimo but it lacks Radarr/Sonarr integration (there is a pull request for Torznab support for both). Spotweb has Newznab support [1] but at Black Friday (soon) there's usually tons of deals available for Newznab indexes.

I don't care about BEP51 as I don't have huge upload. That is also why I prefer Usenet over torrents. But torrents are a useful and sometimes required backup mode. Just not my preferred one.

[1] https://github.com/Spotweb/Spotweb/wiki/Spotweb-als-Newznab-...

diggan

From https://github.com/bitmagnet-io/bitmagnet/issues/11

> The DHT implementation was largely borrowed from Magnetico (https://github.com/boramalper/magnetico) a popular and widely used app which is likewise unable to service these requests.

the8472

From a brief look at each it seems like they're scraping things like torrent websites, usenet or maybe RSS feeds. Not the DHT.

INTPenis

Do you know if there is a simpler torrent tracker out there, like whatever the fedora project is using?

https://torrent.fedoraproject.org/

I just want a simple list, and the backend.

forgetm3

Isn't this much the same as btdig.com which is based on: https://github.com/btdig/dhtcrawler2

I use this service to do security research a fair bit. It'd be nice if there was a higher quality self-hosted version I could use so I'll be watching this project with interest!

executesorder66

I use btdig for finding torrents myself.

But I am curious what you mean when you say you use it "to do security research"?

Are you just looking for security information that is available in torrents, or does btdig have some other features that I am unaware of?

r3trohack3r

Have been playing around with DHT crawling for a while now, curious how you're getting around the "tiers" of the DHT?

IIUC peers favor nodes they've had longer relationships with to provide stable routes through the DHT.

This means short-lived nodes receive very little traffic, nobody is routing much traffic through fresh nodes, they choose nodes they’ve had longer relationships with.

The longer you stay up, the more you start seeing.

At least this is what I've observed in my projects. The only way I've been able to get anything interesting out of the DHT in the last ~5 years has been to put up a node and leave it up for a long time. If I spin up something, the first day I usually only find a handful of resolvable hashes.

Not to mention it seems the BitTorrent DHT is very lax in what it will route compared to other DHTs (like IPFS) meaning many of the hashes you receive aren't for torrents at all.

doakes

What kind of bandwidth usage should be expected from a DHT crawler like this?

KennyBlanken

After ~30-60 minutes of running, still less than 100kB/sec combined in and out. However, as others have noted, nodes don't communicate much with nodes that haven't been up for a while (days.)

It's using roughly 6% CPU time for the crawler and another 1-2% for postgres, on a second-gen i7.

As a datapoint to set expectations: 4000 torrents have been captured so far, and somewhat surprisingly, they're not very current results, necessarily.

For example, a certain wildly popular TV series about samurai in space swinging very hot swords around which just had its season ending episode last night (I think)...that ep isn't in my list so far, but the episode prior to it, and the first two episodes, are.

There's a ton of random, low-seed torrents, so it's actually kind of interesting to search by type, year, etc and see what comes up.

KennyBlanken

I've been unable to get this running; I gave it a postgres user and database, granted it ownership and all permissions on said DB, and there's nothing in the database.

Edit: found the init schema and things seem to be working now: https://github.com/bitmagnet-io/bitmagnet/blob/main/migratio...

It would be really nice to be able to sort by header (size, seeders) and/or some filters for seed/downloaders (for example, filtering out anything with less than X seeds.)

drakenot

> Something that looks like a decentralized private tracker; by this I probably mean something that’s based partly on personal trust and manually weeding out any bad actors; I’d be wary of creating something that looks a bit like Tribler, which while an interesting project seems to have demonstrated that implementing trust, reputation and privacy at the protocol level carries too much overhead to be a compelling alternative to plain old BitTorrent, for all its imperfections

I've thought about this problem a lot. Having a federated / distributed tracker but with some form of trust based, or opt-in curation would be amazing.

synctext

Tribler developer here.

The "Trust framework" that everybody is after: lightness of tit-for-tat with little overhead. With trust that actually can make The Internet a nice place again! Curation is indeed the problem discovered by HN practitioners such yourself. However, nobody in academia is working on this. As one old scientific article is titled: "Curation, Curation, Curation"

johng

I've been running this for about 5 days in Docker. I've had to restart bitmagnet a handful of times as it seems to crash. The web interface is available but the number of torrents indexed never increases. I'm currently at 180,744 indexed and I'm waiting to see how high it goes. I check on it a few times a day.

askiiart

And the predecessor to this: https://github.com/mgdigital/rarbg-selfhosted

It's now archived due to effort being redirected to Bitmagnet.

alexpotato

For those of you interested in the intersection of "make vs buy" decisions, self hosting torrent servers and working at a hedge fund, you might get a kick out of this Twitter thread:

https://twitter.com/alexpotato/status/1661334108180566016

Daily Digest email

Get the top HN stories in your inbox every day.