Pirate Library Mirror: Preserving 7TB of books (that are not in Libgen)

Daily Digest email

Get the top HN stories in your inbox every day.

jrm4

Again, it's just insane to me that we don't even much have a meaningful discussion of:

"Hey, wait, literally everyone could have the entire library of Alexandria in their house for a couple hundred bucks per person. Like, all the knowledge ever. Maybe that should be considered the good default of things.

At least one in every town that everyone could use, for free, forever, without restriction to ANY of the knowledge anyone desires."

gmiller123456

As a kid, my parents had a full copy of the Library of Alexandria on the coffee table. Everyone else just called it an ashtray. (Sorry, too soon?)

ta988

It likely didn't burn and was mostly destroyed by purging of scholars and abandon. https://en.m.wikipedia.org/wiki/Library_of_Alexandria

waplot

have your upvote. now show yourself out.

throwaway742

I haven't actually laughed for like a month. Thanks.

Sakos

:'(

NaturalPhallacy

This is why I'm vehemently against further increasing copyright law, software patents, and treating Imaginary Property as something that can be owned.

We finally build the tools to allow the free sharing of information and almost immediately a bunch of rent seeking lawyers lobby congress to make doing that illegal.

I've always lived by the rule of "If you put it on the Internet, it's not yours anymore." Because that's de facto the truth. Only if you employ lawyers do you have even the slightest real power over something online, even if you're the original source. This isn't something 95% people can afford. Turning the Internet into just another place where the rich get preferential treatment is a terrible thing.

undefined

[deleted]

popcube

can we just offer free text book for everyone? maybe also some online lecture. everything is there, I trust will need incorporate it into school, library and local community.

Barrin92

That's already the case minus the last ~70 years or so. The overwhelming majority of our knowledge is in the public domain, in particular cultural artifacts.

It's a nice sentiment but like, people can already go to gutenberg.org and download pretty much most important works of literature in existence and most books have like 5k downloads so there's that.

ls15

Hasn't been more knowledge published in the last 70 years than during all the times before? More than 2 million new books get published every year.

dhzhzjsbevs

I don't know what twitter you've been reading but I wouldn't call what I see "knowledge".

jjeaff

I assume they are referring to the fact that books older than about 70 years are in the public domain. The rest are protected by copyright and not free.

undefined

[deleted]

dredmorbius

Pretty much certainly: https://news.ycombinator.com/item?id=31984080

caslon

Project Gutenberg is missing a lot of content that is public domain, and the oldest entries are often pretty poor in quality (and some of the oldest entries aren't lucky enough to get redone like Carroll's work has been).

zozbot234

Which just goes to prove parent's point. As much as it might seem otherwise, IPR restrictions are not the main bottleneck to widespread availability of content (at least book-like content); actually making the content available is far more important!

(Also, keep in mind that content is now entering the public domain every year, and projects like PG are nowhere close to keeping up with that flow of newly-unrestricted stuff. So this dynamic is becoming more extreme over time, not less.)

incompatible

In a lot of countries it's life plus 70 years. If it was only 70 years, we'd already have everything from 1951 and earlier. However, we have to wait for the authors to die before we even start counting, and good luck if it's somebody obscure and you can't find their date of death.

dredmorbius

It's actually fairly unlikely that the bulk of published knowledge is in the public domain.

By copyright expiry, public domain in the US begins in 1927. Later works may be in the public domain, but all works published prior to 1927 are in the public domain in the United States.

(This may not be the case in other countries.)

There were not many published books prior to the invention of the printing press, and many of those didn't survive. The total number of books (not individual titles but actualy bound volumes entirely) in Western Europe as of 1400 may have been as few as 50,000.

By 1800, about 1 million titles had been printed.

Over the course of the 18th century, presses became vastly faster, as they evolved from hand-operated wooden screw-press to iron-frames to steam and electric-powered rotary and ultimately web presses. Paper became much cheaper (and less durable --- a factor commented on at length in the Librarian of Congress's annual reports to Congress in the late 19th century). Literacy exploded from ~25% to 95%+ over the 19th century (and probably accounted for numerous revolutions and political upheavals).

Through much of the 20th century, certainly by 1950, US publishers were issuing about 300,000 new titles per year, a rate which state remarkably constant through the early 21st century. By the aughts, "nontraditional" self-publishing (a/k/a "vanity press") was nearing or exceeding 1 million titles per year more than had been published through all time to 1800.

Reports that all recorded data was doubling every few years date to at least the 1960s. That would mean that in any two year period ... half of all recorded information was less than two years old.

The catch is that not all recorded data is published. So I'm not sure what the time-distribution of all publishing looks like. But I'm pretty confident it's skewed far more recently than 1927. And would thus tend to be copyrighted rather than uncopyrighted.

If you want to measure works by significance, you might make a different argument --- there are many great works of literature, philosophy, history, and religion which were first published before 1927. But ranking and tabulating these is more challenging than a simple enumeration.

randomNumber7

gutenberg.org is dns blocked in germany. It's easy to get around this (when you know it), but some governments work actively against it....

geysersam

Only a few works are actually illegal to distribute in Germany. Not the whole site. Project Gutenberg chose to block the whole site instead.

Source: https://cand.pglaf.org/germany/index.html

chrisMyzel

not entirely true, maybe for your ISP, happily able to access

- Gutenberg - Zlib - libgen

via o2 (not promoting them in any way)

mylons

let's not forget what aaron swartz died for, though. quite a bit of science from publicly funded studies is behind paywalled publications.

godelski

> Like, all the knowledge ever.

Go to your local library. There is actually a lot more there than books. There are videos and audio records too. There are news paper clippings too.

Worse, data is being generated very fast. Let's look[0]

> [In 2012], CenturyLink projects that 1.8 zettabytes of data will be created. By 2015, the projection is 7.9 zettabytes.

> MAST is currently home to an estimated *200 terabytes of data*, which… is nearly the same amount of information contained in *the U.S. Library of Congress.*

But that said, a few petabytes is going to only cost in the 10s of thousands of dollars and max in the low 100s. So this is well within the budget of even every modestly sized city and definitely every university. It would be even easier if this was done with torrenting.

I think the real question is^: why are we not, as a species, creating this system? It seems very reasonable that we could back up all data. (There is also a dark side to this too! So let's not forget about that) At least, why aren't we creating a full access world library for all scholarly data, which is probably something that could be housed for a few grand. Should we do this on our own given the relatively low cost? Books + sci-hub + arxiv + *xiv?

[0] https://blogs.loc.gov/thesignal/2012/04/a-library-of-congres...

^ maybe it is already being done and I don't know, please inform me if it is. (Other than the NSA)

Brian_K_White

Insane in the abstract, but not exactly unfathomable. Who would make money off of that, and who else makes money off of it's absense?

olkingcole

Imagine for a split second an idea of value that doesn't start and end with private profit

xhkkffbf

I'm a writer. I can imagine it. But the landlord, grocer, car dealer and pretty much everyone else dealing in physical things wants me to pay for what I consume. I'm not independently wealthy. The only reason I can afford to write books is because the publisher pays me. The only reason the publisher can pay me is because people buy the books.

I can guarantee you that this type of behavior means I'm writing fewer books. It's very short-sighted.

permo-w

I get what you're saying by this, but I think OC is just being practical. This is the world we live in. It's shit and I hate it, but if you want something done, 9 times out of 10 you need to consider these questions

undefined

[deleted]

Brian_K_White

I'm not sure what the point is supposed to be of this imagining.

Yes obviously pure capitalism is as shit as pure anything else, and no system made by humans, and made of humans, actually works very well to create sanity and fairness. So even not-pure capitalism creates all kinds of undesirable results, like available tech doesn't get used in ways that would be wonderful for everyone.

What solution to human nature have you imagined in your split second?

undefined

[deleted]

culi

Society as a whole. Exactly the kind of initiatives governments are supposed to be taking. Instead our governments are totally captured by the profit-seeking organizations and are hellbent on using their monopoly of violence to imprison activists working on these initiatives

visarga

Interesting difference here between how YC reacts to Codex vs this library. Both are reusing copyrighted materials to help society.

rakoo

Why should anyone make a profit ? Why is it important to guarantee individuals accumulating money belonging to others, instead of a common enrichment of everyone ?

SequoiaHope

No one would make money off of it. That is the point. There are innumerable things which are worthwhile which are not profitable.

The idea that we do not do a world library of free digital copies of every book ever written really highlights the problem with the thinking your comment has demonstrated. The idea that individual pursuits must make a profit to be justified leads to us doing terrible things like: not making a free world library of every book.

Though in this particular case this also demonstrates a major problem with intellectual property concepts. Actually hosting the library isn't very expensive. But we have made doing so illegal. Of course, authors deserve to live a decent life just like everyone else. We currently do that by restricting all access to duplications of information they have produced so that they can charge a fee for access, and that fee provides for their survival.

But we suffer an incalculable loss by making all this information restricted. In my view we would be MUCH better off as a society with respect to creativity, innovation, and other popular metrics for progress, if we actually made sure as a society that every person's survival was provided for with no need for them to pay for it. Then authors wouldn't need to get paid, engineers could do what they love to do and post all their work as open source, and we could have a free library for everyone. This extreme openness would in my mind lead to more rapid innovation, and markets would still function as first movers would maintain an advantage for new product releases, though they would have to keep moving as anything they've done that is worthwhile would be copied. But since no one's livelihood would be at stake, this is not a real issue.

This can all be done in a voluntary, libertarian society as long as we have community ownership of the means of production, and promote these ideals of community support in this society. And I think we would be way better off. Doing this would allow us to offer every book ever recorded for free to every person on Earth. A big change, but one with obviously a very big benefit to humanity.

One note though: people who want to own a lot for themselves really mess this up. So people would need to dissuade those people from acting that way. My preferred method of doing so is by starving them of workers and customers, though when it comes to control of land matters get more serious.

rexpop

You're responding to a comment which aims to make your same point.

criddell

Libraries these days also offer movies, music, and some even have games. Should digital versions of all that stuff be free for all as well?

WalterBright

> libertarian society as long as we have community ownership of the means of production

Libertarianism includes the right to own property.

Also common ownership of the means of production is an old idea, and has been tried many times. It always results in poverty.

seoaeu

We've basically already had that for decades thanks to public libraries. In fact, between large collections and inter-library loans most provide 1-2 orders of magnitude more content than the library of Alexandria ever held

sircastor

My relatively uninformed opinion is that the Library of Alexandria was an amazing resource for its time, but in modern context is tremendously overrated. While it certainly contained a vast amount of knowledge for the time, the amount of valuable, useful, accessible information of a modern mid-size city library I would guess is substantially greater.

Sakos

I think it's particularly insane that (ignoring scihub, which still faces legal battles and is at risk of being our next Library of Alexandria) the world's scientific knowledge is largely behind paywalls and inaccessible to most of humanity, even those millions whose taxes funded it.

dietr1ch

The best we can do is a library where you can rent pdfs.

I still can't understand why can't every public library host media that should be free and easy to get.

krick

So, if I get it right. First there was Libgen, which is mirrorable. Then, some Z-Library copied Libgen and added some more books, without making it mirrorable. The goal is to make these new books, which are not mirrorable — mirrorable (i.e. to "preserve" them).

So, why not just re-upload them to Libgen, then? I guess somebody will do that now anyway, but you could easily done it in the first place, without making your own mirror, which is not a mirror of Libgen. Just upload them to Libgen and make a mirror of Libgen.

facethewolf

From their FAQ:

> Q: Should the Z-Library collection be added to Library Genesis?

> A: Yes! However, it is tricky. Library Genesis splits out its collection between non-fiction and fiction. They also have relatively high quality standards. If you are interested in organizing all the books to meet their requirements, let us know.

krick

Oh, it didn't come up to me to navigate "backwards". Thanks. Actually, their whole FAQ is quite more enlightening than just the linked page.

http://pilimi.org/faq.html

pnutjam

I found a book I've been looking for, but it was only an image scan pdf.

I OCR'ed it, and I'm slowly fixing the errors and converting to epub.

Tedious, but interesting.

amelius

Can't they just tag these books as "not reviewed yet"? Anyone looking for these books can then just decide to include them.

killingtime74

Here are 2x 16tb drives for ~$900 AUD ($600 USD), RAID them. https://www.ozbargain.com.au/node/710309

Put it in a case with all parts, $800USD all up?

undefined

[deleted]

plmu

Excellent. I hope more and more people start to see how absurd and evil the concept of "intellectual property" is. It should be totally rejected and any form of keeping useful information for yourself should be shunned and tabooized. In todays world, many have been programmed to believe that the would could not exits without such immoral restrictions, which is horrible.

grumbel

I can't imagine getting rid of it completely would have good effects, as it would make any large scale production impossible. You might still get a few blog post, but getting books produced will be tricky and something big like a movie might be outright impossible. This is doubly true in the modern digital world were everything can be copied in a fraction of a second and where the piracy site, not the authors, will be what bubbles to the top of the search results.

It could also result in far more draconian DRM, as that would be the only way left to protect your work.

Now drastically lowering the time of copyright might be well worth it, something in the realm of 20 years should be enough. As copyright needs to get back to a point where things you consumed in your lifetime, make it into the public domain in your lifetime.

rightbyte

> It could also result in far more draconian DRM, as that would be the only way left to protect your work.

There is no way to protect video, audio or text from being copied. DRM just prevents low effort consumer copying.

grumbel

There are plenty of ways. Limit playback only to official DRM-locked devices and have those devices film the user. The moment a camera makes it into the picture you block their account for life. That's not even new tech, 3D face detection is standard part of many smartphones and laptops. And companies like Facebook have no problem locking you out forever from their services, for much milder infractions.

Or more practically, just look at cinemas. They already film the audience to prevent filming and with great success. While you still get illegal copies of a movie easily, it's only extremely low quality smartphone rubbish. The high quality piracy videos only shows up months later once the films hit streaming services or Bluray.

And all of that is just current tech, lets assume VR will become a success in the future. Now you have a device on your head that tracks every little one of your moves, including things like heart rate and eye-tracking. Furthermore, what streams to you isn't an easily ripable 2D copy of the movie, but the 3D view of sitting in a cinema. Good luck trying to rip that. And of course tamper proof hardware is a thing as well, so any attempt at opening it up will automatically self destruct it and phone home that you tampered with it.

account42

And without copyright, modified playback systems with the restrictions removed could also be freely distributed so DRM becomes even less effective.

half-kh-hacker

As a jumping-off point for thought: Piracy-by-default would work very well in a world where access to a work is not charged, but the social custom is that revenue is collected via pre- or during-production crowdfunding

You may see lower revenues, but how much cost is currently poured into resolving licensing / investing in DRM / etc, all for works to be pirated anyway?

grumbel

While that is not fundamentally impossible, Star Citizen is so far the only crowdfunded thing I know that managed to collect AAA-game levels of money. Pretty much everything else isn't making nearly enough to actually finance the project and requires more funding outside of the crowdfunding. Furthermore, a lot of successful crowdfunding is build up on prior copyrighted work. People spend money on Star Citizen because they liked Wing Commander and Freelancer. Without copyright, those earlier works might not have existed to begin with or never gained the popularity they got.

bayindirh

Many ebook vendors I use (InformIT, Packt, No Starch Press, Pragmatic Bookshelf, Manning, O'Reilly Back in the day), work at piracy-by-default mode. Only InformIT watermarks PDF copies, and there's no DRM to talk about.

So, when you have good content with reasonable prices, people also come and buy.

Also, there are some eBooks in Kobo store devoid of any DRM. So, publishers are not forced to use DRM on Kobo, as well.

singron

Another idea is serialization. You release your work little chunks at a time, and if it's not sufficiently supported financially, you stop. A lot of Patreon is effectively funded like this. Obviously the medium has to be amenable to this (e.g. novels, graphic novels, visual novels, some video games).

Beldin

> something in the realm of 20 years should be enough.

God no. Right now, we'd be getting remakes from every piece of pop-culture that was semi-popular in the 80s-2002 time frame. Not just movies, but TV series, books, theatre, musicals, ...

Sure, copyright should be shortened, but I don't begrudge (eg.) a one hit winner making money off their hit decades layer. Life of artist is reasonable, I think. They take a gamble on a profession with risky pay-out; if it works out at least once for them, let them reap the benefits.

grumbel

Those remakes are exactly why it shouldn't last much longer. As those remakes aren't actually about the work itself, but about the brand recognition that work has in the public consciousness. It's just free advertisement that you don't get with an original work, which is why remakes and sequels are so popular, even if the connection to the original is little more than the title.

It's not the job of copyright to allow people to get lazy or companies profiting forever from the rights they bought. The goal should be to encourage original works and current copyright isn't very good at doing so.

Also it's not like the author would go completely penniless here. Just because everybody can make a StarWars doesn't mean there won't still be a George Lucas approved canon-StarWars. Slapping the authors name on your product to declare it the "Read Thing™" might still be worth a bit and might frankly be better than today's sequels that happen completely without any of the original creators being involved.

zozbot234

> Right now, we'd be getting remakes from every piece of pop-culture that was semi-popular in the 80s-2002 time frame. Not just movies, but TV series, books, theatre, musicals, ...

We are getting these things anyway, except that the originals are far less accessible than they should be. Entertainment trends are cyclical.

acomjean

I think 30 years is a decent base in my mind. Let it be renewed a couple times for an extra years if the owners think it’s worth it. That way most stuff flows into the public domain but some works can keep creating value for their creator.

ATsch

> Right now, we'd be getting remakes from every piece of pop-culture that was semi-popular in the 80s-2002 time frame

That's exactly what we are getting right now though. Looking at the top ten of the box office right now, only three are not part of an existing franchise. Three of them are reboots of 80s movies, four if you count comic books. Large IP holders recognize under the current system, it is much more profitable to exploit their existing IP than to come up with new concepts. If the copyright terms were significantly shorter, the pressure to be original would be far higher.

jdright

So shitty remakes is the reason why general poor population can never have access to information and education? Gotcha.

gatlin

It protects nothing important. That people must participate in selling their creativity and creative labors to eat is a more fundamental bug in society and as someone who makes a living off "IP" only I can't wait for the day that my life is secured by something other than armed threats of violence against people sharing ideas and information.

zasdffaa

I've just spent 2 years writing something which ain't got anything else like it. It was technically pretty difficult and needed a lot of background knowledge.

Should I be disallowed to commercialise it?

I partly get where you stand but if I was in a society that you seem to endorse my first question would be, other than for the love of doing it, why sink so much effort into a thing only to get nothing back. It almost is the opposite of a meritocracy.

atoav

As a musician and filmmaker I see it like this: once my work is published I stop to be in total control of it. That means I cannot control (or sometimes even know) what paths my works will take, how it is being understood, who will do what with it etc.

Of course I like to be credited for my work — but some fan adding my work to a pirate page would not be a concern but rather a bit flattering. What would anger me would be someone claiming credit for themselves, some rich company taking the material without paying me, things of that sort.

freetanga

You can always publish it under a free license. Your choice, previous post can choose differently and charge.

Nothing wrong with expecting to get paid for your work. If as a consumer you don’t want to pay, stick to open source and freely licensed media.

xhkkffbf

You're being short-sighted if you think you're going to be able to sell material in the future if the recording company or studio goes out of business. The only reason they have money to support you is because the public supports them.

cookiengineer

If you as a private person own a patent, you are losing it anyways...because you cannot fight some mega corporation in court to defend it. It's too expensive, and the biggest corps just take what they want due to having more financial resources.

Note that if you don't defend it in court, our justice system thinks it is less valueable to protect. Which in itself is kind of ridiculous.

Also, right to commercialization has nothing to do with intellectual property.

catchclose8919

The solution is simple:

ONLY private persons should own patents, and it should be illegal even for employers or institutions (academic, research) to own patents of people employed to do research. At most companies and istitutions should be allowed to add a clause of "perpetual-free-usage of any patents of employees resulting from direct work" - but an employee or group-of-employees holdig a patent should still be able to license it to other companies too. If businesses are hurt, that's GOOD, most should not exist as coagulated entities.

We're not gonna have proper freedom preserving capitalism ultil we properly decentralize: we all work like swarms of 1-person-companies / solopreneurs contracting between eachother. (No, not the gig-economy, in that distopia we're all still slaves that can't band together to fight the masters.) Legislation will automatically have to be refacored to make this work. With some exceptions, only human individuals should hold most property, not companies and not institutions. Groups/collectives only when the group members directly worked together and know eachother.

And Intellectual Property would just "click in" in in such context. IP sounds hellish and disfunctional because our own practically techno-communist society (yeah, even USA is practically "communist" nowadays in a way - newsflash: "the reds" have won! even the f symbolism is there, "the red pill" is the good one now... all's backwards) is messed up. It makes perfect sense in a hyper-decentralized hyper-individualistic REALLY democratic and REALLY capitalist society.

zasdffaa

> If you as a private person own a patent, you are losing it anyways

There's always someone who just has to spread the despair and helplessness. Every bloody time. Tell me, in what way does this add to the discussion? I wish there was a ban on these kind of comments.

> right to commercialization has nothing to do with intellectual property.

I don't understand. If I don't own it I can't market it, right?

einpoklum

> Should I be disallowed to commercialise it?

I believe you are mis-phrasing the question. What you're actually asking is:

> Should the state criminalize and punish people who make copies of my work, to facilitate my commercial activity with it?

And our answer is "No".

You can go ahead and engage in whatever commercial activity you like, based on open access to your work.

zasdffaa

That's a constructive answer, so thanks, although I have to ask your view of how BSD was used by apple to build a massive fortune but without paying significantly back to the BSD community.

Overheard from an IP lawyer I stood near to once - something about f/oss software being incorporated into commercial products being a big issue (for the free stuff, not the company doing the 'stealing' of it). Your view?

worldshit

> based on open access to your work

Can I freely use your toilet then? your electricity? You pay for the plumber, why don't you pay for entertainment?

Scientific knowledge must be open, but most copyrighted work is entertainment.

andrepd

Surely there is a middle ground between zero rights and early 1900s works coming into public domain in the 2050s.

xhkkffbf

Exactly. I'm in complete agreement. If someone else wants to give away their work for free, that's fine with me. But if I want to charge, I should be able to charge for my labor just like the farmer or the baker. Certainly they'll charge me for bread.

Writing is hard work. Writing books and getting them technically correct is expensive. This is very short-sighted.

hansel_der

> Should I be disallowed to commercialise it?

ofc not

but since noone can ever prove that his was the first incarnation of an idea, nobody can be criminalized for also doing things in a certain way.

the concept of protecting invention for some time to facilitate reward is not without merit, but the implementation of IP law and practise has gone so far astray that it's overdue to rethink the whole thing.

ahmadmijot

I'm also write technical writing (including academic publications) and I do feel like I need to get monetary gains from all my works. But I also believe that copyright law is too restricted and too long (15 years after the death of original author/artist/writer/etc should be enough)

alexb_

15 years period should be enough. If you haven't contributed anything to society within the past 15 years, why should you get to live off the 1 thing you did? Shouldn't you be incentivized to be productive? That's the point of copyright law in the first place after all.

MYEUHD

You could offer a PDF version for free. And charge money for a printed version. It would probably result in more sales.

https://news.ycombinator.com/item?id=23073126

fartsucker69

creating nontrivial intellectual property usually takes a lot of work, work that has to be paid otherwise it could literally not be done, i.e. the great people who create those works could attempt maybe one such work and in most modern cases they would not get close to finishing it before they literally run out of money to pay for rent and food.

the system around intellectual property has some issues but some form of protection / ownership needs to be there.

if you had your wish and the concept of IP was treated as shunned and taboo you would quickly live in a world with vastly diminished amount and quality of art, science and technology.

jesterson

There should be a fine line in intellectual property rights. I see where you are coming from - quite often intellectual property is used a a moat to protect insane revenues and, as a repercussion, delay or slowdown our progress as humanity.

But it is also use to protect unique creator revenue and encourage to create more.

If you ask where the fine line should be I have no immediate answer, but abolishing intellectual property rights just like enforcing them at all costs doesn't seem to be the optimal course of action to me.

thanatos519

> But it is also use to protect unique creator revenue and encourage to create more.

This thinking is an artifact of an economic system so dependent on scarcity for its motivation that it is now generating most of the scarcity in the world.

We now have the technology for creative implementations of "From each according to its ability, to each according to its needs". Just keep track of how much each thing is used, and reward creators from a corporate-tax-funded pool. Every for-profit entity contributes proportionally to its profit, and can use any idea for free.

zozbot234

Even just having substantial "prizes" for the most widely used stuff (whether publicly or privately funded) would go a long way. (Most of our public funding for content creation now happens as grants, basically rewarding compelling ideas and then simply trusting that the reward money will be spent on doing worthwhile things. This does not work very well, for obvious reasons.)

account42

Well said. The reality is that even in our current work without copyrigh a lot of value is added by creators who never see any real reward for it. Yet instead looking at how we can better reward creators we continue with continue with this insane system that has been shown be easily abused to concentrate wealth while having a massive cost for society by restricting what we all do and making everyone pay for enforcing those restrictions.

jesterson

What you say was partially implemented in USSR. It killed any productivity because author (inventor) is discouraged from doing more. Why bother?

js8

I think that's pretty much the case for shortening time frame of IP rights. Which means, they shouldn't be treated as property has been traditionally treated. Although as a socialist, I think perhaps we shouldn't have time unlimited property rights (above certain reasonable boundary, say $10M) in general.

jesterson

Interesting, but being very opposite of socialist myself I am wholeheartedly with you here on limiting timeframe of IP. It partially solves the problem.

And this arguably should be extended to tangible assets as well - I like Singapore model where housing property is sold for specific timeframe. It simplifies a lot of redevelopment.

d0mine

Such comments remind me provocateur methods used by police to suppress any legitimate critique. It works like this:

  1. there is some legitimate issue
  2. people protest (peacefully)
  3. a provocateur does something over the top (violence, absurd statements like "defund the police")
  4. legitimate protesters are discredited because of 3.

risyachka

If you want to drastically reduce the number of new books, songs, content then sure.

Otherwise, I am having a really hard time understanding how can you suggest that I don't own the book I spend a *decade* to write. It is just as mine as the car you drive is yours.

To tighten regulations around intellectual property to make sure that it is not abused - sure.

To ban? Obviously never.

noselasd

Most people writing books, e.g. these, https://www.oreilly.com/ would not do so if they could not monetize it - which would be even harder if there was no legal protection of the work they did. I don't like your idea at all.

nmz

Its funny that often the only people who think this are programmers. everyone else who hopes to make a living doesn't. ATM, not even the NYT Best Sellers make a livable wage and now with Dall-E, artists wont either. COVID basically killed a lot of musicians income. DJs and so on. Its probably also why media itself has achieved such a mediocre state, no authors, no novels, no adaptation. we are in a future which much of the current media is a sea of mediocrity. There's lots of content sure, but barely any that's worth a damn.

stavros

I see things like this, and I wonder why the following software doesn't exist:

I want a piece of software to which I can add a collection of files, say multiple TB. The software will then behave a bit like a BitTorrent tracker, and know which peer has which files. A peer joining this swarm will be able to say "I want to donate X GB of space", and the tracker would tell it "OK, then download and seed these files, which are the least seeded".

The peer would download the files from the rest of the swarm and make them available to it. Then, a request layer on top of the swarm could be used to request a file from the peer which had it. Adding/removing files to this collection would also need to be a feature.

Does anyone know if anything like this exists? If not, how easy would it be to make something like it out of BitTorrent? I might give it a go.

dchuk

Have always thought this would be a great way for many people to share well organized Plex libraries. Many semi-overlapping libraries basically creating a virtual Netflix, with some way to stream in Plex from the whole library no matter if you actually have the content locally or not

867-5309

>virtual Netflix

this already exists with addons for Kodi

npteljes

Freenet is built around similar ideas, combined with encryption an anonymization. Which hopefully adds the benefit of you not being legally liable of distributing CSAM. The gist of Freenet is that you can operate a freesite or upload files, and the content is redundantly dispersed in encrypted parts among a number of other Freenet peers. They don't know what they're hosting and neither do you, the client just fills up the allotted space and uses some bandwidth, that's all.

https://en.wikipedia.org/wiki/Freenet

Regarding of how easy to make something like this network, I'd wager it's pretty hard. There will be a lot of questions, even while establishing the happy path, for example how you manage the updates, especially when you update the protocol, not just the software, and how you effectively manage the volume of search requests, how you distribute the files etc.

And then there's the abuse the network will inevitably get. How you handle spammers, CSAM, malware, ISPs that throttle/block you, the legal risk you put your clients up to, etc. Nice big can of worms. To begin opening it, I suggest a reading through Wikipedia's Peer to peer file sharing article, and especially the File sharing modal on the right, which nicely captures the ideas that have been tried so far.

https://en.wikipedia.org/wiki/Peer-to-peer_file_sharing

stavros

My idea is similar but slightly different. You'd run your client and then choose whose provider's data to seed, eg you'd add the Internet Archive and Libgen to your datasets.

Only those entities would be able to push data to you, nobody else, so if you trust the providers you specified, you should be good.

npteljes

I remembered that I read about something like that before. I think this was it: https://en.wikipedia.org/wiki/Storage@home

Further this lead me to distributed data stores: https://en.wikipedia.org/wiki/Distributed_data_store#Peer_ne...

zidel

I think BitTorrent has all the pieces needed for a fully distributed version of your idea. My initial thought is that you could publish a magnet link that points to a mutable DHT item, which in turn points to a torrent that has a JSON file with some metadata and a list of infohashes the publisher cares about. The client could then scrape the "leaf" torrents from multiple lists to get the peer counts and use that for local prioritization of what to store. By reusing existing torrents you could then share resources with standard torrent clients that are unaware of your system.

The list idea could be extended to nested lists (stavros recommends Internet Archive) for discoverability and composition.

If you go with v2 or hybrid torrents from the beginning you could deduplicate and cross seed files from different collections.

The lists could also be modified to have torrents to exclude, possibly using some salt + rehash idea to make it hard to reverse into a list of e.g. CSAM you don't want to publish as is.

Feels like a neat project that could interoperate nicely with existing torrents.

stavros

Thanks, that's exactly the feedback I was looking for! This sounds like it would work, though I'd have to see if it would scale to thousands or millions of files. Still, great for a PoC, thank you!

sedatk

Donating space is only half of the equation. I think donating bandwidth is a more significant aspect, especially with ISP's like Comcast which provide very little upload bandwidth compared to download. You'd expect that uploads wouldn't impact download speeds, but it's not the case. A saturated upload bandwidth means, ACK packets getting delayed, which means connections would be established way more slower. So, it's not a feasible prospect unless the competition takes over.

throwaway742

You can always implement QoS or something like FQ_Codel.

sedatk

I'm a senior software developer with experience in developing communications protocols and networking software. I haven't been able to set up QoS properly ever, especially in a way that addresses all my needs. Expecting end users to do it is way beyond the realm of possibility IMHO.

DataWraith

Something similar exists: iabackup[1][2]. It is designed to host an independent copy of (some of) the Internet Archive using git-annex. You tell it how much storage you want to donate and git-annex fills your disk with data from the least-seeded files IIRC. Its focus is on data backup, not data serving though.

[1]: https://git-annex.branchable.com/design/iabackup/ [2]: https://wiki.archiveteam.org/index.php/INTERNETARCHIVE.BAK/g...

thrdbndndn

That's literally how most of Japanese P2P software work.

For example, Perfect Dark, Winny, and to less extent, Share (which is more similar to eDonkey/eMule).

sgtnoodle

It's proprietary, but it sounds like you're describing the "Google File System". https://en.m.wikipedia.org/wiki/Google_File_System

For your idea, once all the local storage everywhere is filled up with evenly distributed redundant copies, and then a new file is added, would peers arbitrarly choose other files to delete in order to make room for the new file?

stavros

Yes, it's very similar, though would work slightly differently (BitTorrent under the hood, whole files instead of chunks, etc).

er4hn

This sounds very similar to ipfs. How does it differ?

stavros

IPFS can't give you the least-seeded files. It can't give you any files, you have to manually pin the set you want, and you pin all of it, and it won't automatically change.

It's not very convenient for archival, whereas the system I'm talking about would be (in my opinion, anyway).

paskozdilar

(How) would you filter junk?

generalizations

I wonder if there are any search engines dedicated to indexing these kinds of libraries. I know there's a decent one just for scihub, but it would be awesome if I could do a Google-style search that returned the contents of books, magazines and journal articles instead of just websites.

sacrosanct

There is the Imperial Library of Trantor: https://trantor.is/

They offer a clearnet and a hidden service .onion incase you don’t want ISPs blocking access to it.

generalizations

Unfortunately that doesn't do full text search, which is what I was trying to get at with the Google comparison.

I don't search for websites based on their titles...

zozbot234

Book metadata is widely available via sites like e.g. Open Library. With good metadata, full text search is not as relevant.

scotty79

That's false. I often search for specific citation.

aaron695

delusional

Wasn't that what google books was supposed to be?

daniel_reetz

Google Books, like so many Google projects, had a dual purpose. Making books accessible is noble and on-mission. But more importantly natural language models can be trained on the scanned corpus.

The same was true of the original GOOG 411, which provided a free service, but was really put in place to train up their voice recognition projects.

This is a long running strategy of Google, and it's a shrewd one. The main thing is not to mistake it for a public good. It is an act of privatization.

visarga

That can't be true, Google Books was 15 years prior to the advent of large language models. Until 2020 nobody could train on such a large collection.

I think Google initially wanted to augment the web results with a large book collection to get "all the world information and make it searchable", same with Google News.

voisin

I assume Google abandoned this along with all their earlier mission statements in favour of building another chat app.

b112

I can't imagine a life at google. So much promise, all turned to ash.

emanuensis

Supposedly all at Library of Congress (LoC) anyone have experience using it there?

lupire

Could you take a moment to check it Google Books search still exists?

I'll give you a hint: https://books.google.com/?hl=en

> Search the world's most comprehensive index of full-text books.

natrys

Does anyone know of a good FOSS alternative for google books that could be self-hosted (for personal library)?

MegaDeKay

Does Calibre fit the bill? The program itself is great, but it supports a plugin system that really puts it over the top. One of them automagically strips off Adobe DRM for any book loaded in to it.

https://calibre-ebook.com/

mdp2021

One product could be "Ambar: Document Search Engine"

https://github.com/RD17/ambar

> Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search

> *Easily deploy Ambar with a single docker-compose file; *Perform Google-like search through your documents and contents of your images; *Tag your documents; *Use a simple REST API to integrate Ambar into your workflow

jrm4

But Google could never do it right. It strikes me as obvious that anything that's going try to be a genuinely modern library can't sidestep, or even "work with" the present capitalist+copyright regime. It will just have to be a fight.

lupire

books.google.com

dkjaudyeqooe

To be fair, Z-Library doesn't charge unless you want to download more than 10 books per 24 hour period. That's per account and although they ask you not to open multiple accounts they don't seem to do anything to stop you.

PostOnce

What kind of fairness can there be in charging for stolen books?

I believe in free access to education, but charging for these books they have no rights to is a whole other thing.

mgaunard

Technically, they're not charging for the books, they're charging for the bandwidth.

fancyfredbot

You don't have to pay them if you go steal them yourself. If you find it more convenient to pay another thief to do it for you then I don't think that's significantly less fair.

UmbertoNoEco

All those India/Malasya/Guatemala kids stealing a 200 USD PDE book from Pearson should go to jail... in America... Assange style

krick

I don't quite agree. I mean, they provide useful service, and it costs money to run it. It's ok that they earn (even if it's actually making a profit, not just covering the costs).

That being said, 10 downloads/day feels a bit restrictive to me. I'd get if it was 100, or 50, heck, maybe even 20. I mean, I don't appreciate that it's not mirrorable in the first place, but maybe they cannot afford it, I don't know... But 10 feels less than somebody researching a new topic might need to access in a day, even if he won't read them all immediately.

…That being said as well, it has some really nice UI. I wish somebody did it for Libgen.

Aachen

It's a good thing their hosting provider is okay with providing bandwidth per individual user account on the site of up to—how many books did you say again, 50?

Without sarcasm, I don't think the bandwidth bill cares that you find it restrictive. That even more than a handful are free every day for every account is honestly a lot, since that means virtually nobody will need to contribute to the costs they're collectively incurring. And if you're unable to pay, you can still skim a few dozen books (making two or three accounts isn't that hard to do by hand) every day, and go back to any you've already downloaded previously too. And offer them to friends to offload the server.

rg111

Whenever you reach the limit, you can always copy the ISBN and search it on Libgen.

8 times outta 10, the book would be there.

lupire

You read more than 10 books per day?

jrm4

I worked as a volunteer to fundraise for my local library. I can see no meaningful moral difference between this and that. I get that the law is different, I just think the law is off here.

KMnO4

They’re not charging for the books; they’re charging for the bandwidth.

sacrosanct

Golden rule of piracy: don’t profit from stolen works or it’s not piracy, it’s profiteering.

sudosysgen

Revenue != Profit. I really doubt they're doing anything beyond paying to keep the lights on.

II2II

At this point is worth noting that there are reputable sources for free books, such as public libraries and Project Gutenberg.

I realize that neither source will satisfy many of the people on HN, simply because there is a need for current technical books.

harry8

What on earth does "reputable" mean here?

Total compliance with the regulatory capture of publishing companies? Full support of the landgrab claims of the Disney corporation et al?

You don't have to be an anarchist to look at the status quo and think some amount of civil disobedience is the correct, proper and right thing to do. Also believing that a claim, fully Disney supported, that such an amount of civil disobedience is somehow ethically evil is, in fact, somewhat disreputable. As disreputable as the insanely high journal subscription fees for taxpayer funded research, for example.

The publishing companies chose this path willingly and with prejudice for their profit turning the relevant law against the people. Are they reputable given they did so? It's hardly an outlying position around here to think they really aren't anything of the sort. Refusing to accept that on mass, until appropriate reform is supported and enacted could be considered quite worthwhile.

You may of course, disagree.

Sparkle-san

It would depend on how they use the funds. I wouldn't be surprised if bandwidth expenses made up a majority of what it cost to run Z-Library and that money has to come from somewhere.

terrycody

yes you are right, and due to this, I highly doubt how long they can maintain the service, hope it can lasts a long time...(though its illegal)

dkjaudyeqooe

They've been going for many years. Lots of people donate and they have occasional donation drives where they raise 20,000 euros plus.

uniqueuid

It's really funny to think about how the advances of technology keeps changing how we perceive books.

7TB is even a commodity disk these days. And it's a lot less than the torrent of scientific papers that floated around some time ago (that was ~18TB IIRC).

nonrandomstring

I foresee storage density reaching the point that for most ordinary people "online" becomes rather unimportant. What would be the effects of technology when computers behave as in early science fiction, as stand-alone oracles? [1]

[1] https://www.timeshighereducation.com/opinion/2048-informatio...

michaelmrose

If I could store all the worlds present information on the head of a pin I should still want to access the internet to find out how you feel about it or share my opinion on it with you. Virtually everything we do online isn't reading existing data its taking action on it by virtue of communication.

themodelplumber

How would the appeal of streamers and live data/content settle out in that case? Sometimes context is available in the moment that makes it easier for all parties to consume and analyze in that moment as well.

Since transient, ethereal meme culture is also basically emergent culture now, it's difficult not to also foresee a greater cultural divide in such a case. This is saying nothing of live data tools as well, even weather data...

nonrandomstring

> How would the appeal of streamers and live data/content settle out in that case?

It would be mostly unaffected. As another commenter (Michael) says, communication, news and collaboration are distinct from storage as needs/functions.

What you say here is fascinating:

> Since transient, ethereal meme culture is also basically emergent culture now, it's difficult not to also foresee a greater cultural divide in such a case.

There have always been bookish or gossipy people. Jane Austen and Thomas Hardy both make note of that in stories about English culture. The balance of those qualities may have changed in the internet age. Perhaps the degree of reach via one-to-many communications has amplified the ephemeral gossip side. The scholarly life less so. Though it's enhanced by repositories like Gutenberg, Internet Archive and SciHub and the like, a "reader" (to use Bill Hicks's take) can still only process one media at a time. But as Ted Nelson pointed out, if you have all of the worlds writing at your fingertips, all hyperlinked and with awesome semantic search tools, reading becomes a quite different non-linear experience. That's something centralised walled gardens subtracted from the WWW as its 1990s conceit.

That 90's vision of a multi-modal, multi-media "internet community" in which people read common news, converse and reference together hasn't really survived "Social Media". But then it was always a weal approximation to something like a group seminar in the university library rather than the town square or local pub.

ars

It's 7TB compressed. If it's text you'd need about 70TB to decompress it. It's probably mostly images though, so probably not quite that bad.

emj

I've tried to do lossy compression of epubs with some lines of bash scripts; i.e. removing the images and fonts that were not needed. Many epubs could be downsized to a third of their size, but then I found a book that needed the supplied fonts and gave up. When doing lossy compressions can not have those kind of bugs.

What I also found was that many of the images in the epubs were already unuseable and nothing like their counter parts in phsyical books.

solarmist

I don’t understand this. Are they epubs of comics or something? Epubs are already compressed (zip).

int_19h

Things can be rendered from compressed container files. For HTML with images, even slow-but-strong compression like LZMA is already fast enough to render pages as fast as you can click through them, even on fairly old hardware.

Kiwix .ZIM file format is a good example. The entire Gutenberg Library is a single ~65 Gb file, and you can read any book from it without unpacking anything.

moritonal

It's a shame that this would have been a textbook case for using IPFS and yet that wasn't the default.

Books are naturally immutable, and could be structured into sub-categories whilst enjoying the benefits of deduplication.

sixtyfourbits

All of library genesis is already available on IPFS (see https://freeread.org/ and https://libgen.fun/dweb.html). Hopefully someone will import this collection into libgen and then these books can be on IPFS too.

Torrents are simpler and more efficient for distribution, but IPFS is better for accessing individual files.

stavros

IPFS doesn't really work well for this because you'd never know if the peer hosting the last subset of some books went offline (and you'd lose those until someone who had them came online again).

I want a slightly different system, which I've posted about here: https://news.ycombinator.com/item?id=31972252

moritonal

This is my point about the shame it isn't. It seems obvious that ipfs would have both privacy and a self balancing way to pin a partial set of data. But no, which makes it unsuitable.

AnotherDwellr

What makes you think IPFS has privacy? It has the complete opposite, it is not designed for privacy at all.

When you pin files, you announce to the whole world which files you're hosting. When you download files, you announce to the whole world what you're looking to download.

UkrainianJew

That would paint a huge red target mark on filecoin's back, given that they raised over a quarter-billion. They would take it down faster than you can have an 8TB HDD shipped to you.

sixtyfourbits

IPFS is a protocol like bittorrent. The only way to make content inaccessible is to go after the people/computers that are sharing that content.

betwixthewires

They can't take it down.

UkrainianJew

They can and will use legal means to go after anyone trying to use the network this way in order to make the content less discoverable and discourage others from following suit.

Being coined as the Napster on Blockchain is the worst possible PR one can think of in their situation.

jmprspret

so you're telling me there's CSAM on IPFS that feds "can't" take down? Somehow I doubt that.

hour_glass

What kind of books are on here? I can find what im looking for ~90% of the time on libgen

cookiengineer

This project kind of reminds me of "the eye" [1] which had a _huge_ collection of computer science and hacking related books.

Somewhere around last year they had datacenters burning down due to a natural disaster, but I'm hoping they can recover. It's an amazing project.

[1] https://the-eye.eu/

flanfly

One of the projects on my secret TODO list is feeding Libgen into Elastic Search to get a cross referenced full text search. For now the hardware is prohibitively expensive, but time is on my side. I'm sure redundantly indexing a few 10s of TB will become trivial before the end of the decade.

sitkack

I suggest you just do it. You can generate the indexes on some cloud VMs and save them. It would cost under 100$.

Larrikin

Is there an index of what's included. 7 tb is alot to ask for simply upholding an ideal

robonerd

It apparently comes with a index in the form of a MySQL database that contains title, author, description, and filetype.

Daily Digest email

Get the top HN stories in your inbox every day.