Hutter Prize for compressing human knowledge

prize.hutter1.net

Daily Digest email

Get the top HN stories in your inbox every day.

slashdev

I think the mistake here is to require lossless compression.

Humans and LLMs only do lossy compression. I think lossy compression might be more critical to intelligence. The ability to forget, change your synapses or weights, is crucial to being able to adapt to change.

AnotherGoodName

Some background before i explain why your suggestion is "not even wrong". If you can predict the next X bits given previous Y bits of observation you don't need to store the next X bits, the decompressor can just use the same prediction algorithm you just used and write out its predictions without correction. This is the same as general purpose AI at a high level. If you can predict the next X bits given previous Y bits of observation you can make reasoned a decision ("Choose option A now so that the next X bits have a certain outcome").

The above is actually the premise of the competition and the reason it exists. What i've said above is in academic papers in detail by Marcus Hutter et al. who runs this competition. That is lossless compression can be a scorecard of prediction which is the same as AGI.

Now saying "they should just make it lossy" misses the point. Do you know how to turn lossy data into lossless? You store some data everytime you're wrong on your prediction. ie. Store data when you have loss to make it lossy. This is arithmetic coding in a nutshell. You can turn any lossy data into lossless data with arithmetic coding. This will require more data the more loss you have. The lossless requirement gives us a scorecard of how well the lossy prediction worked.

If you ask for this to be lossy compression you throw out that scorecard and bring in an entire amount of subjectivity to this which is unwanted.

mellosouls

Some background before i explain why your suggestion is "not even wrong".

The phrase you quote is generally used to imply a stupid or unscientific suggestion. Your succeeding comments about what you think AGI is carry a certitude that isn't warranted.

It's good that you are trying to supply knowledge where you think it is lacking, and I understand there are fora where this sort of public school lecturing is amusing but I think your tone is misplaced here.

ltbarcly3

Years long compression challenge with dozens of geniuses participating: exists

Random person on the internet: let me improve this thing I've never heard of by using the one fact I know about compression, there are two kinds

It's absolute hubris and a waste of everyone's time to chime in with low value, trash comments like "they should make it lossy". It's not unreasonable at all to take a snarky tone in response. "not even wrong" absolutely applies here, and they carefully, patiently, and in great detail explained why.

Lanzaa

The phrase ["not even wrong"] is generally used to imply a stupid or unscientific suggestion.

An unscientific suggestion is exactly what was offered. Forgive my ignorance, but why was the tone of the message misplaced, and what was the tone?

carapace

I feel your tone-policing is misplaced here. When someone is that ignorant or foolish they should be told so so they can hopefully recalibrate or something. There's enough inchoate nonsense on the Internet that I appreciate the efforts to keep discussion on a little higher level here.

aeternum

Is there any evidence that arithmetic coding works at the level of concepts?

As a thought experiment, suppose Copernicus came up with this Hutter prize idea and declared that he would provide the award to whoever could compress the text of his book on the epicycle-based movement of planets around the sun (De revolutionibus orbium coelestium).

Today we can explain the actual motion to high accuracy with a single sentence that would have been understandable in that age: "A line drawn from the sun to any planet sweeps out equal areas in equal time"

This however is mostly useless in attempting to win the Copernican Hutter prize. Predicting the wording of some random human's choosing (especially at length) is very far removed from ability to predict in general.

AnotherGoodName

Arithmetic coding isn't the key thing here. That's just a bit wise algorithm. You predict the next bit is a 1 with 90% certainty? you don't need to store much data with arithmetic coding. That's all arithmetic coding is here.

What your getting at is the 'predictor' that feeds into the arithmetic coder and that's wide open and can work any way you want it to. LLMs absolutely have context which is similar to what your asking and they are good predictors of output given complex input (pass gpt a mathematical series and ask it what comes next. If it's right then it's really helpful in compression as you wouldn't need to store the whole series in full).

kaba0

So you don’t think that writing a wiki article about this could be made smaller by encoding this info in a few logical steps, and adding some metadata on what kind of sentence should follow what? That part about decompressing it is the AI part. Where to place a comma can be added in constant cost between any contender programs.

canjobear

You could still come up with a scorecard for lossy compression, for example area under the rate distortion curve. It would be very hard to calculate.

eru

Lossless compression _is_ a scorecard for lossy compression. Thanks to the ideas from arithmetic coding.

gregw134

Or just require the results to be 99.99 percent accurate (within some edit distance)

nathanfig

"... lossless compression can be a scorecard of prediction which is the same as AGI."

Having not read the papers, this sentence strikes me as a bit of a leap. Maybe for very constrained definitions of AGI?

AnotherGoodName

There's a section in the link above "Further Recommended Technical Reading relevant to the Compression=AI Paradigm" and they define it in a reasonably precise mathematical way. It's well accepted at this point. If you can take input, predict what will happen given some options you can direct towards a certain goal. This ability to direct towards a goal effectively defines AGI. "Make paperclips" and the AI observes the world, what decisions needed to be made to optimize for output paperclips and then starts taking decisions to output paperclips is essentially what we mean by AGI and prediction is a piece of this.

I have no stake in this btw, I've just had a crack at the above challenge in my younger days. I failed but i want to get back into it. In theory a small LLM model without any existing training data (for size) that trains itself on the input as it passes predictions to an arithmetic coder that optimally compresses and the same process on the decompression side should work really well here. But i don't have the time these days. Sigh.

AbrahamParangi

If you had a perfect lossless compressor (one that could compress anything down to it's fundamental kolmogorov complexity), you would also definitionally have an oracle that could compute any computable function.

Intelligence would be a subset of the capabilities of such an oracle.

version_five

[flagged]

AnotherGoodName

Sorry but i hope my long-winded explanation explains it.

In computer science we have a way to score how good lossy data is. That way is to make it lossless and look at how much data the arithmetic coder needed to correct it from lossy to lossless.

This is a mathematically perfect way to judge (you can't correct lossy data any more efficiently than an arithmetic coder). All the entries here do in fact make probabilistic predictions on the data and they do all use arithmetic coding. So the suggestion misses a key point of CS involved here. I don't mean to be rude about it but the idea does need correcting.

joshxyz

Your comment is far more worse, lol. His comment actually supports gp's suggestion, just in a different angle.

It looks like LLM's are like compression algorithms with strengths and weaknesses in different things.

Losslessness doesnt always equate to usefulness. But yea, maybe a different competition for this.

thelastparadise

Good lossy compression can be used to achieve lossless compression.

(information theory)

The more accurate the lossy compression is, the smaller the difference between the actual data (lossless) and the approximation. The smaller the difference, the fewer bits required to restore the original data.

So a naive approach is use the LLM to approximate the text (this would need to be deterministic --zero temp with a preset seed), then subtract that from the original data. Store this diff then add back to restore the original data bit-for-bit.

hinkley

In psychoacoustics, one of the most important aspects of how lossy compression works is that they throw away sounds that a human can't even hear, because it's too short or subtle to be noticed. This is the difference between data and knowledge. Someone with perfect pitch and a good memory knows exactly what Don't Stop Me Now by Queen sounds like, and it's a lot smaller than digitizing an LP record in mint condition.

This person can cover that song, and everyone will be happy, because they have reproduced everything that makes that song what it is. If anything we might be disappointed because it's a verbatim reproduction (for some reason we prefer cover songs to introduce their own flavor).

If I ask you to play Don't Stop Me Now and you sound like alcoholic Karaoke, you haven't satisfied the request. You've lost. Actually we've all lost, please stop making that sound, and never do that again.

kaba0

But that’s the good thing about this competition: it is fair to everyone. The perfect pitch version can just use a tiny amount of additional data to decompress to the same thing, while the alcoholic will have to correct for loads of differences, making that version be much larger.

This difference is the same monotonically increasing function in both cases so you can basically don’t have to care about it - you can fairly compare the lossy versions as well, you will get the same amount.

So the more advanced version wins, and the competition remains fair and non-ambiguous (otherwise, is perfect pitch A or perfect pitch B had the better cover?)

uoaei

Exactly right. There would be no confusion if the Hutter prize was a competition for compressing human data.

This is a parallel issue to the ones in conversations around understanding. There's a massive gulf between being able to build an ontology about a thing and having an understanding of it: the former requires only a system of categorizing and relating elements, and has nothing to do with correctness per se; the latter is an effective (note: lossy!) compression of causal factors which generate the categories and relations. It's the difference between "Mercury, a star that doesn't twinkle, orbits Earth, the only planet in existence, and we have inferred curves that predict why it goes backwards in its orbit sometimes" and "Mercury and Earth, planets, both orbit the Sun, a star, and all stable orbits are elliptical".

Xcelerate

Who downvoted you? This is correct; the better the lossy compression, the better the lossless compression as well.

CamperBob2

That's true until the lossy compression alters the data in a way that actually makes it harder to represent. As an example, a bitstream that 'corrects' a decompressed .MP3 file to match the original raw waveform data would be almost as difficult to compress as the raw file itself would be.

It wouldn't be a matter of simply computing and compressing deltas at each sample, because frequency-domain compression moves the bits around.

modeless

The problem with lossy compression in a competition is that in order to compare different approaches to lossy compression you have to define a metric for the quality of different lossy reconstructions of the data. This is practically impossible to do in an unbiased way. As soon as you define a quality metric then the competition becomes cheating the metric instead of useful compression.

So how do you define a metric that can't be cheated? You add an arithmetic coder after your lossy compressor, which turns it into a lossless compressor, and the size of the losslessly compressed data is your metric for the quality of your lossy compressor. It's the only metric that definitively can't be cheated.

lucb1e

Feel free to start your own competition where people are awarded money for a system that correctly answers some set of questions, divided by the amount of storage that system needs.

Sounds like a tough competition to run objectively, not that that makes it less worth doing, but I can see why the parameters were chosen as they were in 2006.

jharohit

Ted Chiang actually explores this concept in his article on ChatGPT and other LLMs https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

vintermann

Lossy compression is lossless over the information you decided to care about.

You may decide that the sound that's outside the human range of hearing isn't interesting. Or that the colours in your camera sensor that's just thermal noise can be safely ignored. Then you can throw it away and do lossy compression.

What you care about, an algorithm can't answer for you. Sometimes you may want to keep the inaudible sounds in a recording to better understand the audible ones. Sometimes you may be glad you kept the noise in a picture as it allowed you to later identify the model of camera.

There's no context-free answer to what matters. There's no subject-free answer to what matters, there's always a "who" it matters to.

mrcgnc

True, but then I would argue (as a layman) that AGI would need to be able to guess what matters in a given context to really be AGI.

vintermann

Matters to who?

To me? In that case, it sometimes does, sometimes doesn't, depending on how lazy I've been in constraining to the algorithm what I might want.

To itself? Then who can say if it isn't doing that already?

hnfong

See a related benchmark: http://www.mattmahoney.net/dc/text.html

In particular, Fabrice Bellard is leading with a transformer model for “enwik9” that which is 10 times larger than the one used in Hutter. It’s doing quite well with enwik8 too, but perhaps the issue is that the “economy of scale” with on the fly training hasn’t caught up in the smaller benchmark yet.

Transformers are definitely lossy, but afaict Bellard’s entry uses the probabilities generated by the model to create an encoding that used fewer bits

YetAnotherNick

Lossless text compression is basically something like arithmetic coding[1] on top of probability. In fact the loss metric that LLM directly optimize for is bits required for lossless compression. Loss plotted in papers is generally nits/token.

The only reason why LLM are not in the leaderboard because it is not a task for compressing human knowledge. It's a task for compressing wikipedia which is tiny in comparison, and the model size plays a big role.

[1]: https://en.wikipedia.org/wiki/Arithmetic_coding

omoikane

500000 EUR is the prize pool. Each winner has to gain at least 1% improvement over previous record to claim a prize that is proportional to the improvement. Getting the full 500000 EUR prize requires an 100% improvement (i.e. compressing 1GB to zero bytes).

phobotics

Does it or does it just require 1% improvement over the last winner? As opposed to a static additional 1% improvement vs the initial best “score”.

omoikane

It's 1% over the last winner. The latest winner has a total size of 114156155, compared to previous winner of 115352938. The payout was

   500000 * (1 - 114156155 / 115352938) = 5187

(see table near "Baseline Enwik9 and Previous Records Enwik8")

bigyikes

Probably if you succeed at this, 500,000 will be worthless to you

sytelus

Why? How does this improvement translates to more financial gains?

munchler

The idea is that sufficiently powerful compression is indistinguishable from artificial intelligence, which can be used to make even more money.

Eduard

because with that knowledge, you will be able to decompress 0 dollar to infinite dollars which the storage mafia will pay you for not publishing your breakthrough in making them obsolete.

Al-Khwarizmi

Well, if you achieve this, you'll basically have proven that something (a bunch of information) equals nothing (no information). So 1=0.

Once you have that, becoming rich is trivial. Multiplying both sides of the equation by one trillion, 1 trillion = 0. So open a bank account with $0, now you have one trillion. Easy.

A funnier (although grimmer) way: let p be the world's population. Multiplying both sides of the equation by (p-1) you get p-1=0. Thus, p=1. If you assume that you exist (which is a reasonable assumption, following Descartes's reasoning) you now own all the wealth in the world.

hegzploit

because, then you will have beat pied piper in the market.

quickthrower2

Being that smart translates to more financial gains.

abroadwin

Form is emptiness, emptiness is form. 1GB == 0B. Full prize money delivered upon successfully achieving enlightenment.

arapacana

holographic storage, if we can harness the zero point field the way Karl Pribram postulated cognitive memory works (i.e. using the Void for storage) may actually get us to 0 "physical" B ;)

Alifatisk

> compressing 1GB to zero bytes

Is this even possible? Don’t we need atleast some data to decompress from?

jfengel

Not really 0, since any exe file will have some minimum length, and that's defined to be part of the calculation.

Conceptually you could cram it into some dozens of bytes (trivial initial state of the universe + basic physics + iteration). In practice, of course, that's ridiculous.

AnimalMuppet

Depends on your definitions. I could decompress 1GB from zero bytes, with a program that is just over 1GB in length...

albertzeyer

It's very well defined on the homepage:

> Create a Linux or Windows compressor comp.exe of size S1 that compresses enwik9 to archive.exe of size S2 such that S:=S1+S2 < L

So obviously your suggestion does not work.

lainga

Ah... I had professors who graded like that

TheAlchemist

I mean, come on man. For some reason, the nerd in me sees this and immediately adds it on my 'I really need to do this' list.

Just memories of old times doing some similar (albeit less challenging probably) competitions on TopCoder almost a decade ago, and also the curiosity to see how I would manage it know, with experience. Given that the current scores are also very far from what they estimate the lower bound to be, this is really interesting ! The prize is however very misleading - per their own FAQ - the total possible payout is ~223k euros.

Definitely not thanking you for the hours I will put into this !

RugnirViking

I think the total prize pool can be won with repeated 1% improvements by different people but never by a single person no?

demandingturtle

I have an idea if you want to code it. You know how we can drop the vowels from sentences and still understand the sentence? What if we do that on the first stage? May not work for every case so have to identify. Worse comes to worst, use full word. Probably not saving much though. Worth a try.

RugnirViking

You got a lot of flak for what is clearly a take from someone that isn't versed in compression techniques. But as one might to a student; you're on the right track! This idea is similar in form to "arithmetic coding" which is what people are using to chip away at this. Namely, finding smaller encodings which can be used to predict common parts (maybe a recognisable word, more likely a sequence of bits or characters) of the full encoding, then cycling through storing "hints" for each part it would get wrong until it can predict the exact desired output

bob1029

I think you have it backwards, but I like the direction of thinking. There is redundancy in the language itself.

> However, vowel-only sentences were always significantly more intelligible than consonant-only sentences, usually by a ratio of 2:1 across groups. In contrast to written English or words spoken in isolation, these results demonstrated that for spoken sentences, vowels carry more information about sentence intelligibility than consonants for both young normal-hearing and elderly hearing-impaired listeners.

https://pubs.aip.org/asa/jasa/article-abstract/122/4/2365/98...

yosito

What you're describing is a form of lossy compression. Yes, it can compress the file, but you're losing some information such that there's no way to convert it back to it's original form without external information. And, as you noted: it may not work for every case, which means some information would be permanently lost.

falloutx

By your logic, the sign language which mostly communicates via symbols and its mostly 1-2 symbols per word, no spaces, will be the best compression.

saulpw

I bet you think of yourself as an idea guy.

freilanzer

> I bet you think of yourself as an idea guy.

What does this even mean?

Dave_Rosenthal

While I know it wouldn't qualify for the prize due to the size of the model, I'm curious how well a big modern LLM can compress the prize file (enwiki9) compared to the current record. (I guess ideally it would be an LLM not trained on wikipedia as well, to minimize memorizing).

On that thought, with a modern LLM's weights dwarfing the enwiki9 file used by the prize, it feels like this prize was setup with the right idea to advance AGI, but with the problem several orders of magnitude to small.

lIIllIIllIIllII

The site itself says that this is the general idea, proposing that intelligence can actually be defined as the ability to compress information as much as possible and.. I guess either just rehydrate it or also derive inferences from it.

So basically like enthalpy

Interesting - if you play the universe in reverse (reversing entropy) you wind up at the Big Bang, which is the final boss of compressing everything, into a singularity.

eru

> Interesting - if you play the universe in reverse (reversing entropy) you wind up at the Big Bang, which is the final boss of compressing everything, into a singularity.

Only if you believe that running the universe forward did not add any extra information. Ie everything is deterministic, and no randomness occurs.

(That's actually a reasonable stance to take. Quantum mechanics is probabilistic in the Copenhagen interpretation, but completely deterministic in the Many World interpretation.)

vjerancrnjak

A non-local formulation of QM is deterministic. [0] An addition to the "many world" interpretation. Still one universe but the underlying reality is fully connected (to go forward you need to take the effects of everything).

0: https://en.wikipedia.org/wiki/De_Broglie–Bohm_theory

kaba0

It is more than likely that most physical reactions are not deterministic, and in that case the Big Bang definitely didn’t encode anything about all the state we have now. (Also, simply by Occam’s razor it seems much much more likely)

DaiPlusPlus

> The site itself says that this is the general idea, proposing that intelligence can actually be defined as the ability to compress information as much as possible

I don't understand how anyone can define "intelligence" like that...

GolDDranks

It's not an unreasonable definition, if you are aware of Kolmogorov complexity and Solomonoff induction. Intelligence is intimately connected to the ability to predict and model things, and it turns out that data compression is _also_ connected to the ability of predict and model things.

Thoreandan

There is a different but similar lossless compression benchmark which just had a new front-runner using LLMs, see https://bellard.org

NooneAtAll3

If anyone is interested in lossless compression competition, but is too intimidated by 100MB/1GB size and level of optimization already achieved here - you can try http://golf.horse/ challenges, which include Wordle wordlist, Pokemons and OEIS database

OscarCunningham

I don't understand why they sum the size of the compressor with the combined compressed file and decompressor. I think the compressed file and decompressor would make an ungameable challenge on their own. Their FAQ section 'Why is Compressor Length superior to other Regularizations?' is satisfied equally well by the decompressor length.

jcparkyn

I had the same question, and I think this FAQ entry answers it fairly well: http://prize.hutter1.net/hfaq.htm#addcomp

OscarCunningham

Thanks, that one answers my question.

userbinator

It's interesting to see a new winner of different nationality beating the multiple previous records by the same guy, and what appears to be another one who beat him 2 years ago; for some (cultural?) reason, there seems to be a historical association between Soviets and advanced data compression technologies in general, possibly starting with Markov's work in the 19th century.

eru

The causation for this might run along the lines of having plenty (in absolute terms) of well educated smart folks but pretty low wages.

Most of the really smart people in Europe and the US for example already have high paying jobs and thus less time for these relatively low paying competitions.

Rexxar

Why the formula use the size of the compressor ? Intuitively I would expect the size of the decompressor and compressed data is the only think that is really important to know how much the total information have been compressed.

Klaus23

Otherwise you could just store the whole text in the decompressor.

Another way is to keep the file secret and just output a score, but if you want to use a public file, you have to include the program in the calculated size.

jcparkyn

> Otherwise you could just store the whole text in the decompressor.

I don't think GP was asking about the decompressor, but specifically the compressor.

The FAQ has a fairly detailed answer for this though: http://prize.hutter1.net/hfaq.htm#addcomp

Klaus23

Yes, I didn't read the comment carefully and quickly assumed it was the standard question about including the decompressor.

Rexxar

Thanks, I didn't see this part of the faq !

Conlectus

Perhaps because they include metrics for compression time, which could easily be gamed by doing significant precomputation in the compressor?

dang

Related. Others?

Saurabh Kumar's fast-cmix wins €5187 Hutter Prize Award - https://news.ycombinator.com/item?id=36839446 - July 2023 (1 comment)

Hutter Prize Submission 2021a: STARLIT and cmix (2021) - https://news.ycombinator.com/item?id=36745104 - July 2023 (1 comment)

Hutter Prize Entry: Saurabh Kumar's “Fast Cmix” Starts 30 Day Comment Period - https://news.ycombinator.com/item?id=36154813 - June 2023 (5 comments)

Hutter Prize - https://news.ycombinator.com/item?id=33046194 - Oct 2022 (3 comments)

Hutter Prize - https://news.ycombinator.com/item?id=26562212 - March 2021 (48 comments)

500'000€ Prize for Compressing Human Knowledge - https://news.ycombinator.com/item?id=22431251 - Feb 2020 (1 comment)

Hutter Prize expanded by a factor of 10 - https://news.ycombinator.com/item?id=22388359 - Feb 2020 (2 comments)

Hutter Prize: up to 50k € for the best compression algorithm - https://news.ycombinator.com/item?id=21903594 - Dec 2019 (2 comments)

Hutter Prize: Compress a 100MB file to less than the current record of 16 MB - https://news.ycombinator.com/item?id=20669827 - Aug 2019 (101 comments)

New Hutter Prize submission – 8 years since previous winner - https://news.ycombinator.com/item?id=14478373 - June 2017 (1 comment)

Hutter Prize for Compressing Human Knowledge - https://news.ycombinator.com/item?id=7405129 - March 2014 (24 comments)

Build a human-level AI by compressing Wikipedia - https://news.ycombinator.com/item?id=143704 - March 2008 (4 comments)

lucb1e

One would think it ought to be possible to automate generating these overviews. Or has it been automated and is it being posted with a non-service account because it's partially but not fully automated?

LeonB

Pretty sure Dang has some nice scripts that generate this, and I expect he does a little manual pruning before posting it. He’s very precise in his choices, definitely applies some intuition/judgement. We can’t losslessly compress him — yet!

hn_throwaway_99

This is probably a dumb question, but I recall from (way) back in my college days studying Shannon's source coding theorem that there are calculable limits regarding how much a data stream can be compressed. Has this limit been calculated for the sample file in question? And apologies if this was in the linked article or one of the footnotes.

mgraczyk

To establish bounds in this way, you have to start with some claim about the distribution of the input data. In this case, the data is natural human language, so it's difficult or impossible to directly state what distribution the input was drawn from. Even worse, the prize is for compressing a particular text, not a sample from a distribution, so tight bounds are actually not possible to compute.

There is some discussion on the Hutter prize page, under "What is the ultimate compression of enwik9?"

http://prize.hutter1.net/hfaq.htm

jftuga

From this article:

More empirically, Shannon's lower estimate suggests that humans might be able to compress enwik9 down to 75MB, and computers some day may do better.

londons_explore

Notable that enwiki8/9 isn't really just human text - a good ~half of the data is random xml markup which may not have the same properties as text.

AnotherGoodName

The current best compresses it to <15MB because it's not a human which that example is referring to. :)

AnotherGoodName

No. In fact there are limits but unless you've already written a 'perfect compression tool' you can't actually know the limit.

https://en.wikipedia.org/wiki/Kolmogorov_complexity is ultimately what you're looking for btw. Shannon is more about limits in transmission speed given noise but Kolmogorov dealt with the limits of compression which is actually unknowable.

TheRealPomax

  Q: Why do you restrict to a single CPU core and exclude GPUs?
  A: The primary intention is to limit compute and memory to some generally available amount in a transparent, easy, fair, and measurable way. 100 hours on one i7 core with 10GB RAM seems to get sufficiently close to this ideal

Sorry, who are these people that don't have a GPU? Even laptops have GPUs. Why would you spend 100 hours on an "i7" (which generation? 4790K or six times faster 13700k?) CPU when you can achieve orders of magnitude better performance on a consumer GPU that literally everyone has access to?

Zandikar

> Sorry, who are these people that don't have a GPU

Literally millions of people. Especially circa early 2000's when this started.

> GPU that literally everyone has access to?

They don't. (Money is a barrier to access).

> Why would you spend 100 hours on an "i7"

Because it's what they had/have.

This a competition to move understanding forward, not to see who has the biggest budget.

nomel

The level of privilege that some people are completely blind to is just incredible.

modeless

Sure, in the early 2000s. But today literally every laptop or desktop computer or phone with a touchscreen has a GPU. Even most feature phones have GPUs, I'd actually be interested to see if you could find a currently manufactured one that doesn't. And they have all been programmable for a very long time now. And almost without exception they have more FLOPS than all of the CPUs in the same machine put together, yes, even the integrated GPUs.

undefined

[deleted]

TheRealPomax

In which case: https://news.ycombinator.com/item?id=37502782

And 20 years changed A LOT of things, turns out the folks who can't afford a deskop or laptop don't need to, they have a call phone. And even cell phones have GPUs that are more powerful than the CPUs of 20 years ago.

anon____

The first winner didn't even have a computer!

    Q: What if I can (significantly) beat the current record?

    A: In this case, submit your code and win the award and/or copyright your code and/or patent your ideas. You should be able to monetize your invention beyond the HKCP. This happened to the first winner, a Russian/Ukrainian who always had to cycle 8km to a friend to test his code because he did not even have a suitable computer, and who now has a lucrative job at QTR in Canada.

http://prize.hutter1.net/hfaq.htm#fame

tyingq

>lucrative job at QTR in Canada

Google failed me on this one. What company/entity is QTR?

anon____

Quantum Technology Recruiting Inc. (https://ca.linkedin.com/company/quantum-technology-recruitin...)

Our guy, Alex Rhatushnyak, is listed as employee.

BTW, this is the first result on DuckDuckGo. (https://duckduckgo.com/?q=qtr+company+canada)

Is it time to switch? ;)

undefined

[deleted]

lucb1e

Note that the competition close to 20 years old

...though I also had a GPU in 2006, so idk. Then again, you need to define something as reference hardware and it doesn't really matter what it is. Better compression should win out over less-good compression no matter if you run both on a 100-core system or a 1-core system, I think?

TheRealPomax

In the category "then update your FAQ, you've have many, many years to do so" =D

(not to change the rules, but to explain why they rules haven't changed, and to clarify the super vague phrasing even back when the 3dfx Voodoo was the pinnacle of graphics. Level playing fields are a worthwhile pursuit)

ozzmotik

anecdata: for my entire life, I have neber, until about 3 weeks ago, owned a laptop with any type of gpu other than an integrated graphics card built into the motherboard. for most of my life in fact I have never really had any modern or really serviceable hardware, as I've basically inherited all of my compute devices secondhand never had the money to just go out and buy a decent rig (or to even put one together for the matter, perhaps tbst will elucidate the economic sector which I have basic y always occupied. well, not always, as now that I'm homeless, I somehow went even LOWER! )

though it truly does excite me that I finally have something with some type of GeForce branded x graphics device inside of it, now I can finally run local machine learning tasks and not have to cloud it out

undefined

[deleted]

stronglikedan

My two years old high end productivity Thinkpad has an Intel Iris XE display adapter, which I would hesitate to call a GPU.

nier

Maybe commodity servers with integrated and probably never used graphics capabilities that no one would actually describe as a GPU?

caseyavila

I do think it's interesting that recent submissions use nearly the entire 50 hours. I wonder how much better people could do if faster hardware was allowed.

AnotherGoodName

Absolutely they could. All the entries so far can do even better with more RAM. That PAQ entries have command line flags to do this. That's already known in fact.

What this is looking for is fundamental improvements, not "i brute forced a known way to win this competition".

londons_explore

An element of compression is usually a search problem - 'lets try all these ways to encode the data, and see which is smallest'.

Therefore to maximize compression, you tweak parameters to search as hard as possible, and therefore use all the time.

concurrentsquar

This competition was from 2006.

The consensus among AI experts (really just everyone) during the 1990s/2000s was that:

- AGI could be achieved by giant GOFAI (usually expert systems/knowledge bases) projects (like OpenCog and Cyc).

- ... or that AGI development is limited by lack of knowledge about key insights (mostly symbolic/rational) into intelligence, not computation/data. (IE AGI was viewed similarly to proving that NP = P or other very-hard math/computer science/psychology/philosophy problems).

- ... or that AGI will be achieved through brain scanning/connectomics.

- ... or that AGI is impossible.

Nobody (except for LeCun and Schmidhuber) paid much attention to neural networks until AlexNet (2012) showed that they could be ran and trained at fast speed and beat the symbolic competition. In the 2000s, only a real, actual psychic would be able to tell you that LLMs would be a valuable path for AGI research.

Here is a list of various expert (and "expert") perspectives on AGI during the 1990s/2000s (notice how nobody is talking about neural networks, and they are definitely not talking about anything remotely close to a LLM or transformer):

> Copycat is a computer program designed to be able to discover insightful analogies, and to do so in a psychologically realistic way. Copycat's architecture is neither symbolic nor connectionist, no was it intended to be a hybrid other two (although some might see it that way); ... [describes a very symbolic system to our modern day eyes, though it was not really symbolic to 1990s AI researchers]

- Douglas Hofstadter and Melanie Mitchell, Fluid Concepts and Creative Analogies (Chapter 5), 1995

> Interviewer: Are you an advocate of furthering AI research?

> Dennett: I think that it’s been a wonderful field and has a great future, and some of the directions are less interesting to me and less important theoretically, I think, than others. I don’t think it needs a champion. There’s plenty of drive to pursue this research in different ways.

> Dennett (cont): What I don’t think it’s going to happen and I don’t think it’s important to try to make it happen; I don’t think we’re going to have a really conscious humanoid agents anytime in the foreseeable future. And I think there’s not only no good reason to try to make such agents, but there’s some pretty good reasons not to try. Now, that might seem to contradict the fact that I work on a Cog project [sic] with MIT, which was of course is an attempt to create a humanoid agent, cogent, cog, and to implement the multiple drafts model of consciousness; my model of consciousness on it.

> Dennett (later): [Cog is intended as a] proof of concept [for AGI]. You want to see what works but then you don’t have to actually do the whole thing.

- Daniel Dennett, Daniel Dennett Investigates Artificial Intelligence, Big Think, 2009

> [Context: Marvin Minsky had a speech where he talked about how expert systems don't work, because they do not have any common sense (and the only solution seems to be to create a giant AGI project (without automatic data gathering)).]

> Only one researcher has committed himself to the colossal task of building a comprehensive common-sense reasoning system, according to Minsky. Douglas Lenat, through his Cyc project, has directed the line-by-line entry of more than 1 million rules into a commonsense knowledge base.

- Mark Baard, AI Founder Blasts Modern Research, Wired, 2003

> Section 1 discusses the conceptual foundations of general intelligence as a discipline, orienting it within the Integrated Causal Model of Tooby and Cosmides; Section 2 constitutes the bulk of the paper and discusses the functional decomposition of general intelligence into a complex supersystem of interdependent internally specialized processes, and structures the description using five successive levels of functional organization: Code, sensory modalities, concepts, thoughts, and deliberation. Section 3 ... [yada yada, this is old, wrong stuff]

- Eliezer Yudkowsky, Levels of Organization in General Intelligence, 2007, Machine Intelligence Research Institute

I could list more examples, but I have spent way too long on this post. What I will say is that Hutter probably had the most correct idea of how modern semi-general AI would work (from the 2000s). He figured out that compression is a extremely important component of intelligence > 10 years before everybody was doing LLMs. That is impressive.

I should probably write a blog post over this.

TheRealPomax

Trying to figure out who you're replying to though, none of this has anything to do with compression algorithms.

concurrentsquar

> Being able to compress well is closely related to intelligence as explained below. While intelligence is a slippery concept, file sizes are hard numbers. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 1GB of Wikipedia better than your predecessors, your (de)compressor likely has to be smart(er). The intention of this prize is to encourage development of intelligent compressors/programs as a path to AGI.

- Marcus Hutter, http://prize.hutter1.net/

bob1029

> So the ultimate compressor of it should "understand" all human knowledge, i.e. be really smart. enwik9 is a hopefully representative 1GB extract from Wikipedia.

I spotted 2 really compelling examples of "compression is AI" posted recently.

ChatLZMA: https://news.ycombinator.com/item?id=37318810

Ziplm: https://news.ycombinator.com/item?id=36732430

Daily Digest email

Get the top HN stories in your inbox every day.