Search Google and Yandex for &quot;2020 election fraud.&quot; The results are VERY different. The Zach Vorhies leak shows that Google regularly does blatant censorship for political purposes.[1][1]<a href="https:&#x2F;&#x2F;www.breitbart.com&#x2F;tech&#x2F;2021&#x2F;08&#x2F;19&#x2F;google-whistleblower-search-engine-rewrote-algorithms-to-go-after-trump&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.breitbart.com&#x2F;tech&#x2F;2021&#x2F;08&#x2F;19&#x2F;google-whistleblow...</a>

&gt;They are the best search engine by far for politically controversial topics.This is an interesting take given the political censorship in Russia (for some ineffable reason much harsher now than it used to be 4 months ago) and cases like <a href="https:&#x2F;&#x2F;twitter.com&#x2F;kevinrothrock&#x2F;status&#x2F;1510944781492531208" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;kevinrothrock&#x2F;status&#x2F;1510944781492531208</a>.

&gt; This is the trend in the west. Technology is too powerful, we must control it!I take it that you&#x27;re either too young or too untraveled to be aware of the level of state control of technology in &quot;the east&quot;. Xerographic machines, mimeographs, and other similar reprographic devices used to be highly controlled machinery behind the Iron Curtain. This is absolutely not something exclusive or even peculiar to &quot;the west&quot;.

Actually they&#x27;re not, some of the Yandex products are actually better and pretty innovative (ignoring the political stuff). Maps and Go are especially good. Ditto with Russian banking apps, they put American bank apps to shame.

&gt;&gt; They are the best search engine by far for politically controversial topicsFYI, they are Russian subject that follows ALL their censorship laws (and oh boy do they have a lot of it).&gt;&gt; probably to perhaps to prevent western competitors from using them
The irony here. All yandex products are exact copies of western, adjusted to local market.

It&#x27;s widely accepted that OpenAi doesn&#x27;t release its models to make money from them, not because they really think they would be harmful

They literally have blocklist of sites that kremlin doesnt like and it acts somehow similar to yandex news in this part. The difference here is more that google filters stuff for usa and yandex for russia

&gt; Hey, we are the bad guys you&#x27;re talking about so who are we keeping this technology from?Laughed out loud!

I feel like you could be paid, or coerced by some country...

I love Yandex. They are the best search engine by far for politically controversial topics. They also release a language model to benefit everyone even if it says politically incorrect stuff. They also name their projects &quot;cocaine&quot; probably to perhaps to prevent western competitors from using them.You look at OpenAI and how they don&#x27;t release their models mainly because they fear &quot;bad people&quot; will use them for &quot;bad stuff.&quot; This is the trend in the west. Technology is too powerful, we must control it! Russia is like... Hey, we are the bad guys you&#x27;re talking about so who are we keeping this technology from? The west has bigger language models than we do, so who cares. Also their attitude to copyright and patents, etc. They don&#x27;t care because that&#x27;s not how their economy makes money. Cory Doctorow&#x27;s end of general purpose computing[1] and locked down everything is very fast approaching. I&#x27;m glad the Russians are around and aren&#x27;t very interested in that project.[1]<a href="https:&#x2F;&#x2F;csclub.uwaterloo.ca&#x2F;resources&#x2F;tech-talks&#x2F;cory-doctorow-the-war-on-general-purpose-computing&#x2F;" rel="nofollow">https:&#x2F;&#x2F;csclub.uwaterloo.ca&#x2F;resources&#x2F;tech-talks&#x2F;cory-doctor...</a>

What if Street Sharks were mormon missionaries? How would Emily Dickinson describe Angie Dickinson in a poem? How would Ramses II have used Bitcoin?THESE are the important things to talk about when it comes to this topic.

This is one of the funniest threads I’ve ever seen on this website. People are yelling at eachother about the CIA and the legitimacy of Israel and Assange and the definition of fascism and… anything that pisses anybody off about international politics in general. In a thread about a piece of software that’s (to me and likely many others) prohibitively expensive to play around with.Anyway I hope somebody creates a playground with this so I can make a computer write a fan fiction about Kirby and Solid Snake trying to raise a human baby on a yacht in the Caspian Sea or whatever other thing people will actually use this for.

I started to work on something similar but way behind your project. I really believe AI models can help us as humans learn better! Do you have a blog or any other writeups on how you approached these problems?

How does the vocabulary generation work?

There are tons of commercial uses for these models. I&#x27;ve been experimenting with an app targeted toward language learners [1]. We use large language models to:- Generate vocabulary - e.g. for biking: handlebars, pedals, shifters, etc- Generate translation exercises for given topic a learner wants to learn about - e.g. I raised the seat on my bike- Generate questions for the user - e.g. What are the different types of biking?- Provide more fluent ways to say things - I went on my bike to the store -&gt; I rode my bike to the store- Provide explanations of the difference in meaning between two wordsAnd we have fine tuned smaller models to do other thing like grammar correction, exercise grading, and embedded search.These models are going to completely change the field of education in my opinion.1) <a href="https:&#x2F;&#x2F;squidgies.app" rel="nofollow">https:&#x2F;&#x2F;squidgies.app</a> - be kind it&#x27;s still a bit alpha

Sorry but every part of that sounds so terrible.

I hate this so much. These tools are getting better, so often you realise only half way through that you are reading AI text. Then you have to flush your brain and take a mental note, to never visit that site again.

Content made for machines. Probably a billion dollar industry.

You just know that some Amazon listings are written by GANs.

We&#x27;re using these at where I work (large retail site) to help make filler text on generated articles. Think the summary blurb no one reads at the top. As for why we&#x27;re writing these articles (we have a paid team that writes them too), the answer is SEO. This is probably the only thing I&#x27;ve seen done with a text model in production usage. I&#x27;m not 100% sure what model they&#x27;re using.

&quot;worthless&quot; huh, not everyone can afford inference of a ~500gb models, depending on the the speed&#x2F;rate you need you might definitely go for smaller modelBut maybe your sentence was more about &quot;after BigScience model, open-sourcing anything smaller than that will be useless&quot; which isn&#x27;t necessarily true either, because there is still room to improve parameter efficiency, i.e. smaller models with comparabale performances

Not necessarily, only ~30% of the database is in English, so it likely won&#x27;t be as good as a smaller model trained solely or mostly on English words.<a href="https:&#x2F;&#x2F;bigscience.huggingface.co&#x2F;blog&#x2F;building-a-tb-scale-multilingual-dataset-for-language-modeling" rel="nofollow">https:&#x2F;&#x2F;bigscience.huggingface.co&#x2F;blog&#x2F;building-a-tb-scale-m...</a>

HuggingFace will soon release their BigScience model: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;BigScienceLLM&#x2F;status&#x2F;1539941348656168961" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;BigScienceLLM&#x2F;status&#x2F;1539941348656168961</a>&quot;a 176 billion parameter transformer model that will be trained on roughly 300 billion words in 46 languages&quot;So anything smaller than that will become worthless. May be a factor, companies have a last chance to make a PR splash before it happens.Read more about it: <a href="https:&#x2F;&#x2F;bigscience.huggingface.co&#x2F;blog&#x2F;model-training-launched" rel="nofollow">https:&#x2F;&#x2F;bigscience.huggingface.co&#x2F;blog&#x2F;model-training-launch...</a>

My guess is they&#x27;re mostly vanity projects for large tech companies. While the models have some value, they also serve as interesting research projects and help them attract ML talent to work on more profitable models like ad-targeting.

I agree that the lack of benchmarks makes it hard to determine how valuable this model is. But on the topic of dropout, dropout has been dropped for the pretraining stage of several other large models. Off the top of my head: GPT-J-6B, GPT-NeoX-20B, and T5-1.1&#x2F;LM.

They did not publish benchmarks about quality of the models, which is very suspicious.I personally squinted hard when they said removing dropout improves training speed (which is in iterations per second), but said nothing about how it affects the performance (rate of mistakes in inference) of the trained model.

An equally plausible frame is that once a technology becomes replicated across several companies, it makes sense to open source it since the marginal competitive advantage are the possible resultant external network effects.I don&#x27;t know if that&#x27;s the right way to think about the open sourcing of large language models. I just think we really can&#x27;t read too much into such releases regarding their motivation.

Those models aren&#x27;t trained with the objective of being deployed in production.
They are trained to be used as teachers during distillation into smaller models that fit the cost&#x2F;latency requirements for whatever scenario those big companies have. That&#x27;s where the real value is.

Yandex uses it for search and voice assistant

From what I&#x27;ve seen, using these huge models for inference at any kind of scale is expensive enough that it&#x27;s difficult to find a business case that justifies the compute cost.

&gt; If you, for some reason, have a few hundred GPU lying aroundNot to nitpick, but that is like saying that if you have a Lamborghini lying around, a Sunday trip in one is not so expensive.

I can&#x27;t think of anyone having a few hundred GPUs around unless:- They were into Ethereum mining and quit.- They&#x27;ve already built a cluster with them (e.g. in an academic setting).- They live in a datacenter.- They are a total psychopath.But even assuming one magically has all those GPUs available and ready to train, I don&#x27;t want to calculate the power cost of it anyway. Unless one has access to free or extremely cheap electricity it would still be very expensive.

Maybe training it is not that expensive?I know from practice that it takes a really really long time to train even a small nn (thousands of params) , so you&#x27;ll need a lot more hardware to train one with billions... But, it&#x27;s expensive to buy the hardware, not necessarily to use it. If you, for some reason, have a few hundred GPU lying around, it might be &quot;cheap&quot; to do the necessary training.Now, that&#x27;s not your point - cost != price. But, still...

To add a voice of skepticism. The recent rush to open source these models may be indicative that the tens of millions that’s spent training these things has relatively poor roi. There may be a hope that someone else figures out how to make these commercially useful.

Case in point: recently, I&#x27;ve noticed that I&#x27;m getting more and more emails with the sign off &quot;Warm regards.&quot; This is not a coincidence. It is an autosuggestion from Google. If you start signing off an email, it will automatically suggest &quot;Warm regards.&quot; It just appears there -- probably an idea generated from an AI network. There are more and more of these algorithmic &quot;suggestions&quot; appearing every day, in more and more contexts. This is true for many text messaging programs: There are &quot;common&quot; replies suggested. How often do people just click on one of the suggested replies, as opposed to writing their own? These suggestions push us into conforming to the expectations of the algorithm, which then reinforces those expectations, creating a cycle of further pushing us into the language use patterns generated by software -- as opposed to idiosyncratic language created by a human mind.In other words, people are already behaving like bots; and we&#x27;re building more and more software to encourage such behavior.

A tangentially related thought:Actors attempt to imitate humans. “Good acting” is convincing; the audience believes the actor is giving a reasonable response to the portrayed situation.But the audience is also trying to imitate the actors to some degree. Like you point out, humans imitate. For some subset of the population, I’d imagine the majority of social situations they are exposed to, and the responses to situations they observe, are portrayed by actors.At what point are actors defining the social responses that they then try to imitate? In other words, at what point does acting beget acting and how much of our daily social interactions actually are driven by actors? And is this world of actors creating artificial social responses substantially different than bots doing the same?

So maybe the Turing Test is not about AI are smart enough, but about how stupid humans become?

It&#x27;s the commonly believed reason; the child starting to take on habits from Gua, like noises when she wanted something, and the way monkeys scratch themselves. No authoritative source for it though, it&#x27;s what I&#x27;ve been told during a lecture back in college, and I think PlainlyDifficult mentions it too in their video about it.<a href="https:&#x2F;&#x2F;youtu.be&#x2F;VP8DD9TGNlU" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;VP8DD9TGNlU</a>

Nice post! But to me your analogy does not really stand : bots are the ones catching up with human conversation in an &quot;accelerated way&quot;, feeding on a corpus that predates them. Bots are not an invariant nature that netizens imitate.

I sincerely regret that I had only one upvote to give you. This shit is so insidious that IMO everyone should just simply stop doing it until they&#x27;ve thought it through a lot more.&gt; ...humans condition themselves in a far more accelerated way to behave like bots than bots are potentially able to do.Than bots can condition themselves to behave like humans, I presume. They can already behave exactly like bots. :-)

Wow this is such a mind bending perspective. Thanks for sharing it.

The bots&#x2F;machine vs human reminds me of that famous experiment from the 30s in which Winthrop Kellogg[0], a comparative psychologist, and his wife decided to raise their human baby (Donald) simultaneously with a chimpanzee baby (Gua) in an effort to &quot;humanize the ape&quot;. It was set out to last 5 years but was relatively quickly abrupted after only 9 months. The explicit reason wasn&#x27;t stated only that it successfully proved the hereditary limits within the &quot;nature vs nurture&quot; debate of a chimpanzee, the reticent statement reads as follows:&gt;Gua, treated as a human child, behaved like a human child except when the structure of her body and brain prevented her. This being shown, the experiment was discontinuedThere have been a lot of speculation as to other reasons of ending the experiment so prematurely. Maybe exhaustion. One thing which seemed to dawn on the parents - if one reads carefully - is that a human baby is far superior at imitating than the chimpanzee baby, frighteningly so, that they decided to abort the experiment early on in order to prevent any irreversible damage in the development to their human child which at that point had become far more similar to the chimpanzee than the chimpanzee to the human.So, I would rephrase &quot;the internet is dead&quot; into &quot;the internet becomes increasingly undead&quot; because humans condition themselves in a far more accelerated way to behave like bots than bots are potentially able to do.
From the wrong side this could be seen as progress when in fact it&#x27;s opposite progress. It sure feels like that way for a lot of of people and is a crucial reciprocal element often overlooked&#x2F;underplayed (mostly in a benign effort to reduce unnecessary complexities) when analyzing human behaviour in interactions with the environment.[0]<a href="https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Winthrop_Kellogg#The_Ape_and_The_Child" rel="nofollow">https:&#x2F;&#x2F;en.m.wikipedia.org&#x2F;wiki&#x2F;Winthrop_Kellogg#The_Ape_and...</a>

I think there will be a trend where model&#x27;s size will shrink due to better optimization &#x2F; compression while hardware specs keep increasing.You can already see this with Chinchilla:<a href="https:&#x2F;&#x2F;towardsdatascience.com&#x2F;a-new-ai-trend-chinchilla-70b-greatly-outperforms-gpt-3-175b-and-gopher-280b-408b9b4510" rel="nofollow">https:&#x2F;&#x2F;towardsdatascience.com&#x2F;a-new-ai-trend-chinchilla-70b...</a>

&gt; I could watch a movie made for meWe&#x27;re a long, long way from this. Stringing words&#x2F;images together into a coherent sequence is arguably the easy bit of creating novels&#x2F;films, and computers still lag a long way behind humans in this regard.Structuring a narrative is a harder, subtler step. Our most advanced ML solutions are improving rapidly, but often struggle with coherence over a single paragraph; they&#x27;re not going to be doing satisfying foreshadowing and emotional beats for a while.

I get the feeling that creative sci fi used to kind of help inoculate us against these kinds of future but it seems like there&#x27;s much less of it than there used to be.&quot;Black mirror&quot; was good but it&#x27;s not nearly enough.

You really dont want to live in Mindwarp (1992 Bruce Campbell movie) or in this !114! year old short story <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;The_Machine_Stops" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;The_Machine_Stops</a>

That&#x27;s definitely the future, personalized entertainment and social interactions will be big. I could watch a movie made for me, and discuss it with a bunch of chat bots. The future will be bubbly as hell, people will be decaying in their safe places as the hellscape rages on outside.

...what? 60 thousand dollars for a dedicated computer that you can&#x27;t use is not everyone, not on their own computers, and is also a crazy large amount of money for nearly everyone. Sure there are some that could, but that&#x27;s not what I said.

You could just run this on a desktop CPU, there&#x27;s nothing stopping you in principle, you just need enough RAM. A big memory (256GB) machine is definitely doable at home. It&#x27;s going to cost 1-2k on the DIMMs alone, less if you use 8x32GB, but that&#x27;ll come down. You could definitely do it for less than $5k all in.Inference latency is a lot higher in relative terms, but even for things like image processing running a CNN on a CPU isn&#x27;t particularly bad if you&#x27;re experimenting, or even for low load production work.But for really transient loads you&#x27;re better off just renting seconds-minutes on a VM.

Nitpick: This uses 8x A100 which are at least $10k a piece to my knowledge. Add in the computer and you&#x27;re closer to $100k.

You&#x27;re grossly overestimating. People who make 60k annually are getting a bit rarer nowadays, it&#x27;s not like everyone can afford it. For the majority of people it&#x27;d be a multi-decade project, for a few it might only take 7 years, very few people could buy it all at once.

What kind of computer would they be?Can you spec it out roughly?

&gt; I have to wonder if 10 years down the line, everyone will be able to run models like this on their own computers.Isn’t that already the case? Sure, it costs $60K, but that is accessible to a surprisingly large minority, considering the potency of this software.

In what sense is the dichotomy between CPU and GPU contrived? Those are designed around fundamentally different use cases. For low power devices you can get CPU and GPU integrated into a single SOC.

You&#x27;re an optimist.Before any of the things you describe happen, most states will mandate the equivalent of a carry permit to be able to freely use compute for undeclared and&#x2F;or unapproved purposes.

Unpopular opinion: something will stop egalitarian power for the masses. I had high hopes for multicore computing in the late 90s and early 2000s but it got blocked every step of the way by everyone doubling down on DSP (glorified vertex buffer) approaches on video cards, leaving us with the contrived dichotomy we see today between CPU and GPU.Whatever we think will happen will not happen. A less-inspired known-good state will take its place, creating another status quo. Which will funnel us into dystopian futures. I&#x27;m just going off my own observations and life experience of the last 20 years, and the way that people in leadership positions keep letting the rest of us down after they make it.

If by running models you mean just the inference phase, then even today you can run large family of ML models on commodity hardware (with some elbow grease, of course). The training phase is generally the one not easily replicated by non-corporations.

Historically, hard-to-falsify documents are an anomaly, the norm was mostly socially conditional and enforced trust. Civilizations leaned and still lean on limited-trust technologies like personal connections, word of mouth, word on paper, signatures, seals, careful custody etc. I agree losing cheap trust can be a setback, just want to point out we’re adaptable.

I know it&#x27;s a sort of exaggerated paranoid thought. But like these things do all come down to scale and some areas of the world definitely could have the amount of compute available to make dall-e level quality full scale videos which we might be consuming right now. It really does make you start to wonder at what point we will rationally be able to have zero trust that not everything we watch online is fabricated.

upcoming mac pro will have pretty poor ML performance when compared to even an old nvidia gpu sadly.

I&#x27;m predicting that the upcoming Mac Pro will be very popular among ML developers, thanks to unified memory.
It should be able to fit the entire model in memory.Combine that with the fact that PyTorch recently added support for Apple silicon GPUs.

I have to wonder if 10 years down the line, everyone will be able to run models like this on their own computers. Have to wonder what the knock-on effects of that will be, especially if the models improve drastically. With so much of our social lives being moved online, if we have the easy ability to create fake lives of fake people one has to wonder what&#x27;s real and what isn&#x27;t.Maybe the dead internet theory will really come true; at least, in some sense of it. <a href="https:&#x2F;&#x2F;www.theatlantic.com&#x2F;technology&#x2F;archive&#x2F;2021&#x2F;08&#x2F;dead-internet-theory-wrong-but-feels-true&#x2F;619937&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.theatlantic.com&#x2F;technology&#x2F;archive&#x2F;2021&#x2F;08&#x2F;dead-...</a>

I think it is a stupid question, but does the power consumption needed by processors to infer compared to human brains demonstrate that there is something fundamentally wrong for the AI approach or is it more physics related?I am not a physicist or biologist or anything like that so my intuition is probably completely wrong but it seems to me that for more basic inference operations (lets say add two numbers) power consumption from a processor and a brain is not that different. It’s like seeing how expensive it is for computers to infer for any NLP model, humans should be continuously eating carbs just to talk.

I bought a 1500w psu soon after the previous crypto collapse for around $150, one of the best purchases I did.

The RAM is not using all that much of the power, and I think that scales more on bus width than capacity.

It&#x27;s also a power issue. The 4090 sounds like you&#x27;re going to need a much, MUCH higher PSU than you currently use.. or it&#x27;ll suddenly turn off as it uses 2-3x the power.You&#x27;ll need your own wiring to run your PC soon :-)

What NVIDIA predominantly does on their consumer cards is limit the RAM sharing, not the RAM itself. The inability for each GPU to share RAM is the limiting factor. It is why I have RTX A5000 GPUs and not RTX 3090 GPUs.

Nvidia deliberately keeps their consumer&#x2F;gamer cards limited in memory. If you have a use for more RAM, they want you to buy their workstation offerings like RTX A6000 which has 48G DDR6 RAM or A100 which has 80G.

200+ GiB of RAM still sounds like a pretty steep hardware requirement.

If you don&#x27;t care about inference speed being in the 1-5sec range, then that should be doable with CPU offloading, with e.g. DeepSpeed.

For the people that didn&#x27;t click on the link:&gt;but is able to work with different configurations with ≈200GB of GPU memory in total which divide weight dimensions correctly (e.g. 16, 64, 128).

What&#x27;s the difference between Apple&#x27;s unified memory and the shared memory pool Intel and AMD integrated GPUs have had for years?In theory you could probably assign a powerful enough iGPU a few hundred gigabytes of memory already, but just like Apple Silicon the integrated GPU isn&#x27;t exactly very powerful. The difference between the M1 iGPU and the AMD 5700G is less than 10% and a loaded out system should theoretically be tweakable to dedicate hundreds of gigabytes of VRAM to it.It&#x27;s just a waste of space. An RTX3090 is 6 to 7 times faster than even the M1, and the promised performance increase of about 35% for the M2 will means nothing when the 4090 will be released this year.I think there are better solutions for this. Leveraging the high throughput of PCIe 5 and resizable BAR support might be used to quickly swap out banks of GPU memory, for example, at a performance decrease.One big problem with this is that GPU manufacturers have incentive to not implement ways for consumers GPUs to compete with their datacenter products. If a 3080 with some memory tricks can approach an A800 well enough, Nvidia might let a lot of profit slip through their hands and they can&#x27;t have that.Maybe Apple&#x27;s tensor chip will be able to provide a performance boost here, but it&#x27;s stuck on working with macOS and the implementations all seem proprietary so I don&#x27;t think cross platform researchers will really care about using it. You&#x27;re restricted by Apple&#x27;s memory limitations anyway, it&#x27;s not like you can upgrade their hardware.

Unified memory is and always has been a cost cutting tactic. Its not a feature not matter how much manufacturers who use it try to claim it is.

Apple is selling M1&#x27;s with &gt; 200gb ram? Have a link so I can buy one?

Take a look at Apple&#x27;s M1 Max, a lot of fast unified memory. No idea how useful though

Perhaps on quantity. Substantially slower though around ~3x from what I can tell…substantial roadblock if you’re training models that take weeks.

Wondering if Apple Silicon will bring arge amounts of unified main memory with high bandwidth to the masses?The Mac Studio maxes out at 128GB currently for around $5K, so 256GB isn&#x27;t that far out and might work with the ~200GB Yandex says is required.

Can Apple Silicone&#x27;s unified memory be an answer?

Seeing those gigantic models it makes me sad that even the 4090 is supposed to stay at 24GB of RAM max. I really would like to be able to run&#x2F;experiment on larger models at home.

I downloaded the weights and made a .torrent file (also a magnet link, see raw README.md). Can somebody else who downloaded the files as well doublecheck the checksums?<a href="https:&#x2F;&#x2F;github.com&#x2F;lostmsu&#x2F;YaLM-100B&#x2F;tree&#x2F;Torrent" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;lostmsu&#x2F;YaLM-100B&#x2F;tree&#x2F;Torrent</a>

It&#x27;s not really random access. I bet the graph can be pipelined such that you can keep a &quot;horizontal cross-section&quot; of the graph in memory all the time, and you scan through the parameters from top to bottom in the graph.

These models are not character-based, but token-based. The problem with CPU inference is the need for random access to 250 GiB of parameters, meaning immense paging and orders of magnitude slower than normal CPU operation.I wonder how bad it comes out with something like Optane?

<pre><code> $ dd if=&#x2F;dev&#x2F;zero of=&#x2F;swapfile bs=1G count=250 status=progress
 $ chmod 600 &#x2F;swapfile
 $ mkswap -U clear &#x2F;swapfile
 $ swapon &#x2F;swapfile</code></pre>

Way too slow on CPU unfortunatelyBut this does make me wonder if there&#x27;s any way to allow a graphics card to use regular RAM in a fast way? AFAIK built-in GPU&#x27;s inside CPU&#x27;s can but those GPU&#x27;s are not powerful enough

What about 250gb of ram and use a cpu ?

For those of us without 200GB of GPU RAM available... How possible is it to do inference loading it from SSD?Would you have to scan through all 200GB of data once per character generated? That doesn&#x27;t actually sound too painful - 1 minute per character seems kinda okay.And I guess you can easily do lots of data parallelism, so you can get 1 minute per character on lots of inputs and outputs at the same time.

Still a bit to expensive for my sideproject ; ) To be honest it seems only big corp can do that kind of stuff. By the way if try to do hyper parameter tuning or some exploration in the architecture it becomes guess 10x or 100x more expensive.

Lambda labs will rent you an 8xA100 instance for 3 months for $21,900. That would put it at around $2m

<a href="https:&#x2F;&#x2F;coreweave.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;coreweave.com&#x2F;</a> offers some of the cheapest GPU compute out there

It&#x27;s just crazy how much it costs to train such models. As I undestand 800 A100 cards would cost about 25.000.000 without considering the energy costs for 61 days of training.

Now we just need someone to figure out how to compress the model to get similar performance in 10B parameters.I assume some of the services that offer GPT-J APIs will pick this up, but it doesn&#x27;t look cheap or easy to get this running.

IMO the main reason these companies don&#x27;t release their models is not ethical concerns but money:- NVIDIA sells GPUs and interconnect needed for training large models. Releasing a pretrained LM would hurt sales, while only publishing a teaser paper boosts them.- Google, Microsoft, and Amazon offer ML-as-a-service and TPU&#x2F;GPU hardware as a part of their cloud computing platforms. Russian and Chinese companies also have their clouds, but they have low global market share and aren&#x27;t cost-efficient, so nobody would use them to train large LMs anyway.- OpenAI are selling their models as an API with a huge markup over inference costs; they are also largely sponsored by the aforementioned companies, further aligning their interests with them.Companies that release large models are simply those who have nothing to lose by doing so. Unfortunately, you need a lot of idle hardware to train them, and companies that have it tend to also launch a public cloud with it, so there is a perpetual conflict of interests here.

OpenAI should just rebrand since nothing they do is actually open.

The “ethical concerns” thing is just a progressive-sounding excuse for why they’re not going to give their models away for free. I guarantee you those models are going to be integrated into various Google products in some form or another.

This reminded me of a shitpost comparing Google and Yandex.<a href="https:&#x2F;&#x2F;desuarchive.org&#x2F;g&#x2F;thread&#x2F;78144754&#x2F;#78145600" rel="nofollow">https:&#x2F;&#x2F;desuarchive.org&#x2F;g&#x2F;thread&#x2F;78144754&#x2F;#78145600</a>

Yandex Image Search is today is what Google Image Search should have been.End of the day I’ll use what actually gets the job done.Same goes for OpenAI and Google AI. If you don’t actually ever release and let others use your stuff and end paralyzed in fear at what your models may do then someone else is gonna release the same tech, and at this rate it seems like that’ll be Chinese or Russian companies who don’t share your sensibilities at all, and their models will be the ones that end up productized.

I regularly use it for a sample of what Google and Bing are intentionally omitting.

I agree with this. When I am still addicted to porn, Yandex Image is the only one that seems to find relevant and useful links.

A 500 days old product in beta? I hope they do well.

FWIW, <a href="https:&#x2F;&#x2F;same.energy&#x2F;" rel="nofollow">https:&#x2F;&#x2F;same.energy&#x2F;</a> seems to work fine for me

You misunderstood parent post. It&#x27;s about Google not being sued for discrimination.<a href="https:&#x2F;&#x2F;www.washingtonpost.com&#x2F;news&#x2F;the-intersect&#x2F;wp&#x2F;2016&#x2F;08&#x2F;10&#x2F;study-image-results-for-the-google-search-ugly-woman-are-disproportionately-black&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.washingtonpost.com&#x2F;news&#x2F;the-intersect&#x2F;wp&#x2F;2016&#x2F;08...</a><a href="https:&#x2F;&#x2F;www.theguardian.com&#x2F;technology&#x2F;2016&#x2F;apr&#x2F;08&#x2F;does-google-unprofessional-hair-results-prove-algorithms-racist-" rel="nofollow">https:&#x2F;&#x2F;www.theguardian.com&#x2F;technology&#x2F;2016&#x2F;apr&#x2F;08&#x2F;does-goog...</a><a href="https:&#x2F;&#x2F;www.bloomberg.com&#x2F;news&#x2F;articles&#x2F;2021-10-19&#x2F;google-quietly-tweaks-image-search-for-racially-diverse-results" rel="nofollow">https:&#x2F;&#x2F;www.bloomberg.com&#x2F;news&#x2F;articles&#x2F;2021-10-19&#x2F;google-qu...</a><a href="https:&#x2F;&#x2F;theconversation.com&#x2F;googles-algorithms-discriminate-against-women-and-people-of-colour-112516" rel="nofollow">https:&#x2F;&#x2F;theconversation.com&#x2F;googles-algorithms-discriminate-...</a>

IIRC it was mostly from groups like Getty images. They and other image licensing companies didn&#x27;t want google showing their images in search results. They claimed it was copyright infringement and given the absolute state of IP law in the US they could have made Google&#x27;s life very difficult.

Maybe the view image link removal in 2018.<a href="https:&#x2F;&#x2F;www.theverge.com&#x2F;2018&#x2F;2&#x2F;15&#x2F;17017864&#x2F;google-removes-view-image-button-from-search-results" rel="nofollow">https:&#x2F;&#x2F;www.theverge.com&#x2F;2018&#x2F;2&#x2F;15&#x2F;17017864&#x2F;google-removes-v...</a>

&gt; Google overlords neutered their own product out of fear over lawyers&#x2F;regulationWhat kind of lawyers&#x2F;regulation do you have in mind? If anything, I&#x27;d find the opposite: lawyers and copyright holders should be grateful for such a tool that - when it was still working - allowed you to trace websites using your images illegally.Now they all use Yandex for this purpose, with relatively good results.

Side note: Yandex search is awesome, and I really hope they stay alive forever. It&#x27;s the only functional image search nowadays, after our Google overlords neutered their own product out of fear over lawyers&#x2F;regulation and a disdain for power users.You can&#x27;t even search for images &quot;before:date&quot; in Google anymore.

You already have access to thousands of machine now from your home computer.Naval Ravikant put it best here: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;naval&#x2F;status&#x2F;1002106977273565184" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;naval&#x2F;status&#x2F;1002106977273565184</a>

If you live in a datacenter, it already is!

I would recommend auditing Stanford courses in following order:1. CS231n Machine Vision <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;playlist?list=PLkt2uSq6rBVctENoVBg1T...</a>2. CS234 Reinforcement Learning <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=FgzM3zpZ55o&amp;list=PLoROMvodv4rOSOPzutgyCTapiGlY2Nd8u" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=FgzM3zpZ55o&amp;list=PLoROMvodv4...</a>3. CS330 Meta Learning <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=0rZtSwNOTQo&amp;list=PLoROMvodv4rMC6zfYmnD7UG3LVvwaITY5" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=0rZtSwNOTQo&amp;list=PLoROMvodv4...</a>Those will get you on track with general concepts about reasoning, AI engineering and concepts of learning itselfLanguage models for me a bit of headache because there&#x27;re in different domain on intersection with linguistics and humanities but here&#x27;s a good course<a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=rha64cQRLs8&amp;list=PLoROMvodv4rPt5D0zs3YhbWSZA8Q_DyiJ" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=rha64cQRLs8&amp;list=PLoROMvodv4...</a>Those are all free and high-quality but require a lot of brain power

Speaking of which... I built a gaming PC a few years ago but I never use it these days. I want to install Linux on it and start playing around with machine learning.Can anyone recommend any open source machine learning project that would be a good starting point? I want one that does something interesting (whether using text, images, whatever), but simple&#x2F;efficient enough to run on a gaming PC and see some kind of results in hours, not months. I&#x27;m not sure what I want to do with ML yet, I just know I&#x27;m interested, and getting something up and running is likely to enthuse me to start playing and researching further.My spec is: GeForce RTX 2080 Ti (11GB), a 24-core AMD Ryzen Threadripper, and 128GB RAM. I&#x27;d be willing to spend on a new graphics card if it would make all the difference. I am a competent coder and familiar with Python but my experience with ML is limited to fawning over things on HN. Any recommendations gratefully received!

Or maybe the AI will own big companies that build bigger models for it. &#x2F;s

When it will be possible to run this at home, the big companies will have models way bigger than this...

Well, it used to be impossible to render on anything not a mainframe in a reasonable time.The day will come when we will be able to.

They hardware they mention can be rented from cloud providers. It’s just that it’s not very cheap.

I was about to comment exactly the same thing. Stuff like this makes me feel so much behind because there&#x27;s no way I can run this lol.

Disk makes no sense considering RAM is pretty cheap. But even then RAM is way too slow (and the communication overhead way too high). You probably get like a 100x slowdown or more.

If your disk has enough space to store the model, I think in theory you could run them, using the disk to store states. But it will be slow. I&#x27;m not sure how slow though, and also if anyone has implemented this. It actually should not be too difficult.

Couldn’t the same thing be said about most things we do on our phones these days?Won’t incremental advancement cover this eventually? (i.e. no major breakthrough required, just patience).

I think that unlikely. Barring some breakthrough that takes us beyond the limits of silicon.

I hope one day it will be possible to run this kind of models at home.

With DeepSpeed an SSD and lots of RAM you should be able to run inference with a 8GB card. There’s a thread somewhere on <a href="https:&#x2F;&#x2F;discuss.huggingface.co&#x2F;" rel="nofollow">https:&#x2F;&#x2F;discuss.huggingface.co&#x2F;</a> doing the math around this

so err the cheapest A100 i could find was EUR 10.579,79 .Suddenly that 3090 i wanted to get, does not seem so expensive....

&gt; It was tested on 4 (A100 80g) and 8 (V100 32g) GPUs, but is able to work with different configurations with ≈200GB of GPU memory in total which divide weight dimensions correctly (e.g. 16, 64, 128).so we looking at crazy prices just for inference. RIP to the first guy&#x27;s cloud billing account who makes this public