Brian Lovin
/
Hacker News
Daily Digest email

Get the top HN stories in your inbox every day.

ipsum2

The graphs ranking GPUs may not be accurate, as they don't represent real-world results and have factual inaccuracies. For example:

> Shown is raw relative performance of GPUs. For example, an RTX 4090 has about 0.33x performance of a H100 SMX for 8-bit inference. In other words, a H100 SMX is three times faster for 8-bit inference compared to a RTX 4090.

RTX 4090 GPUs are not able to use 8-bit inference (fp8 cores) because NVIDIA has not (yet) made the capability available via CUDA.

> 8-bit Inference and training are much more effective on Ada/Hopper GPUs because of Tensor Memory Accelerator (TMA) which saves a lot of registers

Ada does not have TMA, only Hopper does.

If people are interested I can run my own benchmarks on the latest 4090 and compare it to previous generations.

varunkmohan

Would be curious to see your benchmarks. Btw, Nvidia will be providing support for fp8 in a future release of CUDA - https://github.com/NVIDIA/TransformerEngine/issues/15

I think TMA may not matter as much for consumer cards given the disproportionate amount of fp32 / int32 compute that they have.

Would be interesting to see how close to theoretical folks are able to get once CUDA support comes through.

touisteur

Well every since I've read these papers doing FFTs faster than cuFFT using tensor cores (although FFT isn't supposed to be helped by more flops but only better memory bandwidth) and also fp32-level accuracy on convolutions with 3x tf32 tensor cores sweeps (available in cutlass) I'm quite ready to believe some hype about TMA.

Anything improving memory bandwidth for any compute part of the GPU is welcome.

Also I'd like for someone to crack open RT cores and get the ray-triangle intersection acceleration out of Optics. Have you seen the FLOPS on these things?

touisteur

Before I forget, here's the link to the tcFFT paper https://ar5iv.labs.arxiv.org/html/2104.11471v1

And the fp32-gemm-with-tf32 https://arxiv.org/abs/2203.03341

lostmsu

Can you please run bench from https://github.com/karpathy/nanoGPT ?

onnodigcomplex

I'm not on my desktop, but I used nanoGPT extensively on my RTX-4090. I trained a GPT2-small but with a small context window (125 > 90M params) at batch sizes of 1536 using gradient accumulation (3*512). This runs at just a smidge over 1 it/s. Some notes

- Gradient checkpointing and 2048 batch size in a single go allows ~10% performance improvement on a per sample basis.

- torch.compile doesn't work for me yet (lowest cuda version I got my 4090 to run on was 11.8 but highest cuda version on which I got the model to compile is 11.7).

- I did the optimalisations in https://arxiv.org/abs/2212.14034

lostmsu

Can you share the code and numbers so that I could compare directly with my 3090?

Do you train in fp16/bf16?

Have you tried fp8?

p1esk

When people say “8 bit inference” they mean INT8, not FP8.

ipsum2

It's pretty clear from the article that 8-bit inference is referring to FP8, as the author claims int8 is similar to fp16 inference performance.

> Ada/Hopper also have FP8 support, which makes in particular 8-bit training much more effective. I did not model numbers for 8-bit training because to model that I need to know the latency of L1 and L2 caches on Hopper/Ada GPUs, and they are unknown and I do not have access to such GPUs. On Hopper/Ada, 8-bit training performance can well be 3-4x of 16-bit training performance if the caches are as fast as rumored. For old GPUs, Int8 inference performance for old GPUs is close to the 16-bit inference performance.

fxtentacle

I find this article odd with its fixation on computing speed and 8bit.

For most current models, you need 40+ GB of RAM to train them. Gradient accumulation doesn't work with batch norms so you really need that memory.

That means either dual 3090/4090 or one of the extra expensive A100/H100 options. Their table suggests the 3080 would be a good deal, but it's not. It doesn't have enough RAM for most problems.

If you can do 8bit inference, don't use a GPU. CPU will be much cheaper and potentially also lower latency.

Also: Almost everyone using GPUs for work will join NVIDIA's Inception program and get rebates... So why look at retail prices?

mota7

> Gradient accumulation doesn't work with batch norms so you really need that memory.

Last I looked, very few SOTA models are trained with batch normalization. Most of the LLMs use layer norms which can be accumulated? (precisely because of the need to avoid the memory blowup).

Note also that batch normalization can be done in a memory efficient way: It just requires aggregating the batch statistics outside the gradient aggregation.

fxtentacle

wav2vec2, whisper, HifiGAN, Stable Diffusion, and Imagen all use BatchNorm.

undefined

[deleted]

jszymborski

> It doesn't have enough RAM for most problems.

It might not be as glamorous or make as many headlines, but there is plenty of research that goes on below 40Gb.

While I most commonly use A100s for my research, all my models fit on my personal RTX 2080.

ngcc_hk

Wonder are you trying to walk around the limit or it just happen like this?

jszymborski

We're trying to walk around to limits:

I) My research involves biological data (protein-protein interactions) and my datasets are tiny (about 30K high-confidence samples). We have to regularize aggressively and use a pretty tiny network to get something that doesn't over-fit horrendously.

II) We want to accommodate many inferences (10^3 to 10^12) inferences on a personal desktop or cheap OVH server in little time so we can serve the model on an online portal.

varunkmohan

I'm not sure any of this is accurate. 8 bit inference on a 4090 can do 660 Tflops and on an H100 can do 2 Pflops. Not to mention, there is no native support for FP8 (which are significantly better for deep learning) on existing CPUs.

The memory on a 4090 can serve extremely large models. Currently, int4 is started to become proven out. With 24GB of memory, you can serve 40 billion parameter models. That coupled with the fact that GPU memory bandwidth is significantly higher than CPU memory bandwidth means that CPUs should rarely ever be cheaper / lower latency than GPUs.

lostmsu

> Almost everyone using GPUs for work will join NVIDIA's Inception program and get rebates... So why look at retail prices?

They need to advertise it better. First time I hear about it.

What are the prices like there? GPUs/workstations?

fxtentacle

Depends on who you know, but I've seen as low as €799 per new 3090 TI. But you need to waive the right to resell them and there are quotas, for obvious reasons.

alwayslikethis

Consumer parts are dirt cheap compared to enterprise ones. Most companies are not able to use them at scale due to CUDA license terms. I don't think there is much of a need for rebates here. For hobbyists, it is somewhat of a steep price for the latest cards, but it's already way down from the height of ETH mining a year back.

nl

> For most current models, you need 40+ GB of RAM to train them. Gradient accumulation doesn't work with batch norms so you really need that memory.

There's a decision tree chart in the article that addresses this - as it points out there are plenty of models that are much smaller than this.

Not everything is a large language model.

meragrin_

> Almost everyone using GPUs for work will join NVIDIA's Inception program and get rebates... So why look at retail prices?

So maybe they were including information for the hobbyists/students which do not need or cannot afford the latest and greatest professional cards?

habibur

> If you can do 8bit inference, don't use a GPU. CPU will be much cheaper and potentially also lower latency.

Good advice. Does that mean that I can install like 64 gb ram on a PC and run those models in comparable time?

fxtentacle

That's how cloud speech recognition is usually deployed. OpenAI whisper is faster than realtime on regular desktop CPUs, which I guess is good enough.

And for a datacenter, a few $100 AMD CPUs will beat a single $20k NVIDIA A100 at throughput per dollar.

ftufek

> Also: Almost everyone using GPUs for work will join NVIDIA's Inception program and get rebates... So why look at retail prices?

Out of curiosity, does that also apply for consumer grade GPUs?

chrisMyzel

you can get RTX/A6000s but not 3090s or 4090s via inception.

lostmsu

Prices? Prebuilt workstations? Or should we just apply and see, is that easy?

fxtentacle

The client I work with can order 3090 TI and 4090 through their Inception link (in Germany). Apparently, it varies by partner.

tambre

I would be surprised if it did. But you probably shouldn't do professional work on GPUs that lack ECC memory.

nl

The lack of ECC memory is almost certainly not a factor. If you can train at FP8 your model will recover from a single flipped bit somewhere.

gbolcer

3060 12GB are the best things you can buy right now. They are cheap, have a ton of memory--which seems to be the issue w/ image generation--and you can fit four of them into the cheapest motherboards.

3060ti 8GB, 3090 24GB, and 4000 series all have performance benefits, but for now this one is off the charts.

BatteryMountain

With this card you can also run Open AI's Whisper with the Large model (the multilingual one!), as it requires 10GB.

crabbycarrot

Highly recommend quantizing the model (https://pytorch.org/tutorials/recipes/recipes/dynamic_quanti...). I converted the large model to use int8, and I'm able to run it 5x real-time on CPU with pretty low RAM requirements with still very good quality.

Const-me

My implementation of Whisper uses slightly over 4GB VRAM running their large multilingual model: https://github.com/Const-me/Whisper

layoric

Another one to consider is the A4000 16gb. I recently bought an ex-miner card for ~$500 usd . They are around a 3070 with a decent amount of memory for training scenarios, and are single slot cards. I believe there are a lot of these workstation ex miner cards which are pretty heavily discounted.

Combine this with a second hand X99 / 2011-v3 platform like the Dell Precision T7910 dual socket Xeons and you can have a pretty decent homelab for ML workloads. The Dell can come with a 1300 watt PSU and can fit 4 of those cards comfortably (5 with reduced PCIe lanes on one) since they are 150W each.

lithiumii

Also A2000 12GB, it's a slightly less powerful 3060, but it only requires 75 w of power, meaning you don't need to plug in a power cable.

gzer0

Any suggestions on which Motherboard would be ideal for a 4x 3060 12GB setup?

janekm

For image inference 12GB is ok for now (but you may not be able to use all future models given that T5 language model is becoming popular), for training I'd consider 24GB the bare minimum.

hnrodey

A few weeks ago I got a 3060 12GB for $250 from a guy on FB Marketplace.

moneycantbuy

Anyone know if high-RAM Apple silicon such as the 128 GB M1 Ultra is useful for training large models? RAM seems like the limiting factor in DL, so I'm hoping apple can put some pressure on nvidia to offer consumer GPUs with more than 24GB RAM.

dwrodri

Having large amounts of unified memory is useful for training large models, but there are two problems with using Apple Silicon for model training:

1. Apple doesn't provide public interfaces to the ANE or the AMX modules, making it difficult to fully leverage the hardware, even from PyTorch's MPS runtime or Apple's "Tensorflow for macOS" package

2. Even if we could fully leverage the onboard hardware, even the M1 Ultra isn't going to outpace top-of-the-line consumer-grade Nvidia GPUs (3090Ti/4090) because it doesn't have the compute.

The upcoming Mac Pro is rumored to be preserving the configurability of a workstation [1]. If that's the case, then there's a (slim) chance we might see Apple Silicon GPUs in the future, which would then potentially make it possible to build an Apple Silicon Machine that could compete with an Nvidia workstation.

At the end of the day, Nvidia is winning the software war with CUDA. There are far, far more resources which enable software developers to write compute intensive code which runs on Nvidia hardware than any other GPU compute ecosystem out there.

Apple's Metal API, Intel's SYCL, and AMD's ROCm/HIP platform are closing the gap each year, but their success in the ML space is dependent upon how many people they can peel away from the Nvidia Hegemony.

1:https://www.bloomberg.com/news/newsletters/2022-12-18/when-w...

samspenc

This basically seems to line up with what another tech community (outside of AI / ML) seem to agree on: that Apple hardware is not that great for 3D modeling or animation work. Their M1 and M2 chips rank terribly on the open-source Blender benchmark: https://opendata.blender.org/

Despite Apple engineers contributing hardware support to Blender: https://9to5mac.com/2022/03/09/blender-3-1-update-adds-metal...

So looks like NVidia (and AMD to some extent) are winning the 3D usage war as well for now.

hbbio

Specifically, a Macbook Air M2 ranks more or less like a NVidia GTX 970 of 2014.

But... the GTX by itself has a 145W TDP, whereas the whole laptop has a TDP of 20W!

This is super impressive imho, but of course not in the raw power department.

p1necone

They're pretty darn good for laptop GPUs.

This isn't really an Apple specific problem - GPUs with low power budgets are never going to beat GPUs intended to permanently run off of wall power.

(I know there's stuff like the Mac Studio and iMac, but those are effectively still just laptop guts in a (compact) desktop form factor rather than a system designed from the ground up for high performance workstation type use)

I'd love to see a dedicated PCIe GPU developed by Apple with their knowledge from designing the m1/m2, but it's not really their style.

alwayslikethis

The 4090 is able to draw 450 watts. Current apple silicon are all used on laptops, which cannot possibly sustain or cool that much power. It might change when they make a cooling-optimized desktop chip though.

carbocation

Not exactly an answer to your question, but from a pytorch standpoint, there are still many operations that are not supported on MPS[1]. I can't recall a circumstance where an architecture I wanted to train was fully supported on MPS, so some of the training ends up happening in CPU at that point.

1 = https://github.com/pytorch/pytorch/issues/77764

magoghm

I have a MacStudio Ultra with 64 GPU cores and 128 GB of RAM which I use it to play around with neural networks (among other things). I can create models larger than what I would be able to on consumer GPU cards but it isn't very fast. On average, I get about 20% of the speed of an RTX 3090.

threeseed

It is definitely useful but does not beat the top Nvidia cards. [1]

Would be very interesting to retest with the M2 and also in the months/years to come as the software reaches the level of optimisations we see on the PC side.

[1] https://wandb.ai/tcapelle/apple_m1_pro/reports/Deep-Learning...

JonathanFly

It's an interesting option. My gut instinct is that if you need need 128GB of memory for a giant model, but you don't need much compute - like fine tuning a very large model maybe - you might as well just use a consumer high core CPU and wait 10x as long.

5950X CPU ($500) with 128GB of memory ($400).

metadat

CPU-addressable RAM is not interchangeable with graphics card RAM, so unfortunately this strategy isn't quite in the right direction.

AFAIK it flat out won't work with the DL frameworks.

If I'm mistaken please do speak up.

Edit: Thank you JonathanFly, I didn't know this!

JonathanFly

All the frameworks work on CPU. At the time I tried it, the 5950X was about 10x slower than my GPU, which was a 1080Ti or 2080Ti. GAN not a transformer though.

nl

I think they are saying train (or at least fine-tune) on a CPU.

This can work in some cases (years ago I certainly did it for CNNs - was slow but you are fine tuning so anything is an improvement) but I don't know how viable it would be on a transformer.

zone411

If you are training a model that requires this much memory, it will also require a lot of compute, so it would be too slow and not cost-effective. It may be useful for inference.

htrp

in a word .... no

irusensei

My totally user/curious 2 sats experience:

I've recently tried to play with Stable Diffusion. I have an RTX 3060, a base M1 MBP and a laptop with an AMD RX6800m.

- The RTX 3060 just works. Setup was straightforward, performance is good.

- The M1 has those neural engine thingies but it's not compatible with SD. It can run SD on CPU but if you want to make use of the NE you need coreml specific programs and models. Issue here is that it's just not the same stuff. Prompt structure is also different. It doesn't recognize weights and put a lot more emphasis on prompt order. Most of the time it seems to ignore most of what you wrote. On the positive side running stuff on the NE is very fast and doesn't seem to be taxing to the system.

- Finally the 6800M. It should be the powerhouse of the bunch considering its gaming performance ahead of the 3060. Problem is AMD toolkit kinda sucks. They have this obscure ROCm HIP stuff that acts as some kind of translation layer to the CUDA API. It's complicated, it simply doesn't work without a bunch of obscure environment variables and only works on fp32 mode, which means it uses twice amount of video RAM for the same thing. Support is iffy as they seem to only support workstation cards. Using it often throws lots of obscure compilation errors. Bugs celebrating anniversary.

To sum things up, Nvidia is far ahead of the curve in both usability and performance. Apple is trying to do its thing but it's too early yet while AMD is in a messy situation. Hope it helps.

muxr

AMD's driver is much better on Linux than Nvidia's. And ROCm HIP stuff isn't obscure. It's what's needed to bridge Nvidia's proprietary vendor lock ins.

Nvidia is literally cancer in this space.

nicolaslem

AMD display driver on Linux is much better than nvidia's but the GPGPU situation is just ridiculous outside of the Pro cards.

irusensei

> AMD's driver is much better on Linux than Nvidia's.

1. Agree AMDGPU is awesome. 2. All tests were done in Linux with the exception of coreml on MacOS.

> And ROCm HIP stuff isn't obscure.

It is from my novice perspective. While Nvidia stuff works out of the box (even on Linux) I had to install Rocm and a special version of PyTorch for rocm. Then it only kinda worked after an assortment of environment variables to set GFX version. And even then it uses as twice as much video memory as what torch uses on Nvidia, limiting what I can do.

okamiueru

When you say "out of the box", then it's distros like pop!_os? Or has it become more common to automatically install proprietary drivers? Asking out of curiosity.

rpep

The ROCm libraries and their documentation are nowhere near as good though.

For e.g. compare rocFFT to cuFFT. They're worlds apart still.

asciimike

> What is the carbon footprint of GPUs? How can I use GPUs without polluting the environment?

> I worked on a project that produced carbon offsets about ten years ago. The carbon offsets were generated by burning leaking methane from mines in China. UN officials tracked the process, and they required clean digital data and physical inspections of the project site. In that case, the carbon offsets that were produced were highly reliable. I believe many other projects have similar quality standards.

Crusoe Cloud (https://crusoecloud.com) does the same thing; powering GPUs off otherwise flared methane (and behind the meter renewables), to get carbon-reducing GPUs. A year of A100 usage offsets the equivalent emissions of taking a car off the road.

Disclosure: I run product at Crusoe Cloud

throw_pm23

> burning methane

probably not the first thing that comes to mind to people when they hear about carbon offsets. Things are gone further than I thought.

CrazyStat

Methane that has been burned to convert it to CO2 is much less bad of a greenhouse gas than methane that has been allowed to just float off into the atmosphere.

Perhaps counterintuitive given how much we usually go around saying burning fossil fuels is bad for the environment, but the science is sound.

asciimike

Methane has an 80x higher GWP than CO2 over the first 20 years, so ending routine flaring (even if the answer is "complete combustion") is an immensely important problem to help address the effects of climate change. Another source: https://www.edf.org/climate/methane-crucial-opportunity-clim...

Some other info we've published: https://www.crusoeenergy.com/digital-flare-mitigation

kerpotgh

[dead]

sebow

I really hope AMD cleans up and invests some money into their software stack. Granted it's really hard to catch up to Nvidia, but I think it's doable in ~5 years. The barrier to entry into rocm compared to cuda is pretty high, and even accounting for the fact that things got better in the last years. AMD has potent hardware for AI/ML (see instinct), they just don't have it in the consumer space. However one of the key factors of getting adoption in the consumer space is the fore-mentioned software stack, which I recall was a pain to setup. The fact that they're going FOSS for rocm shows promise in this regard.

karolist

In current state Nvidia has no competition, and you buy AMD only because you hate Nvidia, not because it's better. I had tons of AMD cards my first being ATI 9000 but if you want to do more than game the hassle is not worth it. My last AMD was Radeon VII, once I got a bit serious into Blender there's just no comparison, even with enterprise drivers the crashes, random issues and slowness comparatively is just not worth it. What took me 3 minutes to render on Radeon VII takes 10s on 3090 Ti, StableDiffusion renders take 5-6s without any hassle playing with rocm, gaming is also no comparison with RTX (I don't even use DLSS). Fun fact I sold my RVII after 3 years and added $500 for a new 3090 Ti. Nvidia sux for their business practices but technically they dominate because they invested in software really early on and established themselves with Cuda, OptiX, RTX, DLSS. Older AMD cards are nice for hackintosh if you're into that though, Apple dropped Nvidia hard. Also the Linux driver blob thing if you want to be a purist, but IIRC it's supposedly changing (sorry don't have a source right now).

muxr

AMD has been improving a lot. Also you're comparing 3090ti to a 2 generation older RVII how is that a fair comparison?

moneycantbuy

Is a 4090 practically better than a 3090? I just built a new home DL PC with two 3090s because I knew I could fit them both in the case, whereas with the 4090 it seems more than one could be difficult. Also wondering if I can pool the RAM somehow, nvlink won't work because the 3090s are different sizes, and apparently nvlink doesn't do much more than pcie anyway.

JonathanFly

Basically 'it depends' is the answer to all your questions, but dual 3090s is a perfectly fine choice. Though ideally you would have NVLINK since it is an advantage over the 4090. In some specialized situations it is possible to have NVLINK act a lot like 48GB of memory, but if you don't already know if you can leverage NVLINK, you very likely aren't in that situation.

raihansaputra

tangential but the size of 4090 seems to be a mistake and hindering usecases like this. I believe NVIDIA changed to the samsung process a bit late and it produces less heat, but they have communicated to OEMs about the cooling requirements so nobody wants to redesign their card. I expect some aftermarket brands to create "slimmer" 4090 coolers to enable aircooled dense 4090 workstations.

kkielhofner

Gigabyte produced a dual slot 3090 for a little while before Nvidia pressured them to discontinue production[0].

I suppose we might see "slimmer" 4090s at some point but even if the design is (somehow) possible it's clear Nvidia won't allow their partners to manufacture dual slot versions of RTX cards that could possibly compete with higher end cards.

[0] - https://www.crn.com/news/components-peripherals/gigabyte-axe...

shaunsingh0207

NVLink was designed for exactly that use case, pooling memory.

JonathanFly

But most projects aren't designed to use it that way, as a virtual 48GB single GPU.

ianbutler

Flow chart was nice but I am not an organization and I like training multi billion param models. My next two cards were going to be the be the rtx 6000 ada. The memory capacity alone almost makes it necessary.

muffles

Interesting no mention or discussion of FPGAs for DL Neural networks.

"Our enhanced NPU on Stratix 10 NX delivers 24× and 12× higher core compute performance on average compared to the T4 and V100 GPUs at batch-6, despite the smaller NX die size."

"Results show that the Stratix 10 NX NPU running batch 6 inference achieves 12-16× and 8-12× higher average energy efficiency (i.e. TOPS/Watt) on the studied workloads compared to the T4 and V100 GPUs, respectively."

https://users.ece.cmu.edu/~jhoe/distribution/2020/fpt2020.pd...

buildbot

FPGAs are awesome, but are even less usable than AMD GPUs for ML by comparison - you may have to write a kernel to get a new net to work, and that really limits adoption. Software is the #1 thing that will enable you to get research done.

Disclaimer - I work on the team that was originally behind Brainwave at Microsoft.

shaklee3

V100 is 2 major generations old.

karolist

You can score 3090 for 800-900 used, with 24GB VRAM it's superb value

jszymborski

Depends on your market. I just checked in Montreal, Canada and you can't get anything used for less than 1600 CAD (around 1200 USD).

nanidin

I just checked Canadian eBay recently sold and there were several that went for 1000-1100 CAD in the last week.

jszymborski

I was looking at local classifieds (Kijiji), but it is true there are plenty of pretty reasonable (~1K CAD) listings on ebay.ca that ship from abroad.

nightski

Wow that is crazy, Microcenter in the USA had 4090s going for under $1100 USD not too long ago.

wellthisisgreat

4090s?? No way

humanistbot

But how many of those used 3090s have been run 24/7 by crypto farms?

kkielhofner

I'll take a retired crypto mining card over a random desktop/gamer card any day.

Crypto cards are almost always run with lower power limits, managed operating temperature ranges, fixed fan speeds, cleaner environments, etc. They're also typically installed, configured, and managed by "professionals". Eventual resell of cards is part of the profit/business model of miners so they're generally much better about care and feeding of the cards, storing boxes/accessories/packing supplies, etc.

Compared to a desktop/gamer card with sporadic usage (more hot/cold cycles), power maxed out for a couple more FPS, running in unknown environmental conditions (temperature control for humans, vape/smoke residue, cat hair, who knows) and installed by anyone.

smoldesu

Hard disagree, honestly. A mining card might be undervolted, but it will always have lived under sustained VRAM temps of 80c+. That's awful for the lifespan of the GPU (even relative to bursty gaming workloads) and once the memory dies, it's game over for the card. Used GPUs are always a gamble, but mostly because it depends on how used they are. No matter how you slice it, a mining card is more likely to hit the bathtub curve than a gaming one.

kika

And also often crazily overclocked

jszymborski

a surprising number are claiming to be Brand New in Box. Perhaps old scalper stock or maybe even resealed crypto cards.

kika

I hope I'm not getting downvoted into complete white on white for asking: Is there a good resource to learn myself a little ML for greater good, if I'm a complete math idiot? Something very practical, start with {this} and build an image recognition to tell birds from dogs. Or start with {that} and build a algo trading machine that will make me a trillionaire. I do have a 4090 in a windows machine which I can turn into a linux machine.

jamessb

If you're interested in Deep Learning specifically, the fast.ai "Practical Deep Learning for coders" course [1] is often recommended. It says "You don’t need any university math either — we’ll teach you the calculus and linear algebra you need during the course".

[1]: https://course.fast.ai/

kika

Thanks! I don't know if it's DL or not, but what I'm mostly interested is "finding patterns". Like I have this very long stream of numbers (or pair of numbers, like coordinates) and I use such streams to train. Then I have a shorter stream and the model infers the rest. Not sure if I'm talking complete nonsense or not :-)

undefined

[deleted]

gmiller123456

Math isn't an absolute requirement for training an AI, even programming isn't a requirement, as there are quite a few pre-built tools. I would recommend browsing through some of the many, many books available and see which one(s) speak to your current experience level. You won't compete against world class systems without going deeper, but as long as you're willing to start small, you have to start somewhere just to see if it's something you like doing.

knolan

Andrew Ng’s introductory courses on Coursera are a good way to start.

If you’re a bit more comfortable with command line stuff the fast.ai is good.

gmac

I found François Chollet's Deep Learning with Python to be an excellent intro. https://www.manning.com/books/deep-learning-with-python-seco...

mjburgess

Youtube is pretty full with things like "Make a Netflix Clone"

See, then, eg.,

https://www.youtube.com/results?search_query=build+an+image+...

kika

What I'm looking for is more like Rust Book. Concepts explained with examples of more or less real world problems. Like they are not just telling you there's this awkward thing like "interior mutability" but why you may need it.

Daily Digest email

Get the top HN stories in your inbox every day.

Which GPU(s) to get for deep learning - Hacker News