Get the top HN stories in your inbox every day.
jakear
TomVDB
Even after the edit, it's a bit confusing in that, without looking at the table, you get the impression that FP16 is slower than FP32.
Sayrus
Perhaps it has been edited. Now the table contains the following title "Training performance in images processed per second".
A "Higher is better" might still be interesting although redundant.
hughes
Even after being edited, it's still wrong. It shows the significantly lower Inception4 performance as a "40% speedup" instead of 40% of baseline images/sec.
zamadatix
"Training performance in images processed per second"?
mamon
Do you mean this table which has a caption over it that reads "Training performance in images processed per second" ?? Looks pretty self-explanatory.
ml_hardware
This is a poor comparison of performance. All of these networks are CNNs, and very old architectures at that. They are all probably memory bottlenecked which is why you see the consistent 50% improvement in FP32 perf.
It is also not clear what batch sizes are being used for any of the tests. If you switch to FP16 training, you must increase the batch size to properly utilize the Tensor Cores.
If you compare these cards at FP16 performance on large language models (think GPT-style with large model dimension), I am confident you will see Titan RTX outperform the 3090. The former has 130 TF/s of FP16.32 tensor core performance while the latter has only 70 TF/s.
Link: https://www.nvidia.com/content/dam/en-zz/Solutions/geforce/a...
binarymax
The 3090 RTX is also $1000 cheaper than the Titan, so there's that. It would be nice if there was a good way to express value per dollar. Perhaps in GLUE accuracy and training time.
ml_hardware
Totally agree! I think 3090 could be a lot more cost effective for researchers to dabble with NLP. But it really grinds my gears when people post these misleading benchmarks... the 3090 is handicapped at half-rate tensor core performance while the Titan RTX is not.
So if you're someone who does their work mainly in FP32, you will see improved performance with the 3090. On the other hand, if you are an FP16 speed demon who needs to train GPT-3 over the weekend, stick with your Titans :)
bitL
What do you think about TF32 in 3090? Could it replace FP32 with 5x speedup?
dplavery92
For many of us, the Inception-style CNN workloads--especially at FP32--are much more realistic than large language models that may be better suited to take advantage of the tensor cores. If I'm going to be memory bottlenecked either way, I probably don't want to spend an extra $1000 on 400 tensor cores I can't take full advantage of.
ml_hardware
If I may ask, why are the Inception style workloads still popular, rather than architectures like EfficientNet?
Also, why FP32? CNNs are some of the most robust models to train in FP16 (much easier than language models) so you could get yourself a quick XXX speedup and 2x memory savings by switching over.
(btw not intending to be accusatory or anything, I just think FP16 training deserves a lot more adoption that it currently seems to have :)
systemvoltage
Aside: Nit: Don't use gradients for discrete categories in a graph. Use a discrete color palette that perceptually distances colors as much as possible using a tool like this: https://medialab.github.io/iwanthue/
valine
Seems like a good speedup relative to the Titan, especially for the money. I’d be interested to see the performance relative to the 3080 though. There are obviously vram limitations with the 3080 but it would still be interesting to see the difference in raw compute performance.
In games the 3090 only gives a 15% performance bump relative to the 3080. If that pattern holds for machine learning tasks there is probably a scenario where it makes sense to buy two 3080s rather than one 3090.
If you are vram constrained then obviously the 3090 the way to go.
wombatmobile
If this isn't OT...
Could you kindly advise what kind of computer would make sense to purchase to begin learning about ML? I was assuming I'd get a 3080. Should I get a case that could potentially house 2 x 3080's? Does the case require any special cooling considerations, or just whatever will fit the cards? What CPU would you get?
gambiting
If you're "learning about ML" there is no point in buying anything. Just get the cloud compute instead, and for home use and testing literally anything will do. I have friends who work with ML professionally and even they say it's just hard to justify running any computations at home once you factor in the electricity and hardware cost - GCP compute just beats the cost, easily.
jjoonathan
What's the (pre-Ampere) GCP price for a V100? On AWS it was $3/hr, so at 100% use and market prices a Titan V would pay for itself vs the cloud inside a month. Is GCP significantly cheaper? Or are we talking about pricing at ~0% utilization?
claudeganon
Agreed. If you’re just learning or building hobby stuff, you can use Colab, Paperspace, or any number of other services for free or very cheaply.
gbrown
I'd honestly start with cloud options if learning is the only reason you're building the computer. You don't want to dump a bunch of money into depreciating GPUs if you're not going to end up using them.
GPUs are only really required in ML if you want to do deep neural network stuff. You can do plenty in CPU on reasonable data sets using any modern laptop.
valine
Well to start off I’m not advising buying two 3080s. I haven’t seen the benchmarks, and on top of that the 3080 doesn’t support SLI so if you do buy two of them you will need to be using software which can utilize two independent GPUs.
If you’re just wanting to learn machine learning you don’t need anything particularly special. I think you would be happy with GTX 1070. There is also the cloud computing route where you basically rent the gpu from AWS. That will be initially more cost effective than buying your own hardware.
One thing to keep in mind if you do go with the 3080 is the power consumption. Ampere cards are going to be much more power hungry than previous generations, and you will need to budget about 320W just to the graphics card. The recommended power supply for the 3080 is 850W.
posix_compliant
Nvidia recommends 750W for the power supply.
l33tman
Go for it! Get a motherboard with reinforced PCIe slots for both GPUs though, the cheaper mobos only have one armoured slot. Also, you really should use the 3080 Founders Edition I guess for a dual setup as they blow out part of their heat in the back. Otherwise you need good air control and as thin 3080 cards as you can find (so there's some space between them in the case). Still waiting for good thermal benchmarks on this kind of setup...
Of course if money is an issue, you are well off with only a single 3080 :)
There is no point in buying the 20xx series anymore. The 30xx are twice as good for the same money (if you can get one).
duhi88
Woah, there. They said they were just learning. No need to purchase special hardware until you're trying to run state of the art models.
You can get very far on any laptop before hardware becomes the main blocker. And before building an ML machine, there are cloud compute options available for far cheaper.
bitL
3080 has no NVLink, so 2x 3080 would communicate only via PCIe and it's unlikely they would form a single 20-40GB virtual RAM like what 2x2080Ti with NVLink did under Linux.
paol
If you're just beginning a 3080 is overkill, never mind two. Get a used 1080ti (or even 2080ti if not overpriced), it'll be cheaper and even has a bit more RAM.
kevingadd
I think for high throughput scenarios the 3090 probably has more headroom due to its higher TDP and better (larger) cooling solution, which might really matter here if you're driving the tensor cores at max the whole time.
wnevets
Most video games probably aren't going to make the most of all of the extra CUDA cores on the 3090. I'm assuming that helps alot with machine learning, can someone who knows for sure confirm?
Jhsto
Most parallel processing scales linearly with core count. But the 3090 is more interesting for machine learning because of its RAM, which it has 24GB against 3080's 10GB. With machine learning, you spent most of the time copying memory between the CPU and GPU, so being able to fit more data to it reduces computation latency.
option
“ With machine learning, you spent most of the time copying memory between the CPU and GPU”
- this is a sign that you are most likely doing it wrong. Yes, some operations are inherently bandwidth bound, but most important ones such as larger matrix multiplies (transformers) and convolutions are compute bound.
p1esk
There will be 20GB version of 3080 soon.
oivey
There’s been rumors of this, but it’s not confirmed. A big percentage of the cost of the 3090 (and 3080) is the GDDR6X. A 3080 with 20GB of GDDR6X will still be really expensive, so it’s unclear to me that they will actually release something like that. Potentially they could put that much RAM on a card and then use slower GDDR6, but that’s kind of an odd part in Nvidia’s product offerings because then the 3080 with more RAM would be slower in a lot of situations.
bitL
There are rumors about 48GB 3090 with older GDDR6 chips so having the same config for 3080 wouldn't be unexpected, at least until Micron could produce 2GB GDDR6X chips.
duhi88
what would be the benefit there? 20GB can't be that much cheaper than 24GB, right?
Sohcahtoa82
It will be for the gamers who think that 10 GB isn't enough VRAM, and as a way for nVidia to have an answer for rumors that AMD's next GPU will have 16 GB.
nullifidian
2x 20gb 3080 for $1000 each is more cost effective than 2x 3090 for $1500 each, if your model fits into 20 gb.
p1esk
My guess is it’s going to be $999
hhhhhuu
A 3080 with 20gb is planned already
YetAnotherNick
Very likely not with the same price as 3080. I am guessing it won't be much cheaper than 3090 as vram is expensive.
BookPage
Honestly I had an RTX Titan for home use for a while. Eventually I moved to just using a 2080 Super and it performed at nearly the same power for my models. If you don't need ALL the extra memory and have the space for a triple slot then the better value proposition by far for last gen seemed to be a good super.
arijun
See also Tim Detter's fantastic post on GPU performance (which doesn't use benchmarks for the latest cards but instead calculates performance with a model):
https://timdettmers.com/2020/09/07/which-gpu-for-deep-learni...
HN Discussion:
liuliu
Seems to be good speedup overall relative to 2080 Ti (including FP16: see relatives 2080 Ti v Titan: https://lambdalabs.com/blog/2080-ti-deep-learning-benchmarks...). This suggests we should see another Titan card that is even more expensive in the pipeline given the FP16 performance? Or maybe TF32 performance is going to be what NVIDIA promotes in this generation (only if they have better number than FP16?)?
jjoonathan
Here's hoping for an A100 titan with un-nerfed FP64. The 3090 is twice as nerfed as previous generations, which were also bad at 1:32. Now it's 1:64 :(
nullifidian
It seems that Radeon VII and Titan V are the last cards with decent fp64 performance for foreseeable future. Both Nvidia and AMD now basically have different architectures for their consumer and data-center products.
jjoonathan
Yep, sure looks that way. I'll still be dreaming of a Titan A!
Dylan16807
The FP64 units are a separate addition that eat a lot of die space, right? I wouldn't use the word "nerf" for the tradeoff between having more SMs versus having more features in the SMs.
jjoonathan
They eat die space but not TDP.
bryan0
Can someone explain the difference between fp16 and fp32 in these benchmarks because the difference is pretty dramatic. I assume it's floating point precision(?) but why would lower precision be slower relatively on the 3090? For training jobs how does the precision impact accuracy of the model?
Edit: clarified that I am referring to slower relative performance
bufo
Nvidia nerfed at the software level the FP16 performance to disincentivize people from using this card as a TITAN / datacenter ML card replacement.
my123
It isn't at the software level, FP16 goes through the tensor cores on Turing onwards: https://www.anandtech.com/show/13973/nvidia-gtx-1660-ti-revi...)
bufo
See this thread https://twitter.com/wightmanr/status/1309583916362117120
dogma1138
The ALUs are capable of half precision regardless of the tensor cores and aren’t restricted.
For “tensor ops” in GeForce cards FP16 with FP32 accumulate is done at half rate so you don’t get double the performance which you do get in Quadro and Titan cards using the same die.
Der_Einzige
Fp16 is faster in this article on most models...
bufo
That's because of the improved memory bandwidth. See https://timdettmers.com/2020/09/07/which-gpu-for-deep-learni...
undefined
smallnamespace
FP16 is faster (units are images per second)
bitL
3090 opted for bundling 2x FP32 units Bulldozer-style and now FP16 is processed by those cores as well, so FP16 and FP32 have the same performance (35.58TFlops).
https://www.techpowerup.com/gpu-specs/geforce-rtx-3090.c3622
riku_iki
> FP16 is faster (units are images per second)
But does model get quality hit: need to train for more steps before converging to the similar performance and have more parameters?
FM16 obviously contains less information than FP32.
bryan0
Sorry I was referring to the relative performance, I edited my question to be clearer
danbr
I just want to know how they installed the new nvidia cuda drivers without borking their Ubuntu/tf install.
rkwasny
Hi, we just installed everything using nvidia repo and .deb packages.
+ tf-nightly and other python libraries installed through pipenv
paol
Nvidia has official ppa's with all versions of CUDA, libcudnn and drivers. If you install from there you will not have problems.
It helps keeping to Ubuntu LTS versions though, that's what they support best.
opless
NVidia drivers b0rked rebooting my box for a long time.
A couple of months ago, I removed all the references in apt sources, and followed the newer instructions (several times to get the right driver/cuda/tensorflow match) and my reboots are great, and only one GPU lock up so far (probably due to overheating - I've had to replace a couple of components flag as failed due to the heatwave in summer)
Jupyter hub is just great, I'd like to implement better diagnostics though ... have yet to find a good tutorial for that as yet.
fareesh
If I have a really remote location and I need to do on-premises inference, am I better off buying one of the gaming GPUs or are they far behind the T4, etc.?
motorcitycobra
I thought I read Nvidia was nerfing the GeForce cards. Does this disprove it?
fomine3
NVIDIA nerfs FP64 performance on consumer GeForce for recent years. It's critical for scientific calculations but not needed for ML. Alternatively they banned to run GeForce on datacenter.
bitL
No, 3090 has nerfed tensor cores and in some apps Titan RTX is 5x faster (Siemens NX). FP32 accumulate is at 0.5x like with 2080Ti, while Titan's is at 1x.
undefined
Get the top HN stories in your inbox every day.
That second table is aa good example of why always including units (or even just a "higher is better") is a good idea... I have no clue what I'm looking at.
Edit: It's been edited, thx Evolution :) (or I totally glossed over it the first time around... but I don't think so)