Finally, thanks for the clear cut answer. I don’t have any experience with training on AMD but the errors from nvidia are usually very obscure.
As for using gpus other than nvidia, there’s a slew of problems. Mostly that on cloud where most of the projects are deployed, our options seem either limited to nvidia gpus, or cloud tpus.
Each AI experiment can cost usually in thousands of dollars and use a cluster of GPUs. We have built and modified our system for fully utilizing such an environment. I can’t even imagine shifting to Amd gpus at this point. The amount of work involved and the red tape shudder
Oh yeah, the equation completely changes for the cloud. I’m only familiar with local usage where you can’t easily scale out of your resource constraints (and into budgetary ones). It’s certainly easier to pivot to a different vendor/ecosystem locally.
By the way, AMD does have one additional edge locally: They tend to put more RAM into consumer GPUs at a comparable price point – for example, the 7900 XTX competes with the 4080 on price but has as much memory as a 4090. In systems with one or few GPUs (like a hobbyist mixed-use machine) those few extra gigabytes can make a real difference. Of course this leads to a trade-off between Nvidia’s superior speed and AMD’s superior capacity.
Finally, thanks for the clear cut answer. I don’t have any experience with training on AMD but the errors from nvidia are usually very obscure.
As for using gpus other than nvidia, there’s a slew of problems. Mostly that on cloud where most of the projects are deployed, our options seem either limited to nvidia gpus, or cloud tpus.
Each AI experiment can cost usually in thousands of dollars and use a cluster of GPUs. We have built and modified our system for fully utilizing such an environment. I can’t even imagine shifting to Amd gpus at this point. The amount of work involved and the red tape shudder
Oh yeah, the equation completely changes for the cloud. I’m only familiar with local usage where you can’t easily scale out of your resource constraints (and into budgetary ones). It’s certainly easier to pivot to a different vendor/ecosystem locally.
By the way, AMD does have one additional edge locally: They tend to put more RAM into consumer GPUs at a comparable price point – for example, the 7900 XTX competes with the 4080 on price but has as much memory as a 4090. In systems with one or few GPUs (like a hobbyist mixed-use machine) those few extra gigabytes can make a real difference. Of course this leads to a trade-off between Nvidia’s superior speed and AMD’s superior capacity.