Community
Arnold GPU Forum
General discussions about GPU rendering with Arnold.
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

GPU Cache Population Crashing with 4 GPUs... fine with just 1

30 REPLIES 30
SOLVED
Reply
Message 1 of 31
Eden_Soto2
1692 Views, 30 Replies

GPU Cache Population Crashing with 4 GPUs... fine with just 1

4x 1080 Ti's... C4D R21 up-to-date, latest NVIDIA Studio Driver, latest C4DtoA plugin, here's the log...

gpu_cache_population_log.txt

I've deleted the cache in AppData\Local\NVIDIA\OptixCache\arnold-6.0.1.0_driver-441.66, but each time I try to run it with all four GPUs in the system, it fails (see screenshot below for final crash entry)... I delete the cache before each new attempt.

If I run it with just one GPU in the machine, it completes successfully... with all four in, it's a guaranteed crash.

For what it's worth, I can render with all 4 GPUs no problem with Redshift, so the GPUs themselves are fine.

Would love to know if anyone knows why this could be happening and/or how to get around it

5604-annotation-2020-01-07-154700.png

Tags (2)
Labels (2)
30 REPLIES 30
Message 2 of 31

We had multiple reports of similar issues, seems to be an issue with multiple GPUs. We're investigating. Thanks.

Message 3 of 31
Stephen.Blair
in reply to: Eden_Soto2

I'm going to restate the problem to make sure I have it right. Because the screenshot shows a crash during rendering.

But really the problem happens before the render, when the pre-population fails with this error:

[ERROR]     internal error during render permutation 11/40:

So, GPU Pre-Population failed, and then when you tried to render using that GPU cache, the render failed.


Can you delete everything in AppData\Local\NVIDIA\OptixCache, not just that one folder?

Then try to pre-populate again



// Stephen Blair
// Arnold Renderer Support
Message 4 of 31
Eden_Soto2
in reply to: Stephen.Blair

I emptied the whole AppData\Local\NVIDIA\OptixCache folder and ran the command again... it crashed again. New log attached.gpu-cache-population-log.txt

Message 5 of 31

Would love to get this resolved... I still haven't really been able to gauge speed differences between the GPUs and the CPU... I can get the IPR to sometimes work without crashing, but it will eventually crash C4D completely, sometimes even locking up the system requiring a hard reset

Message 6 of 31
thiago.ize
in reply to: Eden_Soto2

If you skip the cache prepopulation and just render a simple scene using 1, 2, and 4 GPUs, does it work? The cache prepopulation is just there as a convenience, but not using it is ok if you don't mind the first few times you render scenes it might take a few minutes before pixels start to appear.

Knowing the answer to this will also help us pinpoint where the problem might be coming from.

Message 7 of 31
thiago.ize
in reply to: Eden_Soto2

If your machine is locking up, that could maybe mean your power supply is unable to power your four GPUs and CPU. If you disable a GPU or two, does it now work? It's possible Arnold is putting more demands on all of your system which ends up tipping it over the power threshold.

Another possibility for a system appearing to lock up is that you ran out of CPU memory. If you run the task manager you might be able to see if that is happening.

Otherwise, system lockups can be indicative of a driver bug.

Message 8 of 31
Eden_Soto2
in reply to: thiago.ize

Just last week I did a 3:00 Redshift render with all 4 GPUs that took 18 hours... not a single hiccup, the GPUs are fine and I’m on the latest NVIDIA Studio Driver that hasn’t changed since then... Have a 2990WX Threadripper with 128GB of RAM and a 1600W power supply... there’s plenty there to handle it

Message 9 of 31
Eden_Soto2
in reply to: thiago.ize

The only way I can render with the GPU is if I just have one in there system... but check my reply to your other note below... the system is plenty capable, so it seems as though it’s something specific to Arnold rendering on the GPUs... CPU is perfectly fine

Message 10 of 31

I'm running into the same problem. With multiple GPUs (I'm running 4 RTX 2080ti's), they will use all the cards UNTIL you turn textures on. Disabling the Textures in the DEBUG window, renders fine. I've tried converting the textures to different formats (TGA/TIF/EXR/etc), with and without MIPs, etc. Nothing works, unless you want to ONLY run one card and see your textures in the render. Any update of when we might be able to see a fix?

Message 11 of 31
Eden_Soto2
in reply to: drinaldi

I never tried all four when disabling textures... I'll try that today and see if I get the same results

Message 12 of 31
thiago.ize
in reply to: Eden_Soto2

Thanks Eden and Dante for your detailed investigations! It looks like NVIDIA has been able to reproduce a driver bug where using 4 or more GPUs with textures will usually cause a crash. This bug could in theory also cause crashes when not using textures or using less than 4 GPUs, though we haven't yet been able to confirm how common this is. They're making good progress towards fixing it, so fingers crossed, it will hopefully get fixed in one of the upcoming driver releases.

In the meantime, you can render with 3 GPUs in order to drastically reduce the likelihood of a crash and we'll also investigate whether there are any workarounds we can do in Arnold.

Message 13 of 31
Eden_Soto2
in reply to: thiago.ize

Fantastic news! Looking forward to getting a working solution so I can do some thorough testing

Message 14 of 31
Anonymous
in reply to: Eden_Soto2

Why do you need gpu cache population? Is it render faster? Arnold 6 is very snappy without it...

Message 15 of 31
thiago.ize
in reply to: Anonymous

You don't need it. All it does is pre-compile some shaders into the gpu cache so that when the first few times you go to render after upgrading your Arnold or nvidia driver version, you don't need to wait a few minutes for the GPU shaders to be compiled since they are already in the cache. On the other hand, once you've rendered a scene, those shaders will now also be in the cache, so now there's now no need to pre-populate.

Personally, I don't usually bother to pre-populate.

Message 16 of 31
Anonymous
in reply to: thiago.ize

Yes, i think it is the old technology already... 🙂

Message 17 of 31
Eden_Soto2
in reply to: Anonymous

For me it’s been a way to test and determine why my machine crashes with more than one GPU in the system... that’s why I continue to run it... if the machine crashes in the IPR or during output render, I don’t have a way to figure out why without the log? At least I don’t know of any other way

Message 18 of 31
Eden_Soto2
in reply to: thiago.ize

FWIW, no combination of GPU selection makes Arnold GPU render stable, so selecting only three didn’t help for me... I’m going to physically remove one and see if that will do it... but like I’ve said in another reply, I can render with all four GPUs in Redshift till the cows come home and it never flinches, which is why I think the issue is unique to Arnold in my setup.

Message 19 of 31
thiago.ize
in reply to: Eden_Soto2

You can get logs from IPR and batch renders. Links to instructions for how to do this are on the right side of this page. Search for "Instructions for generating full verbosity log files are available for". Let us know if you need further help getting logs -- we LOVE logs, so happy to help with that!

We are getting several reports of crashes in the GPU cache pre-population, so it's possible that's triggering more aggressively a bug, possibly the driver bug mentioned earlier?

Message 20 of 31
thiago.ize
in reply to: Eden_Soto2

Arnold 6.0.1.1, released today, should fix the multi-gpu texture hangs. This won't help with the gpu cache pre-population issues.

Can't find what you're looking for? Ask the community or share your knowledge.

Post to forums  

Autodesk Design & Make Report