To test this feature, visit your live site.

Same node slower with CR than locally

I'm not having any luck speeding up renders with CR. In fact if I set load balancing to just use my local machine it takes a LOT longer to render compared to F12. E.g 00:24 seconds vs 05:09 on my latest attempt.

It seems to lag in 'Rendering' before it gets to 'Get tile'.

When trying to use multiple nodes I have no gains over a single machine. Even when tweaking load balancing or on animation sequences. The same render as above took 02:08 on my last attempt, splitting the load 0.43, 0.28, 0.28 - Again I'd get much fast renders with any one of my nodes alone.

They're well equipped machines:

Local:

Ryzen 3900x

RTX 3090

RTX 2080TI

64GB RAM

Nodes 1 & 2 Each:

Ryzen 3700x

2x 2080TIs

64GB RAM

What am I doing wrong?

5 comments

5 Comments

James Crowther

Dec 28, 2020

Hi again :), ok so lets break this into two issues, syncing and performance related by the looks of it.

Syncing - each node records information about its synchronisation and rendering, this data is logged and available for inspection in the Crowdrender panel. You can view this information by clicking the speech bubble icon next to a node.

If you like you could try saving the output for the nodes that wont sync and send them to us so we can take a look :).

Other things to look out for would be using different versions of blender and/or crowdrender, this tends to cause sync failures.

2. Performance, ok, those are some pretty decent rigs you have by the way. When it comes to balancing it might help to think that rendering is not just one operation, there are actually a few things that happen that are different with respsect to how the hardware is used and how best to optimise them.

Starting background process - Crowdrender starts a separate process to render your file. This means that blender, and all your addons must be loaded into memory on each computer. We've had some folk try some exotic things like host blender or their addons on a remote folder on their network. This does have a purpose, it means that all nodes will always use the same version of the addons and blender, which is a key requirement for things working. If you used this approach then the speed of your network is a key factor in the time each frame takes to render since we use a new process for each frame that is rendered. I am guessing this might not apply to you though, if you install blender locally on each node, then starting the background process is really fast and probably won't differ that much between your nodes. Unless you have lightening fast SSD's in some and old platter drives in others, that might make a difference.
Loading your project - Once the process for rendering starts, it immediately loads your project file. This phase of the render can be impacted by the size of your file and whether your nodes all have similar speeds for disk access/read. What may be a major factor is if you decide to share things like textures/models/sim cache from your local node. These files can be significant and since the local already has them it doesn't have to wait to transfer them across the network, so all the other nodes on your network will have a time delay until they even start rendering since they have to transfer the files from the local node first.
Synchronising the render kernel - The next stage after loading the project is when the render engine begins to process your scene and load its representation of your scene into memory prior to starting to draw pixels. This speed of this phase mainly depends on how fast your CPU and RAM are, and a little on the speed at which you can transfer data to your GPU (if you're using GPU rendering). Things that can impact this phase are not having enough system RAM to complete the sync phase. If your system gets short on system RAM it will start paging to the hard drive, using it as a much slower type of RAM so it can continue the calculations without crashing. It really hurts the performance though if this happens. You seem to have plenty of system RAM though, so this might not apply to you.
ACTUALLY RENDERING!!! - When the render engine starts drawing, this is mainly the speed of your render device now. This is the part of the rendering where you likely judge how fast each system is. Its not the whole story though. Especially for renders where the drawing phase might be short compared to the other parts. If that is the case, then the other phases of rendering might start to dominate the actual optimised values you should be using for each node.
Collecting the render tiles - Once the render finishes, the finished tiles of the frame are collected from each render node. The time taken for this process can vary since it depends on the speed of your network, the size of the tiles, and the compression codec used (you can change this codec, see below, the default is lossless but can result in tiles of over 100MB, the more layers and render passes you choose, the bigger these tiles get, the longer they take). Since the local doesn't need to do this part, it has a speed advantage. Choosing a lossy codec can reduce the advantage a bit.

In summary, optimising a render requires understanding the data flow and the hardware involved. Happy to discuss further if all of that didn't help. It was rather a lot looking back. But thats the nature of the beast :)

Pro

Dec 29, 2020

Replying to

Thank you for such a detailed reply! I made a handful of changes to my setup based on your response.

Swapped out for slightly faster SSDs I had lying around on the two 3700x nodes (I've ordered nvme M.2 SSDs to install later)

Checked XMP was enabled on all my machine's RAM (I actually missed one so lucky I checked)
Moved all of my models, textures, plugins onto a NAS so each node is referencing assets from the same location. I'm still setting up each Node to save the CR files locally. Would it make sense to have them share the same Network location?

Installed the latest CR version

I also ran plenty of animation frames to get the load balance adjusted and the last render saw some pretty remarkable gains. (Last Render on Local @ 03:43 compared to 01:11 with CR).

I'm still getting the CRMAIN.handle_disc_req error when I close my remote nodes but only seem to be getting sync failure on my first attempts. I'll save any logs if the previous problems comes back up.

{"time_logged": 1609240090, "time_stamp": "Tue Dec 29 11:08:10 2020", "message": "Info: node status updated: connected:", "level": 6}

{"time_logged": 1609240091, "time_stamp": "Tue Dec 29 11:08:11 2020", "message": "Info: node status updated: sync failed", "level": 6}

{"time_logged": 1609240091, "time_stamp": "Tue Dec 29 11:08:11 2020", "message": "Info: node status updated: repairing...", "level": 6}

{"time_logged": 1609240092, "time_stamp": "Tue Dec 29 11:08:12 2020", "message": "Info: node status updated: sync failed", "level": 6}

Thanks again!

James Crowther

Dec 29, 2020

Replying to

@Pro Right on man! Thats some wonderful news. Keen to hear more, I'm pretty passionate about helping guys like you squeeze every last drop of performance out of your rigs, so keen to help.

Sync errors, if you can share any log data from the nodes, that capture them, very interested to see it, see if we can iron it out. Those log entries, though they're showing a sync fail, they're not reporting why, so thats something I gotta figure out. If there's a part of your scene that is different, the data block and its properties that are different should be logged. So that one's on me. Glad its repairing it, and remember that the resync button can be used to force upload the scene again.

The CRMAIN.handle_disc_err_req error is interesting, a connect error suggests something had trouble contacting our servers. Any chance you can zip your logs from your client machine and send them to us? You probably want to do that via e-mail, info at crowdrender dot com dot au :)

Whatever you do, DONT setup CR to save anything to a network location as far as its cr folder is concerned, it will break your setup in strange and hard to debug ways! We've been through the pain of diagnosing this before. Those folders are meant to be unique to each node, so saving them on a network drive will cause havoc, and also slow down CR as you'll be double hopping a lot of data between your NAS, each node and your client node.

Let me know how things go, and if you decide to zip up those logs :)

Cheers!

Pro

Dec 28, 2020

I did some tests with and without CPU and it was faster with it selected on all my nodes. I've have had success speeding things up by deleting app data on my nodes and reinstalling the plugin, but syncing is temperamental, with it frequently failing. I'm getting this error when I close Blender on each of the nodes:

WARNING ; CRmain.handle_disc_req: ; got a response that wasn't in the correct format :connect_error

I'm still convinced that I'm not getting the most out of the farm. Currently I'm shaving 12 seconds off a 60 second render by balancing around 80, 10, 10. Admittedly, my local machine has an RTX 3090 and more cores, but I'd expect something closer to 60, 20,20, at the very least.

Are there any other potential bottleneck's I could look into?

James Crowther

Dec 28, 2020

Hi, ok, one thing that would be helpful to know is if you're using CUDA, OPTIX or CPU rendering?

The large difference on the local should not be occurring since crowdrender uses blender in background mode to render, which if anything is usually the same if not a little quicker.

One possibility is that if you are rendering with a GPU, you could have the CPU active as well. This tends to lag on some systems.

I've personally seen renders last a lot longer when using CPU, particularly if you use large render tile sizes. A CPU can get a tile that is highly detailed, and spend a long time finishing it, the GPU might have already finished and the CPU is where the delay is.

So I'd highly recommend turning off the CPU in Crowdrender's settings for each node if you experienced this lag trying to use the GPU. Let me know if that has the desired effect?

James