Proposal for an alternate LoadBalancer Algorithm for Crowdrender based on a multicolored checkerboard
Introduction
Crowderender is an addon for Blender 2.8 that distributes the render process across multiple computers by assigning areas of the render image ( tiles ) to each computer. The goal is to speedup the render process in total. The assumption is that the render load can be distributed ( efficiently ) and be faster in sum than on a single computer.
The distirubted process involves several steps and latencies ( bottlenecks ):
* The Load Balancer uploads a blender file to each computer – this adds latency to the overall process due to networkbandwidth and latency
* The Load Balacner coordiantes tiles for every computer – every computer get's some tiles assigned to render based on a certain metric or assumption
* Upon finishing the task each computer uploads the rendered tile to the Load Balancer
* The Load Balancer composes the individual tiles to form the final rendered imaged
Implications
It turns out that the LB has no good way of determining or guessing how to distributed the tiles and is therefore inefficient.
A heuristic approach would not improve the overall efficiency as we need to consider the time needed for training the LB.
Also each scene and each tile have different complexities and efficiencies on the computers cpu/gpu.
In short we are dealing with a typical case of a bounded touring complexity problem.
Current example
Assuming we have 2 or more computers involved in the distributed rendering.
Each computer would get roughly 50% of the image. These 2 workloads would finish at very different times and so 1 of the 2 nodes will be idling and the overall rendering time will be worse than on a single computer.
Proposal
Use a multicolored checkers board algorithm where every tile gets potentially dispatched to a different computer.
Split into many tiles ( 4, 8, 16 …. ) and dispatch each tile to an idle computer.
After a tile has been dispatched, rendered and sent back the LB just sends the next tile in line to the any idle node. After all tiles have ben sent back the LB does the final composition.
Efficiency
The LB would not need to determince performance characteristics for the tile distribution. Network latencies and cpu/gpu characteristics would average out during the course of the network rendering process.
Cheers
Hello, I am wondering if it wouldn't be easy for rendering animations (as a work around for network and security issues) to just implement the following proposed hack with G-drive or equivalent and - for instance - PyDrive2:
Blender Network Rendering (Hack) (English)
https://www.youtube.com/watch?v=i4RXzQgQlGE
PyDrive2 is a wrapper library of google-api-python-client that simplifies many common Google Drive API V2 tasks.
https://pypi.org/project/PyDrive2/
If you were able to run the other stations in 'headless' mode, it should be much faster, but it would require a 'free-standing' node that would be listening, and invoked by the Master station. Hoping your progress continues, as I am a little bit stuck at the moment.
Thanks for all this Martin, really great to see this contribution from you :D, I've replied to your other post about this, in short, your suggestions are not too different from what we're planning on building. I say planning because most of our activities right now are absorbed, somewhat frustratingly (for the community and us) with trying to get enough funding to support new development.
I won't go on and on about that, but we started a dev fund December last year which looked like it was going to grow real quick and help us fund a team of devs the size we need to deliver on a brand new system that would fix this and other issues, but then 2020 'happened' (good material for a meme I guess).
The funding growth is still there, but is real slow. So we're distracted right now with how to 'get there'. With 'there' being a sustainable project. So far we've raised enough to cover all costs of our operations, and then we have a teensy amount on top of that for development which is powering keeping the current system current as blender continues to evolve. It also helps with responding to issues people find with the current system. All this means new patch releases are happening right now, a new one will drop soon.
New stuff, is being worked on too, but its painfully slow. However, this is not your problem, its ours. We're the ones who need to find new funding avenues to build our stuff. And that is what we're spending a big chunk of our time on.