Over the past few years, Visual Effects (VFX) studios have gone more global by opening satellite studios or hiring artists working from home all around the world.
Whether the reasons are location specific financial incentives, unlocking local talent, or co-locating with a production team, having a render farm in a single location is no longer ideal for a workforce that spans the globe.
AWS Thinkbox Deadline is a render manager which provides a wide range of compute management options allowing customers to easily access any combination of on-premises, hybrid or cloud-based resources for rendering. Deadline ships with multiple plugins that allow customers to customize it for their specific workflows. The Spot Event Plugin (SEP) manages and automatically scales a cloud-based render farm based on the number of tasks in the render queue. The plugin can dynamically launch Amazon EC2 Spot Instances as needed, then shut them down when they have been idle for a certain period.
The most recent release of the SEP unlocked the power of multi-Region rendering, allowing a single Deadline repository to launch render nodes in any of the 32 (and growing) AWS Regions. With multi-Region, customers can operate a follow-the-sun workflow, optimize their costs by taking advantage of Region-specific pricing, and meet their sustainability goals by sending render workloads to Regions which use more renewable energy.
However, distributed rendering comes with its own challenges. The data used by the render node needs to be available in the Region the rendering is taking place. Coordinating data movement across multiple locations can be a complicated task. A global file system can help address these challenges.
Today, AWS is excited to announce a new, open-source Hammerspace event plugin for Deadline that helps customers use the Hammerspace global file system with Deadline multi-Region rendering to solve this data movement problem.
What is a global file system?
A global file system (GFS) presents a single global namespace to multiple locations. This means that a client mounting the file system at any of the locations will see the same file and folder structure as clients in any other location. The benefit of using a GFS over syncing the data to multiple locations is that a GFS can be more efficient with the amount of data it transfers.
A GFS works by keeping a file’s metadata, such as its creation date, size, name and location, separate from the main payload of the file content (sometimes referred to as the essence data). The metadata is continuously synced between locations, while the payload is only transferred when required. Since the metadata for even large files takes up only a couple of kilobytes, using this system in production ends up requiring less data transfer than a full syncing system would.
The GFS abstracts these concepts away from the user. From an application’s point of view, the file system behaves the same as a local file system would. The end result is that the workflows built on top of local file systems will not need any modifications to work with a GFS.
Different global file system solutions have different mechanisms for syncing the metadata and payload. In this post we will look at how Hammerspace handles this synchronization.
Hammerspace global file system
Hammerspace uses two types of nodes for its synchronization:
- Anvil Node: responsible for the metadata
- Data Services Node (DSX): responsible for the payloads
Each location will need at least one Anvil and one DSX node, however multiple instances of each node is recommended for high availability. Each of these nodes can be run on EC2 instances in your cloud environment.
The Anvil nodes communicate with each other directly, but the DSX nodes use an object store like Amazon Simple Storage Service (S3) as a staging area. When payloads need to be synced, one DSX node will upload the data to S3 and the DSX at the other side will download it to its site.
By default, Hammerspace only syncs the metadata automatically. The payloads are kept on the site where they are created. When the file is requested at another site, the DSX nodes coordinate the payload transfer.
Hammerspace offers the ability to set up rule-based payload syncing. If you know certain files will be accessed from certain locations, such as review material for dailies, you can use the directives feature in Hammerspace and create a rule which automatically pushes the review materials to the various locations ahead of time, making sure that the file payloads are already present when dailies start.
Like any distributed system, there are certain situations that require extra care. When rendering into a global file system, we need to pay attention when tasks from a single job are rendered on different sites.
Race conditions inherent to distributed rendering
Different render tasks don’t typically write to the same file, but writing to the same folder is common. If these folders are created at the same time on different sites, then at sync time there needs to be a mechanism to merge the metadata. Different merging strategies will lead to different outcomes and it is important to understand the differences.
How Hammerspace handles folder collisions
When Hammerspace detects a folder collision, it presents the folder as multiple folders with the site name appended to the folder names.
How the Hammerspace plugin for Deadline works
The event plugin for Deadline works by injecting a new Deadline job that the original Deadline job becomes dependent on. This new job creates the output directories at a single location, then polls Hammerspace until the directories are safely synced across all render locations, finally releasing the original job when it’s safe to do so. This centralizes the folder creation and prevents the race condition.
Here is how it looks in Deadline Monitor:
Setting up the event plugin
In order for Deadline to find the plugin, download the code from here and copy it to the custom events folder in your Deadline repository. The final directory should be:
<Deadline Repository Path>/custom/events/Hammerspace
Once the plugin is copied to that location, it should now show up in Deadline Monitor. First make sure you are in Super User Mode:
Then go to Tools/Configure Events:
If the Hammerspace plugin was copied to the right location in the Deadline repository (see above), it should show up in the Configure Event Plugins window, and be ready to be configured. If you do not see the plugin in the window but have confirmed the plugin was copied to the right directory, try synchronizing the plugins using the Deadline Monitor Tools/“Synchronize Monitor Scripts and Plugins” menu item.
Configuring the plugin
The Job Options section of the configuration allows you to set scheduling parameters for the “Create Directories” job that the plugin creates. This allows you to control the placement and priority of the directory creation job on your farm, perhaps sending it to a Deadline group which is specifically configured and sized for quick, small jobs such as this.
The Hammerspace Options section lets you fine tune the plugin’s behavior by controlling how many times the plugin will poll for the directory status across sites (Iterations) and how long to wait between polling (Iteration Count, in seconds). Once the full iteration count has been reached, the render will proceed regardless of the sync status of the directories.
For more details on how to set up and configure the plugin, please refer to the plugin’s README.
In this post, we detailed how studios using the Hammerspace global file system, can use this new open-source event plugin in a Deadline based render farm to solve this data movement problem and enable multi-site rendering in either an on-premises, hybrid cloud, or all-in AWS architecture.
If you want to flag potential issues or suggest additional features with the Hammerspace plugin within AWS Thinkbox Deadline, please reach out to Thinkbox Support via email: firstname.lastname@example.org or feel free to contribute to this open-source project on GitHub.
Information surrounding Hammerspace Deadline event plugin can be found here:
Deadline documentation can be found here: