Hammerspace Introduces an All-New Storage Architecture | Read the Press Release

Guest Blog: Multi-Region Rendering with Deadline and Hammerspace

Over the past few years, Visual Effects (VFX) studios have gone more global by opening satellite studios or hiring artists working from home all around the world.

Whether the reasons are location specific financial incentives, unlocking local talent, or co-locating with a production team, having a render farm in a single location is no longer ideal for a workforce that spans the globe.

AWS Thinkbox Deadline is a render manager which provides a wide range of compute management options allowing customers to easily access any combination of on-premises, hybrid or cloud-based resources for rendering. Deadline ships with multiple plugins that allow customers to customize it for their specific workflows. The Spot Event Plugin (SEP) manages and automatically scales a cloud-based render farm based on the number of tasks in the render queue. The plugin can dynamically launch Amazon EC2 Spot Instances as needed, then shut them down when they have been idle for a certain period.

The most recent release of the SEP unlocked the power of multi-Region rendering, allowing a single Deadline repository to launch render nodes in any of the 32 (and growing) AWS Regions. With multi-Region, customers can operate a follow-the-sun workflow, optimize their costs by taking advantage of Region-specific pricing, and meet their sustainability goals by sending render workloads to Regions which use more renewable energy.

However, distributed rendering comes with its own challenges. The data used by the render node needs to be available in the Region the rendering is taking place. Coordinating data movement across multiple locations can be a complicated task. A global file system can help address these challenges.

Today, AWS is excited to announce a new, open-source Hammerspace event plugin for Deadline that helps customers use the Hammerspace global file system with Deadline multi-Region rendering to solve this data movement problem.

What is a global file system?

Figure 1: There are three locations shown on a world map, western US, eastern US, and Europe. At each location an overlay on the map shows a folder structure. Each location has the same folder structure, but different files are highlighted at different locations.

Figure 1: Global file systems make the same file and folder structure visible at all sites, but the file payload is only transferred when needed

A global file system (GFS) presents a single global namespace to multiple locations. This means that a client mounting the file system at any of the locations will see the same file and folder structure as clients in any other location. The benefit of using a GFS over syncing the data to multiple locations is that a GFS can be more efficient with the amount of data it transfers.

A GFS works by keeping a file’s metadata, such as its creation date, size, name and location, separate from the main payload of the file content (sometimes referred to as the essence data). The metadata is continuously synced between locations, while the payload is only transferred when required. Since the metadata for even large files takes up only a couple of kilobytes, using this system in production ends up requiring less data transfer than a full syncing system would.

The GFS abstracts these concepts away from the user. From an application’s point of view, the file system behaves the same as a local file system would. The end result is that the workflows built on top of local file systems will not need any modifications to work with a GFS.

Different global file system solutions have different mechanisms for syncing the metadata and payload. In this post we will look at how Hammerspace handles this synchronization.

Hammerspace global file system

Figure 2: Architecture diagram showing two Regions, each of which has a file system and two Hammerspace nodes. The Hammerspace DSX nodes are connect to an Amazon S3 bucket for payload transfer. The Hammerspace Anvil nodes are connected to each other for metadata transfer.

Figure 2: Hammerspace setup to sync file systems across two AWS Regions

Hammerspace uses two types of nodes for its synchronization:

  • Anvil Node: responsible for the metadata
  • Data Services Node (DSX): responsible for the payloads

Each location will need at least one Anvil and one DSX node, however multiple instances of each node is recommended for high availability. Each of these nodes can be run on EC2 instances in your cloud environment.

The Anvil nodes communicate with each other directly, but the DSX nodes use an object store like Amazon Simple Storage Service (S3) as a staging area. When payloads need to be synced, one DSX node will upload the data to S3 and the DSX at the other side will download it to its site.

By default, Hammerspace only syncs the metadata automatically. The payloads are kept on the site where they are created. When the file is requested at another site, the DSX nodes coordinate the payload transfer.

Hammerspace offers the ability to set up rule-based payload syncing. If you know certain files will be accessed from certain locations, such as review material for dailies, you can use the directives feature in Hammerspace and create a rule which automatically pushes the review materials to the various locations ahead of time, making sure that the file payloads are already present when dailies start.

Like any distributed system, there are certain situations that require extra care. When rendering into a global file system, we need to pay attention when tasks from a single job are rendered on different sites.

Race conditions inherent to distributed rendering

Different render tasks don’t typically write to the same file, but writing to the same folder is common. If these folders are created at the same time on different sites, then at sync time there needs to be a mechanism to merge the metadata. Different merging strategies will lead to different outcomes and it is important to understand the differences.

Figure 3: Two folder structures are shown labeled Oregon and London. In both locations the folder hierarchy is Movie, SEQA, A_001. In Oregon a file named 00001.exr is shown. In London the file is called 00002.exr

Figure 3: Different frames from the same job can be rendered in different Regions

How Hammerspace handles folder collisions

When Hammerspace detects a folder collision, it presents the folder as multiple folders with the site name appended to the folder names.

Figure 4: The same folder structure as figure 3 is shown. At each location, there is an extra folder with a similar structure but slightly different name containing the frame file from the other location. Instead of being named Movie, the top-level folders are called Movie[#S=10] in Oregon and Movie[#S=0] in London.

Figure 4: Hammerspace resolves folder collisions by separating them out by site IDWhile this can be useful in determining which site is responsible for creating which file, it does mean that the files are in different folders and the file paths that the studio’s pipeline tools and artist expect will not work correctly. The view that we would like to get to instead of the above is this:

Figure 5: The folder structure in Oregon and London are shown to be identical. The folder structure is Movie, SEQA, A_001. In both locations, the A_001 folder contains 2 files names 00001.exr and 00002.exr

Figure 5: The desired view of the file system after merging multiple sites

How the Hammerspace plugin for Deadline works

The event plugin for Deadline works by injecting a new Deadline job that the original Deadline job becomes dependent on. This new job creates the output directories at a single location, then polls Hammerspace until the directories are safely synced across all render locations, finally releasing the original job when it’s safe to do so. This centralizes the folder creation and prevents the race condition.

Figure 6: The flow diagram starts with ‘Create folders at one site’. The next block is a decision ‘Have folders been synced to all other sites’. If yes, the flow moves to ‘Release render job’. If no, the next decision is ‘Has maximum iteration count been reached?’. If yes, go to ‘Release render job’. If no, go to ‘Wait for some time’ and then back to ‘Have folders been synced to all other sites?'

Figure 6: The flow diagram of how the Hammerspace plugin for AWS Thinkbox Deadline works

Here is how it looks in Deadline Monitor:

Figure 7: On the left, a single Deadline job is shows in the Deadline Monitor window. It is called ‘Render Job’. On the right, a batch job also named ‘Render Job’ is shown. It contains two jobs called ‘Render Job [Make Directories]’ and ‘Render Job’

Figure 7: The same Deadline job submission as it looks in Deadline Monitor with and without the Hammerspace plugin enabled

Setting up the event plugin

In order for Deadline to find the plugin, download the code from here and copy it to the custom events folder in your Deadline repository. The final directory should be:

<Deadline Repository Path>/custom/events/Hammerspace

Once the plugin is copied to that location, it should now show up in Deadline Monitor. First make sure you are in Super User Mode:

Figure 8: The user is clicking on ‘Super User Mode’ in Deadline Monitor’s ‘Tools’ menu

Figure 8: Enable Super User Mode

Then go to Tools/Configure Events:

Figure 9: The user is clicking on ‘Configure Events…’ in Deadline Monitor’s ‘Tools’ menu

Figure 9: Choose “Configure Events…”

If the Hammerspace plugin was copied to the right location in the Deadline repository (see above), it should show up in the Configure Event Plugins window, and be ready to be configured. If you do not see the plugin in the window but have confirmed the plugin was copied to the right directory, try synchronizing the plugins using the Deadline Monitor Tools/“Synchronize Monitor Scripts and Plugins” menu item.

Figure 10: Deadline Monitor’s ‘Configure Event Plugins’ window is shown with ‘Hammerspace’ selected from a list of plugins. On the right, there are options to enable the plugin, place the plugin in various pool, group, limit, and priority configurations, and to set the iteration delay and iteration count for the plugin.

Figure 10: Hammerspace event plugin configuration options

Configuring the plugin

The Job Options section of the configuration allows you to set scheduling parameters for the “Create Directories” job that the plugin creates. This allows you to control the placement and priority of the directory creation job on your farm, perhaps sending it to a Deadline group which is specifically configured and sized for quick, small jobs such as this.

The Hammerspace Options section lets you fine tune the plugin’s behavior by controlling how many times the plugin will poll for the directory status across sites (Iterations) and how long to wait between polling (Iteration Count, in seconds). Once the full iteration count has been reached, the render will proceed regardless of the sync status of the directories.

For more details on how to set up and configure the plugin, please refer to the plugin’s README.

Summary

In this post, we detailed how studios using the Hammerspace global file system, can use this new open-source event plugin in a Deadline based render farm to solve this data movement problem and enable multi-site rendering in either an on-premises, hybrid cloud, or all-in AWS architecture.

If you want to flag potential issues or suggest additional features with the Hammerspace plugin within AWS Thinkbox Deadline, please reach out to Thinkbox Support via email: support@awsthinkbox.zendesk.com or feel free to contribute to this open-source project on GitHub.

Further Reading

Information surrounding Hammerspace Deadline event plugin can be found here:

Hammerspace Deadline plugin

Deadline documentation can be found here:

Deadline Documentation

This blog was originally published on the AWS for M&E blog on Aug. 8, 2023 and was authored by Ehsan Shokrgozar, DJ Rahming, and Mike Owen of AWS.