Have you had a chance to listen to our recent webinar on Building a Global Data Environment for Decentralized Scientific Computing?
Scientific research organizations face a significant challenge as they try to accommodate a globally dispersed workforce, myriad edge devices, and applications that may be spread across multiple data centers and the cloud. If left unattended, this data and resource sprawl can slow innovation, reduce accuracy of results, and drive up costs.
Defining a Global Data Environment
In today’s global economy, data needs to be accessed from many places by many different users, each with different needs. The challenge is how to effectively bridge those users and applications with distributed storage resources.
This is the inspiration for Hammerspace software, which enables customers to create a Global Data Environment that unifies access and management of data that may be distributed on any storage platform from any vendor, across one or more locations, including the cloud. Hammerspace does this by bridging otherwise incompatible storage types and locations with a high performance parallel global file system. This means users are not only able to get direct file access to their data across storage silos without shuffling copies between locations, but applications and processes such as machine learning can also seamlessly access data globally where it lives today, as well as where it may need to move to tomorrow.
Historically, file systems are embedded at the infrastructure level within storage platforms. The problem is that when all or parts of the datasets need to migrate to other storage types or locations, new copies of the file and its metadata are created and sent to another often incompatible file system on a different vendor storage platform. The result is IT administrators must now wrangle multiple manual processes and point solutions to orchestrate file copies, and users now have to navigate multiple file systems via different mount-points, or shares, to find their files. This fragments the data collection into silos.
The impact of this is increased complexity, which also directly impacts user productivity and adds to IT operational expenses. This complexity gets compounded when organizations have remote access requirements, or data needs to be moved to the cloud. IT teams are continually struggling to get control of the resulting proliferation of copies, point solutions, and data protection strategies just to deal with this problem in siloed environments.
Hammerspace software fixes this issue by rapidly assimilating file system metadata without needing to move the data off of existing storage. By elevating the file system above the infrastructure layer into a high-performance Parallel Global File System that can span multiple vendor storage types and locations, Hammerspace enables users to access their files globally via standard file protocols exactly as they are used to regardless of where they are, where the data is today, or where it moves to tomorrow.
Moreover, administrators now have global control of all their storage resources from a single pane of glass, enabling them to automate a rich collection of data services across all files on any storage type or location at a file-granular level. These services include cross platform tiering, data placement, data protection, and many other services that may be automated with metadata-driven, objectives-based policies. This cross-platform control also reduces or eliminates the expense of siloed, single purpose gateways and other point solutions that try to address these issues.
This seamless global reach is critical for life science organizations conducting research in the field – sometimes in extremely remote locations without the infrastructure and connectivity found in traditional laboratories. Since metadata is lightweight, it enables users even in remote locations to access a global file system from anywhere without needing to shuffle copies of large datasets to their location.
Designing IT Equitably to Enable Research Anywhere
Laura Boykin Okalebo is a TED Senior Fellow and Senior Scientific Consultant at BioTeam, an IT consultancy working to close the gap between scientists and technology.
“One of the things I care most about is being an equity designer,” Boykin Okalebo explained. Generating massive amounts of data off the grid on edge devices like genomic sequencers is a real challenge for scientists operating in extreme and remote environments like sub-Saharan Africa. These machines don’t work well where electric power and internet connectivity is limited, if not entirely non-existent.
Boykin Okalebo noted that global accessibility and collaboration is the only way forward – and for IT teams to take on the role of an equity designer. “A hybrid solution for data sharing and collaboration is the only globally inclusive way forward at the moment,” she said. “All of us can do better designing things equitably for our colleagues around the world to solve the things we’re trying to solve. We can all do a better job of thinking more globally and designing things more equitably.”
What Boykin Okalebo described is an extreme example of a common issue many life sciences organizations face – managing and classifying massive amounts of data spread across devices, users, and locations. Over time, it’s inevitable that the data will need to be moved, often between incompatible storage types and locations. Silos often emerge, which results in fragmented access by users, especially when they are also distributed across multiple locations.
By elevating the metadata out of the infrastructure, Hammerspace enables all users to have direct file access, everywhere. From a user perspective, the data is there as it always has been, whether you’re a scientist examining crop disease in a remote African farm, or in a research lab at MIT.
Leveraging Hybrid Cloud Solutions to Accelerate Time to Results
Brad Winett is president of TrackIt, an Amazon Web Services Advanced Consulting Partner specializing in cloud management, consulting, and software development solutions. He explained there’s always been the problem of getting both the data and the compute to where it needs to be. This is where the cloud can play a key role in supporting a geographically dispersed or siloed infrastructure.
Working with different life sciences organizations, Winett said that it is increasingly common that they must manage lots of people distributed across many continents. Sure it’s easier to have your instruments, storage, and people all located in the same data center, but that’s simply not possible anymore. “Take advantage of resources and tools where you need them,” he said. “When you think about ways to share data, you need the cloud to accomplish it because you can’t put it on USBs or use flash drives, etc. to accomplish your goals.”
You can view the on-demand webinar here. For those interested in learning more about the Hammerspace solution and how you can create your own global data environment, we’ll be hosting a second webinar offering a deeper dive into our solution shortly.
Learn More
Contact us to set up a meeting and learn more about how Hammerspace can help you drive business value from your data and accomplish your digital transformation goals.