3 Data Storage Paradigms That Have Changed in a Decentralized World 

Enterprises are increasingly decentralized 

Data, IT systems, and the workforce are spread across different geographies and need solutions to bridge the gap between where data is stored and the people, applications and computers who need to use it. 

These decentralization trends have led to a shift in how organizations are assessing data storage and data management solutions in three key areas. 

Paradigm 1: Scale Requirements 

In the past: Scale used to be measured by capacity in a single rack or number of nodes in a single namespace.   

Buyers and IT decision makers would parse through datasheets to determine the capacity of the SSDs and HDDs in a storage system, they would consider storage density per rack unit in an array, and take into consideration the number of nodes a storage cluster could scale to as they created their decision criteria.   

In today’s decentralized world: Scale needs to be measured by the variety of types of storage, the number of data centers, the number of regions, and the number of clouds. 

Scale is about more than just capacity management and data center space efficiency. Data creation and access must be global, so it is increasingly also about how to create a data environment which spans multiple types of storage, multiple data centers, multiple cloud regions, and even multiple cloud vendors.   

Decision criteria when selecting data management solutions now need to include how to manage data access and data storage efficiently and cost-effectively across distributed resources.  Buyers and IT decision makers need to consider which data center technologies (block, file, object) are supported as well as the variety of cloud offerings, number of regions supported, and which cloud services the data management solutions must be integrated.  And most importantly, they need to ensure that users and applications are able to act locally in a globally distributed infrastructure, and to do so in a way that does not burden IT administrators with manual processes and proprietary vendor lock-in, both of which hamper agility and add costs.  

Paradigm 2: Performance Requirements 

In the past: Performance was measured only by local IOPS and I/O streaming throughput.

Performance was focused on raw IOPS and I/O throughput between applications and local storage residing in close proximity to the datacenter. Extensive benchmarks were run, and systems are carefully optimized to increase these point-to-point metrics, including often adding specialized networking and proprietary client software. 

In today’s decentralized world: Performance needs to be measured on how fast a remote user or application can view a data set, how fast they can move required data to a new geography, and how to minimize the latency it is to enable them to effectively work with that data. 

Performance considerations now must also include how users can get their jobs done from remote work environments efficiently, especially when data must be distributed across multiple data silos, clouds, and a variety of vendors’ technologies.  

Data users and applications view and access all the data they have permissions for in real time, regardless of where they or the data are located. 

When data is identified that is needed for work, users need local access to just those files, seamlessly, and without having to shift duplicate copies of stale data everywhere. Doing so with minimal friction for users or complexity for IT staff is a measure of productivity for the valuable workforce resources. 

And lastly, the ability for distributed teams to collaborate as effectively as if they were local is the final new measure of performance success.  Synchronizing changes in a shared global data environment, which all designated resources have access to, ensures teams can design, analyze, and produce results collaboratively from wherever they are working. 

Paradigm 3: Data Service Requirements 

In the past: Data services were designed to work within a single datacenter and system.

Data services were typically built into propriety software in the vendor’s environment. Snapshots could only be restored to that same vendor’s environment. The more storage types needed in a datacenter resulted in more incompatible systems IT staff had to manage. Point solutions, third-party tiering software, the use of stubs, symbolic links, gateways or other proprietary techniques were needed to manage them all.  

Not only did this add significant complexity to IT staff who had to manage otherwise incompatible systems with these many separate tools. But replication of data across tiers or especially across locations resulted in the proliferation of multiple file copies, which wasted resources, and added risk and confusion.  

IT staff were faced with either going all in with a single storage vendor, with the hopes of having a single integrated system, even though this would cost more and meant limiting future storage choices. Or they would have to do their own integration of multiple vendor solutions, and then deal with the added OPEX and complexity to tie them all together.   

Either way it limited choices, added costs, and adversely affected end-user productivity. 

In today’s decentralized world: Data services need to be set once and apply globally across the distributed environment. 

IT teams are hungry for a solution to manage both their data and their storage choices on a global level.  They need to know what datasets the organization has, as well as to know where the files are, limit the proliferation of copies, control who has access to it, how it is being protected, and so on. They need to ensure that data is placed in the appropriate performance tiers that users and applications need, but to also ensure that data in higher performance tiers is not idle and thus wasting expensive resources.   

Users in siloed environments with distributed resources often spend too much time trying to find their files, or jumping between systems, to the point that some analysts reckon that the average knowledge worker spends about half of their time collecting their data into the appropriate workspace before even being able to become productive working with it.   

IT teams also must manage data sprawl.  When storage costs grow beyond budget, organizations become desperate for technology to manage copy proliferation and orphaned data getting lost somewhere in their storage systems.  They need a way to unify control with a data-centric approach, to seamlessly bridge incompatible point solutions, eliminate unnecessary copies and duplicated processes, so users anywhere can simply work on their data as though it were local, not waste time wrangling it.  

Data services that operate on a global level make it easy to set policies on all data while viewing the storage resources managed through a single pane of glass, regardless of the underlying storage platform. 

Data services that are agnostic to the storage solution and offer file granular policies, make it possible to apply data protection and disaster recovery processes across all storage types and locations globally. Snapshots can become global, and no longer just siloed. Deduplication, WORM, encryption, versioning, undelete, and other data recovery and ransomware mitigation strategies need to be applied globally based upon the data value; not piecemeal with different techniques, which may differ with what is supported depending on the underlying storage.    

Conclusion 

We are rapidly arriving in a reality where enterprises are not only having to reconcile the inefficiencies of manually bridging incompatible storage silos in a single datacenter, but increasingly also must find a way to cost-effectively and manage globally distributed data resources across multiple locations and a distributed workforce.  Organizations will need to alter how they assess their data and IT strategies in order to design forward for this decentralized world. 

Molly Presley, SVP of Marketing

Floyd Christofferson, VP of Product Marketing