Data is increasingly the lifeblood of all public and private enterprises.
Despite this, only 10% of enterprise data (structured data) is easily processed and analyzed for the benefit of the business. This structured data is by its nature organized, with predefined formats and within database applications and environments specifically designed to derive insights from it.
The previous data cycle for enterprises revolved around managing business primarily through this structured data: business intelligence, ERP software, project work management, financial information, HR management software, and so on. Growth of structured data is relatively flat, at around 12% per year according to Gartner.
The problem is that, according to MIT, the remaining 90% of data is unstructured, or file data that is not organized in a predefined manner. Add to this the fact that it is also by far the most rapidly expanding segment of data growth, research firm ITC predicts that the volume of unstructured data is set to grow from 33 zettabytes in 2018 to 175 zettabytes by 2025 – that is 175 billion terabytes!
This explosion of unstructured data makes it difficult for enterprises to really understand what they have, but more importantly to find and extract the value held within their digital assets. Instead of an asset that fully contributes value to the enterprise, under-utilized unstructured data becomes a rapidly expanding cost center that can become a drag on businesses.
Not only is the amount of unstructured data growing exponentially, but so is the complexity and number of point solutions needed to manage, protect, and leverage this data. This is made even more difficult because the files are not in one centralized repository, but typically scattered across silos of otherwise incompatible storage types and locations, making global access and control of even a single enterprise’s data extremely difficult.
Artificial intelligence and machine learning engines, large compute clusters, and end-user applications are now distributed across edge devices, data centers, and different cloud vendor offerings as well, so matching these next-gen applications with all the distributed and siloed data they need is the primary technology challenge of our era.
This is precisely the problem space that Hammerspace’s Data Orchestration System solves.
The Next Data Cycle is Happening Now
With the rise of artificial intelligence (AI) and machine learning (ML) technologies, the enterprises and public sector organizations that can effectively bridge these multiple types of storage silos to exploit this growing volume of unstructured data will transform their businesses in the Next Data Cycle. It marks an evolution in how all data, especially unstructured data, is accessed, analyzed, and activated.
For example, companies like NVIDIA are revolutionizing AI, deep learning, and data analytics, with dramatic processing acceleration, achieving unprecedented performance with large data models in high-performance computing. At the other end of the spectrum, the growth of applications like Zoom reflect the new normal of access and collaboration for distributed workforces. But both the ability to process extreme volumes of unstructured data and a distributed model of global access are needed to effectively meet the current challenge. And these are at opposite ends of the extremes.
Traditionally, it is assumed that unstructured data needs to be collected into a central repository or “data lake” to enable the next steps of pre-processing, analysis, and use. Because this typically involves acquiring new storage devices and difficult data migrations, it often puts the hand-brake on data projects, adding cost and inertia, and creating what is often called “data gravity” that limits such initiatives.
But what if unstructured data could be accessed with standards-based file access globally across silos of multi-vendor storage types, on-prem, in the cloud, and across multiple locations? Effectively defying data gravity by creating a virtual data lake of unstructured data without the pain or added expense of migrating it all somewhere else?
This is how unstructured data orchestration can bring to the world of siloed file/object stores what structured data has enjoyed for years. In other words, the creation of an unstructured data orchestration system that seamlessly bridges siloed storage types from any vendor as well as across the cloud and multiple geographic locations. Doing so enables global access to unstructured data for AI and ML models as well as other applications, enabling them direct access to the data wherever it is today, and wherever the models are run, and without migrating data into a new repository or requiring proprietary clients to access it.
“The industry has long been focused on the complexity of storing and accessing massive quantities of unstructured data,” said Senior Strategist Randy Kerns of Futurum Group. “With the technologies of today and requirements for data, the urgent need in the latest data cycle is the orchestration of a data pipeline of unstructured data to analytics for AI and machine learning applications, enabling speed and agility for enterprises.”
Hammerspace is accelerating the Next Data Cycle by delivering a standards-based high-performance data orchestration system for unstructured data that is designed to provide the type of unified access and control for unstructured data that has been commonplace for years with structured data types.
Hammerspace technology makes unstructured data a global resource, so teams can access, activate and monetize their data wherever it is, bridging multi-vendor storage silos, clouds and locations, and without the need to migrate data into a new repository.It enables organizations to automate data orchestration without interrupting user access, to get the best utilization of their existing resources, leveraging standards-based non-proprietary file access to data that may be on any storage, cloud, and/or in one or more data centers anywhere. .
Today’s announcement of our first institutional round of funding of $56.7M accelerates the Next Data Cycle, and expands our ability to support our innovative customers and help companies find opportunities in their unstructured data.