Hammerspace
Resources
CASE STUDIES
DATA SHEETS
NEWSLETTERS
SOLUTION BRIEFS
USER GUIDES
TECH REFERENCE
WHITEPAPERS
VIDEOS
WEBINARS
PODCAST
Hammerspace
Case Studies
Hammerspace
Data Sheets
Sign up to receive monthly newsletters from Hammerspace.
Hammerspace
Technical Reference
Hammerspace
Solution Briefs
Hammerspace
User Guides
Hammerspace
Videos
Hammerspace
Webinars
Hammerspace
Whitepapers
Hammerspace
Podcasts
Episode 32: Overcoming the Challenges of Sharing Life Sciences Data w/ Chris Dwan
Welcome back to another informative episode of the Data Unchained podcast! This time, we are joined by freelance Advisor and Consultant to genomics...
Episode 31: Solving Problems in Healthcare Data Systems w/ Rami Rauch
Welcome back to another informative episode of the Data Unchained podcast! In this episode, we are joined by Rami Rauch, CTO of Applied Genomics....
Episode 30: Data Driving the Future of Genetics w/ Jean-Marc Holder
On this latest episode of the Data Unchained podcast, we talk with Jean-Marc Holder, Chief Operating Officer at SeqOne. Jean-Marc talks dices into...
Dictionary of Terms
Active Archive
- An active archive is the strategy for managing data that is too valuable for a company to discard or store in an offline state, but which is accessed only occasionally. It is the concept of ensuring that any data is available for users to access directly, without IT intervention.
- Active archive solutions include a wide range of technologies and vendors. Some are simply HSM solutions, which present a file system view to data via a NAS share, but move the file essence to tape or some other lower cost storage type. These often use file stubs or symbolic links, to redirect users or applications to where the data essence has moved to.
- Symbolic Links, and file stubs are highly proprietary, and tend to lock customers into the storage platforms that require them.
HSM (Hierarchical Storage Management)
- Hierarchical storage management (HSM) is a data storage technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as solid-state drive arrays, are more expensive (per byte stored) than slower devices, such as hard disk drives, optical discs and magnetic tape drives.
- HSM is a long-established concept, dating back to the beginnings of commercial data processing. HSM solutions typically use symbolic links, file stubs or other proprietary techniques to leave digital breadcrumbs in the expensive high performance storage types, while moving the file essence downstream to cheaper storage. This creates significant vendor lock, and limits data mobility.
- The remaining major HSM solutions such as DMF from HP, HPSS from IBM, Tivoli which is part of IBM Spectrum Scale, Stornext Storage Manager & Versity all employ such techniques.
- Traditional HSMs move data based upon the file age to determine if the file should be moved downstream.
- Hammerspace can perform HSM-like functions at a file granular basis without the need for proprietary symbolic links. In addition, it can leverage not only file age, but multiple other metadata types including user created metadata tags can, can perform file-level data orchestration with a level of intelligence that traditional HSMs simply cannot do.
- In this way, a Hammerspace Global Data Environment includes as part of its basic capabilities far greater granularity in the ability to place data on the appropriate storage of any type, without the downsides of proprietary symbolic links, stubs or limited metadata triggers.
Tiered Storage
- Storage tiering is the practice of allocating data to different storage mediums to match performance with cost, resulting in a lower overall total cost of ownership (TCO). Storage managers often manage three tiers of data. Tier 0 or 1 is the highest performance tier and holds data that is often needed immediately, such as transactional data, or for applications requiring low latency. These are typically flash storage or high performance NAS or SAN systems. Think of this as the important documents sitting atop your desk at work.
- Tier 2 is data that requires less immediate or frequent access than Tier 1, yet is still vital to stakeholders, such as last month’s financial data. These may be lower cost/performing NAS. Think of this data as the folders in the drawers of your desk.
- Tier 3 and 4 data is used as the archive. This data may need to be accessed occasionally, but must be protected and stored for long periods of time. Storage in this tier may be object storage, tape, optical media or other lower cost per TB and lower performing storage. Think of this data as the data in your filing cabinet.
- Active archive solutions, and HSMs are often spoken of synonymously with Tiered storage solutions.
- The problem in most data environments is that typically 80% of the data sitting on the most expensive tier 0 or tier 1 storage has not been accessed for 60 days. And according to multiple studies, if it has not been accessed in 60 days, it is unlikely that it will ever be accessed.
- But most storage environments have great difficulty in identifying which data can be moved downstream to less expensive tiers.
- And even fewer can do so without interrupting user access.
How big is a petabyte?
- Byte= 8 binary digits = one character
- Kilobyte = 1000 bytes = several word documents
- Megabyte = 1000 kilobytes = several music files
- Gigabyte = 1000 megabytes = several full-length HD films
- Terabyte = 1000 gigabytes = 100+ full-length HD films
- Petabyte = 1000 terabytes = 80 million filing cabinets full of text
- Exabyte = 1000 petabytes = All words every spoken by mankind (5 Exabytes)
- Zettabyte = 1000 exabytes = One million city block size data-centers full of terabyte hard drives
- Yottabyte = 1000 zettabytes = All information, images, video, and content ever created or posted on the entire world wide web
What is shared storage
- In a shared storage environment, multiple users or applications can simultaneously access data from a storage device. This means that two separate workstations could access files from a share (or folder) on a network drive.
- NAS and SAN solutions are the simplest implementation of Shared Storage. Often these solutions, however, are limited to a proprietary file system, and can result in multiple incompatible data silos. In such cases, users must use different file shares to access data across the different vendor storage types.
- Hammerspace bridges incompatible data silos with a parallel global file system, by which users are able to access shared storage across any storage type from any vendor, and indeed across multiple locations and clouds. This is a Global Data Environment.
Data Silos (also, Storage Silo, Information Silo)
- A data silo is typically a vendor-specific storage system in which one information system or subsystem is incapable of seamless sharing of information with others. Thus data is not adequately shared but rather remains sequestered within each system or subsystem, figuratively trapped within a container like grain is trapped within a silo: there may be much of it, and it may be stacked quite high and freely available within those limits, but it has no effect outside those limits.
- Such data silos are proving to be an obstacle for businesses, creating operational complexity for administrators who must manage each independently, often with vendor specific point solutions.
- Data or storage silos also inhibit user productivity, forcing them to look for their data across multiple different platforms and shares.
- NAS and SAN solutions are the simplest implementation of Shared Storage. Often these solutions, however, are limited to a proprietary file system, and can result in multiple incompatible data silos. In such cases, users must use different file shares to access data across the different vendor storage types.
- Hammerspace bridges incompatible data silos with a parallel global file system, by which users are able to access shared storage across any storage type from any vendor, and indeed across multiple locations and clouds. This is a Global Data Environment.
Unstructured Data
- Unstructured data refers to information and files that are not housed in a traditional database. Unstructured data accounts for 80% of enterprise data, and is the fastest growing data type.
- These are typically seen by users and applications as individual files. They could be anything from office documents, to genome sequences, video files, weather images, etc.
Structured Data (data model)
- Structured data is the type of data that is housed within a database, as opposed to loose files such might be accessible by users or applications in a file system.
- Information put into a CRM, or online bank portal, for example may include lots of data fragments. But users must go through that structured environment to access those data. As such, database environments such as Oracle, SAP, etc. tend to be end-to-end solutions that include all storage and compute resources.
- Structured data environments are not cost effective for managing large volumes of files such as described in Unstructured Data.
File copy management
- Copy data management describes the management of information other than what is stored in primary systems. Copy data can be used for data protection purposes, such as snapshots or backups, or to seed test and development environments for ongoing application development. Other uses include testing upgrades of existing applications and using the data for mining or analytics.
Structured Data (data model)
- Structured data is the type of data that is housed within a database, as opposed to loose files such might be accessible by users or applications in a file system.
- Information put into a CRM, or online bank portal, for example may include lots of data fragments. But users must go through that structured environment to access those data. As such, database environments such as Oracle, SAP, etc. tend to be end-to-end solutions that include all storage and compute resources.
- Structured data environments are not cost effective for managing large volumes of files such as described in Unstructured Data.
NAS
- Network-attached storage (NAS) is a file-level (as opposed to block-level storage) computer data storage server connected to a computer network providing data access to a heterogeneous group of clients.
- NAS is specialized for serving files either by its hardware, software, or configuration. It is often manufactured as a computer appliance – a purpose-built specialized computer.
- NAS systems are networked appliances that contain one or more storage drives, often arranged into logical, redundant storage containers or RAID.
- Network-attached storage removes the responsibility of file serving from other servers on the network. They typically provide access to files using network file sharing protocols such as NFS or SMB/CIFS.
NFS
- NFS is a file-system protocol that allows a client (such as a server or users) to access files from a storage device over a network. It’s a common way for data to be accessed by users and applications in Linux and VMware environments.
CIFS
- CIFS (Common Internet File System) is a standard connectivity protocol for computers/users to share files across networks. CIFS is viewed as a complement to existing internet application protocols like HTTP. CIFS, widely used in Windows environments. CIFS is a variant of SMB.
SMB
- Server Message Block (SMB) operates as an application-layer network protocol mainly used for providing shared access to files, printers, and serial ports and miscellaneous communications between nodes on a network. Most usage of SMB involves computers running Microsoft Windows or MacOS.
Global Namespace
- A Global Namespace is a heterogeneous, enterprise-wide abstraction of all file information, accessible via standard network protocols such as SMB, NFS, etc.
- A Global Namespace has the unique ability to aggregate disparate and remote network-based file systems, providing a consolidated view that can greatly reduce complexities of localized file management and administration.
- Hammerspace with its Global File System incorporates, but also goes beyond traditional Global Namespace capabilities, by bridging not only individual storage silos, but also being able to provide direct file system capabilities across multiple locations and clouds.
Global Data Environment
- A Global Data Environment (GDE) is defined as a solution to provide users and applications with the experience of ‘local’ access to data that may be stored across widely distributed storage types/locations, while at the same time providing global control for data services transparently across them all at the infrastructure level.
- This could include multiple otherwise incompatible storage silos in a data center, or perhaps across multiple data centers, and even may include one or more Cloud storage offerings.
- A GDE directly addresses the problem experienced by most organizations where users and applications need the experience of local, read/write access to all of their file data across all silos and locations based upon their permissions. And IT managers equally need the ability to manage all their storage resources globally, without interrupting access to users and applications, and without being overwhelmed with the complexity of silo-based point solutions for data services.
Parallel Global File System – (Metadata Control Plane)
- Hammerspace is unique in the industry with the ability to bridge any vendor storage type and location in a high-performance parallel global file system.
- Traditionally a file system is limited to an individual storage platform, and is a method and data structure that the operating system uses to control how data is stored and retrieved.
- Without a file system, data placed in a storage medium would be one large body of bytes with no way to tell where one piece of data stopped and the next began, or where any piece of data was located when it was time to retrieve it.
- By separating the data into pieces and giving each piece a name, the data is easily isolated and identified.
- Taking its name from the way a paper-based data management system is named, each group of data is called a “file.” The structure and logic rules used to manage the groups of data and their names is called a “file system.”
- The traditional file system structure is captured in its metadata, or information about the underlying file data that is housed on the storage device. This metadata on traditional storage platforms is housed in the individual storage device, or in a metadata controller, and is the window through which users and applications can access the files.
- The problem with traditional file systems is that they are siloed by default. All user or application action must access each individual storage device via its file system metadata.
- Any data movement or actions that involve services that must bridge across other storage locations require going through many such siloed views via the file system metadata on each device.
- This siloed architecture in traditional storage platforms is rooted in concepts designed decades ago when data volumes were much smaller and contained within a single array. This creates serious limitations for modern environments with data volumes exceed the capacity of a single system, and in many cases extend beyond a single site.
- Hammerspace overcomes these limitations by creating parallel global file system that can span not only multiple otherwise incompatible storage platforms from any vendor, but also can be extended to span multiple locations with a single, common metadata control plane or layer.
- So instead of the file system metadata being trapped in each individual storage device, Hammerspace globally assimilates the metadata from all devices, elevating it above the physical layer.
- Data does not need to move to do this, and remains on the storage platform as before. But when users or applications access those files, they are doing so now via this global file system layer that spans all of the storage silos and locations.
- Because metadata is lightweight, this ability to separate the metadata from the data path means that data movement can be orchestrated directly with the underlying storage to achieve linear scalable performance in the data path.
- This enables a level of parallelism across multiple storage resources that would be impossible with the metadata trapped at the individual storage platform level.
Universal Data Access Layer
- In traditional storage devices there are specific file access protocols that they support, such as NFS, SMB, S3, etc.
- Certain storage devices enable users to access via both NFS and SMB. But if an S3 object storage platform is added, for example, it often breaks the ability in a siloed environment to provide users with a common way to access all their data across all storage devices.
- Hammerspace solves this by providing a multi-protocol Universal Data Access Layer that can span any backend storage type.
- Even though the backend storage may only support NFS, SMB, or S3 protocols, users and applications can access all of them via the Global Files System via standard file protocols NFS, pNFS, SMB, individually and in mixed protocol mode.
- All data may be viewed by any protocol, regardless of the protocols supported by the underlying storage resources
Data Management
- Data management is a term that gets used in the industry for many things. It may refer to an HSM that automatically moves stale data downstream to tape.
- Or it may also refer to Data Lifecycle Management or document management systems.
- All of these are correct, but they are limited in their scope
- An HSM might manage data placement based upon file age, but has no capability to apply file-level granularity based upon other file attributes, or to bridge incompatible storage types or locations.
- A document management or DLM system may provide users with a catalog of the data, but typically has no ability to orchestrate data across storage platforms and locations, or manage data services.
- Hammerspace is unique in bridging both of these worlds with the ability to manage data by intent through service-level objectives down to a file-granular basis.
- This is also a powerful by-product of the parallel global file system, which enables file level management based upon filesystem metadata as well as user-created custom metadata. But then it also includes the capability to implement those data policies across any storage type or location.
Data Orchestration
- A powerful by-product of the parallel global file system is the ability to implement data movement policies across any storage type based upon service-level objectives.
- In this way, file system metadata and/or user-created custom metadata tags may be referenced to automate data movement to facilitate distributed workflows, or align with data protection or retention policy objectives, and so on.
- This can be done on a file-granular basis, and applied locally, globally, or whatever combination is required for the specific use case.
Data Protection
- Modern data protection requires mitigating multiple threats to data. Hammerspace provides multiple mechanisms to protect data from disasters and attacks through immutable snapshots, undelete, WORM (Write Once Read Many), and file versioning, which provide comprehensive, layered protection to ensure data availability.
- In addition, file copy management may be automated, to ensure global DR policies are applied across files living on any storage type and location.
- In this way, Hammerspace provides a rich palate of data protection capabilities that may be applied across all data on any storage type globally by IT administrators.
- No longer do IT staff need to wrangle multiple point solutions that are specific to a particular storage type.
Disaster Recovery
- Hammerspace reinvents the data management component of planning for disaster recovery.
- If a site or system goes down, no failover or failback procedure is needed as the same global file system metadata exists in multiple locations providing continuous online access.
- Users and applications can be automatically and transparently redirected to the alternative location and quickly continue operations without needing to re-point applications, or mount different shares.
- In the event of an outage at one site, the global file system will still be available to users and applications to view, and data is still accessible as long as there is an instantiation of the data in an alternate location or DR site.
Cross-platform Data Services
- Hammerspace provides file-granular global data services across all local and remote storage resources, leveraging its policy engine or on-demand capabilities.
- File-granular services give individual files, or sets of files, the ability to be managed by policies for any metadata attribute including file names, creation dates, modify times, file types, in addition to custom metadata tags.
- Hammerspace global data services enable companies to manage their digital business assets globally in ways that were previously impractical, or even impossible, due to price and performance challenges.
- Because these data services can be applied globally across all storage resources and locations, the implementation of global control via the Hammerspace GDE eliminates the need for IT organizations to manage multiple point solutions to migrate, protect, or perform other functions, as is typically the case in siloed environments today.
Distributed workforce
- The way businesses work today has profoundly changed, with many companies no longer requiring their employees to work from office locations. However, providing remote access for employees to have a unified view of all an organization’s network shares is extremely challenging, as data is typically stored in multiple data silos in legacy data center systems.
- Hammerspace makes network shares visible and accessible to anyone anywhere as though they were sitting next to local storage at the data center. This is done using its metadata-based global file system and invokes file-granular replication to move remote users’ files geographically closer to them when needed.
Cloud Tiering
- Cloud tiering extends the concept of Tiering beyond the local data center to include off site storage, such as public cloud.
- So instead of simply moving data between flash, NAS, object storage and/or tape in the datacenter, Cloud Tiering enables data to move by policy to public cloud.
- Some solutions do this with a gateway concept, where a point solution acts as a data mover to bridge on-premises data.
Burst for capacity
- This is a concept related to Cloud tiering, whereby a data center administrator may need to stage data or burst to the Cloud because local on-premises storage resources are filling up.
- Burst for capacity is typically a temporary staging to accommodate an influx of data that exceeds normal day-to-day data volumes.
Burst for Compute
- As with Burst for Capacity, Burst for Compute is strategy to deal with situations in which application workflows exceed the local compute resources in a datacenter. Typically this involves moving workloads to cloud-based compute environments for the life of a project, or until an expansion to on-premises infrastructure can be completed.
Hybrid Cloud NAS
- Hybrid Cloud NAS means the ability for a company to support file access across a distributed data environment that may include on-premises storage resources with cloud-based storage.
- Because Hammerspace can span both on-premises and cloud storage with a single parallel global file system, this means that IT organizations can provide the types of NAS capabilities previously only possible at a local level within single storage systems.
Multi-Cloud NAS
- As with Hybrid-Cloud NAS, Hammerspace can provide NAS-like file services to distributed environments where data may be distributed across multiple cloud regions, or even across multiple vendor cloud solutions.
Multi-Datacenter NAS
- As with Multi-Cloud NAS, Hammerspace can bridge incompatible storage silos within a single data center, but also across provide that same universal view and NAS-type access to data distributed across multiple data centers.