Unlock the Power of Unstructured Data with Dell EMC PowerScale
As organizations look to transform their businesses with AI and machine learning, a crucial first step is identifying the right storage solution for their unstructured data. This article explores Gartner’s highest-ranked network-attached storage system, Dell EMC PowerScale, and common use cases.
A massive amount of data — roughly 2.5 quintillion bytes — is produced every day. While it’s difficult to imagine such quantity, it’s not too surprising.
Consider your own data-generating habits: sending texts, conducting Google searches, posting on social media, placing online orders and so on. You probably produce more data in a day than you realize.
Not all data is the same. There are two main categories: structured and unstructured. Structured data has pre-defined data types which make it easy to search and analyze (e.g. fixed fields in an airline reservation). Unstructured data — essentially everything else — is qualitative data that does not have a specific model type. A few examples include photos, videos, audio, emails and Microsoft Office documents. Unstructured data accounts for as much as 90 percent of all data and continues to grow at a rate of 55 to 65 percent each year due to the rise of Internet of Things (IoT).
Many organizations are focused on finding ways to better access and leverage their unstructured data. Whereas analyzing structured data is as simple as running a report within a relational database, unstructured big data requires artificial intelligence (AI) tools to extract deep, meaningful insights and a high-performance storage solution to meet availability needs.
Why network-attached storage (NAS) for unstructured data
Network-attached storage (NAS) is file-level storage attached to a network that provides data access to a heterogenous group of clients. Unstructured data is a natural fit for NAS because it’s typically in file format. NAS can read and write files with speed and doesn’t require servers, making it ideal for storing unstructured big data with high-performance computing requirements. This not only enables collaboration among users but is crucial for “mission-critical” data that must remain readily available.
NAS also provides dedicated, centralized storage for unstructured data which allows organizations to maintain simplified management no matter how large their data environment becomes in the future. Organizations can seamlessly scale out capacity and performance, as needed, to prevent bottlenecks and improve overall storage performance.
Dell EMC PowerScale: Gartner’s highest-ranked NAS system
Dell EMC PowerScale is the industry’s No. 1 family of scale-out network-attached storage platforms for high-volume storage, backup and archiving of unstructured data. It is the market leader in Gartner's magic quadrant for Distributed File Systems and Object Storage (2019). PowerScale is available in three options: all flash, hybrid scale-out NAS and archive scale-out NAS. Our team at WWT recommends PowerScale for the following benefits.
Streamlined management and seamless scalability
Powered by Dell EMC’s OneFS operating system, PowerScale delivers a single-file system, single volume architecture that makes it easy for organizations to manage their data storage under one namespace. Organizations can seamlessly “scale out” with PowerScale by adding additional nodes — up to 252 nodes per system — in a matter of minutes without downtime or migration. Once a new node is added, PowerScale automatically rebalances data among the nodes and performs deduplication to deliver up to 80 percent utilization.
Gain cost efficiencies
PowerScale allows organizations to reduce costs by utilizing a policy-based approach for inactive data. Based on a threshold set by the organization, PowerScale automatically moves inactive data to more cost-effective storage.
OneFS integrates with several industry-standard protocols, including Hadoop Distributed File System (HDFS). Organizations can take a scale-out data lake approach in which their Hadoop data can be used across applications, eliminating the need to manually move data around to generate business analytics.
Increased data protection and security
OneFS includes FlexProtect, a data protection technology, that allows storage administrators to protect specific files with higher protection levels than others based on data sensitivity. OneFS also enables Data-at-Rest encryption (DARE) for tightened security against potential data loss.
Best use cases for PowerScale
Determining the right storage solution can be difficult. Organizations must consider several factors to identify the most efficient and cost-effective option. While every organization’s situation varies, our team finds that PowerScale is typically best suited for the following use cases.
Every organization has forms of file sharing data. This is essentially any text, program or directory data created by users, including Excel spreadsheets, PDFs, Word documents and PowerPoints. While individually these documents may not be very large, data can quickly add up with a large user base. PowerScale is a good option to reduce data management while ensuring users — employees, students, etc. — can quickly access what they need.
OneFS supports Isilon Swift, an object storage interface, which allows organizations to access file-based data stored on an PowerScale cluster as objects. An object typically includes data, metadata and a unique identifier. Examples of appropriate object data to store on PowerScale might include:
- Healthcare content: Picture archiving and communication system (PACS) imaging is ideal for PowerScale because healthcare workers must be able to quickly load images — x-rays, MRIs, CT scans, etc. — to evaluate and treat patients.
- Call recordings: Many retailers and financial institutions record customer support calls. Depending on the length of a call, these files can be massive. If a problem or complaint arises, organizations must be able to pull and review the support call in question.
- Research: As the most traditional form of unstructured data, this is typically the use case customers think of first. Scientific and academic research is a good fit for PowerScale because it produces large amounts of data, sometimes over the course of several months or years. PowerScale can easily scale to make room for new data being collected while remaining readily available to researchers.
- Social media: Platforms like Facebook, Twitter and Instagram have millions of users uploading pictures, videos and audio daily. Similar to research, PowerScale is a good storage option for these platforms because it can easily scale out to keep up with the increasing amount of data while allowing users to instantly access content.
- Media and entertainment: Creating full-production videos, movies and broadcast segments requires storing large amounts of footage — sometimes years’ worth. Storing this data in PowerScale enables ease of use and availability for the film crew and editing team to work simultaneously.
- Surveillance: Surveillance footage is only as valuable as a user’s ability to quickly access it. For instance, if a retailer experienced a burglary, it must be able to immediately view the surveillance footage during the time period of the crime. Other examples include video footage from police body cameras and interviews; security cameras for hospitals, government buildings and schools; aerial video footage; and in-car security system monitoring.
- Data analytics: Many organizations are conducting intensive customer analytics to drive business growth. This requires high-performance storage to effectively leverage this information in a timely manner. For example, some retailers are using facial recognition when customers enter their stores to deliver instant offers and discounts via their app.
Remote office/edge storage
Dell Technologies offers IsilonSD Edge, which is built specifically for supporting unstructured data needs at edge locations, such as remote or branch offices. Leveraging traditional x86 servers in a VMware environment, IsilonSD Edge can scale up to 36TB on six nodes to improve performance, reduce costs, reduce storage footprint and simplify data management.
Many organizations utilize public cloud storage to reduce costs and on-site footprint. PowerScale OneFS’ CloudPools feature allows tiering of data to public and private cloud storage. Organizations can implement PowerScale within public cloud to run a full-cloud, scale-out NAS, while still leveraging the functionality of PowerScale. This can be a cost-effective approach for organizations wanting to use public cloud services to conduct data analytics and AI.
Another option is creating a hybrid architecture in which PowerScale is implemented on-premise and data is backed up to the public cloud for disaster recovery (DR) purposes.
This is any data that an organization must store or maintain for future reference or compliance reasons. PowerScale is a good fit for archive data that must remain readily available because of its 80-percent utilization rate. For example, a healthcare organization might be required to store patient records to maintain HIPAA compliance or a law enforcement agency might store body camera footage to meet department policy.
Start leveraging your unstructured data
As organizations look to transform their businesses with AI and machine learning, there’s no doubt that unstructured data — and how to store it — will be at the center of that conversation. Leveraging our Advanced Technology Center (ATC), our team can demonstrate PowerScale’s vast capabilities and provide proofs of concept (POCs) to help determine if PowerScale is the right fit for your organization.