Article • May 23, 2019 • 8 minute read

Nutanix Files Analytics At-A-Glance (Part I)

With the release of Nutanix Files 3.5, let's discusses its analytics features and areas of possible improvement.

Reviewing the analytics features of Nutanix Files version 3.5

Nutanix recently released a new version of Files (previously AFS) and WWT finally deployed the functionality in our Advanced Technology Center (ATC). I'm excited about the new features and where they might take the product.

There's a lot about Nutanix Files to unpack and I encourage reading up on it or reaching out to get more information. This blog does not cover all the old or new Files features. Instead, it focuses on the analytics portion of the platform, which is one of the biggest additions of the 3.5 release. As a matter of fact, as I'm writing this, it has become clear that I'll need to split this information into two blogs. Welcome to Part I.

Before I get going, I want to get a few things out of the way:

The analytics portion of Nutanix Files is still in early access (EA) at the time of writing. There's potential for significant changes as the product moves to general availability (GA).
Some of the statements in this blog series are my personal thoughts and future vision hopes for Files. I do not mean to imply these suggestions for improvement will be implemented by Nutanix.
This is not a comparison of Files vs. other products in the market.

Complexity, cost and motivation

I've historically been on the cautious side of recommending hyper-converged infrastructure (HCI) as a platform to host large file shares of any type. Understanding the data types (structured vs. unstructured) and file types (general office files vs. large files, such as medical images) that may not dedupe or compress well just hasn't been a good use case for HCI. As an end user, there were always better, simpler and cheaper options out there, including the typical "stay with what we have."

Putting this workload on HCI meant either using a Windows server to host the file share or running some virtual appliance as a VM to expose the functionality to end users (something like Dell's Virtual Unity). At scale, this just didn't make much sense. The complexity and costs introduced were not justified.

Another important factor for our customers is the question of motivation. Why move to a new platform? Will the pain of moving all their data be worth the additional bells and whistles, or will the value add be minimal?

Files has been a Nutanix feature for some time and I believe release 3.5 is a critical turning point. The new Files has the ability to compete with other products at scale, cost and simplicity. It also features some nice analytics to help end users manage everything, which should help motivate customer adoption.

As it evolves, I suspect the analytics aspect of Files will become more granular with even more useful correlated data for end users to manipulate. From talking with the Nutanix product management team, the analytics portion of Files is one of their top priorities and they're doubling down on ensuring its success.

Nutanix in the ATC

After reading this blog series, if you think you'd benefit from further testing in an ecosystem that resembles your environment or use case, we recommend booking some lab time in WWT's ATC. The ATC is a collaborative ecosystem to design, build, educate, demo and deploy innovative technology products and integrated architectural solutions for our customers, partners and employees. We have a variety of Nutanix gear running, including All-Flash systems, dedicated to customer proofs-of-concept. Reach out to your WWT account manager for more details on our lab services.

Deployment

In typical Nutanix fashion, the deployment of Files was quite easy. One click? No, but maybe 10. A few IP addresses, some DNS entries and 20 minutes later, I was creating shares and importing data. (Note that you'll want to make sure reverse lookup works.)

The analytics portion of Files is currently a separate deployment/VM that is not part of the default 3.5 deployment. I fully expect Nutanix to integrate this as an option during the initial deployment walkthrough as the product GAs.

If you didn't catch it, the analytics function in its current form requires a VM to be deployed. Like anything else, there's no such thing as a free lunch, and being able to crawl through a lot of metadata and aggregating it into a human readable format takes resources. The screenshot below shows the default resources required. This can be adjusted as your data grows, but it gives you an idea.

Analytics function default resources required view

Additionally, once deployed, clicking on the analytics link opens a new tab instead of being directly integrated in Prism. I attribute the current format to EA status and expect Nutanix will integrate this directly into Prism for GA.

After initial deployment, it does take a little bit to crawl through and scan the metadata. How long depends on how much data you have. Once that's done, the fun starts. Keep in mind that the info and screenshots below were generated in our labs and may not necessarily apply to your specific use case or "the real world," so to speak. But they should help get the idea across.

Capacity Trend

The first thing you'll see upon opening the analytics tab is a "Capacity Trend" graph. Personally, I would rename this graph — it's not really a capacity trend for the share or file server, but more of an overview of how much data has been added/removed from the file server. It doesn't really relate back to how big the share is or the expected estimate of when it will fill up based on historical data.

Regardless, this graph can help administrators visually detect anomalies if a bunch of data were added/deleted and then aid with subsequent investigation.

Capacity Trend graphs

Hidden within this graph is the ability to drill down a bit further. A mouse-over gives you basic information. But what isn't immediately apparent is that clicking on the bar gives you more details around the types of files being added. While this feature is useful, it'd be nice to get information on where within a share the data is actually being created/deleted (hint: this is coming). A rogue application creating or deleting a bunch of files, maybe creating IOPs issues, would be easy to point out using this view.

Capacity Trend details for April 28 - May 3

Another powerful feature not currently available would be the ability to drill down on a per share basis. In its current form, the capacity trend graph shows information for the entire file server with rigid timeframes (seven days, 30 days, one year). Having this type of information on a per share level might enable quicker time-to-resolution if an issue arises. The data is there, which means this feature is possible. It's just a matter of time.

File distribution by type

One of the main points of interest for me is the ability to see what kind of files are hosted on the shares and their proportion of total space. This can help an end user quickly identify where space can be freed up quickly. Have a bunch of .BAK files? If you're running out of space, this feature can tell you this probably isn't the best place to store them.

File distribution by type

Clicking "View Details" breaks the above screenshot down with more information. First, you can see what file extensions are included in the categories (e.g., "Others"). A simple mouse-over lists everything. You can also reveal how much of each category was added in the last seven days, 30 days and year. As mentioned above, these time frames are preset, but the functionality is there for further improvement as the product evolves.

Details of file distribution

The data displayed cannot be manipulated in its current form. However, adding the ability to search for and pinpoint the location of a specific file type within the share would be quite powerful. Don't want your end users storing MP3 files on the network (assuming MP3s are still a thing)? A report showing a customized search with specific file extensions, location and user would be nice to include as part of this feature. The data has already been gathered, it's just a matter of time before something like this is implemented (or should be).

For more information

That's it for Part I. WWT is here to help you determine the best HCI solution for your organization's needs. Access the latest hyper-converged labs, including many Nutanix solutions, by reaching out to your account manager or connecting with us in the comment section below.

Look for Part II where I'll address Data Age, Top Accessed Files, Top Active Users and permission denials.