Lead Image © Sergey Nivens, 123RF.com

Tool your HPC systems for data analytics

Get a Grip

Article from ADMIN 22/2014

By Jeff Layton

As data analytics workloads become more common, administrators need to assess their hardware, software, and processes.

I was very hesitant to use the phrase "Big Data" in the title, because it's somewhat ill defined (plus some HPC people would have given me no end of grief), so I chose to use the more generic "data analytics." I prefer this term because the basic definition refers to the process, not the size of the data or the "three Vs" [1] of Big Data: velocity, volume, and variety.

The definition I tend to favor is from TechTarget [2]: "Data Analytics is the science of examining raw data with the purpose of drawing conclusions about that information." It doesn't mention the amount of data, although the implication is that there has to be enough to be meaningful. It doesn't say anything about the velocity, or variety, or volume in the definition. It simply communicates the high-level process.

Another way to think of data analytics is the combination of two concepts: "data analysis" and "analytics." Data analysis [3] is very similar to data analytics, in that it is the process of massaging data with the goal of discovering useful information that can be used for suggesting conclusions and supporting decision making. Analytics [4], on the other hand, is the discovery and communication of meaningful patterns in data. Even though one could argue that analytics is really a subset of data analysis, I prefer to combine the two terms, so it gathers everything from collecting the data in raw form to examining the data with algorithms or mathematics (typically implying computations) to look for possible information. I'm sure some people will disagree with me, and that's perfectly fine. We're blind men trying to define something we can't easily see and isn't easy to define, even if you can see it.

...

Use Express-Checkout link below to read the full article (PDF).