Lead Image © Ivan Trifonenko, 123RF.com

Lead Image © Ivan Trifonenko, 123RF.com

Where Is Your Big Data?


Article from ADMIN 42/2017
You'd think that massive amounts of data wouldn't have the opportunity to be elusive, but we know this isn't true from the sheer number of data breaches in the past couple of years.

You'd think that massive amounts of data wouldn't have the opportunity to be elusive, but we know this isn't true from the sheer number of data breaches in the past couple of years. Big data seems to be one of the greatest sources of pain for enterprises and online businesses alike. But where does all that data come from, where does it go, and why is it so hard to maintain? At first glance, the answers seem simple. Upon further inspection, the answers are still pretty simple.

Where does big data come from? This somewhat inappropriately configured question's answer is logfiles. Logfiles are by far the biggest culprits in big data generators. Every device on your network generates some type of logfile. Those logfiles either are kept on the local systems that produce them or they're sent to some type of log aggregator for further processing. Or not – meaning that someone might collect them but never bother parsing them. Preserving logfiles simply for posterity is a waste of bandwidth and disk space. If you collect logs, then you should parse, scrape, and process them for relevant and actionable information, including security breach data.

Where does the data go? The answer to this question shouldn't be much of a mystery because of how logfiles are saved or sent to another system for processing. Unfortunately, logfiles are often forgotten. Someone once called logfiles our digital exhaust. The moniker is accurate enough, because once we've jettisoned those logfiles, they're out of sight and out of mind. For a lot of us, their fate falls into the "good riddance" category. "No one looks at those stupid logfiles anyway" goes the swan song of many well-meaning but shortsighted system administrators. If you're not looking at your logfiles with some sort of aggregator and alerting system, then ignoring your big data is destined to become your biggest mistake – a mistake because you're missing security information, performance data, and user behaviors whose discovery will help you better maintain your systems and your security.

Why is big data so hard to maintain? To slightly change a quote from Douglas Adams' Hitchhiker's Guide to the Galaxy series about the size of space, big data is so hard to maintain because it's big. Really big. You just won't believe how vastly hugely mind-bogglingly big it is, until you try to archive it, retrieve it, or search through it. Maintain your big data, in this case logfiles, with a log aggregator. If you don't want to use a commercial solution like Splunk, you can take a chance on one of the many free aggregators, such as Loggly, which also offers commercial options.

If you've made it this far, you might get the idea that I think you should collect and use those logfiles for something more than an excuse to upgrade to a 10GbE network. You're correct. Those logfiles, once collected into terabytes of "digital exhaust" are actually digital gold for those who care to have a look inside them. And I know no one has the time or the patience to go plowing through even a few gigabytes of logs, but you can set up some automated scripts to capture interesting entries to send to an email distribution group or to an alert console that hopefully someone watches with interest.

Where is your big data? It's all around you. You're collecting it. You're backing it up. You're probably ignoring it. Don't ignore it. Ignoring it could cost your company a lot of money and its reputation. If you don't have a budget sufficient to purchase some decent tools, there's always those junior-level system administrators eager to learn what "real" system administrators do.

Ken Hess * ADMIN Senior Editor

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Manage logs with logrotate
    Take charge of your installation's logfiles with logrotate.
  • Detecting security threats with Apache Spot
    Security vulnerabilities often remain unknown when the data they reveal is buried in the depths of logfiles. Apache Spot uses big data and machine learning technologies to sniff out known and unknown IT security threats.
  • Real-time log inspection
    Teler is an intrusion detection and threat alert command-line tool that analyzes logs and identifies suspicious activity in real time.
  • Apache Storm
    We take you through the installation of a Storm cluster and discuss how to create your own topologies.
  • Security is Everyone's Problem
    I attended a security seminar a few weeks ago, and one of the slides read, "Security is not an IT problem." I laughed when I saw it and gave a smirk to our Security Manager at my new job.
comments powered by Disqus