Lead Image © Ivan Trifonenko, 123RF.com

Lead Image © Ivan Trifonenko, 123RF.com

Where Is Your Big Data?


Article from ADMIN 42/2017
You'd think that massive amounts of data wouldn't have the opportunity to be elusive, but we know this isn't true from the sheer number of data breaches in the past couple of years.

You'd think that massive amounts of data wouldn't have the opportunity to be elusive, but we know this isn't true from the sheer number of data breaches in the past couple of years. Big data seems to be one of the greatest sources of pain for enterprises and online businesses alike. But where does all that data come from, where does it go, and why is it so hard to maintain? At first glance, the answers seem simple. Upon further inspection, the answers are still pretty simple.

Where does big data come from? This somewhat inappropriately configured question's answer is logfiles. Logfiles are by far the biggest culprits in big data generators. Every device on your network generates some type of logfile. Those logfiles either are kept on the local systems that produce them or they're sent to some type of log aggregator for further processing. Or not – meaning that someone might collect them but never bother parsing them. Preserving logfiles simply for posterity is a waste of bandwidth and disk space. If you collect logs, then you should parse, scrape, and process them for relevant and actionable information, including security breach data.

Where does the data go? The answer to this question shouldn't be much of a mystery because of how logfiles are saved or sent to another system for processing. Unfortunately, logfiles are often forgotten. Someone once called logfiles our digital exhaust. The moniker is accurate enough, because once we've jettisoned those logfiles, they're out of sight and out of mind. For a lot of us, their fate falls into the "good riddance" category. "No one looks at those stupid logfiles anyway" goes the swan song of many well-meaning but shortsighted system administrators. If you're not looking at your logfiles with some sort of aggregator and alerting system, then ignoring your big data is destined to become your biggest mistake – a mistake because you're missing security information, performance data, and user behaviors whose discovery will help you better maintain your systems and your security.

Why is big data so hard to maintain? To slightly change a quote from Douglas Adams' Hitchhiker's Guide to the Galaxy series about the size of space, big data is so hard to maintain because it's big. Really big. You just won't believe how vastly hugely mind-bogglingly big it is, until you try to archive it, retrieve it, or search through it. Maintain your big data, in this case logfiles, with a log aggregator. If you don't want to use a commercial solution like Splunk, you can take a chance on one of the many free aggregators, such as Loggly, which also offers commercial options.

If you've made it this far, you might get the idea that I think you should collect and use those logfiles for something more than an excuse to upgrade to a 10GbE network. You're correct. Those logfiles, once collected into terabytes of "digital exhaust" are actually digital gold for those who care to have a look inside them. And I know no one has the time or the patience to go plowing through even a few gigabytes of logs, but you can set up some automated scripts to capture interesting entries to send to an email distribution group or to an alert console that hopefully someone watches with interest.

Where is your big data? It's all around you. You're collecting it. You're backing it up. You're probably ignoring it. Don't ignore it. Ignoring it could cost your company a lot of money and its reputation. If you don't have a budget sufficient to purchase some decent tools, there's always those junior-level system administrators eager to learn what "real" system administrators do.

Ken Hess * ADMIN Senior Editor

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Manage logs with logrotate
    Take charge of your installation's logfiles with logrotate.
  • A modern logging solution
    As systems grow more complex and distributed, managing and making sense of logs used for monitoring, debugging, and troubleshooting can become a daunting task. Fluentd and its lighter counterpart Fluent Bit can help you unify data collection and consumption to make sense of logging data.
  • Detecting security threats with Apache Spot
    Security vulnerabilities often remain unknown when the data they reveal is buried in the depths of logfiles. Apache Spot uses big data and machine learning technologies to sniff out known and unknown IT security threats.
  • Security analysis with Security Onion
    Security Onion offers a comprehensive security suite for intrusion detection that involves surprisingly little work.
  • Real-time log inspection
    Teler is an intrusion detection and threat alert command-line tool that analyzes logs and identifies suspicious activity in real time.
comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs

Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>


		<div class=