Image © Maksim Kabakou,

Image © Maksim Kabakou,

The Fine Art of Troubleshooting


Article from ADMIN 53/2019
System troubleshooting is an art. It is a science. And, sometimes, it's brute force.

System troubleshooting is an art. It is a science. And, sometimes, it's brute force.

Junior system administrators have often asked, "How do you troubleshoot a problem when you have no clue where to start?" My answer has never changed: Start with the simple things first. This advice has helped me resolve every problem I've ever encountered over the past 20 years. Sure, some problems are difficult to solve, and some even seem impossible, but if you start with the simple things first, your chances of success are very high.

People in general tend to complicate problems and solutions. They tend to reach for the least probable cause for a problem and then apply the least likely solution to resolve it. I guess it's just human nature to assume that there is no easy problem or easy solution. I have found just the opposite. Most of the problems that I've seen have a reasonable cause and a relatively simple solution. I've been on many root cause analysis and postmortem calls, where I said, "I rebooted the system and everything came back as it should." Of course, I always had to explain why that resolution was the correct one and it was usually met with unhealthy skepticism and much criticism.

I can't count the number of times I heard, "Well, rebooting fixed the issue temporarily, but you didn't really resolve the problem or apply a permanent fix to it." My task was to restore service and not to spend days or weeks researching a memory leak in an application. A reboot fixed the problem. Subsequent reboots will continue to resolve the problem. Until the developers fix the application, rebooting is the correct response to the problem.

System administrators, especially junior admins, love to see long uptimes for systems. It is impressive to see a system that has an uptime of 500+ days. Everyone loves bragging rights of long uptimes. I once worked on a system that had an uptime of more than 1,300 days – a Sun Enterprise 450

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • No Hands
    As system administrators, we deal with a variety of issues, problems, and tasks that face us on a regular basis. Our managers ask us to solve problems with fewer staff. They ask us to "make do" with underpowered systems.
  • Is System Administration Bound for Extinction?
    Writers and tech journalists have predicted for years that the system administrator role is an endangered species, with extinction just around the corner. Are they right?
  • Dealing with IT Burnout
    I'm not the first writer or the first system administrator to discuss IT job burnout, but I think I have a few ideas to help when it happens to you.
  • Confessions of a Patchaholic

    Managing patches, service packs and updates in a heterogeneous environment is one of the leading causes of sleep deprivation among system administrators. The big question is, “How do you manage patches in a complex environment?”

  • The Fine Art of Negligence
    For 16 years, I worked for a very large international support company, mostly as a system administrator. For those 16 years, I found that the fine art of negligence was not only prevalent, but an actual and conscious career choice for some of my colleagues.
comments powered by Disqus