Autonomous File Recovery

Summary

Normally, having users remove data and then yell about getting it recovered quickly is a fairly rare occurrence. I talk to a great number of administrators, and this scenario was something they rarely encountered. If they did, they either restored the data, talked to the user to help educate them, developed scripts or tools to help alleviate the problems, or performed some combination of these actions.

As system sizes grow, the probability of catastrophic events that require the restoration of data or other extreme measures increases. In general terms, the majority of administrators feel that this problem is only going to get worse with more users and more data, so they are looking for solutions.

I see two aspects to the problem. The first is a policy aspect, wherein upper level management needs to be brought into discussions to develop appropriate policies. As part of this, people need to remove emotion from the discussions and present real information about the frequency of data restoration requests, how much work it requires, and how much it disrupts normal operations. In essence, the discussion, like many other discussions, should be around resource allocation and associated benefits. The benefit of having upper management involved is the agreement on policies at the highest levels. The policies should be published to all users, with the implication that management is very aware of the issues surrounding data recovery and no more squeaky wheels will be tolerated. (Score one for the administrators.)

The second aspect, which really accompanies the first, is technical. Can tools help easily restore or recover erased data or prevent a user from accidentally erasing data? Backups can help, but they are only part of the solution. Going back to my early Unix education, I discussed how administrators can either alias the rm command so the data is moved to a temporary disk location or create their own script to accomplish the same thing. Coupled with normal backups, this method could help alleviate some of the problems administrators are having. All you need is some sort of temporary disk-based storage and you are off to the races. Make sure the size of the temporary storage is adjustable, so if you need more space, it’s fairly easy to add more hardware (with its associated costs).

Have I made a case for aliasing the rm command or using a script? I think the answer is unique to each system, the administrators, and the users. Many times, this approach can help users recover needed files quickly, but it takes work to develop and test the scripts.