User Authentication and Disk Monitoring Discussions

SMART: Usage Within Big Clusters

In the past I have mentioned the SMART (Self-Monitoring Analysis and Reporting Technology) system included in virtually all modern hard drives. SMART capable hard drives have added intelligence in the firmware to monitor the drive and to attempt to detect hard drive failures. Also, SMART Capable drives can perform various types of self-tests which are very useful for diagnostics as well as monitor the temperature of the hard drive (note: not all hard drives report the same information). There is a nice package for Linux, called smartmontools, that allows you to access the SMART information and to run self-tests on SMART capable drives to help detect drives that are failing.

On February 14 of 2004, Konstantin Kudin asked if anyone was using SMART monitoring of IDE drives in big clusters. He was curious how often SMART was able to give some kind of warning of a failing drive within 24 hours of failure. Steve Timm responded that they had been using SMART monitoring tools on their cluster and SMART was able to predict failure about 50% of the time. Steven seemed very happy with this number.

Joe Mack posted a question about how one can get information out of smartd (the daemon in smartmontools). Steve Timm replied that they were using an older version that didn't have smartd and just used a cron script to run a short test every night and capture the output to a file. Steve also said that they were probably going to switch over to using smartd and an agent that is already grep-ing through /var/log/messages to capture the SMART information.

Felix Rauch posted that he was using smartmontools as well and had a few troubles grep-ing though the system logs, particularly when the logs rotate. He now uses a simple setuid-root program to monitor temperatures on the drives. Daniel Fernandez also mentioned that it's possible to have smartd write to a file other than the system logs and check it regularly for temperature. He also mentioned that you can have smartd run a script if a problem develops.

{mosgoogle right}

Sidebar One: Links Mentioned in Column

Beowulf Archives

Smartmontools Archive

Smartmontools

NIS on Linux

LDAP on Linux HOWTO

LDAP Implementation HOWTO

Rsync

Kerberos


This article was originally published in ClusterWorld Magazine. It has been updated and formatted for the web. If you want to read more about HPC clusters and Linux you may wish to visit Linux Magazine.

Jeff Layton has been a cluster enthusiast since 1997 and spends far too much time reading mailing lists. He can found hanging around the Monkey Tree at ClusterMonkey.net (don't stick your arms through the bars though).

    Search

    Feedburner

    Login Form

    Share The Bananas


    Creative Commons License
    ©2005-2012 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.